Normal view
Examining Smokeloaderβs Anti Hooking technique
Hooking is a technique to intercept function calls/messages or events passed between software, or in this case malware. The technique can be used for malicious, as well as defensive cases.
Rootkits for example can hook API calls to make themselves invisible from analysis tools, while we as defenders can use hooking to gain more knowledge of malware or build detection mechanisms to protect customers.
Cybersecurity continues to be a game of cat and mouses, and while we try to build protections, blackhats will always try to bypass these protection mechanisms. Today I want to show you how SmokeLoader bypasses hooks on ntdll.dll
and how Frida can be used to hook library functions.
The bypass was also already explained in a blog article from Checkpoint[1] written by Israel Gubi. It also covers a lot more than I do regarding Smokeloader, so it is definitely worth reading too.
Hooking with Frida
If youβve read my previous blog articles about QBot, you are familiar with the process iteration and AV detection[3]. It iterates over processes and compares the process name with entries in a black list containing process names of common AV products. If one process name matches with an entry, QBot quits its execution.
Frida is a Dynamic Instrumentation Toolkit which can be used to write dynamic analysis scripts in high level languages, in this case JavaScript. If you want to know more about this technology, I advice you to read to visit this website[4] and read its documentation.
We can write a small Frida script to hook the lstrcmpiA
function in order to investigate which process names are in the black list.
def main(): """Main.""" # argv[1] is our malware sample pid = frida.spawn(sys.argv[1]) sess = frida.attach(pid) script = sess.create_script(""" console.log("[+] Starting Frida script") var lstrcmpiA = ptr("0x76B43E8E") console.log("[+] Hooking lstrcmpiA at " + lstrcmpiA) Interceptor.attach(lstrcmpiA, { onEnter: function(args) { console.log("[+][+] Called strcmpiA"); console.log("[+][+] Arg1Addr = " + args[0]); console.log("[+][+] Buffer"); pretty_print(args[0], 0x30); console.log("[+][+] Arg2Addr = " + args[1]); console.log("[+][+] Buffer"); pretty_print(args[1], 0x30); }, onLeave: function(retval) { console.log("[+][+] Returned from strcmpiA") } }); function pretty_print(addr, sz) { var bufptr = ptr(addr); var bytearr = Memory.readByteArray(bufptr, sz); console.log(bytearr); }; """) script.load() frida.resume(pid) sys.stdin.read() sess.detach()
We attach to the malicious process and hook the lstrcmpiA
function at static address. When analysing malware, we have (most of the time) the privilege to control and adjust our environment as much as we want. If you turn off ASLR
and use snapshots, using Frida with static pointers is pretty convenient, because most functions will always have the same address. However, itβs also possible to calculate the addresses dynamically. lstrcmpiA
has 2 arguments, which are both pointers of type LPSTR
. So we just resolve the pointers, fill 0x30
bytes starting at pointer address into a ByteArray and print it.

Smokeloaderβs Anti Hooking technique
So how does Smokeloader bypass hooks? Well it can do it atleast for the ntdll.dll
library. During execution Smokeloader retrieves the Temp folder path and generates a random name. If a file with the generated name already exists in the temp folder, it is deleted with DeleteFileW
.

Next the original ntdll.dll
file is copied from system32
to the temp folder with the exact name it just generated. This leads to a copy of this mentioned library being placed in the temp directory.


Instead of loading the real ntdll.dll
file, the copy is loaded into memory by calling LdrLoadDll
.

Most AV vendors, as well as analysts probably implemented their hooks on ntdll.dll
, so the references to the copied ntdll.dll
file will be missed.
Smokeloader continues to call functions from this copied DLL, using for example function calls like NtQueryInformationProcess
to detect wether a debugger is attached to it.
Final Words
While analysing SmokeLoader at work, I stumbled across this AntiHook mechanism, which I havenβt seen before, so I wanted to share it here :-).
Iβve also only scratched on the surface of what Frida is capable of. I might work on something more complex next time.
Taming Virtual Machine Based Code Protection β 1
Overcoming obfuscation in binaries has always been an interesting topic for me, especially in combination with malware. Over the last weeks Iβve been playing around with Virtualised Code Protection in order to see how well I could handle it.
I decided to download a simple crack-me challenge which is obfuscated with this technique. It takes me some time to reverse everything, so there will be atleast 2 blog articles about my little project.

Virtualised Code Protection
Each architecture has a defined instruction set. By looking up the instructions to the corresponding bytes, we are able to translate these bytes into disassembly. The unit that actually executes these bytes is the CPU.
Virtual machine based code protection emulates a processor and thus switches our usual instruction set against a custom one. So in order to really understand what a virtual machine hardened binary is doing on a low level basis, we need to reverse the virtual machine first. This means we have to understand the custom instruction set.
I want to show you a practical example of how such a custom instruction can look like and be discovered.
Practical Example
Preparing the virtual machine
The challenge demands a serial key and a username. Both of them need certain values for the serial key to be valid. After entering a username and a serial key, the length of both of them are checked first.

Next At the bottom of this routine, we can already spot 2 interesting functions and operations which push the success or failure message onto the stack.

The function InitialiseVM
is where it gets interesting for us. If you just look quickly through the disassembly in the figure below, you will see that there are multiple buffers allocated and static values written into an internal structure. Furthermore it is filled with function pointers. Each one of those functions represents a custom instruction. This routine is used to allocate the virtual address space our virtual machine will use for emulation, as well as a table to select custom instructions from.

Next is the CheckSerial
function, which implements the virtual machine loop that emulates the virtual processor unit.

In the block at loc_4015E5
the function sub_4013DF
is executed each iteration. Afterwards the byte which the address in ESI+0x7C
points to is used to calculate the dynamic call at the end of the current block we are talking about (call dword ptr [esi+eax*4+80h]
). That means that the byte influencing which function to enter, is deciding which custom instruction to execute. Before we look at how some of the opcodes are actually parsed here, letβs review how the virtualised address space of this VM looks like.

Executing custom instructions
The function sub_4013DF
is called each iteration and reads bytes from the buffer which contains opcodes for custom instructions. The first one has a size of 5 bytes. Each of them is used by the virtual machine for translating these opcodes into a valid operation. At the moment of writing this article, I did not fully explore this function yet. However, I am confident that the last 2 bytes of an instruction are used to influence registers.

Upon returning from this function, the program takes the first byte of the ESI+0x7C
structure and uses it to determine which function from the previously allocated function table is called. The first run returns EAX=3
, so we are dealing with the custom instruction with instruction id 3.
Letβs jump into our first custom instruction.

The function sub_401271
has 31 XRefs and is used in every function from the function table. Before the function is called, the pointer to ESI+7C
, our 0x24 buffer holding the custom opcodes are retrieved.0xC
is added, that means we are pointing at the byte at ESI+7C+0xC
, the 4th DWORD
in this buffer.
The routine accesses the third byte of the current opcode and is responsible for determining the instruction type. The first four bits decide wether it is an instruction utilizing 2 registers, a memory read or moving an immediate value into a register. The second 4 bits influence the size of the byte that will be moved around. These 4 bits are zero extended into bytes.

Take a look at the figure below. The result of our InstrType
function is saved in ebp+0x4
. Next the memory address which ESI+0x20
points at is decreased and filled with the value we just computed. Doesnβt this look familiar ? The stack is also decreased if we put data onto it.

It seems that the custom instruction we just investigated is a custom PUSH
instruction. ESI+0x20
points to the virtual stack that is emulated by this virtual machine. Since the pointer at ESI+0x4C
is increased here after an instruction, it might hold the virtual instruction pointer.
So far we figured out what the first 3 opcodes do and we have an idea what the last 2 ones are responsible for. In order to give a proper answer on how they are used, it is needed to look at more than just 1 virtual instruction execution.

Conclusion
So it just took me a complete blog article to really explain how to reverse a single custom instruction of a binary hardened with Virtualised Code Protection ;-). As you can see, this kind of software protection is very powerful.
I will finish this challenge for sure and will write a second blog article about how I solved it.
UpnP β Messing up Security since years
UpnP is a set of networking protocols to permit network devices to discover each otherβs presence on a network and establish services for various functionalities.
Too lazy to port forward yourself ? Just enable UpnP to automatically establish working configurations with devices! Dynamic device configuration like this makes our life more comfortable for sure. Sadly it also comes with many security issues.
In this blog article I am focusing on mentioning the stages of the UpnP protocol, a quick introduction to security issues regarding UpnP and how QBot abuses the UpnP protocol to exploit devices as proxy C2 servers.
UpnP in a nutshell
UpnP takes usage of common networking protocols and stacks HTTP
, SOAP
and XML
on top of the IP
protocol in order to provide a variety of functionalities for users. Without going to deep into how UpnP works in detail, the following figure is enough for the basics.

Some services a node with UpnP enabled can offer (it really depends on the device):
- Port forwarding
- Switching power on and off for light bulbs
- etc.
This is very high level of course. If you are interested in everything about UpnP, I recommend you to check out Wikipedia[1] for a high level introduction or read this report that goes more into detail[2].
For the following content of this blog article, only the first three stages are really relevant.
IoT Security and UpnP
Misconfiguration
Again, while it might be very convenient for customers to have devices autoconfigure themselves, it leads to huge security risks.
Many routers have UpnP enabled by default. Think of misconfigured IoT devices that sends a command to port forward a specific port, leading to a port exposure to the internet.
It is known that many IoT devices contain awful security flaws like default credentials for telnet. If devices like this have such misconfigurations and expose its telnet port to the outside, it probably takes about 5 minutes till some script kiddie adds this device to its botnet.
Exploitation
A blog post from TrendMicro[3] previously mentioned that many devices still use very old UpnP libraries which are not up to date to current security standards. This creates a larger attack surface for attackers. The newest one being CallStranger
.

It is caused by the Callback header value in the UpnP SUBSCRIBE function. This field can be controlled by an attacker and enabled a Server Side Request Forgery
like vulnerability. It can be used for the following malicious cases:
- Exfilitrate data
- Scan networks
- Force nodes to participate in DDoS attacks
I recommend you to visit the official domain[4] of this vulnerability, if you want gain more knowledge about this vulnerability.
UpnP abused by QBot
Security risks created by UpnP are not limited to the IoT landscape of course.
Another method to use UpnP for malicious cases is to install Proxy C2 servers on devices which have the mentioned protocol enabled, like QBot does for example. Letβs take a look at how this is done.
Diving into QBotβs UpnP proxy module
This technique was first discovered by McAfee[4] in 2017. First QBot starts scanning for devices which have UpnP enabled and is one of the following device types:
- urn:schemas-upnp-org:device:InternetGatewayDevice:1
- urn:schemas-upnp-org:service:WANIPConnection:1
- urn:schemas-upnp-org:service:WANPPPConnection:1
- upnp:rootdevice

If you are using INETSIM for malware analysis, you will probably realise that it does not offer any functionality to fake a SSDP or UpnP service in any way. However, we can use this python script[5] by user GrahamCobb which emulates a fake SSDP service and adjust the device description to suit our needs.
Once the devices are discovered, it sends requests for device descriptions and checks whether it deals with an internet gateway device. This can be determined by looking at the device description itself.

If it is an internet gateway device, it confirms whether a connection exists by sending a GetStatusInfo
followed by retrieving the external ip address of this device by sending the GetExternalIPAddress
command.
Next it tries to use the AddPortMapping
command to add port forwarding rules to the device.

Afterwards all rules are removed again and the ports which were successfully port forwarded are sent as a HTTP-POST
to the C2 server.
The carrier protocol is HTTPS
and the response is sent in the following form:
# destination address
https://[HARDCODED_IP]:[HARDCODED_PORT]/bot_serv
# POST DATA form, successful port forwarded ports are appended to ports
cmd=1&msg=%s&ports=
From this point on, my analysis stopped for now. However, McAfee explains that a new binary is downloaded from the contacted C2 server, which re-adds the port forwarding rules and is responsible for the C2 communication. The blog article Iβve referenced above explains the whole functionality, so I recommend you to take a look at it, if you are interested in the next steps.
Final Words
As you can see UpnP contains many security flaws and can lead to a compromised network. If you have UpnP enabled in your companyβs network, I really recommend to check whether this is really needed and turn it off if it is not necessary.
So exams at university are coming up next, it will probably take some time until I can get my hands on the QBot C2 protocol or the proxy binary. I do however, want to look at these two functionalities next.
Deobfuscating DanaBotβs API Hashing
You probably already guessed it from the titleβs name, API Hashing is used to obfuscate a binary in order to hide API names from static analysis tools, hindering a reverse engineer to understand the malwareβs functionality.
A first approach to get an idea of an executableβs functionalities is to more or less dive through the functions and look out for API calls. If, for example a CreateFileW
function is called in a specific subroutine, it probably means that cross references or the routine itself implement some file handling functionalities. This wonβt be possible if API Hashing is used.
Instead of calling the function directly, each API call has a corresponding checksum/hash. A hardcoded hash value might be retrieved and for each library function a checksum is computed. If the computed value matches the hash value we compare it against, we found our target.

In this case a reverse engineer needs to choose a different path to analyse the binary or deobfuscate it. This blog article will cover how the DanaBot banking trojan implements API Hashing and possibly the easiest way on how this can be defeated. The SHA256
of the binary I am dissecting here is added at the end of this blog post.
Deep diving into DanaBot
DanaBot itself is a banking trojan and has been around since atleast 2018 and was first discovered by ESET[1]. It is worth mentioning that it implements most of its functionalities in plugins, which are downloaded from the C2 server. I will focus on deobfuscating API Hashing in the first stage of DanaBot, a DLL which is dropped and persisted on the system, used to download further plugins.
Reversing the ResolvFuncHash routine
At the beginning of the function, the EAX
register stores a pointer to the DOS
header of the Dynamic Linked Library which, contains the function the binary wants to call. The corresponding hash of the yet unknown API function is stored in the EDX
register. The routine also contains a pile of junk instructions, obfuscating the actual use case for this function.
The hash is computed solely from the function name, so the first step is to get a pointer to all function names of the target library. Each DLL contains a table with all exported functions, which are loaded into memory. This Export Directory is always the first entry in the Data Directory array. The PE file format and its headers contain enough information to reach this mentioned directory by parsing header structures:

In the picture below, you can see an example of the mentioned junk instructions, as well as the critical block, which compares the computed hash with the checksum of the function we want to call. The routine iterates through all function names in the Export Directory and calculates the hash.
The loop breaks once the computed hash matches the value that is stored in the EDX
register since the beginning of this routine.

Reversing the hashing algorithm
The hashing algorithm is fairly simple and nothing too complicated. Junk instructions and opaque predicates complicate the process of reversing this routine.
The algorithm takes the nth
and the stringLength-n-1th
char of the function name and stores them, as well as capitalised versions into memory, resulting in a total of 4 characters. Each one of those characters is XOR'd
with the string length. Finally they are multiplied and the values ββare added up each time the loop is run and result in the hash value.
def get_hash(funcname): """Calculate the hash value for function name. Return hash value as integer""" strlen = len(funcname) # if the length is even, we encounter a different behaviour i = 0 hashv = 0x0 while i < strlen: if i == (strlen - 1): ch1 = funcname[0] else: ch1 = funcname[strlen - 2 - i] # init first character and capitalize it ch = funcname[i] uc_ch = ch.capitalize() # Capitalize the second character uc_ch1 = ch1.capitalize() # Calculate all XOR values xor_ch = ord(ch) ^ strlen xor_uc_ch = ord(uc_ch) ^ strlen xor_ch1 = ord(ch1) ^ strlen xor_uc_ch1 = ord(uc_ch1) ^ strlen # do the multiplication and XOR again with upper case character1 hashv += ((xor_ch * xor_ch1) * xor_uc_ch) hashv = hashv ^ xor_uc_ch1 i += 1 return hashv
A python script for calculating the hash for a given function name is also uploaded on my github page[2] and free for everyone to use. Iβve also uploaded a text file with hashes for exported functions of commonly used DLLs.
Deobfuscation by Commenting
So now that we cracked the algorithm, we want to update our disassembly to know which hash value represents which function. As Iβve already mentioned, we want to focus on simplicity. The easiest way is to compute hash values for exported functions of commonly used DLLs and write them into a file.

With this file, we can write an IdaPython
script to comment the library function name next to the Api Hashing call. Luckily the Api Hashing function is always called with the same pattern:
- Move the wanted hash value into the
EDX
register - Move a
DWORD
intoEAX
register
First we retrieve all XRefs
of the Api Hashing function. Each XRef
will contain an address where the Api Hashing function is called at, which means that in atleast the 5 previous instructions, we will find the mentioned pattern. So we will fetch the previous instruction until we extract the wanted hash value, which is being pushed into EDX
. Finally we can use this immediate to extract the corresponding api function from the hash values we have generated before and comment the function name next to the Xref
address.
def add_comment(addr, hashv, api_table): """Write a comment at addr with the matching api function.Return True if a corresponding api hash was found.""" # remove the "h" at the end of the string hashv = hex(int(hashv[:-1], 16)) keys = api_table.keys() if hashv in keys: apifunc = api_table[hashv] print "Found ApiFunction = %s. Adding comment." % (apifunc,) idc.MakeComm(addr, apifunc) comment_added = True else: print "Api function for hash = %s not found" % (hashv,) comment_added = False return comment_added def main(): """Main""" f = open( "C:\\Users\\luffy\\Desktop\\Danabot\\05-07-2020\\Utils\\danabot_hash_table.txt", "r") lines = f.readlines() f.close() api_table = get_api_table(lines) i = 0 ii = 0 for xref in idautils.XrefsTo(0x2f2858): i += 1 currentaddr = xref.frm addr_minus = currentaddr - 0x10 while currentaddr >= addr_minus: currentaddr = PrevHead(currentaddr) is_mov = GetMnem(currentaddr) == "mov" if is_mov: dst_is_edx = GetOpnd(currentaddr, 0) == "edx" # needs to be edx register to match pattern if dst_is_edx: src = GetOpnd(currentaddr, 1) # immediate always ends with 'h' in IDA if src.endswith("h"): add_comment(xref.frm, src, api_table) ii += 1 print "Total xrefs found %d" % (i,) print "Total api hash functions deobfuscated %d" % (ii,) if __name__ == '__main__': main()
Conclusion
As reverse engineers, we will probably continue to encounter Api Hashing in various different ways. I hope I was able to show you some quick & dirty method or give you at least some fundament on how to beat this obfuscation technique. I also hope that, the next time a blue team fellow has to analyse DanaBot, this article might become handy to him and saves him some time reverse engineering this banking trojan.
IoCs
- Dropper =
e444e98ee06dc0e26cae8aa57a0cddab7b050db22d3002bd2b0da47d4fd5d78c
- DLL =
cde01a2eeb558545c57d5c71c75e9a3b70d71ea6bbeda790a0b871fcb1b76f49
Linux/Windows Internals β Process structures
Having an overview of the running processes on the operating system is something we usually take for granted. We canβt think of working without fundamental features like that.
But how does the kernel keep track of the processes, which are currently running ? Today, we take a look at the corresponding structures of the Windows and the Linux system, which are responsible for holding track of the running processes.
Linux β Task structures
If you ever used Linux before, you are probably familiar with the ps
command, which allows you to print the list of all processes currently running on the system. We will dive into how the Linux kernel keeps track of these processes internally.
The kernel stores a list of processes in a doubly linked list, called the task list
. Each node in this list is a process descriptor of the type task_struct
. The definition of this task struct
can be found in linux/sched.h
[1] of Linus Torvaldβs git repository.

If you checked out the code, you will realise that this structure is pretty extensive and we will not dive into every member of this structure. Our focus lies on understanding how the kernel handles this task list. As Iβve already explained, the kernel keeps track of all processes by a doubly linked list. Each task structure holds a member tasks
of type list_head
.
struct list_head { struct list_head *next, *prev; };
As youβve probably already guessed, the next
pointer holds a reference, which allows us to retrieve the next task_struct
and the prev
field allows us to take a step back. We can write a simple to linux kernel module to iterate through the task list and print out all process names and process ids on the current system:
Iterating through the linked list
Task structures lie in kernel space, so accessing these is not possible without writing a kernel module. The code is pretty straight forward. We just use the init_task
as an initial entry point, which is the idle task running on the linux system. Iterating through the linked list is possible via the next_task
macro. Then we use the printk
function to log the comm
(process executable) member and the process id.
#include <linux/sched/task.h> #include <linux/sched/signal.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/init.h> MODULE_LICENSE("GPL"); MODULE_AUTHOR("Andreas Klopsch"); MODULE_DESCRIPTION("Simple module for printing task structure members"); MODULE_VERSION("0.1"); // get the top element in the task doubly linked list extern struct task_struct init_task; static int __init action_init(void){ struct task_struct task; printk(KERN_INFO "Init task = %s", init_task.comm); printk(KERN_INFO "Getting next task"); task = *(next_task(&init_task)); // deference pointer for convencience reasons while(task.pid != init_task.pid) { printk(KERN_INFO "Comm = %s pid = %d", task.comm, task.pid); task = *(next_task(&task)); // dereference again, use macro to not iterate through list_head } return 0; } static void __exit action_exit(void){ printk(KERN_INFO "Stopping task iterator"); } module_init(action_init); module_exit(action_exit);

Windows β EPROCESS
On Windows, there are similarities with Linux. Each process on Windows is represented by an EPROCESS
structure, which is actually the representation of a process object. The EPROCESS
structure also contains a KPROCESS
structure, which holds information for the kernel.
As with Linux, this block contains various information relating to the corresponding process, like:
- Virtual Address Descriptors, holding the map of the process virtual memory
- Process ID
- Image base name
Another similarity with the Linux system, is the way the processes are linked with each other. EPROCESS
structures are connected to each other via a doubly linked list, called ActiveProcessLinks
. The next process in the list is referenced by FLink
and the previous process object is referenced by the BLink
pointer. One way of how this could be implemented, is iterating through the ActiveProcessLinks
structure again.

References
- Windows Internals, Part 1: System Architecture, Processes, Threads, Memory Management, and More
- Mastering Malware Analysis: The complete malware analystβs guide to combating malicious software, APT, cybercrime, and IoT attacksΒ
DGAs β Generating domains dynamically
A domain generation algorithm is a routine/program that generates a domain dynamically. Think of the following example:
An actor registers the domain evil.com
. The corresponding backdoor has this domain hardcoded into its code. Once the attacker infects a target with this malware, it will start contacting its C2 server.
As soon as a security company obtains the malware, it might blacklist the registered domain evil.com
. This will hinder any attempts of the malware to receive commands from the original C2.
If a domain generation algorithm would have been used, the domain will be generated based on a seed. The current date for example is a popular seed amongst malware authors. A simple domain blacklisting would not solve the problem. The security company will have to resort to different methods.
By generating domains dynamically, it is harder for defenders to hinder the malware from contacting its C2 server. It will be necessary to understand the algorithm.
Example implementation of a DGA
A quick & dirty implementation(loosely based on Wikipedia)[1] of such algorithm could look like this:
"""Example implementation of a domain generation algorithm.""" import sys import time import random def gen_domain(month, day, hour, minute): """Generate the domain based on time. Return domain""" print( f"[+] Gen domain based on month={month} day={day} hour={hour} min={minute}") domain = "" for i in range(8): month = (((month * 8) ^ 0xF)) day = (((day * 8) ^ 0xF)) hour = (((hour * 8) ^ 0xF)) minute = (((minute * 8) ^ 0xF)) domain += chr(((month * day * hour * minute) % 25) + 0x61) return domain try: while True: d = gen_domain(random.randint(1, 12), random.randint(1, 30), random.randint(0, 24), random.randint(0, 60)) print(f"[+] Generated domain = {d}") time.sleep(5) except KeyboardInterrupt: sys.exit()
Our DGA algorithm would use the current date and time as a seed. Each parameter is multiplied with 8 and XORβd with 0xF
. Finally all four values are multiplied with each other. The final operations are used to make sure that we generate a character in small caps. The output of this program looks like this:
[+] Gen domain based on month=12 day=2 hour=4 min=4 [+] Generated domain = taavtaab.com [+] Gen domain based on month=3 day=10 hour=11 min=36 [+] Generated domain = kugxfkvx.com [+] Gen domain based on month=2 day=27 hour=4 min=1 [+] Generated domain = kaasuapn.com
Seed or Dictionary based
There are different main approaches when implementing a domain generation algorithm. For the sake of keeping this simple, we will not focus on the hybrid approach.

Seed based Approach
We already introduced the first one. Our implementation is an algorithm based on a seed, which is served as an input. Another example I can provide, is how APT34
used such seed based algorithm in a campaign targeting a government organisation in the Middle East. The campaign was discovered by FireEye[2].
The mentioned APT group used domain generation algorithms in one of their downloaders. The Downloader was named BONDUPDATER
by FireEye and is implemented in the Powershell Scripting Language.

The first 12 chars of the UUID is extracted. Next the program runs into a loop. Each iteration a new random number is generated and the domain is generated by concatenating hardcoded, as well as generated values. GetHostAddresses
will try to resolve the generated domain. If it fails, a new iteration starts. Once a registered domain is generated and resolved, it will break the loop.
Depending on the resolved ip address, the script will trigger different actions.
Dictionary based Approach
The second approach is to create a dictionary based domain generation algorithm. Instead of focusing on a seed, a list of words could be provided. The algorithm randomly selects words from these lists, concatenates them and generates a new domain. Suppobox[3] is a malware, which implemented the dictionary based approach[4].
Defeating Domain Generation Algorithms
The straight forward way to counter these algorithms is to reverse engineer the routine and to predict future domains. One famous case of predicting future domains is the takedown of the Necurs Botnet by Microsoft[5]. By understanding the DGA, they were able to predict the domains for the next 25 months.
I am not a ML magician. However, just a quick google research shows that there is a lot research going on. Machine Learning based approaches to counter DGAs seems to be promising too.
Taming Virtual Machine Based Code Protection β 2
In the last episode β¦
As youβve probably guessed it, this is the second part of my journey to reverse engineer a virtual machine protected binary. If you havenβt read the first part[1], I encourage you to do so, because I will not repeat everything again here. While the first part dealt with explaining the virtual environment and giving an initial first look into the virtual machineβs custom instruction set, I will focus on disassembling the virtual machine code completely this time.
I might repeat some steps from the first part again, mostly because I felt that it was necessary to do so :-).
Into the battle
We already explained the environmental setup in the previous blog post and also identified the main loop, which is responsible for instruction execution.

Each iteration, an instruction is parsed and the final CALL
in the left branch of figure 1 executes the instruction.
Critical functions
I covered the instruction parsing process in my last blog article a little bit. But since we are going to build a disassembler, I will explain the most important routines once again.
0x4013DF / ParseInstruction
This function is called each iteration in the loop from figure 1 and is responsible for parsing the byte codes.

Each loop, the Virtual Instruction Pointer/VIP
is retrieved, pointing at the instruction to execute. Each instruction is parsed. This function is fully responsible for transforming the bytes into a further processable format. Letβs take a look at how the first three instructions are parsed:

If you are interested in understanding this format fully, I recommend you to jump to the disassembler code[2]. I will only cover the first instruction here.
So how do we get from 03 15 03 00 04
to the parsed format ?
The first byte is always the instruction id. 03
is the id for the PUSH
instruction. The second byte is divided into its upper 6 bits and lower 2 bits, representing the instruction size and number of operands used for this instruction. The next bytes are used to represent a single operand. In the example above, the first operand config 00 03 00 00, is the configuration for USE 32 BIT OF REGISTER, SPECIFIED BY THE NEXT DWORD 04 00 00 00
. The next DWORD is 04 00 00 00
, which is the fourth virtual register. Now what is the fourth register here ? Letβs take a quick look at the instructions.
PUSH VR4 MOV VR4, VR7 SUB VR7, 0xB4
This looks very similar to the usual function prologue ;-). So the fourth register must be EBP
!.
PUSH EBP MOV EBP, ESP SUB ESP, 0xB4
0x401271 / GetOpval & 0x401322 / StoreOpval
I will not cover these two functions in depth here. If you take a look at figure 3 again, you will see that I mention the operand configs
. These functions are responsible for filling the operands according to these configs.
In the example above, the SUB VR7, 0xB4
instruction uses 00030000 07000000
for the first operand and 00020000 B4000000
for the second config. If you reverse engineer every single option, you will find out that the following configurations exist:
# First DWORD CONFIG 00000000 ==> LOWEST BYTE OF REG X # f.e AX 00010000 ==> SECOND LOWEST BYTE OF REG X # f.e. AH 00020000 ==> LOWER 16 BIT OF REG X # f.e. AX 00030000 ==> 32 BIT OF REGX # f.e. EAX 01000000 ==> BYTE AT LOC 01010000 ==> BYTE AT LOC 01020000 ==> WORD AT LOC 01030000 ==> DWORD AT LOC 02000000 == BYTE FROM IMM. 02010000 ==> BYTE FROM IMM. 02020000 ==> WORD FROM IMM. 02030000 ==> DWORD FROM IMM. # Second DWORD CONFIG, if register 00000000 ==> EAX 01000000 ==> EBX 02000000 ==> ECX 03000000 ==> EDX 04000000 ==> EBP 05000000 ==> ESI 06000000 ==> EDI 07000000 ==> ESP
Eternal Debugging
Now we can use the gained knowledge to gain an initial understanding of what is happening and to verify whether we are able to decode instructions manually.

If you take a look at the last instructions, you will see that there are some constants pushed into memory. If you google these constants, you will come to the conclusion that this must be the MD5 Init
routine[3]. The next step is to build a disassembler.
Disassembling the code
I wrote this one in C++ and you can find the source code to it on my github page[4]. Writing this on Python would have been possible too β¦ and probably a lot easier and faster, I chose C++ though for learning purposes. If my C++ is awful, forgive me. We all start somewhere ;-).

Our disassembler does have some limitations though. The disassembly was complex and I believe that some memory address offsets and register sizes are wrong. Also, I did not reverse engineer all instructions. However though, that should not be a problem, because we only need to understand what is happening here on a higher level.
Identifying the algorithm
We already spotted the variables, which we also found in the MD5.c source code(f.e. 0x2381bc0
). However, the actual hashing algorithm does not match the original one. Therefore it seems to be some kind of a modified version of it. Furthermore we spot a routine, which seems to be the XTEA algorithm[5].

Final words
So thatβs basically it. I donβt know when and if I will a third part covering the serial key generator. When I started this challenge, I was only interested in learning how to disassemble custom instruction sets.
If you are interested in how others solved this challenge, I recommend you to read the tutorials from wagonono and kernelj, they both completely solved this challenge[6]. Wagonono also created a disassembler and his version is better than mine.
Catching Debuggers with Section Hashing
As a Reverse Engineer, you will always have to deal with various anti analysis measures. The amount of possibilities to hamper our work is endless. Not only you will have to deal with code obfuscation to hinder your static analysis, but also tricks to prevent you from debugging the software you want to dig deeper into. I want to present you Section Hashing
today.
I will begin by explaining how software breakpoints work internally and then give you an example of a Section Hashing
implementation.
Debuggers β How software breakpoints work
When you set a breakpoint in your favourite debugger at a specific instruction, the debugger software will replace it temporarily with another instruction, which causes a fault or an interrupt. On x86, this is very often the INT 3
instruction, which is the opcode 0xCC
. We can examine how this looks like in RAM.
We open x32dbg.exe
and debug a 32 bit PE and set a breakpoint near the entry point.

When setting a breakpoint, you will see the original instruction instead of the patched one in the debugger. However, we can examine the same memory page in RAM with ProcessHacker.

In volatile memory, the byte 33
changed to CC
, which will cause the program to halt when reached. This software interrupt will then be handled by the debugger and the code will be replaced again.
Catching Breakpoints with Section Hashing
After explaining how software breakpoints work, Iβll get to the real topic of this article now. We will move to the Linux world now for this example.
A software breakpoint is actually nothing else than a code modification of the executable memory section in RAM. Once a breakpoint is set, the .text
section will be modified. A very known technique to catch such breakpoints in RAM is called Section Hashing
.
Authors can embed the hash of the .text section in the binary. Upon execution, they use the same algorithm to generate a new hash from the .text section. If a software breakpoint is set, the hash will differ from the embedded hash. An example implementation can look like this:

In this case, a hash of the .text section is generated. Afterwards it is used to influence the generation of the flag. If a software breakpoint is set during execution, a wrong hash will be generated.
This is a simple example of Section Hashing
. In combination with code obfuscation and other anti analysis measurements, it can be very hard to spot this technique. It is also occasionally used by commercial packers.
Defeating Section Hashing
There are multiple ways to defeat this technique, some of them could be:
- Patching instructions
- Using hardware breakpoints
Instead of modifying the code in Random Access Memory, in x86 hardware breakpoints use dedicated registers to halt the execution. Hardware Breakpoints are still detectable.
In Windows, the program can fetch the CONTEXT
via GetThreadContext
to see if the debugging registers are used. A great example on how this is implemented can be found here[1]. If you are interested in trying to defeat it by yourself, you can try to beat the Section Hashing
technique by yourself at root-me.org[2].
PEB: Where Magic Is Stored
As a reverse engineer, every now and then you encounter a situation where you dive deeper into the internal structures of an operating system as usual. Be it out of simple curiosity, or because you need to understand how a binary uses specific parts of the operating system in certain ways . One of the more interesting structures in Windows is the Process Environment Block/PEB. In this article, Iβd like to introduce you to this structure and talk about various use cases of how adversaries can abuse this structure for their own purposes.
Introducing PEB
The Process Environment Block is a critical structure in the Windows OS, most of its fields are not intended to be used by other than the operating system. It contains data structures that apply across a whole process and is stored in user-mode memory, which makes it accessible for the corresponding process. The structure contains valuable information about the running process, including:
- whether the process is being debugged or not
- which modules are loaded into memory
- the command line used to invoke the process
All these information gives adversaries a number of possibilities to abuse it. The figure below shows the layout of the PEB
structure:
typedef struct _PEB { BYTE Reserved1[2]; BYTE BeingDebugged; BYTE Reserved2[1]; PVOID Reserved3[2]; PPEB_LDR_DATA Ldr; PRTL_USER_PROCESS_PARAMETERS ProcessParameters; PVOID Reserved4[3]; PVOID AtlThunkSListPtr; PVOID Reserved5; ULONG Reserved6; PVOID Reserved7; ULONG Reserved8; ULONG AtlThunkSListPtr32; PVOID Reserved9[45]; BYTE Reserved10[96]; PPS_POST_PROCESS_INIT_ROUTINE PostProcessInitRoutine; BYTE Reserved11[128]; PVOID Reserved12[1]; ULONG SessionId; } PEB, *PPEB;
Now that weβve talked a little bit about the layout and purpose of the structure, letβs take a look at a few use cases.
Reading the BeingDebugged flag
The most obvious way is to check the BeingDebugged
to identify, whether a debugger is attached to the process or not. Through reading the variable directly from memory instead of using usual suspects like NtQueryInformationProcess
or IsDebuggerPresent
, malware can prevent noisy WINAPI calls. This makes it harder to spot this technique.
However, most debuggers already take care of this. X64dbg
for example, has an option to hide the Debugger by modifying the PEB structure at start of the debugging session.
Iterating through loaded modules
Another use case, could be iterating the loaded modules and discover DLLs injected into memory with purpose to overwatch the running process. To understand how to achieve this, we need to take a look at the PPEB_LDR_DATA
structure included in PEB
, which is provided by the Ldr
variable:
typedef struct _PEB_LDR_DATA { BYTE Reserved1[8]; PVOID Reserved2[3]; LIST_ENTRY InMemoryOrderModuleList; } PEB_LDR_DATA, *PPEB_LDR_DATA;
PPEB_LDR_DATA
contains the head to a doubly linked list named InMemoryOrderModuleList
. Each item in this list is a structure from type LDR_DATA_TABLE_ENTRY
, which contains all the information we need to iterate loaded modules. See the structure of LDR_DATA_TABLE_ENTRY
below:
typedef struct _LDR_DATA_TABLE_ENTRY { PVOID Reserved1[2]; LIST_ENTRY InMemoryOrderLinks; PVOID Reserved2[2]; PVOID DllBase; PVOID EntryPoint; PVOID Reserved3; UNICODE_STRING FullDllName; BYTE Reserved4[8]; PVOID Reserved5[3]; union { ULONG CheckSum; PVOID Reserved6; }; ULONG TimeDateStamp; } LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;
So by iterating the doubly linked list, we are able to discover the base address and full name of all modules loaded into memory of the running process. The snippet below is a small Proof of Concept. It iterates the linked list and prints the library name to stdout. I created it for the purpose of this blog article. You are free to use it, however I will also upload it to my github repo the upcoming days:
#include <Windows.h> #include <iostream> #include <shlwapi.h> #define NO_STDIO_REDIRECT typedef struct _UNICODE_STRING { USHORT Length; USHORT MaximumLength; PWSTR Buffer; } UNICODE_STRING, * PUNICODE_STRING; typedef struct _LDR_DATA_TABLE_ENTRY_MOD { LIST_ENTRY InMemoryOrderLinks; PVOID Reserved2[2]; PVOID DllBase; PVOID EntryPoint; PVOID Reserved3; UNICODE_STRING FullDllName; BYTE Reserved4[8]; PVOID Reserved5[3]; union { ULONG CheckSum; PVOID Reserved6; }; ULONG TimeDateStamp; } LDR_DATA_TABLE_ENTRY_MOD, * PLDR_DATA_TABLE_ENTRY_MOD_MOD; int main(int argc, char** argv[]){ PLDR_DATA_TABLE_ENTRY_MOD_MOD lib = NULL; _asm { xor eax, eax mov eax, fs:[0x30] mov eax, [eax + 0xC] mov eax, [eax + 0x14] mov lib, eax }; printf("[+] Initialised pointer to first LDR_DATA_TABLE_ENTRY_MOD\n"); // Loop as long as we don't reach the head of the linked list again while ( lib->FullDllName.Buffer != NULL ) { printf("[+] %S\n", lib->FullDllName.Buffer); lib = (PLDR_DATA_TABLE_ENTRY_MOD_MOD)lib->InMemoryOrderLinks.Flink; } printf("[+] Done!\n"); return 0;
If you are wondering how I am able to access the PEB
in the code below, you should take a look at the inline assembly in the main
method, especially the instruction mov eax, fs:[0x30]
. FS is a segment register, similar to GS. FS can be used to access thread-specific memory. Offset 0x30
allows you to access the linear address of the Process Environment Block.
Finally, we want to take a look at a real world example of how PEB
can be abused.
How the MATA Framework abuses PEB
This use case was introduced to me while reverse engineering a Windows variant of the MATA Framework. According to Kaspersky[1], the MATA Framework is used by the Lazarus group and targets multiple platforms.
Malware authors have a high interest in obfuscation, because it increases the time needed to reverse engineer it. One way to hide API calls is to use API Hashing. I have written about Danabotβs API Hashing[2] before and how to overcome it. MATA also uses this technique.
However instead of using the WIN API calls to retrieve the address of DLLs loaded into memory, MATA abuses the Process Environment Block to fetch base addresses. Letβs take a look at how MATA for Windows achieves this:
MATA API Hashing
The input of the APIHashing
method takes an integer as the only parameter, this is the hash for the corresponding API call.

Right after the prologue, it retrieves a pointer to PEB
by reading it from the Thread Environment Block via the segment register GS
. Similar to our proof of concept above, MATA now fetches the address to the head of the linked list provided by InMemoryOrderModuleList
. Each item of the linked list provides the DLL base address of the corresponding loaded module.
From there, the malware reads the e_lfanew
field, which contains the offset to the file header. By adding the base address, e_lfsanew
and 0x88
it jumps directly to the data directories of the corresponding PE. From the data directories, MATA accesses the exported function names in a similar way as Iβve described in my blog article about DanaBotβs API Hashing[3]. The hashing algorithm is fairly simple. Each integer representation of a character is added and the result of the addition is ROR'd
by 0xD
consecutively each iteration. If the final hash matches the input parameter, the address to the function is retrieved. The following figure explains the function at a high level:

Learning from each other
Thatβs it with the blog article, I hope you enjoyed it! There are probably way more use cases and real world cases of how the PEB
is and and can be abused. If you can think of another one, feel free to leave a comment below and share it, so that we can learn from each other!
The DLL Search Order And Hijacking It
If you ever used Process Monitor to track activity of a process, you might have encountered the following pattern:

The image above is a snippet from events captured by Process Monitor during the execution of x32dbg.exe
on Windows 7. DNSAPI.DLL
and IPHLPPAPI.DLL
are persisted in the System
directory, so you might question yourself:
Why would Windows try to search for either of these DLLs in the application directory first?
Operating Systems are very complex and so is the challenge of implementing an error-fault system to search for dependencies, like dynamic linked libraries. Today, weβll talk about DLL Search Order
and DLL Search Order Hijacking
, in particular how it works and how adversaries can abuse it.
DLL Search Order
First, we have to talk about what happens when a PE File is executed on the Windows system.
The majority of native binaries you encounter on Windows are linked dynamically. Linked dynamically means that upon start of the execution, it uses information which are embedded inside the binary to locate DLLs that are essential for this process. In comparison with statically linked binaries, when linked dynamically the executable will use the libraries provided by the OS instead of having them compiled into the executable itself.
Before the dynamically linked executable can use or load these libraries, it will have to know where these dependencies are persisted on disk or if they are already in memory. This is where the DLL Search Order
makes its appearance. To keep it simple, we will focus only on Windows Desktop Applications.
Pre-Checks and In-Memory Search
Before the Windows OS starts searching for the needed DLL on disk, it will first attempt to find the needed module in memory. If a DLL is already in memory, it will not loaded it again. Now this part is a little bit complicated and out of context for this blog article, we would have to define what βloadedβ even means. If you are more interested in the first check, I advise you to look up the official Microsoft documentation[1].
If the memory check fails, Windows can fall back to using a list of known DLLs. if the needed library is part of that list, it will use the copy of the known DLL. The list of known DLLs are persisted in the Windows Registry.

On-Disk Search
If the first two checks fail, the OS will have to search for the DLL on disk. Depending on the OS Settings, Windows will use a different search order. Per default, Windows enables the DLL Search Mode
feature to harden the system and prevent DLL Search Order Hijacking attacks, a technique we will explain in the upcoming section.
The key to the feature is as follows:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SafeDllSearchMode
Letβs take a look at the differences of the search order depending whether SafeDllSearchMode
is enabled or not.

We clearly see that the current directory is prioritised if SafeDllSearchMode
is disabled and this can be abused by adversaries. The art of abusing this search order flow is called DLL Search Order Hijacking
.
DLL Search Order Hijacking
Adversaries can abuse the search order flow displayed above to load their own malicious DLLs instead of the legitimate ones into memory. There are many ways this technique can be used. However, it is more effective in achieving persistence on the target system then initial execution.
Letβs take a step back and revisit our example from above:
x32dbg.exe
tries to loadDNSAPI.DLL
DNSAPI.DLL
is not in the list of known DLLs and is also not loaded into memory.- Since
SafeDllSearchMode
is enabled, it will fall back to the system directory if not found in the application directory
What would happen, if we craft and place a malicious DLL, named DNSAPI.DLL
into the application directory?
We would be able to hijack the search order flow and force a legitimate application to load our malicious code into memory.
Practical Use Case
Letβs take a look at a simple practical example. Our application calls LoadLibraryA
and tries to load dnsapi.dll
like in our example from above. Next we craft a small DLL file, which does nothing else but create a message box in the DLLMain
function. Once the DLL is loaded into memory, the main function will be triggered.
In the first run, we do not place the crafted DLL in the application directory. As expected, Windows will load dnsapi.dll
from the system
directory:

Next, we will now name our crafted DLL dnsapi.dll
and place it in the application directory:

Whoops! I think we can all think of a couple use cases of how APT groups and malware can abuse this technique to achieve persistence on the victimβs system.
Real world examples and APTs
For the sake of keeping it simple and explaining the core principles behind this persistence technique, weβve build a very simple use case here. Of course, the real world looks a little bit different and usually attackers have to take into account:
- Endpoint Security solutions with behaviour based detections, preventing such attacks with signatures
- Programmatic dependencies, which wonβt allow you to just replace a DLL in an application directory and hope that it will work just fine
- and many more
However, if you never heard about this technique, I hope I was able to create some awareness for it!