Normal view

There are new articles available, click to refresh the page.
Before yesterdayMalware

The DLL Search Order And Hijacking It

10 November 2021 at 07:52

If you ever used Process Monitor to track activity of a process, you might have encountered the following pattern:

Figure 1: Example of dnsapi.dll not being found in the application directory

The image above is a snippet from events captured by Process Monitor during the execution of x32dbg.exe on Windows 7. DNSAPI.DLL and IPHLPPAPI.DLL are persisted in the System directory, so you might question yourself:

Why would Windows try to search for either of these DLLs in the application directory first?

Operating Systems are very complex and so is the challenge of implementing an error-fault system to search for dependencies, like dynamic linked libraries. Today, we’ll talk about DLL Search Order and DLL Search Order Hijacking, in particular how it works and how adversaries can abuse it.

DLL Search Order

First, we have to talk about what happens when a PE File is executed on the Windows system.

The majority of native binaries you encounter on Windows are linked dynamically. Linked dynamically means that upon start of the execution, it uses information which are embedded inside the binary to locate DLLs that are essential for this process. In comparison with statically linked binaries, when linked dynamically the executable will use the libraries provided by the OS instead of having them compiled into the executable itself.

Before the dynamically linked executable can use or load these libraries, it will have to know where these dependencies are persisted on disk or if they are already in memory. This is where the DLL Search Order makes its appearance. To keep it simple, we will focus only on Windows Desktop Applications.

Pre-Checks and In-Memory Search

Before the Windows OS starts searching for the needed DLL on disk, it will first attempt to find the needed module in memory. If a DLL is already in memory, it will not loaded it again. Now this part is a little bit complicated and out of context for this blog article, we would have to define what “loaded” even means. If you are more interested in the first check, I advise you to look up the official Microsoft documentation[1].

If the memory check fails, Windows can fall back to using a list of known DLLs. if the needed library is part of that list, it will use the copy of the known DLL. The list of known DLLs are persisted in the Windows Registry.

Figure 2: List of KnownDlls on Windows 7

On-Disk Search

If the first two checks fail, the OS will have to search for the DLL on disk. Depending on the OS Settings, Windows will use a different search order. Per default, Windows enables the DLL Search Mode feature to harden the system and prevent DLL Search Order Hijacking attacks, a technique we will explain in the upcoming section.

The key to the feature is as follows:

  • HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SafeDllSearchMode

Let’s take a look at the differences of the search order depending whether SafeDllSearchMode is enabled or not.

Figure 3: DLL Search Order flow

We clearly see that the current directory is prioritised if SafeDllSearchMode is disabled and this can be abused by adversaries. The art of abusing this search order flow is called DLL Search Order Hijacking.

DLL Search Order Hijacking

Adversaries can abuse the search order flow displayed above to load their own malicious DLLs instead of the legitimate ones into memory. There are many ways this technique can be used. However, it is more effective in achieving persistence on the target system then initial execution.

Let’s take a step back and revisit our example from above:

  • x32dbg.exe tries to load DNSAPI.DLL
  • DNSAPI.DLL is not in the list of known DLLs and is also not loaded into memory.
  • Since SafeDllSearchMode is enabled, it will fall back to the system directory if not found in the application directory

What would happen, if we craft and place a malicious DLL, named DNSAPI.DLL into the application directory?

We would be able to hijack the search order flow and force a legitimate application to load our malicious code into memory.

Practical Use Case

Let’s take a look at a simple practical example. Our application calls LoadLibraryA and tries to load dnsapi.dll like in our example from above. Next we craft a small DLL file, which does nothing else but create a message box in the DLLMain function. Once the DLL is loaded into memory, the main function will be triggered.

In the first run, we do not place the crafted DLL in the application directory. As expected, Windows will load dnsapi.dll from the system directory:

Next, we will now name our crafted DLL dnsapi.dll and place it in the application directory:

Whoops! I think we can all think of a couple use cases of how APT groups and malware can abuse this technique to achieve persistence on the victim’s system.

Real world examples and APTs

For the sake of keeping it simple and explaining the core principles behind this persistence technique, we’ve build a very simple use case here. Of course, the real world looks a little bit different and usually attackers have to take into account:

  • Endpoint Security solutions with behaviour based detections, preventing such attacks with signatures
  • Programmatic dependencies, which won’t allow you to just replace a DLL in an application directory and hope that it will work just fine
  • and many more

However, if you never heard about this technique, I hope I was able to create some awareness for it!

PEB: Where Magic Is Stored

22 August 2021 at 15:49

As a reverse engineer, every now and then you encounter a situation where you dive deeper into the internal structures of an operating system as usual. Be it out of simple curiosity, or because you need to understand how a binary uses specific parts of the operating system in certain ways . One of the more interesting structures in Windows is the Process Environment Block/PEB. In this article, I’d like to introduce you to this structure and talk about various use cases of how adversaries can abuse this structure for their own purposes.

Introducing PEB

The Process Environment Block is a critical structure in the Windows OS, most of its fields are not intended to be used by other than the operating system. It contains data structures that apply across a whole process and is stored in user-mode memory, which makes it accessible for the corresponding process. The structure contains valuable information about the running process, including:

  • whether the process is being debugged or not
  • which modules are loaded into memory
  • the command line used to invoke the process

All these information gives adversaries a number of possibilities to abuse it. The figure below shows the layout of the PEB structure:

typedef struct _PEB {
  BYTE                          Reserved1[2];
  BYTE                          BeingDebugged;
  BYTE                          Reserved2[1];
  PVOID                         Reserved3[2];
  PPEB_LDR_DATA                 Ldr;
  PRTL_USER_PROCESS_PARAMETERS  ProcessParameters;
  PVOID                         Reserved4[3];
  PVOID                         AtlThunkSListPtr;
  PVOID                         Reserved5;
  ULONG                         Reserved6;
  PVOID                         Reserved7;
  ULONG                         Reserved8;
  ULONG                         AtlThunkSListPtr32;
  PVOID                         Reserved9[45];
  BYTE                          Reserved10[96];
  PPS_POST_PROCESS_INIT_ROUTINE PostProcessInitRoutine;
  BYTE                          Reserved11[128];
  PVOID                         Reserved12[1];
  ULONG                         SessionId;
} PEB, *PPEB;

Now that we’ve talked a little bit about the layout and purpose of the structure, let’s take a look at a few use cases.

Reading the BeingDebugged flag

The most obvious way is to check the BeingDebugged to identify, whether a debugger is attached to the process or not. Through reading the variable directly from memory instead of using usual suspects like NtQueryInformationProcess or IsDebuggerPresent, malware can prevent noisy WINAPI calls. This makes it harder to spot this technique.

However, most debuggers already take care of this. X64dbg for example, has an option to hide the Debugger by modifying the PEB structure at start of the debugging session.

Iterating through loaded modules

Another use case, could be iterating the loaded modules and discover DLLs injected into memory with purpose to overwatch the running process. To understand how to achieve this, we need to take a look at the PPEB_LDR_DATA structure included in PEB, which is provided by the Ldr variable:

typedef struct _PEB_LDR_DATA {
  BYTE       Reserved1[8];
  PVOID      Reserved2[3];
  LIST_ENTRY InMemoryOrderModuleList;
} PEB_LDR_DATA, *PPEB_LDR_DATA;

PPEB_LDR_DATA contains the head to a doubly linked list named InMemoryOrderModuleList. Each item in this list is a structure from type LDR_DATA_TABLE_ENTRY, which contains all the information we need to iterate loaded modules. See the structure of LDR_DATA_TABLE_ENTRY below:

typedef struct _LDR_DATA_TABLE_ENTRY {
    PVOID Reserved1[2];
    LIST_ENTRY InMemoryOrderLinks;
    PVOID Reserved2[2];
    PVOID DllBase;
    PVOID EntryPoint;
    PVOID Reserved3;
    UNICODE_STRING FullDllName;
    BYTE Reserved4[8];
    PVOID Reserved5[3];
    union {
        ULONG CheckSum;
        PVOID Reserved6;
    };
    ULONG TimeDateStamp;
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;

So by iterating the doubly linked list, we are able to discover the base address and full name of all modules loaded into memory of the running process. The snippet below is a small Proof of Concept. It iterates the linked list and prints the library name to stdout. I created it for the purpose of this blog article. You are free to use it, however I will also upload it to my github repo the upcoming days:

#include <Windows.h>
#include <iostream>
#include <shlwapi.h>


#define NO_STDIO_REDIRECT

typedef struct _UNICODE_STRING
{
    USHORT Length;
    USHORT MaximumLength;
    PWSTR Buffer;
} UNICODE_STRING, * PUNICODE_STRING;


typedef struct _LDR_DATA_TABLE_ENTRY_MOD {
    LIST_ENTRY InMemoryOrderLinks;
    PVOID Reserved2[2];
    PVOID DllBase;
    PVOID EntryPoint;
    PVOID Reserved3;
    UNICODE_STRING FullDllName;
    BYTE Reserved4[8];
    PVOID Reserved5[3];
    union {
        ULONG CheckSum;
        PVOID Reserved6;
    };
    ULONG TimeDateStamp;
} LDR_DATA_TABLE_ENTRY_MOD, * PLDR_DATA_TABLE_ENTRY_MOD_MOD;




int main(int argc, char** argv[]){

 
    PLDR_DATA_TABLE_ENTRY_MOD_MOD lib = NULL;
    _asm {
        xor eax, eax
        mov eax, fs:[0x30]
        mov eax, [eax + 0xC]
        mov eax, [eax + 0x14]
        mov lib, eax
    };
    printf("[+] Initialised pointer to first LDR_DATA_TABLE_ENTRY_MOD\n");
    

    // Loop as long as we don't reach the head of the linked list again
    while ( lib->FullDllName.Buffer != NULL ) {

        printf("[+] %S\n", lib->FullDllName.Buffer);
        lib = (PLDR_DATA_TABLE_ENTRY_MOD_MOD)lib->InMemoryOrderLinks.Flink;
    }
    
    printf("[+] Done!\n");



	return 0;

If you are wondering how I am able to access the PEB in the code below, you should take a look at the inline assembly in the main method, especially the instruction mov eax, fs:[0x30]. FS is a segment register, similar to GS. FS can be used to access thread-specific memory. Offset 0x30 allows you to access the linear address of the Process Environment Block.

Finally, we want to take a look at a real world example of how PEB can be abused.

How the MATA Framework abuses PEB

This use case was introduced to me while reverse engineering a Windows variant of the MATA Framework. According to Kaspersky[1], the MATA Framework is used by the Lazarus group and targets multiple platforms.

Malware authors have a high interest in obfuscation, because it increases the time needed to reverse engineer it. One way to hide API calls is to use API Hashing. I have written about Danabot’s API Hashing[2] before and how to overcome it. MATA also uses this technique.

However instead of using the WIN API calls to retrieve the address of DLLs loaded into memory, MATA abuses the Process Environment Block to fetch base addresses. Let’s take a look at how MATA for Windows achieves this:

MATA API Hashing

The input of the APIHashing method takes an integer as the only parameter, this is the hash for the corresponding API call.

Figure 1: Call to APIHash method

Right after the prologue, it retrieves a pointer to PEB by reading it from the Thread Environment Block via the segment register GS. Similar to our proof of concept above, MATA now fetches the address to the head of the linked list provided by InMemoryOrderModuleList. Each item of the linked list provides the DLL base address of the corresponding loaded module.

From there, the malware reads the e_lfanew field, which contains the offset to the file header. By adding the base address, e_lfsanew and 0x88 it jumps directly to the data directories of the corresponding PE. From the data directories, MATA accesses the exported function names in a similar way as I’ve described in my blog article about DanaBot’s API Hashing[3]. The hashing algorithm is fairly simple. Each integer representation of a character is added and the result of the addition is ROR'd by 0xD consecutively each iteration. If the final hash matches the input parameter, the address to the function is retrieved. The following figure explains the function at a high level:

High level overview of API Hashing of MATA malware

Learning from each other

That’s it with the blog article, I hope you enjoyed it! There are probably way more use cases and real world cases of how the PEB is and and can be abused. If you can think of another one, feel free to leave a comment below and share it, so that we can learn from each other!

Catching Debuggers with Section Hashing

24 January 2021 at 10:41

As a Reverse Engineer, you will always have to deal with various anti analysis measures. The amount of possibilities to hamper our work is endless. Not only you will have to deal with code obfuscation to hinder your static analysis, but also tricks to prevent you from debugging the software you want to dig deeper into. I want to present you Section Hashing today.

I will begin by explaining how software breakpoints work internally and then give you an example of a Section Hashing implementation.

Debuggers – How software breakpoints work

When you set a breakpoint in your favourite debugger at a specific instruction, the debugger software will replace it temporarily with another instruction, which causes a fault or an interrupt. On x86, this is very often the INT 3 instruction, which is the opcode 0xCC. We can examine how this looks like in RAM.

We open x32dbg.exe and debug a 32 bit PE and set a breakpoint near the entry point.

Disassembly view of debugged program

When setting a breakpoint, you will see the original instruction instead of the patched one in the debugger. However, we can examine the same memory page in RAM with ProcessHacker.

Code section in RAM during debug session

In volatile memory, the byte 33 changed to CC, which will cause the program to halt when reached. This software interrupt will then be handled by the debugger and the code will be replaced again.

Catching Breakpoints with Section Hashing

After explaining how software breakpoints work, I’ll get to the real topic of this article now. We will move to the Linux world now for this example.

A software breakpoint is actually nothing else than a code modification of the executable memory section in RAM. Once a breakpoint is set, the .text section will be modified. A very known technique to catch such breakpoints in RAM is called Section Hashing.

Authors can embed the hash of the .text section in the binary. Upon execution, they use the same algorithm to generate a new hash from the .text section. If a software breakpoint is set, the hash will differ from the embedded hash. An example implementation can look like this:

Example implementation of Section Hashing

In this case, a hash of the .text section is generated. Afterwards it is used to influence the generation of the flag. If a software breakpoint is set during execution, a wrong hash will be generated.

This is a simple example of Section Hashing. In combination with code obfuscation and other anti analysis measurements, it can be very hard to spot this technique. It is also occasionally used by commercial packers.

Defeating Section Hashing

There are multiple ways to defeat this technique, some of them could be:

  • Patching instructions
  • Using hardware breakpoints

Instead of modifying the code in Random Access Memory, in x86 hardware breakpoints use dedicated registers to halt the execution. Hardware Breakpoints are still detectable.

In Windows, the program can fetch the CONTEXT via GetThreadContext to see if the debugging registers are used. A great example on how this is implemented can be found here[1]. If you are interested in trying to defeat it by yourself, you can try to beat the Section Hashing technique by yourself at root-me.org[2].

Taming Virtual Machine Based Code Protection – 2

26 November 2020 at 14:02

In the last episode …

As you’ve probably guessed it, this is the second part of my journey to reverse engineer a virtual machine protected binary. If you haven’t read the first part[1], I encourage you to do so, because I will not repeat everything again here. While the first part dealt with explaining the virtual environment and giving an initial first look into the virtual machine’s custom instruction set, I will focus on disassembling the virtual machine code completely this time.

I might repeat some steps from the first part again, mostly because I felt that it was necessary to do so :-).

Into the battle

We already explained the environmental setup in the previous blog post and also identified the main loop, which is responsible for instruction execution.

Figure 1: Main loop responsible for instruction execution

Each iteration, an instruction is parsed and the final CALL in the left branch of figure 1 executes the instruction.

Critical functions

I covered the instruction parsing process in my last blog article a little bit. But since we are going to build a disassembler, I will explain the most important routines once again.

0x4013DF / ParseInstruction

This function is called each iteration in the loop from figure 1 and is responsible for parsing the byte codes.

Figure 2: ParseInstruction overview

Each loop, the Virtual Instruction Pointer/VIP is retrieved, pointing at the instruction to execute. Each instruction is parsed. This function is fully responsible for transforming the bytes into a further processable format. Let’s take a look at how the first three instructions are parsed:

Figure 3: Parsing instructions

If you are interested in understanding this format fully, I recommend you to jump to the disassembler code[2]. I will only cover the first instruction here.

So how do we get from 03 15 03 00 04 to the parsed format ?

The first byte is always the instruction id. 03 is the id for the PUSH instruction. The second byte is divided into its upper 6 bits and lower 2 bits, representing the instruction size and number of operands used for this instruction. The next bytes are used to represent a single operand. In the example above, the first operand config 00 03 00 00, is the configuration for USE 32 BIT OF REGISTER, SPECIFIED BY THE NEXT DWORD 04 00 00 00. The next DWORD is 04 00 00 00, which is the fourth virtual register. Now what is the fourth register here ? Let’s take a quick look at the instructions.

PUSH VR4
MOV VR4, VR7
SUB VR7, 0xB4

This looks very similar to the usual function prologue ;-). So the fourth register must be EBP!.

PUSH EBP
MOV EBP, ESP
SUB ESP, 0xB4

0x401271 / GetOpval & 0x401322 / StoreOpval

I will not cover these two functions in depth here. If you take a look at figure 3 again, you will see that I mention the operand configs. These functions are responsible for filling the operands according to these configs.

In the example above, the SUB VR7, 0xB4 instruction uses 00030000 07000000 for the first operand and 00020000 B4000000 for the second config. If you reverse engineer every single option, you will find out that the following configurations exist:

# First DWORD CONFIG
00000000 ==> LOWEST BYTE OF REG X # f.e AX
00010000 ==> SECOND LOWEST BYTE OF REG X # f.e. AH 
00020000 ==> LOWER 16 BIT OF REG X # f.e. AX
00030000 ==> 32 BIT OF REGX # f.e. EAX
01000000 ==> BYTE AT LOC
01010000 ==> BYTE AT LOC
01020000 ==> WORD AT LOC
01030000 ==> DWORD AT LOC
02000000 == BYTE FROM IMM.
02010000 ==> BYTE FROM IMM.
02020000 ==> WORD FROM IMM.
02030000 ==> DWORD FROM IMM.
# Second DWORD CONFIG, if register
00000000 ==> EAX
01000000 ==> EBX
02000000 ==> ECX
03000000 ==> EDX
04000000 ==> EBP
05000000 ==> ESI
06000000 ==> EDI
07000000 ==> ESP

Eternal Debugging

Now we can use the gained knowledge to gain an initial understanding of what is happening and to verify whether we are able to decode instructions manually.

Figure 4: Manually disassembled bytecode

If you take a look at the last instructions, you will see that there are some constants pushed into memory. If you google these constants, you will come to the conclusion that this must be the MD5 Init routine[3]. The next step is to build a disassembler.

Disassembling the code

I wrote this one in C++ and you can find the source code to it on my github page[4]. Writing this on Python would have been possible too … and probably a lot easier and faster, I chose C++ though for learning purposes. If my C++ is awful, forgive me. We all start somewhere ;-).

Figure 5: Output of decoded virtual machine bytes

Our disassembler does have some limitations though. The disassembly was complex and I believe that some memory address offsets and register sizes are wrong. Also, I did not reverse engineer all instructions. However though, that should not be a problem, because we only need to understand what is happening here on a higher level.

Identifying the algorithm

We already spotted the variables, which we also found in the MD5.c source code(f.e. 0x2381bc0). However, the actual hashing algorithm does not match the original one. Therefore it seems to be some kind of a modified version of it. Furthermore we spot a routine, which seems to be the XTEA algorithm[5].

Figure 6: Identified XTEA algorithm

Final words

So that’s basically it. I don’t know when and if I will a third part covering the serial key generator. When I started this challenge, I was only interested in learning how to disassemble custom instruction sets.

If you are interested in how others solved this challenge, I recommend you to read the tutorials from wagonono and kernelj, they both completely solved this challenge[6]. Wagonono also created a disassembler and his version is better than mine.

DGAs – Generating domains dynamically

5 November 2020 at 09:49

A domain generation algorithm is a routine/program that generates a domain dynamically. Think of the following example:

An actor registers the domain evil.com. The corresponding backdoor has this domain hardcoded into its code. Once the attacker infects a target with this malware, it will start contacting its C2 server.

As soon as a security company obtains the malware, it might blacklist the registered domain evil.com. This will hinder any attempts of the malware to receive commands from the original C2.

If a domain generation algorithm would have been used, the domain will be generated based on a seed. The current date for example is a popular seed amongst malware authors. A simple domain blacklisting would not solve the problem. The security company will have to resort to different methods.

By generating domains dynamically, it is harder for defenders to hinder the malware from contacting its C2 server. It will be necessary to understand the algorithm.

Example implementation of a DGA

A quick & dirty implementation(loosely based on Wikipedia)[1] of such algorithm could look like this:

"""Example implementation of a domain generation algorithm."""

import sys
import time
import random


def gen_domain(month, day, hour, minute):
    """Generate the domain based on time. Return domain"""
    print(
        f"[+] Gen domain based on month={month} day={day} hour={hour} min={minute}")
    domain = ""
    for i in range(8):
        month = (((month * 8) ^ 0xF))
        day = (((day * 8) ^ 0xF))
        hour = (((hour * 8) ^ 0xF))
        minute = (((minute * 8) ^ 0xF))
        domain += chr(((month * day * hour * minute) % 25) + 0x61)
    return domain


try:
    while True:
        d = gen_domain(random.randint(1, 12), random.randint(1, 30),
                       random.randint(0, 24), random.randint(0, 60))
        print(f"[+] Generated domain = {d}")
        time.sleep(5)
except KeyboardInterrupt:
    sys.exit()

Our DGA algorithm would use the current date and time as a seed. Each parameter is multiplied with 8 and XOR’d with 0xF. Finally all four values are multiplied with each other. The final operations are used to make sure that we generate a character in small caps. The output of this program looks like this:

[+] Gen domain based on month=12 day=2 hour=4 min=4
[+] Generated domain = taavtaab.com
[+] Gen domain based on month=3 day=10 hour=11 min=36
[+] Generated domain = kugxfkvx.com
[+] Gen domain based on month=2 day=27 hour=4 min=1
[+] Generated domain = kaasuapn.com

Seed or Dictionary based

There are different main approaches when implementing a domain generation algorithm. For the sake of keeping this simple, we will not focus on the hybrid approach.

Different kinds of approaches

Seed based Approach

We already introduced the first one. Our implementation is an algorithm based on a seed, which is served as an input. Another example I can provide, is how APT34 used such seed based algorithm in a campaign targeting a government organisation in the Middle East. The campaign was discovered by FireEye[2].

The mentioned APT group used domain generation algorithms in one of their downloaders. The Downloader was named BONDUPDATER by FireEye and is implemented in the Powershell Scripting Language.

BONDLOADER DGA algorithm

The first 12 chars of the UUID is extracted. Next the program runs into a loop. Each iteration a new random number is generated and the domain is generated by concatenating hardcoded, as well as generated values. GetHostAddresses will try to resolve the generated domain. If it fails, a new iteration starts. Once a registered domain is generated and resolved, it will break the loop.

Depending on the resolved ip address, the script will trigger different actions.

Dictionary based Approach

The second approach is to create a dictionary based domain generation algorithm. Instead of focusing on a seed, a list of words could be provided. The algorithm randomly selects words from these lists, concatenates them and generates a new domain. Suppobox[3] is a malware, which implemented the dictionary based approach[4].

Defeating Domain Generation Algorithms

The straight forward way to counter these algorithms is to reverse engineer the routine and to predict future domains. One famous case of predicting future domains is the takedown of the Necurs Botnet by Microsoft[5]. By understanding the DGA, they were able to predict the domains for the next 25 months.

I am not a ML magician. However, just a quick google research shows that there is a lot research going on. Machine Learning based approaches to counter DGAs seems to be promising too.

Linux/Windows Internals – Process structures

9 August 2020 at 13:43

Having an overview of the running processes on the operating system is something we usually take for granted. We can’t think of working without fundamental features like that.

But how does the kernel keep track of the processes, which are currently running ? Today, we take a look at the corresponding structures of the Windows and the Linux system, which are responsible for holding track of the running processes.

Linux – Task structures

If you ever used Linux before, you are probably familiar with the ps command, which allows you to print the list of all processes currently running on the system. We will dive into how the Linux kernel keeps track of these processes internally.

The kernel stores a list of processes in a doubly linked list, called the task list. Each node in this list is a process descriptor of the type task_struct. The definition of this task struct can be found in linux/sched.h[1] of Linus Torvald’s git repository.

Some struct members of task_struct

If you checked out the code, you will realise that this structure is pretty extensive and we will not dive into every member of this structure. Our focus lies on understanding how the kernel handles this task list. As I’ve already explained, the kernel keeps track of all processes by a doubly linked list. Each task structure holds a member tasks of type list_head.

struct list_head {
    struct list_head *next, *prev;
};

As you’ve probably already guessed, the next pointer holds a reference, which allows us to retrieve the next task_struct and the prev field allows us to take a step back. We can write a simple to linux kernel module to iterate through the task list and print out all process names and process ids on the current system:

Iterating through the linked list

Task structures lie in kernel space, so accessing these is not possible without writing a kernel module. The code is pretty straight forward. We just use the init_task as an initial entry point, which is the idle task running on the linux system. Iterating through the linked list is possible via the next_task macro. Then we use the printk function to log the comm(process executable) member and the process id.

#include <linux/sched/task.h> 
#include <linux/sched/signal.h>
#include <linux/module.h>    
#include <linux/kernel.h>    
#include <linux/init.h>      

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Andreas Klopsch");
MODULE_DESCRIPTION("Simple module for printing task structure members");
MODULE_VERSION("0.1");
  
// get the top element in the task doubly linked list
extern struct task_struct init_task;


static int __init action_init(void){
	struct task_struct task;
	printk(KERN_INFO "Init task = %s", init_task.comm);
	printk(KERN_INFO "Getting next task");
	task = *(next_task(&init_task)); // deference pointer for convencience reasons
	while(task.pid != init_task.pid) {
		printk(KERN_INFO "Comm = %s pid = %d", task.comm, task.pid);
		task = *(next_task(&task)); // dereference again, use macro to not iterate through list_head
	}
	return 0;
 }
 

static void __exit action_exit(void){
	printk(KERN_INFO "Stopping task iterator");
}
 
module_init(action_init);
module_exit(action_exit);

dmesg output

Windows – EPROCESS

On Windows, there are similarities with Linux. Each process on Windows is represented by an EPROCESS structure, which is actually the representation of a process object. The EPROCESS structure also contains a KPROCESS structure, which holds information for the kernel.

As with Linux, this block contains various information relating to the corresponding process, like:

  • Virtual Address Descriptors, holding the map of the process virtual memory
  • Process ID
  • Image base name

Another similarity with the Linux system, is the way the processes are linked with each other. EPROCESS structures are connected to each other via a doubly linked list, called ActiveProcessLinks. The next process in the list is referenced by FLink and the previous process object is referenced by the BLink pointer. One way of how this could be implemented, is iterating through the ActiveProcessLinks structure again.

References

  • Windows Internals, Part 1: System Architecture, Processes, Threads, Memory Management, and More
  • Mastering Malware Analysis: The complete malware analyst’s guide to combating malicious software, APT, cybercrime, and IoT attacks 

Deobfuscating DanaBot’s API Hashing

12 July 2020 at 13:27

You probably already guessed it from the title’s name, API Hashing is used to obfuscate a binary in order to hide API names from static analysis tools, hindering a reverse engineer to understand the malware’s functionality.
A first approach to get an idea of an executable’s functionalities is to more or less dive through the functions and look out for API calls. If, for example a CreateFileW function is called in a specific subroutine, it probably means that cross references or the routine itself implement some file handling functionalities. This won’t be possible if API Hashing is used.

Instead of calling the function directly, each API call has a corresponding checksum/hash. A hardcoded hash value might be retrieved and for each library function a checksum is computed. If the computed value matches the hash value we compare it against, we found our target.

API Hashing used by DanaBot

In this case a reverse engineer needs to choose a different path to analyse the binary or deobfuscate it. This blog article will cover how the DanaBot banking trojan implements API Hashing and possibly the easiest way on how this can be defeated. The SHA256of the binary I am dissecting here is added at the end of this blog post.

Deep diving into DanaBot

DanaBot itself is a banking trojan and has been around since atleast 2018 and was first discovered by ESET[1]. It is worth mentioning that it implements most of its functionalities in plugins, which are downloaded from the C2 server. I will focus on deobfuscating API Hashing in the first stage of DanaBot, a DLL which is dropped and persisted on the system, used to download further plugins.

Reversing the ResolvFuncHash routine

At the beginning of the function, the EAX register stores a pointer to the DOS header of the Dynamic Linked Library which, contains the function the binary wants to call. The corresponding hash of the yet unknown API function is stored in the EDX register. The routine also contains a pile of junk instructions, obfuscating the actual use case for this function.

The hash is computed solely from the function name, so the first step is to get a pointer to all function names of the target library. Each DLL contains a table with all exported functions, which are loaded into memory. This Export Directory is always the first entry in the Data Directory array. The PE file format and its headers contain enough information to reach this mentioned directory by parsing header structures:

Cycling through the PE headers to obtain the ExportDirectory and AddressOfNames

In the picture below, you can see an example of the mentioned junk instructions, as well as the critical block, which compares the computed hash with the checksum of the function we want to call. The routine iterates through all function names in the Export Directory and calculates the hash.
The loop breaks once the computed hash matches the value that is stored in the EDX register since the beginning of this routine.

Graph overview of obfuscated API Hashing function

Reversing the hashing algorithm

The hashing algorithm is fairly simple and nothing too complicated. Junk instructions and opaque predicates complicate the process of reversing this routine.

The algorithm takes the nth and the stringLength-n-1th char of the function name and stores them, as well as capitalised versions into memory, resulting in a total of 4 characters. Each one of those characters is XOR'd with the string length. Finally they are multiplied and the values ​​are added up each time the loop is run and result in the hash value.

def get_hash(funcname):
    """Calculate the hash value for function name. Return hash value as integer"""
    strlen = len(funcname)
    # if the length is even, we encounter a different behaviour
    i = 0
    hashv = 0x0
    while i < strlen:
        if i == (strlen - 1):
            ch1 = funcname[0]
        else:
            ch1 = funcname[strlen - 2 - i]
        # init first character and capitalize it
        ch = funcname[i]
        uc_ch = ch.capitalize()
        # Capitalize the second character
        uc_ch1 = ch1.capitalize()
        # Calculate all XOR values
        xor_ch = ord(ch) ^ strlen
        xor_uc_ch = ord(uc_ch) ^ strlen
        xor_ch1 = ord(ch1) ^ strlen
        xor_uc_ch1 = ord(uc_ch1) ^ strlen
        # do the multiplication and XOR again with upper case character1
        hashv += ((xor_ch * xor_ch1) * xor_uc_ch)
        hashv = hashv ^ xor_uc_ch1
        i += 1
    return hashv

A python script for calculating the hash for a given function name is also uploaded on my github page[2] and free for everyone to use. I’ve also uploaded a text file with hashes for exported functions of commonly used DLLs.

Deobfuscation by Commenting

So now that we cracked the algorithm, we want to update our disassembly to know which hash value represents which function. As I’ve already mentioned, we want to focus on simplicity. The easiest way is to compute hash values for exported functions of commonly used DLLs and write them into a file.

Generated hashes

With this file, we can write an IdaPython script to comment the library function name next to the Api Hashing call. Luckily the Api Hashing function is always called with the same pattern:

  • Move the wanted hash value into the EDX register
  • Move a DWORD into EAX register

First we retrieve all XRefs of the Api Hashing function. Each XRef will contain an address where the Api Hashing function is called at, which means that in atleast the 5 previous instructions, we will find the mentioned pattern. So we will fetch the previous instruction until we extract the wanted hash value, which is being pushed into EDX. Finally we can use this immediate to extract the corresponding api function from the hash values we have generated before and comment the function name next to the Xref address.

def add_comment(addr, hashv, api_table):
    """Write a comment at addr with the matching api function.Return True if a corresponding api hash was found."""
    # remove the "h" at the end of the string
    hashv = hex(int(hashv[:-1], 16))
    keys = api_table.keys()
    if hashv in keys:
        apifunc = api_table[hashv]
        print "Found ApiFunction = %s. Adding comment." % (apifunc,)
        idc.MakeComm(addr, apifunc)
        comment_added = True
    else:
        print "Api function for hash = %s not found" % (hashv,)
        comment_added = False
    return comment_added


def main():
    """Main"""
    f = open(
        "C:\\Users\\luffy\\Desktop\\Danabot\\05-07-2020\\Utils\\danabot_hash_table.txt", "r")
    lines = f.readlines()
    f.close()
    api_table = get_api_table(lines)
    i = 0
    ii = 0
    for xref in idautils.XrefsTo(0x2f2858):
        i += 1
        currentaddr = xref.frm
        addr_minus = currentaddr - 0x10
        while currentaddr >= addr_minus:
            currentaddr = PrevHead(currentaddr)
            is_mov = GetMnem(currentaddr) == "mov"
            if is_mov:
                dst_is_edx = GetOpnd(currentaddr, 0) == "edx"
                # needs to be edx register to match pattern
                if dst_is_edx:
                    src = GetOpnd(currentaddr, 1)
                    # immediate always ends with 'h' in IDA
                    if src.endswith("h"):
                        add_comment(xref.frm, src, api_table)
                        ii += 1
    print "Total xrefs found %d" % (i,)
    print "Total api hash functions deobfuscated %d" % (ii,)


if __name__ == '__main__':
    main()

Conclusion

As reverse engineers, we will probably continue to encounter Api Hashing in various different ways. I hope I was able to show you some quick & dirty method or give you at least some fundament on how to beat this obfuscation technique. I also hope that, the next time a blue team fellow has to analyse DanaBot, this article might become handy to him and saves him some time reverse engineering this banking trojan.

IoCs

  • Dropper = e444e98ee06dc0e26cae8aa57a0cddab7b050db22d3002bd2b0da47d4fd5d78c
  • DLL = cde01a2eeb558545c57d5c71c75e9a3b70d71ea6bbeda790a0b871fcb1b76f49

UpnP – Messing up Security since years

21 June 2020 at 08:01

UpnP is a set of networking protocols to permit network devices to discover each other’s presence on a network and establish services for various functionalities.
Too lazy to port forward yourself ? Just enable UpnP to automatically establish working configurations with devices! Dynamic device configuration like this makes our life more comfortable for sure. Sadly it also comes with many security issues.

In this blog article I am focusing on mentioning the stages of the UpnP protocol, a quick introduction to security issues regarding UpnP and how QBot abuses the UpnP protocol to exploit devices as proxy C2 servers.

UpnP in a nutshell

UpnP takes usage of common networking protocols and stacks HTTP, SOAP and XML on top of the IP protocol in order to provide a variety of functionalities for users. Without going to deep into how UpnP works in detail, the following figure is enough for the basics.

Quick explanation of existing stages in UpnP protocol

Some services a node with UpnP enabled can offer (it really depends on the device):

  • Port forwarding
  • Switching power on and off for light bulbs
  • etc.

This is very high level of course. If you are interested in everything about UpnP, I recommend you to check out Wikipedia[1] for a high level introduction or read this report that goes more into detail[2].

For the following content of this blog article, only the first three stages are really relevant.

IoT Security and UpnP

Misconfiguration

Again, while it might be very convenient for customers to have devices autoconfigure themselves, it leads to huge security risks.

Many routers have UpnP enabled by default. Think of misconfigured IoT devices that sends a command to port forward a specific port, leading to a port exposure to the internet.

It is known that many IoT devices contain awful security flaws like default credentials for telnet. If devices like this have such misconfigurations and expose its telnet port to the outside, it probably takes about 5 minutes till some script kiddie adds this device to its botnet.

Exploitation

A blog post from TrendMicro[3] previously mentioned that many devices still use very old UpnP libraries which are not up to date to current security standards. This creates a larger attack surface for attackers. The newest one being CallStranger.

source : https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-12695

It is caused by the Callback header value in the UpnP SUBSCRIBE function. This field can be controlled by an attacker and enabled a Server Side Request Forgery like vulnerability. It can be used for the following malicious cases:

  • Exfilitrate data
  • Scan networks
  • Force nodes to participate in DDoS attacks

I recommend you to visit the official domain[4] of this vulnerability, if you want gain more knowledge about this vulnerability.

UpnP abused by QBot

Security risks created by UpnP are not limited to the IoT landscape of course.

Another method to use UpnP for malicious cases is to install Proxy C2 servers on devices which have the mentioned protocol enabled, like QBot does for example. Let’s take a look at how this is done.

Diving into QBot’s UpnP proxy module

This technique was first discovered by McAfee[4] in 2017. First QBot starts scanning for devices which have UpnP enabled and is one of the following device types:

  • urn:schemas-upnp-org:device:InternetGatewayDevice:1
  • urn:schemas-upnp-org:service:WANIPConnection:1
  • urn:schemas-upnp-org:service:WANPPPConnection:1
  • upnp:rootdevice
Disassembly of strcmp calls to check for device type

If you are using INETSIM for malware analysis, you will probably realise that it does not offer any functionality to fake a SSDP or UpnP service in any way. However, we can use this python script[5] by user GrahamCobb which emulates a fake SSDP service and adjust the device description to suit our needs.

Once the devices are discovered, it sends requests for device descriptions and checks whether it deals with an internet gateway device. This can be determined by looking at the device description itself.

Capture SSDP traffic, showing the MSEARCH request and retrieval of the device description

If it is an internet gateway device, it confirms whether a connection exists by sending a GetStatusInfo followed by retrieving the external ip address of this device by sending the GetExternalIPAddress command.

Next it tries to use the AddPortMapping command to add port forwarding rules to the device.

Port forwarding command sent to fake SSDP service

Afterwards all rules are removed again and the ports which were successfully port forwarded are sent as a HTTP-POST to the C2 server.
The carrier protocol is HTTPS and the response is sent in the following form:

# destination address
https://[HARDCODED_IP]:[HARDCODED_PORT]/bot_serv

# POST DATA form, successful port forwarded ports are appended to ports
cmd=1&msg=%s&ports=

From this point on, my analysis stopped for now. However, McAfee explains that a new binary is downloaded from the contacted C2 server, which re-adds the port forwarding rules and is responsible for the C2 communication. The blog article I’ve referenced above explains the whole functionality, so I recommend you to take a look at it, if you are interested in the next steps.

Final Words

As you can see UpnP contains many security flaws and can lead to a compromised network. If you have UpnP enabled in your company’s network, I really recommend to check whether this is really needed and turn it off if it is not necessary.

So exams at university are coming up next, it will probably take some time until I can get my hands on the QBot C2 protocol or the proxy binary. I do however, want to look at these two functionalities next.

Taming Virtual Machine Based Code Protection – 1

7 June 2020 at 09:09

Overcoming obfuscation in binaries has always been an interesting topic for me, especially in combination with malware. Over the last weeks I’ve been playing around with Virtualised Code Protection in order to see how well I could handle it.

I decided to download a simple crack-me challenge which is obfuscated with this technique. It takes me some time to reverse everything, so there will be atleast 2 blog articles about my little project.

Challenge from crackmes.de

Virtualised Code Protection

Each architecture has a defined instruction set. By looking up the instructions to the corresponding bytes, we are able to translate these bytes into disassembly. The unit that actually executes these bytes is the CPU.

Virtual machine based code protection emulates a processor and thus switches our usual instruction set against a custom one. So in order to really understand what a virtual machine hardened binary is doing on a low level basis, we need to reverse the virtual machine first. This means we have to understand the custom instruction set.

I want to show you a practical example of how such a custom instruction can look like and be discovered.

Practical Example

Preparing the virtual machine

The challenge demands a serial key and a username. Both of them need certain values for the serial key to be valid. After entering a username and a serial key, the length of both of them are checked first.

Next At the bottom of this routine, we can already spot 2 interesting functions and operations which push the success or failure message onto the stack.

Preparing the virtual machine and jumping to the serial key check

The function InitialiseVM is where it gets interesting for us. If you just look quickly through the disassembly in the figure below, you will see that there are multiple buffers allocated and static values written into an internal structure. Furthermore it is filled with function pointers. Each one of those functions represents a custom instruction. This routine is used to allocate the virtual address space our virtual machine will use for emulation, as well as a table to select custom instructions from.

InitialiseVM function

Next is the CheckSerial function, which implements the virtual machine loop that emulates the virtual processor unit.

Virtual machine loop at the bottom

In the block at loc_4015E5 the function sub_4013DF is executed each iteration. Afterwards the byte which the address in ESI+0x7C points to is used to calculate the dynamic call at the end of the current block we are talking about (call dword ptr [esi+eax*4+80h]). That means that the byte influencing which function to enter, is deciding which custom instruction to execute. Before we look at how some of the opcodes are actually parsed here, let’s review how the virtualised address space of this VM looks like.

Overview of the current vm address space

Executing custom instructions

The function sub_4013DF is called each iteration and reads bytes from the buffer which contains opcodes for custom instructions. The first one has a size of 5 bytes. Each of them is used by the virtual machine for translating these opcodes into a valid operation. At the moment of writing this article, I did not fully explore this function yet. However, I am confident that the last 2 bytes of an instruction are used to influence registers.

Upon returning from this function, the program takes the first byte of the ESI+0x7C structure and uses it to determine which function from the previously allocated function table is called. The first run returns EAX=3, so we are dealing with the custom instruction with instruction id 3.

Let’s jump into our first custom instruction.

Overview of function representing instruction id 3

The function sub_401271 has 31 XRefs and is used in every function from the function table. Before the function is called, the pointer to ESI+7C, our 0x24 buffer holding the custom opcodes are retrieved.0xC is added, that means we are pointing at the byte at ESI+7C+0xC, the 4th DWORD in this buffer.

The routine accesses the third byte of the current opcode and is responsible for determining the instruction type. The first four bits decide wether it is an instruction utilizing 2 registers, a memory read or moving an immediate value into a register. The second 4 bits influence the size of the byte that will be moved around. These 4 bits are zero extended into bytes.

Take a look at the figure below. The result of our InstrType function is saved in ebp+0x4. Next the memory address which ESI+0x20 points at is decreased and filled with the value we just computed. Doesn’t this look familiar ? The stack is also decreased if we put data onto it.

Block decreasing the virtual stack and writing the result into it

It seems that the custom instruction we just investigated is a custom PUSH instruction. ESI+0x20 points to the virtual stack that is emulated by this virtual machine. Since the pointer at ESI+0x4C is increased here after an instruction, it might hold the virtual instruction pointer.

So far we figured out what the first 3 opcodes do and we have an idea what the last 2 ones are responsible for. In order to give a proper answer on how they are used, it is needed to look at more than just 1 virtual instruction execution.

Final thoughts regarding opcodes

Conclusion

So it just took me a complete blog article to really explain how to reverse a single custom instruction of a binary hardened with Virtualised Code Protection ;-). As you can see, this kind of software protection is very powerful.

I will finish this challenge for sure and will write a second blog article about how I solved it.

Examining Smokeloader’s Anti Hooking technique

24 May 2020 at 09:22

Hooking is a technique to intercept function calls/messages or events passed between software, or in this case malware. The technique can be used for malicious, as well as defensive cases.

Rootkits for example can hook API calls to make themselves invisible from analysis tools, while we as defenders can use hooking to gain more knowledge of malware or build detection mechanisms to protect customers.

Cybersecurity continues to be a game of cat and mouses, and while we try to build protections, blackhats will always try to bypass these protection mechanisms. Today I want to show you how SmokeLoader bypasses hooks on ntdll.dll and how Frida can be used to hook library functions.

The bypass was also already explained in a blog article from Checkpoint[1] written by Israel Gubi. It also covers a lot more than I do regarding Smokeloader, so it is definitely worth reading too.

Hooking with Frida

If you’ve read my previous blog articles about QBot, you are familiar with the process iteration and AV detection[3]. It iterates over processes and compares the process name with entries in a black list containing process names of common AV products. If one process name matches with an entry, QBot quits its execution.

Frida is a Dynamic Instrumentation Toolkit which can be used to write dynamic analysis scripts in high level languages, in this case JavaScript. If you want to know more about this technology, I advice you to read to visit this website[4] and read its documentation.

We can write a small Frida script to hook the lstrcmpiA function in order to investigate which process names are in the black list.

def main():
    """Main."""
    # argv[1] is our malware sample
    pid = frida.spawn(sys.argv[1])
    sess = frida.attach(pid)
    script = sess.create_script("""
        console.log("[+] Starting Frida script")
        var lstrcmpiA = ptr("0x76B43E8E")
        console.log("[+] Hooking lstrcmpiA at " + lstrcmpiA)
        Interceptor.attach(lstrcmpiA, {
            onEnter: function(args) {
                console.log("[+][+] Called strcmpiA");
                console.log("[+][+] Arg1Addr = " + args[0]);
                console.log("[+][+] Buffer");
                pretty_print(args[0], 0x30);
                console.log("[+][+] Arg2Addr = " + args[1]);
                console.log("[+][+] Buffer");
                pretty_print(args[1], 0x30);
            },
            onLeave: function(retval) {
                console.log("[+][+] Returned from strcmpiA")
            }
        });

        function pretty_print(addr, sz) {
            var bufptr = ptr(addr);
            var bytearr = Memory.readByteArray(bufptr, sz);
            console.log(bytearr);
        };

        """)
    script.load()
    frida.resume(pid)
    sys.stdin.read()
    sess.detach()

We attach to the malicious process and hook the lstrcmpiA function at static address. When analysing malware, we have (most of the time) the privilege to control and adjust our environment as much as we want. If you turn off ASLR and use snapshots, using Frida with static pointers is pretty convenient, because most functions will always have the same address. However, it’s also possible to calculate the addresses dynamically. lstrcmpiA has 2 arguments, which are both pointers of type LPSTR. So we just resolve the pointers, fill 0x30 bytes starting at pointer address into a ByteArray and print it.

Result of Frida Script

Smokeloader’s Anti Hooking technique

So how does Smokeloader bypass hooks? Well it can do it atleast for the ntdll.dll library. During execution Smokeloader retrieves the Temp folder path and generates a random name. If a file with the generated name already exists in the temp folder, it is deleted with DeleteFileW.

drltrace output DeleteFileW call, deleting 9A26.tmp in Temp Folder

Next the original ntdll.dll file is copied from system32 to the temp folder with the exact name it just generated. This leads to a copy of this mentioned library being placed in the temp directory.

Meta data of disguised ntdll.dll
Export functions of the disguised ntdll file

Instead of loading the real ntdll.dll file, the copy is loaded into memory by calling LdrLoadDll.

9A26.tmp as ntdll.dll

Most AV vendors, as well as analysts probably implemented their hooks on ntdll.dll, so the references to the copied ntdll.dll file will be missed.

Smokeloader continues to call functions from this copied DLL, using for example function calls like NtQueryInformationProcess to detect wether a debugger is attached to it.

Final Words

While analysing SmokeLoader at work, I stumbled across this AntiHook mechanism, which I haven’t seen before, so I wanted to share it here :-).


I’ve also only scratched on the surface of what Frida is capable of. I might work on something more complex next time.

❌
❌