Normal view

There are new articles available, click to refresh the page.
Before yesterdayThreat Research

FLARE Script Series: Automating Obfuscated String Decoding

28 December 2015 at 14:01
Introduction

We are expanding our script series beyond IDA Pro. This post extends the FireEye Labs Advanced Reverse Engineering (FLARE) script series to an invaluable tool for the reverse engineer – the debugger. Just like IDA Pro, debuggers have scripting interfaces. For example, OllyDbg uses an asm-like scripting language, the Immunity debugger contains a Python interface, and Windbg has its own language. Each of these options isn’t ideal for rapidly creating string decoding debugger scripts. Both Immunity and OllyDbg only support 32-bit applications, and Windbg’s scripting language is specific to Windbg and, therefore, not as well-known. The pykd project was created to interface between Python and Windbg to allow debugger scripts to be written in Python. Because malware reverse engineers love Python, we built our debugger scripting library on top of pykd for Windbg.

Here we release a library we call flare-dbg. This library provides several utility classes and functions to rapidly develop scripts to automate debugging tasks within Windbg. Stay tuned for future blog posts that will describe additional uses for debugger scripts!

String Decoding

Malware authors like to hide the intent of their software by obfuscating their strings. Quickly deobfuscating strings allows you to quickly figure out what the malware is doing.

As stated in Practical Malware Analysis, there are generally two approaches to deobfuscating strings: self-decoding and manual programming. The self-decoding approach allows the malware to decode its own strings. Manual programming requires the reverse engineer to reprogram the decoding function logic. A subset of the self-decoding approach is emulation, where each assembly instruction execution is emulated. Unfortunately, library call emulation is required, and emulating every library call is difficult and may cause inaccurate results. In contrast, a debugger is attached to the actual running process, so all the library functions can be run without issue. Each of these approaches has their place, but this post teaches a way to use debugger scripting to automatically self-decode all obfuscated strings.

Challenge

To decode all obfsucated strings, we need to find the following: the string decoder function, each time it is called, and all arguments to each of those instances. We then need to run the function and read out the result. The challenge is to do this in a semi-automated way.

Approach

The first task is to find the string decoder function and get a basic understanding of the inputs and outputs of the function. The next task is to identify each time the string decoder function is called and all of the arguments to each call. Without using IDA, a handy Python project for binary analysis is Vivisect. Vivisect contains several heuristics for identifying functions and cross-references. Additionally, Vivisect can emulate and disassemble a series of opcodes, which can help us identify function arguments. If you haven’t already, be sure to check out the FLARE scripting series post on tracking function arguments using emulation, which also uses Vivisect.

Introducing flare-dbg

The FLARE team is introducing a Python project, flare-dbg that runs on top of pykd. Its goal is to make Windbg scripting easy. The heart of the flare-dbg project lies in the DebugUtils class, which contains several functions to handle:

·      Memory and register manipulation
·      Stack operations
·      Debugger execution
·      Breakpoints
·      Function calling

In addition to the basic debugger utility functions, the DebugUtils class uses Vivisect to handle the binary analysis portion.

Example

I wrote a simple piece of malware that hides strings by encoding them. Figure 1 shows an HTTP User-Agent string being decoded by a function I named string_decoder.

Figure 1: String decoder function reference in IDA Pro

After a cursory look at the string_decoder function, the arguments are identified as an offset to an encoded string of bytes, an output address, and a length. The function can be described as the following C prototype:

Now that we have a basic understanding of the string_decoder function, we test decoding using Windbg and flare-dbg. We begin by starting the process with Windbg and executing until the program’s entry point. Next, we start a Python interactive shell within Windbg using pykd and import flaredbg.

Next, we create a DebugUtils object, which contains the functions we need to control the debugger.

We then allocate 0x3A-bytes of memory for the output string. We use the newly allocated memory as the second parameter and setup the remainder of the arguments.

Finally, we call the string_decoder function at virtual address 0x401000, and read the output string buffer.

After proving we can decode a string with flare-dbg, let’s automate all calls to the string_decoder function. An example debugger script is shown in Figure 2. The full script is available in the examples directory in the github repository.

Figure 2. Example basic debugger script

Let’s break this script down. First, we identify the function virtual address of the string decoder function and create a DebugUtils object. Next, we use the DebugUtils function get_call_list to find the three push arguments for each time string_decoder is called.

Once the call_list is generated, we iterate all calling addresses and associated arguments. In this example, the output string is decoded to the stack. Because we are only executing the string decoder function and won’t have the same stack setup as the malware, we must allocate memory for the output string. We use the third parameter, the length, to specify the size of the memory allocation. Once we allocate memory for the output string, we set the newly allocated memory address as the second parameter to receive the output bytes.

Finally, we run the string_decoder function by using the DebugUtils call function and read the result from our allocated buffer. The call function sets up the stack, sets any specified register values, and executes the function. Once all strings are decoded, the final step is to get these strings back into our IDB. The utils script contains utility functions to create IDA Python scripts. In this case, we output an IDA Python script that creates comments in the IDB.

Running this debugger script produces the following output:

The output IDA Python script creates repeatable comments on all encoded string locations, as shown in Figure 3.

Figure 3. Decoded string as comment

Conclusion

Stay tuned for another debugger scripting series post that will focus on plugins! For now, head over to the flare-dbg github project page to get started. The project requires pykd,winappdbg, and vivisect.

FLARE Script Series: flare-dbg Plug-ins

9 February 2016 at 12:00

Introduction

This post continues the FireEye Labs Advanced Reverse Engineering (FLARE) script series. In this post, we continue to discuss the flare-dbg project. If you haven’t read my first post on using flare-dbg to automate string decoding, be sure to check it out!

We created the flare-dbg Python project to support the creation of plug-ins for WinDbg. When we harness the power of WinDbg during malware analysis, we gain insight into runtime behavior of executables. flare-dbg makes this process particularly easy. This blog post discusses WinDbg plug-ins that were inspired by features from other debuggers and analysis tools. The plug-ins focus on collecting runtime information and interacting with malware during execution. Today, we are introducing three flare-dbg plug-ins, which are summarized in Table 1.

Table 1: flare-dbg plug-in summary

To demonstrate the functionality of these plug-ins, this post uses a banking trojan (MD5: 03BA3D3CEAE5F11817974C7E4BE05BDE) known as TINBA to FireEye.

injectfind

Background

A common technique used by malware is code injection. When malware allocates a memory region to inject code, the created region contains certain characteristics we use to identify them in a process’s memory space. The injectfind plug-in finds and displays information about injected regions of memory from within WinDbg.

The injectfind plug-in is loosely based off the Volatility malfind plug-in. Given a memory dump, the Volatility variant searches memory for injected code and shows an analyst injected code found within processes. Instead of requiring a memory dump, the injectfind WinDbg plug-in runs in a debugger. Similar to malfind, the injectfind plug-in identifies memory regions that may have had code injected and prints a hex dump and a disassembly listing of each identified memory region. A quick glance at the output helps us identify injected code or hooked functions. The following section shows an example of an analyst identifying injected code with injectfind.

Example

After running the TINBA malware in an analysis environment, we observe that the initial loader process exits immediately, and the explorer.exe process begins making network requests to seemingly random domains. After attaching to the explorer.exe process with Windbg and running the injectfind plug-in, we see the output shown in Figure 1.

Figure 1: Output from the injectfind plug-in

The first memory region at virtual address 0x1700000 appears to contain references to Windows library functions and is 0x17000 bytes in size. It is likely that this memory region contains the primary payload of the TINBA malware.

The second memory region at virtual address 0x1CD0000 contains a single page, 0x1000 bytes in length, and appears to have two lines of meaningful disassembly. The disassembly shows the eax register being set to 0x30 and a jump five bytes into the NtCreateProcessEx function. Figure 2 shows the disassembly of the first few instructions of the NtCreateProcessEx function.

Figure 2: NtCreateProcessEx disassembly listing

The first instruction for NtCreateProcessEx is a jmp to an address outside of ntdll's memory. The destination address is within the first memory region that injectfind identified as injected code. We can quickly conclude that the malware creates a function hook for process creation all from within a Windbg debugger session.

membreak

Background

One feature missing from Windbg that is present in OllyDbg and x64dbg is the ability to set a breakpoint on an entire memory region. This type of breakpoint is known as a memory breakpoint. Memory breakpoints are used to pause a process when a specified region of memory is executed.

Memory breakpoints are useful when you want to break on code execution without specifying a single address. For example, many packers unpack their code into a new memory region and begin executing somewhere in this new memory. Setting a memory breakpoint on the new memory region would pause the debugger at the first execution anywhere within the new memory region. This obviates the need to tediously reverse engineer the unpacking stub to identify the original entry point.

One way to implement memory breakpoints is by changing the memory protection for a memory region by adding the PAGE_GUARD memory protection flag. When this memory region is executed, a STATUS_GUARD_PAGE_VIOLATION exception occurs. The debugger handles the exception and returns control to the user. The flare-dbg plug-in membreak uses this technique to implement memory breakpoints.

Example

After locating the injected code using the injectfind plug-in, we set a memory breakpoint to pause execution within the injected code memory region. The membreak plug-in accepts one or multiple addresses as parameters. The plug-in takes each address, finds the base address for the corresponding memory region, and changes the entire region’s permissions. As shown in Figure 3, when the membreak plug-in is run with the base address of the injected code as the parameter, the debugger immediately begins running until one of these memory regions is executed.

Figure 3: membreak plug-in run in Windbg

The output for the memory breakpoint hit shows a Guard page violation and a message about first chance exceptions. As explained above, this should be expected. Once the breakpoint is hit, the membreak plug-in restores the original page permissions and returns control to the analyst.

importfind

Background

Malware often loads Windows library functions at runtime and stores the resolved addresses as global variables. Sometimes it is trivial to resolve these statically in IDA Pro, but other times this can be a tedious process. To speed up the labeling of these runtime imported functions, we created a plug-in named importfind to find these function addresses. Behind the scenes, the plug-in parses each library's export table and finds all exported function addresses. The plug-in then searches the malware’s memory and identifies references to the library function addresses. Finally, it generates an IDAPython script that can be used to annotate an IDB workspace with the resolved library function names.

Example

Going back to TINBA, we saw text referencing Windows library functions in the output from injectfind above. The screenshot of IDA Pro in Figure 2 shows this same region of data. Note that following each ASCII string containing an API name, there is a number that looks like a pointer. Unfortunately, IDA Pro does not have the same insight as the debugger, so these addresses are not resolved to API functions and named.

Figure 4: Unnamed library function addresses

We use the importfind plug-in to find the function names associated with these addresses, as shown in Figure 5.

Figure 5: importfind plug-in run in Windbg

The importfind plug-in generates an IDA Python script file that is used to rename these global variables in our IDB as shown in Figure 2. Figure 6 shows a screenshot from IDA Pro after the script has renamed the global variables to more meaningful names.

Figure 6: IDA Pro with named global variables

Conclusion

This blog post shows the power of using the flare-dbg plug-ins with a debugger to gain insight into how the malware operates at runtime. We saw how to identify injected code using the injectfind plug-in and create memory breakpoints using membreak. We also demonstrated the usefulness of the importfind plug-in for identifying and renaming runtime imported functions.

To find out how to setup and get started with flare-dbg, head over the github project page where you’ll learn about setup and usage.

❌
❌