Normal view

There are new articles available, click to refresh the page.

Before yesterdayThreat Research

Threat Research
Thinking Outside the Bochs: Code Grafting to Unpack Malware in EmulationMichael Bailey
7 April 2020 at 16:00

Thinking Outside the Bochs: Code Grafting to Unpack Malware in Emulation

7 April 2020 at 16:00

This blog post continues the FLARE script series with a discussion of patching IDA Pro database files (IDBs) to interactively emulate code. While the fastest way to analyze or unpack malware is often to run it, malware won’t always successfully execute in a VM. I use IDA Pro’s Bochs integration in IDB mode to sidestep tedious debugging scenarios and get quick results. Bochs emulates the opcodes directly from your IDB in a Bochs VM with no OS.

Bochs IDB mode eliminates distractions like switching VMs, debugger setup, neutralizing anti-analysis measures, and navigating the program counter to the logic of interest. Alas, where there is no OS, there can be no loader or dynamic imports. Execution is constrained to opcodes found in the IDB. This precludes emulating routines that call imported string functions or memory allocators. Tom Bennett’s flare-emu ships with emulated versions of these, but for off-the-cuff analysis (especially when I don’t know if there will be a payoff), I prefer interactively examining registers and memory to adjust my tactics ad hoc.

What if I could bring my own imported functions to Bochs like flare-emu does? I’ve devised such a technique, and I call it code grafting. In this post I’ll discuss the particulars of statically linking stand-ins for common functions into an IDB to get more mileage out of Bochs. I’ll demonstrate using this on an EVILNEST sample to unpack and dump next-stage payloads from emulated memory. I’ll also show how I copied a tricky call sequence from one IDB to another IDB so I could keep the unpacking process all in a single Bochs debug session.

EVILNEST Scenario

My sample (MD5 hash 37F7F1F691D42DCAD6AE740E6D9CAB63 which is available on VirusTotal) was an EVILNEST variant that populates the stack with configuration data before calling an intermediate payload. Figure 1 shows this unusual call site.

Figure 1: Call site for intermediate payload

The code in Figure 1 executes in a remote thread within a hollowed-out iexplore.exe process; the malware uses anti-analysis tactics as well. I had the intermediate payload stage and wanted to unpack next-stage payloads without managing a multi-process debugging scenario with anti-analysis. I knew I could stub out a few function calls in the malware to run all of the relevant logic in Bochs. Here’s how I did it.

Code Carving

I needed opcodes for a few common functions to inject into my IDBs and emulate in Bochs. I built simple C implementations of selected functions and compiled them into one binary. Figure 2 shows some of these stand-ins.

Figure 2: Simple implementations of common functions

I compiled this and then used IDAPython code similar to Figure 3 to extract the function opcode bytes.

Figure 3: Function extraction

I curated a library of function opcodes in an IDAPython script as shown in Figure 4. The nonstandard function opcodes at the bottom of the figure were hand-assembled as tersely as possible to generically return specific values and manipulate the stack (or not) in conformance with calling conventions.

Figure 4: Extracted function opcodes

On top of simple functions like memcpy, I implemented a memory allocator. The allocator referenced global state data, meaning I couldn’t just inject it into an IDB and expect it to work. I read the disassembly to find references to global operands and templatize them for use with Python’s format method. Figure 5 shows an example for malloc.

Figure 5: HeapAlloc template code

I organized the stubs by name as shown in Figure 6 both to call out functions I would need to patch, and to conveniently add more function stubs as I encounter use cases for them. The mangled name I specified as an alias for free is operator delete.

Figure 6: Function stubs and associated names

To inject these functions into the binary, I wrote code to find the next available segment of a given size. I avoided occupying low memory because Bochs places its loader segment below 0x10000. Adjacent to the code in my code segment, I included space for the data used by my memory allocator. Figure 7 shows the result of patching these functions and data into the IDB and naming each location (stub functions are prefixed with stub_).

Figure 7: Data and code injected into IDB

The script then iterates all the relevant calls in the binary and patches them with calls to their stub implementations in the newly added segment. As shown in Figure 8, IDAPython’s Assemble function saved the effort of calculating the offset for the call operand manually. Note that the Assemble function worked well here, but for bigger tasks, Hex-Rays recommends a dedicated assembler such as Keystone Engine and its Keypatch plugin for IDA Pro.

Figure 8: Abbreviated routine for assembling a call instruction and patching a call site to an import

The Code Grafting script updated all the relevant call sites to resemble Figure 9, with the target functions being replaced by calls to the stub_ implementations injected earlier. This prevented Bochs in IDB mode from getting derailed when hitting these call sites, because the call operands now pointed to valid code inside the IDB.

Figure 9: Patched operator new() call site

Dealing with EVILNEST

The debug scenario for the dropper was slightly inconvenient, and simultaneously, it was setting up a very unusual call site for the payload entry point. I used Bochs to execute the dropper until it placed the configuration data on the stack, and then I used IDAPython’s idc.get_bytes function to extract the resulting stack data. I wrote IDAPython script code to iterate the stack data and assemble push instructions into the payload IDB leading up to a call instruction pointing to the DLL’s export. This allowed me to debug the unpacking process from Bochs within a single session.

I clicked on the beginning of my synthesized call site and hit F4 to run it in Bochs. I was greeted with the warning in Figure 10 indicating that the patched IDB would not match the depictions made by the debugger (which is untrue in the case of Bochs IDB mode). Bochs faithfully executed my injected opcodes producing exactly the desired result.

Figure 10: Patch warning

I watched carefully as the instruction pointer approached and passed the IsDebuggerPresent check. Because of the stub I injected (stub_IsDebuggerPresent), it passed the check returning zero as shown in Figure 11.

Figure 11: Passing up IsDebuggerPresent

I allowed the program counter to advance to address 0x1A1538, just beyond the unpacking routine. Figure 12 shows the register state at this point which reflects a value in EAX that was handed out by my fake heap allocator and which I was about to visit.

Figure 12: Running to the end of the unpacker and preparing to view the result

Figure 13 shows that there was indeed an IMAGE_DOS_SIGNATURE (“MZ”) at this location. I used idc.get_bytes() to dump the unpacked binary from the fake heap location and saved it for analysis.

Figure 13: Dumping the unpacked binary

Through Bochs IDB mode, I was also able to use the interactive debugger interface of IDA Pro to experiment with manipulating execution and traversing a different branch to unpack another payload for this malware as well.

Conclusion

Although dynamic analysis is sometimes the fastest road, setting it up and navigating minutia detract from my focus, so I’ve developed an eye for routines that I can likely emulate in Bochs to dodge those distractions while still getting answers. Injecting code into an IDB broadens the set of functions that I can do this with, letting me get more out of Bochs. This in turn lets me do more on-the-fly experimentation, one-off string decodes, or validation of hypotheses before attacking something at scale. It also allows me to experiment dynamically with samples that won’t load correctly anyway, such as unpacked code with damaged or incorrect PE headers.

I’ve shared the Code Grafting tools as part of the flare-ida GitHub repository. To use this for your own analyses:

In IDA Pro’s IDAPython prompt, run code_grafter.py or import it as a module.
Instantiate a CodeGrafter object and invoke its graftCodeToIdb() method:
- CodeGrafter().graftCodeToIdb()
Use Bochs in IDB mode to conveniently execute your modified sample and experiment away!

This post makes it clear just how far I’ll go to avoid breaking eye contact with IDA. If you’re a fan of using Bochs with IDA too, then this is my gift to you. Enjoy!

Threat Research
Cmd and Conquer: De-DOSfuscation with flare-qdbMichael Bailey
20 November 2018 at 17:30

Cmd and Conquer: De-DOSfuscation with flare-qdb

Threat Research

By: Michael Bailey

20 November 2018 at 17:30

When Daniel Bohannon released his excellent DOSfuscation paper, I was fascinated to see how tricks I used as a systems engineer could help attackers evade detection. I didn’t have much to contribute to this conversation until I had to analyze a hideously obfuscated batch file as part of my job on the FLARE malware queue.

Previously, I released flare-qdb, which is a command-line and Python-scriptable debugger based on Vivisect. I previously wrote about how to use flare-qdb to instrument and modify malware behavior. Flare-qdb also made a guest appearance in Austin Baker and Jacob Christie’s SANS DFIR Summit 2017 briefing, inducing the Windows event log service to exclude process creation events. In this blog post, I will show how I used flare-qdb to bring “script block logging” to the Windows command interpreter. I will also share an Easter Egg that I found by flipping only a single bit in the process address space of cmd.exe. Finally, I will share the script that I added to flare-qdb so you can de-obfuscate malicious command scripts yourself by executing them (in a safe environment, of course). But first, I’ll talk about the analysis that led me to this solution.

At First Glance

Figure 1 shows a batch script (MD5 hash 6C8129086ECB5CF2874DA7E0C855F2F6) that has been obfuscated using the BatchEncryption tool referenced in Daniel Bohannon’s paper. This file does not appear in VirusTotal as of this writing, but its dropper does (the MD5 hash is ABD0A49FDA67547639EEACED7955A01A). My goal was to de-obfuscate this script and report on what the attacker was doing.

Figure 1: Contents of XYNT.bat

This 165k batch file is dropped as C:\Windows\Temp\XYNT.bat and executed by its dropper. Its commands are built from environment variable substrings. Figure 2 shows how to use the ECHO command to decode the first command.

Figure 2: Partial command decoding via the ECHO command

The script uses hundreds of commands to set environment variables that are ultimately expanded to de-obfuscate malicious commands. A tedious approach to de-obfuscating this script would be to de-fang each command by prepending an ECHO statement to print each de-obfuscated command to the console. Unfortunately, although the ECHO command can “decode” each command, BatchEncryption needs the SET commands to be executed to decode future commands. To decode this script while allowing the full malicious functionality to run as expected, you would have to iteratively and carefully echo and selectively execute a few hundred obfuscated SET commands.

The irony of BatchEncryption is that batch scripts are viewed as being easy to de-obfuscate, making binary code the safer place to hide logic from the prying eyes of network defenders. But BatchEncryption adds a formidable barrier to analysis by its extensive, layered use of environment variables to rebuild the original commands.

Taking Cmd of the Situation

I decided to see if it would be easier to instrument cmd.exe to log commands rather than de-obfuscating the script myself. To begin, I debugged cmd.exe, set a breakpoint on CreateProcessW, and executed a program from the command prompt. Figure 3 shows the call stack for CreateProcessW as cmd.exe executes notepad.

Figure 3: Call stack for CreateProcessW in cmd.exe

Starting from cmd!ExecPgm, I reviewed the disassembly of the above functions in cmd.exe to trace the origin of the command string up the call stack. I discovered cmd!Dispatch, which receives not a string but a structure with pointers to the command, arguments, and any I/O redirection information (such as redirecting the standard output or error streams of a program to a file or device). Testing revealed that these strings had all their environment variables expanded, which means we should be able to read the de-obfuscated commands from here.

Figure 4 is an exploration of this structure in WinDbg after running the command "echo hai > nul". This command prints the word hai to the standard output stream but uses the right-angle bracket to redirect standard output to the NUL device, which discards all data. The orange boxes highlight non-null pointers that got my attention during analysis, and the arrows point to the commands I used to discover their contents.

Figure 4: Exploring the interesting pointers in 2nd argument to cmd!Dispatch

Because users can redirect multiple I/O streams in a single command, cmd.exe represents I/O redirection with a linked list. For example, the command in Listing 1 shows redirection of standard output (stream #1 is implicit) to shares.txt and standard error (stream #2 is explicitly referenced) to errors.txt.

net use > shares.txt 2>errors.txt

Listing 1: Command-line I/O redirection example

Figure 5 shows the command data structure and the I/O redirection linked list in block diagram format.

Figure 5: Command data structure diagram

By inspection, I found that cmd!Dispatch is responsible for executing both shell built-ins and executable programs, so unlike breaking on CreateProcess, it will not miss commands that do not result in process creation. Based on these findings, I wrote a flare-qdb script to parse and dump commands as they are executed.

Introducing De-DOSfuscator

De-DOSfuscator uses flare-qdb and Vivisect to hook the Dispatch function in cmd.exe and parse commands from memory. The De-DOSfuscator script runs in a 64-bit Python 2 interpreter and dumps commands to both the console and a log file. The script comes with the latest version of flare-qdb and is installed as a Python entry point named dedosfuscator.exe.

De-DOSfuscator relies on the location of the non-exported Dispatch function to log commands, and its location varies per system. For convenience, if an Internet connection is available, De-DOSfuscator automatically retrieves this function’s offset using Microsoft’s symbol server. To allow offline use, you can supply the path to a copy of cmd.exe from your offline machine to the --getoff switch to obtain this offset. You can then supply that output as the argument to the --useoff switch in your offline machine to inform De-DOSfuscator where the function is located. Alternately, you can use De-DOSfuscator with a downloaded PDB or a local symbol cache containing the correct symbols.

Figure 6 demonstrates getting and using the offset in a single session. Note that for this to work in an isolated VM, you would instead specify the path to a copy of the guest’s command interpreter specific to that VM.

Figure 6: Getting and using offsets and testing De-DOSfuscator

This works great on the BatchEncrypted script in Figure 1. Let’s have a look.

Results

Figure 7 shows the log created by De-DOSfuscator after running XYNT.bat. Hundreds of lines of SET statements progressively build environment variables for composing further commands. Keen eyes will also note a misspelling of the endlocal command-line extension keyword.

Figure 7: Beginning of dumped commands

These environment variable manipulations give way to real commands as shown in Figure 8. One of the script’s first actions is to use reg.exe to check the NUMBER_OF_PROCESSORS environment variable. This analysis system only had one vCPU, which can be seen in the set "a=1" output on line 620. After this, the script executes goto del, which branches to a batch label that ultimately deletes the script and other dropped files.

Figure 8: Anti-sandbox measure

This is a batch-oriented spin on a common sandbox evasion trick. It works because many malware analysis sandboxes run with a single CPU to minimize hypervisor resources, whereas most modern systems have at least two CPU cores. Now that we can easily read the script’s commands, it is trivial to circumvent this evasion by, for example, increasing the number of vCPUs available to the VM. Figure 9 shows De-DOSfuscator log after inducing the rest of the code to run.

Figure 9: After circumventing anti-sandbox measure

XYNT.bat calls a dropped binary to create and start a Windows service for persistence. The largest dropped binary is a variant of the XMRig cryptocurrency miner, and many of the services and executables referenced by the script also appear to be cryptocurrency-related.

Happy Easter

Easter is a long way off, but I must present you with a very early Easter Egg because it is such a neat little find. During my journey through cmd.exe, I noticed a variable named fDumpParse having only one cross-reference that seemed to control an interesting behavior. The lone cross-reference and the relevant code are shown in Figure 10. Although fDumpParse is inaccessible anywhere else in the code, it controls whether a function is called to dump information about the command that has been parsed.

Figure 10: fDumpParse evaluation and cross-refs (EDI is NULL)

To experiment with this, you can use De-DOSfuscator’s --fDumpParse switch. You will then be greeted with a command prompt that is more transparent about what it has parsed. Figure 11 shows an example along with a graphical representation of the abstract syntax tree (AST) of parsed command tokens.

Figure 11: Command interpreter with fDumpParse set

Microsoft probably inserted the fDumpParse flag so developers could debug issues with cmd.exe. Unfortunately, as nifty as this is, it has drawbacks for bulk de-obfuscation:

This output is harder to read than plain commands, because it dumps the tree in preorder traversal rather than inorder like it was typed.
Output copied from the console may contain extraneous line breaks depending on the console host program’s text wrapping behavior.
Scrolling in the command interpreter to read or copy output can be tedious.
The console buffer is limited, so not everything may be captured.
Malicious script authors can still use the CLS command to clear the screen and make all the fDumpParse output disappear.
Gratuitous joining of commands with command separators (as found in XYNT.bat) yields unreadable ASTs that exceed the console width and wrap around, as in Figure 12.

Figure 12: fDumpParse result exceeding console width

Consequently, fDumpParse is not ideal for de-obfuscating large, malicious batch files; however, it is still interesting and useful for de-obfuscating short scripts or one-off commands. You can get the offset De-DOSfuscator needs for offline use via --getoff and use it via --useoff, as with normal operation.

Wrapping Up

I have given you an example of a heavily obfuscated command script and I have shared a useful tool for de-obfuscating it, along with the analytical steps that I followed to synthesize it. The De-DOSfuscator script code comes with the latest version of flare-qdb and is accessible as a script entry point (dedosfuscator.exe) when you install flare-qdb. It is my hope that this not only helps you to conveniently analyze malicious batch scripts, but also inspires you to devise your own creative ways to employ flare-qdb against malware.

Loading Kernel Shellcode

Threat Research

By: Michael Bailey

23 April 2018 at 15:00

In the wake of recent hacking tool dumps, the FLARE team saw a spike in malware samples detonating kernel shellcode. Although most samples can be analyzed statically, the FLARE team sometimes debugs these samples to confirm specific functionality. Debugging can be an efficient way to get around packing or obfuscation and quickly identify the structures, system routines, and processes that a kernel shellcode sample is accessing.

This post begins a series centered on kernel software analysis, and introduces a tool that uses a custom Windows kernel driver to load and execute Windows kernel shellcode. I’ll walk through a brief case study of some kernel shellcode, how to load shellcode with FLARE’s kernel shellcode loader, how to build your own copy, and how it works.

As always, only analyze malware in a safe environment such as a VM; never use tools such as a kernel shellcode loader on any system that you rely on to get your work done.

A Tale of Square Pegs and Round Holes

Depending upon how a shellcode sample is encountered, the analyst may not know whether it is meant to target user space or kernel space. A common triage step is to load the sample in a shellcode loader and debug it in user space. With kernel shellcode, this can have unexpected results such as the access violation in Figure 1.

Figure 1: Access violation from shellcode dereferencing null pointer

The kernel environment is a world apart from user mode: various registers take on different meanings and point to totally different structures. For instance, while the gs segment register in 64-bit Windows user mode points to the Thread Information Block (TIB) whose size is only 0x38 bytes, in kernel mode it points to the Processor Control Region (KPCR) which is much larger. In Figure 1 at address 0x2e07d9, the shellcode is attempting to access the IdtBase member of the KPCR, but because it is running in user mode, the value at offset 0x38 from the gs segment is null. This causes the next instruction to attempt to access invalid memory in the NULL page. What the code is trying to do doesn’t make sense in the user mode environment, and it has crashed as a result.

In contrast, kernel mode is a perfect fit. Figure 2 shows WinDbg’s dt command being used to display the _KPCR type defined within ntoskrnl.pdb, highlighting the field at offset 0x38 named IdtBase.

Figure 2: KPCR structure

Given the rest of the code in this sample, accessing the IdtBase field of the KPCR made perfect sense. Determining that this was kernel shellcode allowed me to quickly resolve the rest of my questions, but to confirm my findings, I wrote a kernel shellcode loader. Here’s what it looks like to use this tool to load a small, do-nothing piece of shellcode.

Using FLARE’s Kernel Shellcode Loader

I booted a target system with a kernel debugger and opened an administrative command prompt in the directory where I copied the shellcode loader (kscldr.exe). The shellcode loader expects to receive the name of the file on disk where the shellcode is located as its only argument. Figure 3 shows an example where I’ve used a hex editor to write the opcodes for the NOP (0x90) and RET (0xC3) instructions into a binary file and invoked kscldr.exe to pass that code to the kernel shellcode loader driver. I created my file using the Windows port of xxd that comes with Vim for Windows.

Figure 3: Using kscldr.exe to load kernel shellcode

The shellcode loader prompts with a security warning. After clicking yes, kscldr.exe installs its driver and uses it to execute the shellcode. The system is frozen at this point because the kernel driver has already issued its breakpoint and the kernel debugger is awaiting commands. Figure 4 shows WinDbg hitting the breakpoint and displaying the corresponding source code for kscldr.sys.

Figure 4: Breaking in kscldr.sys

From the breakpoint, I use WinDbg with source-level debugging to step and trace into the shellcode buffer. Figure 5 shows WinDbg’s disassembly of the buffer after doing this.

Figure 5: Tracing into and disassembling the shellcode

The disassembly shows the 0x90 and 0xc3 opcodes from before, demonstrating that the shellcode buffer is indeed being executed. From here, the powerful facilities of WinDbg are available to debug and analyze the code’s behavior.

Building It Yourself

To try out FLARE’s kernel shellcode loader for yourself, you’ll need to download the source code.

To get started building it, download and install the Windows Driver Kit (WDK). I’m using Windows Driver Kit Version 7.1.0, which is command line driven, whereas more modern versions of the WDK integrate with Visual Studio. If you feel comfortable using a newer kit, you’re welcomed to do so, but beware, you’ll have to take matters into your own hands regarding build commands and dependencies. Since WDK 7.1.0 is adequate for purposes of this tool, that is the version I will describe in this post.

Once you have downloaded and installed the WDK, browse to the Windows Driver Kits directory in the start menu on your development system and select the appropriate environment. Figure 6 shows the WDK program group on a Windows 7 system. The term “checked build” indicates that debugging checks will be included. I plan to load 64-bit kernel shellcode, and I like having Windows catch my mistakes early, so I’m using the x64 Checked Build Environment.

Figure 6: Windows Driver Kits program group

In the WDK command prompt, change to the directory where you downloaded the FLARE kernel shellcode loader and type ez.cmd. The script will cause prompts to appear asking you to supply and use a password for a test signing certificate. Once the build completes, visit the bin directory and copy kscldr.exe to your debug target. Before you can commence using your custom copy of this tool, you’ll need to follow just a few more steps to prepare the target system to allow it.

Preparing the Debug Target

To debug kernel shellcode, I wrote a Windows software-only driver that loads and runs shellcode at privilege level 0. Normally, Windows only loads drivers that are signed with a special cross-certificate, but Windows allows you to enable testsigning to load drivers signed with a test certificate. We can create this test certificate for free, and it won’t allow the driver to be loaded on production systems, which is ideal.

In addition to enabling testsigning mode, it is necessary to enable kernel debugging to be able to really follow what is happening after the kernel shellcode gains execution. Starting with Windows Vista, we can enable both testsigning and kernel debugging by issuing the following two commands in an administrative command prompt followed by a reboot:

bcdedit.exe /set testsigning on

bcdedit.exe /set debug on

For debugging in a VM, I install VirtualKD, but you can also follow your virtualization vendor’s directions for connecting a serial port to a named pipe or other mechanism that WinDbg understands. Once that is set up and tested, we’re ready to go!

If you try the shellcode loader and get a blue screen indicating stop code 0x3B (SYSTEM_SERVICE_EXCEPTION), then you likely did not successfully connect the kernel debugger beforehand. Remember that the driver issues a software interrupt to give control to the debugger immediately before executing the shellcode; if the debugger is not successfully attached, Windows will blue screen. If this was the case, reboot and try again, this time first confirming that the debugger is in control by clicking Debug -> Break in WinDbg. Once you know you have control, you can issue the g command to let execution continue (you may need to disable driver load notifications to get it to finish the boot process without further intervention: sxd ld).

How It Works

The user-space application (kscldr.exe) copies the driver from a PE-COFF resource to the disk and registers it as a Windows kernel service. The driver implements device write and I/O control routines to allow interaction from the user application. Its driver entry point first registers dispatch routines to handle CreateFile, WriteFile, DeviceIoControl, and CloseHandle. It then creates a device named \Device\kscldr and a symbolic link making the device name accessible from user-space. When the user application opens the device file and invokes WriteFile, the driver calls ExAllocatePoolWithTag specifying a PoolType of NonPagedPool (which is executable), and writes the buffer to the newly allocated memory. After the write operation, the user application can call DeviceIoControl to call into the shellcode. In response, the driver sets the appropriate flags on the device object, issues a breakpoint to pass control to the kernel debugger, and finally calls the shellcode as if it were a function.

While You’re Here

Driver development opens the door to unique instrumentation opportunities. For example, Figure 7 shows a few kernel callback routines described in the WDK help files that can track system-wide process, thread, and DLL activity.

Figure 7: WDK kernel-mode driver architecture reference

Kernel development is a deep subject that entails a great deal of study, but the WDK also comes with dozens upon dozens of sample drivers that illustrate correct Windows kernel programming techniques. This is a treasure trove of Windows internals information, security research topics, and instrumentation possibilities. If you have time, take a look around before you get back to work.

Wrap-Up

We’ve shared FLARE’s tool for loading privileged shellcode in test environments so that we can dynamically analyze kernel shellcode. We hope this provides a straightforward way to quickly triage kernel shellcode if it ever appears in your environment. Download the source code now.

Do you want to learn more about these tools and techniques from FLARE? Then you should take one of our Black Hat classes in Las Vegas this summer! Our offerings include Malware Analysis Crash Course, macOS Malware for Reverse Engineers, and Malware Analysis Master Class.

Threat Research
Introducing Linux Support for FakeNet-NG: FLARE’s Next Generation Dynamic Network Analysis ToolMichael Bailey
5 July 2017 at 15:00

Introducing Linux Support for FakeNet-NG: FLARE’s Next Generation Dynamic Network Analysis Tool

Threat Research

By: Michael Bailey

5 July 2017 at 15:00

Introduction

In 2016, FLARE introduced FakeNet-NG, an open-source network analysis tool written in Python. FakeNet-NG allows security analysts to observe and interact with network applications using standard or custom protocols on a single Windows host, which is especially useful for malware analysis and reverse engineering. Since FakeNet-NG’s release, FLARE has added support for additional protocols. FakeNet-NG now has out-of-the-box support for DNS, HTTP (including BITS), FTP, TFTP, IRC, SMTP, POP, TCP, and UDP as well as SSL.

Building on this work, FLARE has now brought FakeNet-NG to Linux. This allows analysts to perform basic dynamic analysis either on a single Linux host or using a separate, dedicated machine in the same way as INetSim. INetSim has made amazing contributions to the productivity of the security community and is still the tool of choice for many analysts. Now, FakeNet-NG gives analysts a cross-platform tool for malware analysis that can directly integrate with all the great Python-based infosec tools that continually emerge in the field.

Getting and Installing FakeNet-NG on Linux

If you are running REMnux, then good news: REMnux now comes with FakeNet-NG installed, and existing users can get it by running the update-remnux command.

For other Linux distributions, setting up and using FakeNet-NG will require the Python pip package manager, the net-tools package, and the development files for OpenSSL, libffi, and libnetfilterqueue. Here is how to quickly obtain the appropriate prerequisites for a few common Linux distributions:

Debian and Ubuntu: sudo apt-get install python-pip python-dev libssl-dev libffi-dev libnetfilter-queue-dev net-tools
Fedora 25 and CentOS 7:
- yum -y update;
- yum -y install epel-release; # <-- If CentOS
- yum -y install redhat-rpm-config; # <-- If Fedora
- yum -y groupinstall 'Development Tools'; yum -y install python-pip python-devel openssl-devel libffi-devel libnetfilter_queue-devel net-tools

Once you have the prerequisites, you can download the latest version of FakeNet-NG and install it using setup.py install.

A Tale of Two Modes

On Linux, FakeNet-NG can be deployed in MultiHost mode on a separate host dedicated to network simulation, or in the experimental SingleHost mode for analyzing software locally. Windows only supports SingleHost mode. FakeNet-NG is configured by default to run in NetworkMode: Auto, which will automatically select SingleHost mode on Windows or MultiHost mode on Linux. Table 1 lists the currently supported NetworkMode settings by operating system.

	SingleHost	MultiHost
Windows	Default (Auto)	Unsupported
Linux	Experimental	Default (Auto)

Table 1: FakeNet-NG NetworkMode support per platform

FakeNet-NG’s support for SingleHost mode on Linux currently has limitations.

First, FakeNet-NG does not yet support conditional redirection of specific processes, hosts, or ports on Linux. This means that settings like ProcessWhiteList will not work as expected. We plan to add support for these settings in a later release. In the meantime, SingleHost mode supports redirecting all Internet-bound traffic to local listeners, which is the main use case for malware analysts.

Second, the python-netfilterqueue library is hard-coded to handle datagrams of no more than 4,012 octets in length. Loopback interfaces are commonly configured with high maximum transmittal unit (MTU) settings that allow certain applications to exceed this hard-coded limit, resulting in unanticipated network behavior. An example of a network application that may exhibit issues due to this would be a large file transfer via FTP. A workaround is to recompile python-netfilterqueue with a larger buffer size or to decrease the MTU for the loopback interface (i.e. lo) to 4,012 or less.

Configuring FakeNet-NG on Linux

In addition to the new NetworkMode setting, Linux support for FakeNet-NG introduces the following Linux-specific configuration items:

LinuxRedirectNonlocal: For MultiHost mode, this setting specifies a comma-delimited list of network interfaces for which to redirect all traffic to the local host so that FakeNet-NG can reply to it. The setting in FakeNet-NG’s default configuration is *, which configures FakeNet-NG to redirect on all interfaces.
LinuxFlushIptables: Deletes all iptables rules before adding rules for FakeNet-NG. The original rules are restored as part of FakeNet-NG’s shutdown sequence which is triggered when you hit Ctrl+C. This reduces the likelihood of conflicting, erroneous, or duplicate rules in the event of unexpected termination, and is enabled in FakeNet-NG’s default configuration.
LinuxFlushDnsCommand: Specifies the command to flush the DNS resolver cache. When using FakeNet-NG in SingleHost mode on Linux, this ensures that name resolution requests are forwarded to a DNS service such as the FakeNet-NG DNS listener instead of using cached answers. The setting is not applicable on all distributions of Linux, but is populated by default with the correct command for Ubuntu Linux. Refer to your distribution’s documentation for the proper command for this behavior.

Starting FakeNet-NG on Linux

Before using FakeNet-NG, also be sure to disable any services that may bind to ports corresponding to the FakeNet-NG listeners you plan to use. An example is Ubuntu’s use of a local dnsmasq service. You can use netstat to find such services and should refer to your Linux distribution’s documentation to determine how to disable them.

You can start FakeNet-NG by invoking fakenet with root privileges, as shown in Figure 1.

Figure 1: Starting FakeNet-NG on Linux

You can alter FakeNet-NG’s configuration by either directly editing the file displayed in the first line of FakeNet-NG’s output, or by creating a copy and specifying its location with the -c command-line option.

Conclusion

FakeNet-NG now brings the convenience of a modern, Python-based, malware-oriented network simulation tool to Linux, supporting the full complement of listeners that are available on FakeNet-NG for Windows. Users of REMnux can make use of FakeNet-NG already, while users of other Linux distributions can download and install it using standard package management tools.

Threat Research
FLARE Script Series: Querying Dynamic State using the FireEye Labs Query-Oriented Debugger (flare-qdb)Michael Bailey
4 January 2017 at 14:02

FLARE Script Series: Querying Dynamic State using the FireEye Labs Query-Oriented Debugger (flare-qdb)

Threat Research

By: Michael Bailey

4 January 2017 at 14:02

Introduction

This post continues the FireEye Labs Advanced Reverse Engineering (FLARE) script series. Here, we introduce flare-qdb, a command-line utility and Python module based on vivisect for querying and altering dynamic binary state conveniently, iteratively, and at scale. flare-qdb works on Windows and Linux, and can be obtained from the flare-qdb github project.

Motivation

Efficiently understanding complex or obfuscated malware frequently entails debugging. Often, the linear process of following the program counter raises questions about parallel or previous register states, state changes within a loop, or if an instruction will ever be executed at all. For example, a register’s value may not become interesting until the analyst learns it was an important input to a function that has already been executed. Restarting a debug session and cataloging the results can take the analyst out of the original thought process from which the question arose.

A malware analyst must constantly judge whether such inquiries will lend indispensable observations or extraneous distractions. The wrong decision wastes precious time. It would be useful to query malware state like a database, similar to the following:

SELECT eax, poi(ebp-0x14) FROM malware.exe WHERE eip = 0x401072

FLARE has devised a command-line tool to efficiently query dynamic state and more in a similar fashion. The following is a description of this tool with examples of how it has been used within FLARE to analyze malware, simulate new scenarios, and even solve challenges from the 2016 FLARE-On challenge.

Usage

Drawing heavily from vivisect, flare-qdb is an open source tool for efficiently posing sophisticated questions and obtaining simple answers from binaries. flare-qdb’s command-line syntax is as follows:

flareqdb "<cmdline>" -at <address> "<python>"

flare-qdb allows an analyst to execute a command line of their choosing, break on arbitrary program counter values, optionally check conditions, and display or alter program state with ad-hoc Python code. flare-qdb implements several WinDbg-like builtins for querying and modifying state. Table 1 lists a few illustrative example queries.

Experiment or Alteration	Query
What two DWORD arguments are passed to kernel32!Beep? (WinDbg analog: dd)	-at kernel32.Beep "dd('esp+4', 2)"
Terminate if eax is null at 0x401072 (WinDbg analog: .kill)	-at-if 0x401072 eax==0 "kill()"
Alter ecx programmatically (WinDbg analog: r)	-at malwaremodule+0x102a "r('ecx', '(ebp-0x14)*eax')
Alter memory programmatically	-at 0x401003 "memset('ebp-0x14', 0x2a, 4)"

Table 1: Example flare-qdb queries

Using the flareqdb Command Line

The usefulness of flare-qdb can be seen in cases such as loops dealing with strings. Figure 1 shows the flareqdb command line utility being used to dump the Unicode string pointed to by a stack variable for each iteration of a loop. The output reveals that the variable is used as a runner pointer iterating through argv[1].

Figure 1: Using flareqdb to monitor a string within a loop

Another example is challenge 4 from the 2016 FLARE-On Challenge (spoiler alert: partial solution presented below, full walkthrough is here).

In flareon2016challenge.dll, a decoded PE file contains a series of calls to kernel32!Beep that must be tracked in order to construct the correct sequence of calls to ordinal #50 in the challenge binary. Figure 2 shows a flareqdb one-liner that forwards each kernel32!Beep call to ordinal #50 in the challenge binary to obtain the flag.

Figure 2: Using flareqdb to solve challenge 4 of the 2016 FLARE-On Challenge

flareqdb can also force branches to be taken, evaluate function pointer values, and validate suspected function addresses by disassembling. For example, consider the subroutine in Figure 3, which is only invoked if a set of conditions is satisfied and which calls a C++ virtual function. Identifying this function could help the analyst identify its caller and discover what kind of data to provide through the command and control (C2) channel to exercise it.

Figure 3: Unidentified function with virtual function call

Using the flareqdb command-line utility, it is possible to divert the program counter to bypass checks on the C2 data that was provided and subsequently dump the address of the function pointer that is called by the malware at program counter 0x4029a4. Thanks to vivisect, flare-qdb can even disassemble the instructions at the resulting address to validate that it is indeed a function. Figure 4 shows the flareqdb command-line utility being used to force control flow at 0x4016b5 to proceed to 0x4016bb (not shown) and later to dump the function pointer called at 0x4029a4.

Figure 4: Forcing a branch and resolving a C++ virtual function call

The function pointer resolves to 0x402f32, which IDA has already labeled as basic_streambuf::xsputn as shown in Figure 5. This function inserts a series of characters into a file stream, which suggests a file write capability that might be exercised by providing a filename and/or file data via the C2 channel.

Figure 5: Resolved virtual function address

Using the flareqdb Python Module

flare-qdb also exists as a Python module that is useful for more complex cases. flare-qdb allows for ready use of the powerful vivisect library. Consider the logic in Figure 6, which is part of a privilege escalation tool. The tool checks GetVersionExW, NetWkstaGetInfo, and IsWow64Process before exploiting CVE-2016-0040 in WMI.

Figure 6: Privilege escalation platform check

It appears as if the tool exploits 32-bit Windows installations with version numbers 5.1+, 6.0, and 6.1. Figure 7 shows a script to quickly validate this by executing the tool 12 times, simulating different versions returned from GetVersionExW and NetWkstaInfo. Each time the script executes the malware, it indicates whether the malware reached the point of attempting the privilege escalation or not. The script passes a dictionary of local variables to the Qdb instance for each execution in order to permit the user callback to print the friendly name of each Windows version it is simulating for the binary. The results of GetVersionExW are modified prior to return using the vstruct definition of the OSVERSIONINFOEXW; NetWkstaGetInfo is fixed manually for brevity and in the absence of a definition corresponding to the WKSTA_INFO_100 structure.

Figure 7: Script to test version check

Figure 8 shows the output, which confirms the analysis of the logic from Figure 6.

Figure 8: Script output

Next, consider an example in which the analyst must devise a repeatable process to unpack a binary and ascertain the locations of unpacked PE-COFF files injected throughout memory. The script in Figure 9 does this by setting a breakpoint relative to the tail call and using vivisect’s envi module to enumerate all the RWX memory locations that are not backed by a named file. It then uses flare-qdb’s park() builtin before calling detach() so that the binary runs in an endless loop, allowing the analyst to attach a debugger and resume manual analysis.

Figure 9: Unpacker script that parks its debuggee after unpacking is complete

Figure 10 shows the script announcing the locations of the self-injected modules before parking the process in an infinite loop and detaching.

Figure 10: Result of unpacker script

Attaching with IDA Pro via WinDbg as in Figure 11 shows that the program counter points to the infinite loop written in memory allocated by flare-qdb. The park() builtin stored the original program counter value in the bytes following the jmp instruction. The analyst can return the program to its original location by referring to those bytes and entering the WinDbg command r eip=1DC129B.

Figure 11: Attaching to the parked process

The parked process makes it convenient to snapshot the malware execution VM and repeatedly connect remotely to exercise and annotate different code areas with IDA Pro as the debugger. Because the same OS process can be reused for multiple debug sessions, the memory map announced by the script remains the same across debugging sessions. This means that the annotations created in IDA Pro remain relevant instead of becoming disconnected from the varying data and code locations that would result from the non-deterministic heap addresses returned by VirtualAlloc if the program were simply executed multiple times.

Conclusion

flare-qdb provides a command-line tool to quickly query dynamic binary state without derailing the thought process of an ongoing debugging session. In addition to querying state, flare-qdb can be used to alter program state and simulate new scenarios. For intricate cases, flare-qdb has a scripting interface permitting almost arbitrary manipulation. This can be useful for string decoding, malware unpacking, and general software analysis. Head over to the flare-qdb github page to get started using it.