Normal view

There are new articles available, click to refresh the page.
Before yesterdayThreat Research

FLARE IDA Pro Script Series: MSDN Annotations Plugin for Malware Analysis

11 September 2014 at 22:00

The FireEye Labs Advanced Reverse Engineering (FLARE) Team continues to share knowledge and tools with the community. We started this blog series with a script for Automatic Recovery of Constructed Strings in Malware. As always, you can download these scripts at the following location: https://github.com/fireeye/flare-ida. We hope you find all these scripts as useful as we do.

 

Motivation

 

During my summer internship with the FLARE team, my goal was to develop IDAPython plug-ins that speed up the reverse engineering workflow in IDA Pro. While analyzing malware samples with the team, I realized that a lot of time is spent looking up information about functions, arguments, and constants at the Microsoft Developer Network (MSDN) website. Frequently switching to the developer documentation can interrupt the reverse engineering process, so we thought about ways to integrate MSDN information into IDA Pro automatically. In this blog post we will release a script that does just that, and we will show you how to use it.

 

Introduction

 

The MSDN Annotations plug-in integrates information about functions, arguments and return values into IDA Pro’s disassembly listing in the form of IDA comments. This allows the information to be integrated as seamlessly as possible. Additionally, the plug-in is able to automatically rename constants, which further speeds up the analyst workflow. The plug-in relies on an offline XML database file, which is generated from Microsoft’s documentation and IDA type library files.

 

Features

 

Table 1 shows what benefit the plug-in provides to an analyst. On the left you can see IDA Pro’s standard disassembly: seven arguments get pushed onto the stack and then the CreateFileA function is called. Normally an analyst would have to look up function, argument and possibly constant descriptions in the documentation to understand what this code snippet is trying to accomplish. To obtain readable constant values, an analyst would be required to research the respective argument, import the corresponding standard enumeration into IDA and then manually rename each value. The right side of Table 1 shows the result of executing our plug-in showing the support it offers to an analyst.

The most obvious change is that constants are renamed automatically. In this example, 40000000h was automatically converted to GENERIC_WRITE. Additionally, each function argument is renamed to a unique name, so the corresponding description can be added to the disassembly.

flare1

Table 1: Automatic labelling of standard symbolic constants

In Figure 1 you can see how the plug-in enables you to display function, argument, and constant information right within the disassembly. The top image shows how hovering over the CreateFileA function displays a short description and the return value. In the middle image, hovering over the hTemplateFile argument displays the corresponding description. And in the bottom image, you can see how hovering over dwShareMode, the automatically renamed constant displays descriptive information.

Functions

flare2

Arguments

flare3

Constants

flare4

Figure 1: Hovering function names, arguments and constants displays the respective descriptions

 

How it works

 

Before the plug-in makes any changes to the disassembly, it creates a backup of the current IDA database file (IDB). This file gets stored in the same directory as the current database and can be used to revert to the previous markup in case you do not like the changes or something goes wrong.

The plug-in is designed to run once on a sample before you start your analysis. It relies on an offline database generated from the MSDN documentation and IDA Pro type library (TIL) files. For every function reference in the import table, the plug-in annotates the function’s description and return value, adds argument descriptions, and renames constants. An example of an annotated import table is depicted in Figure 2. It shows how a descriptive comment is added to each API function call. In order to identify addresses of instructions that position arguments prior to a function call, the plug-in relies on IDA Pro’s markup.

flare5

Figure 2: Annotated import table

Figure 3 shows the additional .msdn segment the plug-in creates in order to store argument descriptions. This only impacts the IDA database file and does not modify the original binary.

flare6

Figure 3: The additional segment added to the IDA database

The .msdn segment stores the argument descriptions as shown in Figure 4. The unique argument names and their descriptive comments are sequentially added to the segment.

flare7

Figure 4: Names and comments inserted for argument descriptions

To allow the user to see constant descriptions by hovering over constants in the disassembly, the plug-in imports IDA Pro’s relevant standard enumeration and adds descriptive comments to the enumeration members. Figure 5 shows this for the MACRO_CREATE enumeration, which stores constants passed as dwCreationDisposition to CreateFileA.

flare8

Figure 5: Descriptions added to the constant enumeration members

 

Preparing the MSDN database file

 

The plug-in’s graphical interface requires you to have the QT framework and Python scripting installed. This is included with the IDA Pro 6.6 release. You can also set it up for IDA 6.5 as described here (http://www.hexblog.com/?p=333).

As mentioned earlier, the plug-in requires an XML database file storing the MSDN documentation. We cannot distribute the database file with the plug-in because Microsoft holds the copyright for it. However, we provide a script to generate the database file. It can be cloned from the git repository at https://github.com/fireeye/flare-ida together with the annotation plug-in.

You can take the following steps to setup the database file. You only have to do this once.

 

     

  1. Download and install an offline version of the MSDN documentationYou can download the Microsoft Windows SDK MSDN documentation. The standalone installer can be downloaded from http://www.microsoft.com/en-us/download/details.aspx?id=18950. Although it is not the newest SDK version, it includes all the needed information and data extraction is straight-forward.As shown in Figure 6, you can select to only install the help files. By default they are located in C:\Program Files\Microsoft SDKs\Windows\v7.0\Help\1033.

     

    flare9

    Figure 6: Installing a local copy of the MSDN documentation

     

  2. Extract the files with an archive manager like 7-zip to a directory of your choice.
  3. Download and extract tilib.exe from Hex-Ray’s download page at https://www.hex-rays.com/products/ida/support/download.shtml 

     

    To allow the plug-in to rename constants, it needs to know which enumerations to import. IDA Pro stores this information in TIL files located in %IDADIR%/til/. Hex-Rays provides a tool (tilib) to show TIL file contents via their download page for registered users. Download the tilib archive and extract the binary into %IDADIR%. If you run tilib without any arguments and it displays its help message, the program is running correctly.

  4. Run MSDN_crawler/msdn_crawler.py <path to extracted MSDN documentation> <path to tilib.exe> <path to til files>

     

     

    With these prerequisites fulfilled, you can run the MSDN_crawler.py script, located in the MSDN_crawler directory. It expects the path to the TIL files you want to extract (normally %IDADIR%/til/pc/) and the path to the extracted MSDN documentation. After the script finishes execution the final XML database file should be located in the MSDN_data directory.

  5.  

 

You can now run our plug-in to annotate your disassembly in IDA.

Running the MSDN annotations plug-in

In IDA, use File - Script file... (ALT + F7) to open the script named annotate_IDB_MSDN.py. This will display the dialog box shown in Figure 7 that allows you to configure the modifications the plug-in performs. By default, the plug-in annotates functions, arguments and rename constants. If you change the settings and execute the plug-in by clicking OK, your settings get stored in a configuration file in the plug-in’s directory. This allows you to quickly run the plug-in on other samples using your preferred settings. If you do not choose to annotate functions and/or arguments, you will not be able to see the respective descriptions by hovering over the element.

flare10

Figure 7: The plug-in’s configuration window showing the default settings

When you choose to use repeatable comments for function name annotations, the description is visible in the disassembly listing, as shown in Figure 8.

flare11

Figure 8: The plug-in’s preview of function annotations with repeatable comments

 

Similar Tools and Known Limitations

 

Parts of our solution were inspired by existing IDA Pro plug-ins, such as IDAScope and IDAAPIHelp. A special thank you goes out to Zynamics for their MSDN crawler and the IDA importer which greatly supported our development.

Our plug-in has mainly been tested on IDA Pro for Windows, though it should work on all platforms. Due to the structure of the MSDN documentation and limitations of the MSDN crawler, not all constants can be parsed automatically. When you encounter missing information you can extend the annotation database by placing files with supplemental information into the MSDN_data directory. In order to be processed correctly, they have to be valid XML following the schema given in the main database file (msdn_data.xml). However, if you want to extend partly existing function information, you only have to add the additional fields. Name tags are mandatory for this, as they get used to identify the respective element.

For example, if the parser did not recognize a commonly used constant, we could add the information manually. For the CreateFileA function’s dwDesiredAccess argument the additional information could look similar to Listing 1.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

<?xml version="1.0" encoding="ISO-8859-1"?>

<msdn>

<functions>

<function>

<name>CreateFileA</name>

<arguments>

<argument>

<name>dwDesiredAccess</name>

<constants enums="MACRO_GENERIC">

<constant>

<name>GENERIC_ALL</name>

<value>0x10000000</value>

<description>All possible access rights</description>

</constant>

<constant>

<name>GENERIC_EXECUTE</name>

<value>0x20000000</value>

<description>Execute access</description>

</constant>

<constant>

<name>GENERIC_WRITE</name>

<value>0x40000000</value>

<description>Write access</description>

</constant>

<constant>

<name>GENERIC_READ</name>

<value>0x80000000</value>

<description>Read access</description>

</constant>

</constants>

</argument>

</arguments>

</function>

</functions>

</msdn>

 

 

Listing 1: Additional information enhancing the dwDesiredAccess argument for the CreateFileA function

 

Conclusion

 

In this post, we showed how you can generate a MSDN database file used by our plug-in to automatically annotate information about functions, arguments and constants into IDA Pro’s disassembly. Furthermore, we talked about how the plug-in works, and how you can configure and customize it. We hope this speeds up your analysis process!

Stay tuned for the FLARE Team’s next post where we will release solutions for the FLARE On Challenge (www.flare-on.com).

 

FLARE Script Series: Recovering Stackstrings Using Emulation with ironstrings

28 February 2019 at 16:30

This blog post continues our Script Series where the FireEye Labs Advanced Reverse Engineering (FLARE) team shares tools to aid the malware analysis community. Today, we release ironstrings: a new IDAPython script to recover stackstrings from malware. The script leverages code emulation to overcome this common string obfuscation technique. More precisely, it makes use of our flare-emu tool, which combines IDA Pro and the Unicorn emulation engine. In this blog post, I explain how our new script uses flare-emu to recover stackstrings from malware. In addition, I discuss flare-emu’s event hooks and how you can use them to easily adapt the tool to your own analysis needs.

Introduction

Analyzing strings in binary files is an important part of malware analysis. Although simple, this reverse engineering technique can provide valuable information about a program’s use and its capabilities. This includes indicators of compromise like file paths and domain names. Especially during advanced analysis, strings are essential to understand a disassembled program’s functionality. Malware authors know this, and string obfuscation is one of the most common anti-analysis techniques reverse engineers encounter.

Due to the prevalence of obfuscated strings, the FLARE team has already developed and shared various tools and techniques to deal with them. In 2014 we published an IDA Pro plugin to automate the recovery of constructed strings in malware. In 2016, we released FLOSS; a standalone open-source tool to automatically identify and decode strings in malware.

Both solutions rely on vivisect, a Python-based program analysis and emulation framework. Although vivisect is a robust tool, it may fail to completely analyze an executable file or emulate its code correctly. And just like any tool, vivisect is susceptible to anti-analysis techniques. With missing, incomplete, or erroneous processing by vivisect, dependent tools cannot provide the best results. Moreover, vivisect does not provide an easy-to-use graphical interface to interactively change and enhance program analysis.

I encountered all these shortcomings recently, when I analyzed a GandCrab ransomware sample (version 5.0.4, SHA256 hash: 72CB1061A10353051DA6241343A7479F73CB81044019EC9A9DB72C41D3B3A2C7). The malware contains various anti-analysis techniques to hinder disassembly and control-flow analysis. Before I could perform any efficient reverse engineering in IDA Pro, I had to overcome these hurdles. I used IDAPython to remove various anti-analysis instruction patterns which then allowed the disassembler to successfully identify all functions in the binary. Many of the recovered functions contained obfuscated strings. Unfortunately, my changes did not propagate to vivisect, because it performs its own independent analysis on the original binary. Consequently, vivisect still failed to recognize most functions correctly and I couldn’t use one of our existing solutions to recover the obfuscated strings.

While I could have tried to feed my patches in IDA Pro back to vivisect or to create a modified binary, I instead created a new IDAPython script that does not depend on vivisect. Thus, circumventing the mentioned shortcomings. It uses IDA Pro’s program analysis and Unicorn’s emulation engine. The easy integration of these two tools is powered by flare-emu.

Using IDA Pro instead of vivisect resolves multiple limitations of our previous implementations. Now changes that users make in their IDB file, e.g. by patching instructions to manually enhance analysis, are immediately available during emulation. Moreover, the tool more robustly supports different architectures including x86, AMD64, and ARM.

Stackstrings: An Example

The disassembly listing in Figure 1 shows an example string obfuscation from the sample I analyzed. The malware creates a string at run-time by moving each character into adjacent stack addresses (gray highlights). Finally, the sample passes the string’s starting offset as an argument to the InternetOpen API call (blue highlight). Manually following these memory moves and restoring strings by hand is a very cumbersome process. Especially if malware complicates value assignments using additional instructions like illustrated below.


Figure 1: Disassembly listing showing stackstring creation and usage

Because malware often uses stack memory to create such strings, Jay Smith coined the term stackstrings for this anti-analysis technique. Note that malware can also construct strings in global memory. Our new script handles both cases; strings constructed on the stack and in global memory.

ironstrings: Stackstring Recovery Using flare-emu

The new IDAPython script is an evolution of our existing solutions. It combines FLOSS’s stackstring recovery algorithm and functionality from our IDA Pro plugin. The script relies on IDA Pro’s program analysis and emulates code using Unicorn. The combination of both tools is powered by flare-emuFe, short for flare-emu, is the chemical symbol for iron and hence the script is named ironstrings.

To recover stackstrings, ironstrings enumerates all disassembled functions in a program except for library and thunk functions as identified by IDA Pro. For each function, the script emulates various code paths through the function and searches for stackstrings based on two heuristics:

  1. Before all call instructions in the function. As stackstrings are often constructed and then passed to other functions, i.e. Windows APIs like CreateFile or InternetOpenUrl.
  2. At the end of a basic block containing more than five memory writes. The number of memory writes is configurable. This heuristic is helpful if the same memory buffer is used multiple times in a function and if the string construction spans multiple basic blocks.

If any of these conditions apply, the script searches the function’s current stack frame for printable ASCII and UTF-16 strings. To detect strings in global memory, the script additionally searches for strings in all memory locations that have been written to.

Using flare-emu Hooks to Recover Stackstrings

If you’re not already familiar with flare-emu, I recommend reading our previous blog post. It discusses some of the interfaces the tool provides. Other helpful resources are the examples and the project documentation available on the flare-emu GitHub.

The stackstrings script uses flare-emu’s iterateAllPaths API. The function iterates multiple code paths through a function. It first finds possible paths from function start to function end. The tool then forces the emulation down all identified code paths independent from the actual program state. This extensive code coverage allows ironstrings to recover strings constructed from many different emulation runs.

A key feature of flare-emu are the various hook functions that get triggered by different emulation events. These hooks, or callbacks, enable the development of very powerful automation tasks. The available hooks are a combination of Unicorn’s standard hooks, e.g., to hook memory access events, and multiple convenience hooks provided by flare-emu. The following section briefly describes the available callbacks in flare-emu and illustrates how the ironstrings script uses them to recover obfuscated strings.

  • instructionHook: This Unicorn standard hook is triggered before an instruction is emulated. ironstrings uses this hook to initiate the stackstrings extraction if a basic block contained enough memory writes, for example.
  • memAccessHook: This Unicorn standard hook is triggered when memory read or write events occur during emulation. In the stackstrings script this function stores data about all memory writes.
  • callHook: This flare-emu hook is activated before each function call. The hook’s return value is ignored. In the stackstrings script this hook triggers the extraction of stackstrings.
  • preEmuCallback: This flare-emu hook is called before each emulation run. It is only available in the iterate and the iterateAllPaths functions. The hook’s return value is ignored. ironstrings does not use this hook.
  • targetCallback: This flare-emu hook gets called whenever one of the specified target addresses is hit. It is only available in the iterate and the iterateAllPaths functions. The hook’s return value is ignored. The stackstrings script does not use this hook.

The code in Figure 2 shows the callback functions that flare-emu’s API currently supports, their signatures, and examples of how to use them. All callbacks receive an argument named hookData. This named dictionary allows the user to provide application specific data to use before, during, and after emulation. Often, this dictionary is named userData in the user-defined callbacks, as in the examples below, due to its naming in Unicorn. ironstrings uses this to access function analysis data and store recovered strings across its various hooks. The dictionary also provides access to the EmuHelper object and emulation meta data.


Figure 2: flare-emu example hook implementations

Installation

Download and install flare-emu as described at the GitHub installation pageironstrings is available, along with our other IDA Pro plugins and scripts, at our ironstrings GitHub page.

Note that both flare-emu and ironstrings were written using the new IDAPython API available in IDA Pro 7.0 and higher. They are not backwards compatible with previous program versions.

Usage and Options

To run the script in IDA Pro, go to File – Script File... (ALT+F7) and select ironstrings.py. The script runs automatically on all functions, prints its results to IDA Pro's output window, and adds comments at the locations where it recovered stackstrings. Figure 3 shows the script’s output of the recovered stackstring locations from the GandCrab sample. Analysis of this malware takes the script about 15 seconds.


Figure 3: Deobfuscated stackstrings and locations where they were identified

Figure 4 shows the disassembly listing of the stackstring creation example discussed at the beginning of this post after running ironstrings.


Figure 4: Commented stackstring after running ironstrings

After analyzing a sample, the script provides a summary and a unique listing of all recovered strings. The output for the ransomware sample is shown in Figure 5. Here the tool failed to analyze two functions due to invalid memory operations during Unicorn’s code emulation.


Figure 5: Script summary and unique string listing

Note that you can modify various options to change the script’s behavior. For example, you can configure the output format at the top of the ironstrings.py file. The script’s README file explains the options in more detail.

Conclusion

This blog post explains how our new IDAPython script ironstrings works and how you can use it to automatically recover stackstrings in IDA Pro. Overcoming anti-analysis techniques is just one of many useful applications of code emulation for malware analysis. This post shows that flare-emu provides the ideal base for this by integrating IDA Pro and Unicorn. The detailed discussion of flare-emu’s hook functions will help you to write your own powerful automation scripts. Please reach out to us with questions, suggestions and feedback via the flare-emu and flare-ida GitHub issue trackers.

❌
❌