❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayThreat Research

FLASHMINGO: The FireEye Open Source Automatic Analysis Tool for Flash

15 April 2019 at 15:00

Adobe Flash is one of the most exploited software components of the last decade. Its complexity and ubiquity make it an obvious target for attackers. Public sources list more than one thousand CVEs being assigned to the Flash Player alone since 2005. Almost nine hundred of these vulnerabilities have aΒ Common Vulnerability Scoring SystemΒ (CVSS) score of nine or higher.

After more than a decade of playing cat and mouse with the attackers, Adobe is finally deprecating Flash in 2020. To the security community this move is not a surprise since all major browsers have already dropped support for Flash.

A common misconception exists that Flash is already a thing of the past; however, history has shown us that legacy technologies linger for quite a long time. If organizations do not phase Flash out in time, the security threat may grow beyond Flash's end of life due to a lack of security patches.

As malware analysts on the FLARE team, we still see Flash exploits within malware samples. We must find a compromise between the need to analyse Flash samples and the correct amount of resources to be spent on a declining product. To this end we developed FLASHMINGO, a framework to automate the analysis of SWF files. FLASHMINGO enables analysts to triage suspicious Flash samples and investigate them further with minimal effort. It integrates into various analysis workflows as a stand-alone application or can be used as a powerful library. Users can easily extend the tool's functionality via custom Python plug-ins.

Background: SWF and ActionScript3

Before we dive into the inner workings of FLASHMINGO, let’s learn about the Flash architecture. Flash’s SWF files are composed of chunks, called tags, implementing a specific functionality. Tags are completely independent from each other, allowing for compatibility with older versions of Flash. If a tag is not supported, the software simply ignores it. The main source of security issues revolves around SWF’s scripting language: ActionScript3 (AS3). This scripting language is compiled into bytecode and placed within a Do ActionScript ByteCode (DoABC) tag. If a SWF file contains a DoABC tag, the bytecode is extracted and executed by a proprietary stack-based virtual machine (VM), known as AVM2 in the case of AS3, shipped within Adobe’s Flash player. The design of the AVM2 was based on the Java VM and was similarly plagued by memory corruption and logical issues that allowed malicious AS3 bytecode to execute native code in the context of the Flash player. In the few cases where the root cause of past vulnerabilities was not in the AVM2, ActionScript code was still necessary to put the system in a state suitable for reliable exploitation. For example, by grooming the heap before triggering a memory corruption. For these reasons, FLASHMINGO focuses on the analysis of AS3 bytecode.

Tool Architecture

FLASHMINGO leverages the open source SWIFFAS library to do the heavy lifting of parsing Flash files. All binary data and bytecode are parsed and stored in a large object named SWFObject. This object contains all the information about the SWF relevant to our analysis: a list of tags, information about all methods, strings, constants and embedded binary data, to name a few. It is essentially a representation of the SWF file in an easily queryable format.

FLASHMINGO is a collection of plug-ins that operate on the SWFObject and extract interesting information. Figure 1 shows the relationship between FLASHMINGO, its plug-ins, and the SWFObject.


Figure 1: High level software structure

Several useful plug-ins covering a wide range of common analysis are already included with FLASHMINGO, including:

  • Find suspicious method names. Many samples contain method names used during development, like β€œrun_shell” or β€œfind_virtualprotect”. This plug-in flags samples with methods containing suspicious substrings.
  • Find suspicious constants. The presence of certain constant values in the bytecode may point to malicious or suspicious code. For example, code containing the constant value 0x5A4D may be shellcode searching for an MZ header.
  • Find suspicious loops. Malicious activity often happens within loops. This includes encoding, decoding, and heap spraying. This plug-in flags methods containing loops with interesting operations such as XOR or bitwise AND. It is a simple heuristic that effectively detects most encoding and decoding operations, and otherwise interesting code to further analyse.
  • Retrieve all embedded binary data.
  • A decompiler plug-in that uses the FFDEC Flash Decompiler. This decompiler engine, written in Java, can be used as a stand-alone library. Since FLASHMINGO is written in Python, using this plug-in requires Jython to interoperate between these two languages.

Extending FLASHMINGO With Your Own Plug-ins

FLASHMINGO is very easy to extend. Every plug-in is located in its own directory under the plug-ins directory. At start-up FLASHMINGO searches all plug-in directories for a manifest file (explained later in the post) and registers the plug-in if it is marked as active.

To accelerate development a template plug-in is provided. To add your own plug-in, copy the template directory, rename it, and edit its manifest and code. The template plug-in’s manifest, written in YAML, is shown below:

```
# This is a template for easy development
name: Template
active: no
description: copy this to kickstart development
returns: nothing

```

The most important parameters in this file are: name and active. The name parameter is used internally by FLASHMINGO to refer to it. The active parameter is a Boolean value (yes or no) indicating whether this plug-in should be active or not. By default, all plug-ins (except the template) are active, but there may be cases where a user would want to deactivate a plug-in. The parameters description and returns are simple strings to display documentation to the user. Finally, plug-in manifests are parsed once at program start. Adding new plug-ins or enabling/disabling plug-ins requires restarting FLASHMINGO.

Now for the actual code implementing the business logic. The file plugin.py contains a class named Plugin; the only thing that is needed is to implement its run method. Each plug-in receives an instance of a SWFObject as a parameter. The code will interact with this object and return data in a custom format, defined by the user. This way, the user's plug-ins can be written to produce data that can be directly ingested by their infrastructure.

Let's see how easy it is to create plug-ins by walking through one that is included, named binary_data. This plugin returns all embedded data in a SWF file by default. If the user specifies an optional parameter pattern then the plug-in searches for matches of that byte sequence within the embedded data, returning a dictionary of embedded data and the offset at which the pattern was found.

First, we define the optional argument pattern to be supplied by the user (line 2 and line 4):

Afterwards, implement a custom run method and all other code needed to support it:

This is a simple but useful plugin and illustrates how to interact with FLASHMINGO. The plug-in has a logging facility accessible through the property β€œml” (line 2). By default it logs to FLASHMINGO’s main logger. If unspecified, it falls back to a log file within the plug-in’s directory. Line 10 to line 16 show the custom run method, extracting information from the SWF’s embedded data with the help of the custom _inspect_binary_data method. Note the source of this binary data: it is being read from a property named β€œswf”. This is the SWFObject passed to the plug-in as an argument, as mentioned previously. More complex analysis can be performed on the SWF file contents interacting with this swf object. Our repository contains documentation for all available methods of a SWFObject.

Conclusion

Even though Flash is set to reach its end of life at the end of 2020 and most of the development community has moved away from it a long time ago, we predict that we’ll see Flash being used as an infection vector for a while. Legacy technologies are juicy targets for attackers due to the lack of security updates. FLASHMINGO provides malware analysts a flexible framework to quickly deal with these pesky Flash samples without getting bogged down in the intricacies of the execution environment and file format.

Find the FLASHMINGO tool on the FireEye public GitHub Repository.

Solving Ad-hoc Problems with Hex-Rays API

10 April 2018 at 15:00

Introduction

IDA Pro is the de facto standard when it comes to binary reverse engineering. Besides being a great disassembler and debugger, it is possible to extend it and include a powerful decompiler by purchasing an additional license from Hex-Rays. The ability to switch between disassembled and decompiled code can greatly reduce the analysis time.

The decompiler (from now on referred to as Hex-Rays) has been around for a long time and has achieved a good level of maturity. However, there seems to be a lack of a concise and complete resources regarding this topic (tutorials or otherwise). In this blog, we aim to close that gap by showcasing examples where scripting Hex-Rays goes a long way.

Overview of a Decompiler

In order to understand how the decompiler works, it’s helpful to first review the normal compilation process.

Compilation and decompilation center around the concept of an Abstract Syntax Tree (AST). In essence, a compiler takes the source code, splits it into tokens according to a grammar, then these tokens are grouped into logical expressions. In this phase of the compilation process, referred to as parsing, the code structure is represented as a complex object, the AST. From the AST, the compiler will produce assembly code for the specified platform.

A decompiler takes the opposite route. From the given assembly code, it works back to produce an AST, and from this to produce pseudocode.

From all the intermediate steps between code and assembly, we are stressing the AST so much because most of the time you will spend using the Hex-Rays API, you will actually be reading and/or modifying the Abstract Syntax Tree (or ctree in Hex-Rays terminology).

Items, Expressions and Statements

Now we know that Hex-Rays’s ctree is a tree-like data structure. The nodes of this tree are either of type cinsn_t or cexpr_t. We will define these in a moment, but for now it is important to know that both derive from a very basic type, namely the citem_t type, as seen in the following code snippet:

Therefore, all nodes in the ctree will have the op property, which indicates the node type (variable, number, logical expression, etc.).

The type of op (ctype_t) is an enumeration where all constants are named either cit_<xyz> (for statements) or cot_<xyz> (for expressions). Keep this in mind, as it will be very important. A quick way to inspect all ctype_t constants and their values is to execute the following code snippet:

This produces the following output:

Let’s dive a bit deeper and explain the two types of nodes: expressions and statements.

It is useful to think about expressions as the β€œthe little logical elements” of your code. They range from simple types such as variables, strings or numerical constants, to small code constructs (assignments, comparisons, additions, logical operations, array indexing, etc.).

These are of type cexpr_t, a large structure containing several members. The members that can be accessed depend on its op value. For example, the member n to obtain the numeric value only makes sense when dealing with constants.

On the other side, we have statements. These correlate roughly to language keywords (if, for, do, while, return, etc.) Most of them are related to control flow and can be thought as β€œthe big picture elements” of your code.

Recapitulating, we have seen how the decompiler exposes this tree-like structure (the ctree), which consists of two types of nodes: expressions and statements. In order to extract information from or modify the decompiled code, we have to interact with the ctree nodes via methods dependent on the node type. However, the following question arises: β€œHow do we reach the nodes?”

This is done via a class exposed by Hex-Rays: the tree visitor (ctree_visitor_t). This class has two virtual methods, visit_insn and visit_expr, that are executed when a statement or expression is found while traversing the ctree. We can create our own visitor classes by inheriting from this one and overloading the corresponding methods.

Example Scripts

In this section, we will use the Hex-Rays API to solve two real-world problems:

  • Identify calls to GetProcAddress to dynamically resolve Windows APIs, assigning the resulting address to a global variable.
  • Display assignments related to stack strings as characters instead of numbers, for easier readability.

GetProcAddress

The first example we will walk through is how to automatically handle renaming global variables that have been dynamically resolved at run time. This is a common technique malware uses to hide its capabilities from static analysis tools. An example of dynamically resolving global variables using GetProcAddress is shown in Figure 1.


Figure 1: Dynamic API resolution using GetProcAddress

There are several ways to rename the global variables, with the simplest being manual copy and paste. However, this task is very repetitive and can be scripted using the Hex-Rays API.

In order to write any Hex-Rays script, it is important to first visualize the ctree. The Hex-Rays SDK includes a sample, sample5, which can be used to view the current function’s ctree. The amount of data shown in a ctree for a function can be overwhelming. A modified version of the sample was used to produce a picture of a sub-ctree for the function shown in Figure 1. The sub-ctree for the single expression: 'dword_1000B2D8 = (int)GetProcAdress(v0, "CreateThread");' is shown in Figure 2.


Figure 2: Sub-ctree for GetProcAddress assignment

With knowledge of the sub-ctree in use, we can write a script to automatically rename all the global variables that are being assigned using this method.

The code to automatically rename all the local variables is shown in Figure 3. The code works by traversing the ctree looking for calls to the GetProcAddress function. Once found, the code takes the name of the function being resolved and finds the global variable that is being set. The code then uses the IDA MakeName API to rename the address to the correct function.


Figure 3: Function renaming global variables

After the script has been executed, we can see in Figure 4 that all the global variables have been renamed to the appropriate function name.


Figure 4: Global variables renamed

Stack Strings

Our next example is a typical issue when dealing with malware: stack strings. This is a technique aimed to make the analysis harder by using arrays of characters instead of strings in the code. An example can be seen in Figure 5; the malware stores each character’s ASCII value in the stack and then references it in the call to sprintf. At a first glance, it’s very difficult to say what is the meaning of this string (unless of course, you know the ASCII table by heart).


Figure 5: Hex-Rays decompiler output. Stack strings are difficult to read.

Our script will modify these assignments to something more readable. The important part of our code is the ctree visitor mentioned earlier, which is shown in Figure 6.


Figure 6: Custom ctree visitor

The logic implemented here is pretty straightforward. We define our subclass of a ctree visitor (line 1) and override its visit_expr method. This will only kick in when an assignment is found (line 9). Another condition to be met is that the left side of the assignment is a variable and the right side a number (line 15). Moreover, the numeric value must be in the readable ASCII range (lines 20 and 21).

Once this kind of expression is found, we will change the type of the right side from a number to a string (lines 26 to 31), and replace its numerical value by the corresponding ASCII character (line 32).

The modified pseudocode after running this script is shown in Figure 7.


Figure 7: Assigned values shown as characters

You can find the complete scripts in our FLARE GitHub repository under decompiler scripts

Conclusion

These two admittedly simple examples should be able to give you an idea of the power of IDA’s decompiler API. In this post we have covered the foundations of all decompiler scripts: the ctree object, a structure composed by expressions and statements representing every element of the code as well the relationships between them. By creating a custom visitor we have shown how to traverse the tree and read or modify the code elements, therefore analyzing or modifying the pseudocode.

Hopefully, this post will motivate you to start writing your own scripts. This is only the beginning!

Do you want to learn more about these tools and techniques from FLARE? Then you should take one of our Black Hat classes in Las Vegas this summer! Our offerings include Malware Analysis Crash Course, macOS Malware for Reverse Engineers, and Malware Analysis Master Class.

References

Although written in 2009, one of the best references is still the original article on the Hex-Rays blog.

❌
❌