In this blog post we will go into a user-friendly memory scanning Python library that was created out of the necessity of having more control during memory scanning. We will give an overview of how this library works, share the thought process and the why’s. This blog post will not cover the inner workings of the memory management of the respective platforms.
Memory Scanning
Memory scanning is the practice of iterating over the different processes running on a computer system and searching through their memory regions for a specific pattern. There can be a myriad of reasons to scan the memory of certain processes. The most common use cases are probably credential access (accessing the memory of the lsass.exe process for example), scanning for possible traces of malware and implants or recovery of interesting data, such as cryptographic material.
If time is as valuable to you as it is to us at Fox-IT, you probably noticed that performing a full memory scan looking for a pattern is a very time-consuming process, to say the least.
Why is scanning memory so time consuming when you know what you are looking for, and more importantly; how can this scanning process be sped up? While looking into different detection techniques to identify running Cobalt Strike beacons, we noticed something we could easily filter on, speeding up our scanning processes: memory attributes.
Speed up scanning with memory attributes
Memory attributes are comparable to the permission system we all know and love on our regular file and directory structures. The permission system dictates what kind of actions are allowed within a specific memory region and can be changed to different sets of attributes by their respective API calls.
The following memory attributes exist on both the Windows and UNIX platforms:
Read (R)
Write (W)
Execute (E)
The Windows platform has some extra permission attributes, plus quite an extensive list of allocation1 and protection2 attributes. These attributes can also be used to filter when looking for specific patterns within memory regions but are not important to go into right now.
So how do we leverage this information about attributes to speed up our scanning processes? It turns out that by filtering the regions to scan based on the memory attributes set for the regions, we can speed up our scanning process tremendously before even starting to look for our specified patterns.
Say for example we are looking for a specific byte pattern of an implant that is present in a certain memory region of a running process on the Windows platform. We already know what pattern we are looking for and we also know that the memory regions used by this specific implant are always set to:
Type
Protection
Initial
PRV
ERW
ERW
Table 1. Example of implant memory attributes that are set
Depending on what is running on the system, filtering on the above memory attributes already rules out a large portion of memory regions for most running processes on a Windows system.
If we take a notepad.exe process as an example, we can see that the different sections of the executable have their respective rights. The .text section of an executable contains executable code and is thus marked with the E permission as its protection:
If we were looking for just the sections and regions that are marked as being executable, we would only need to scan the .text section of the notepad.exe process. If we scan all the regions of every running process on the system, disregarding the memory attributes which are set, scanning for a pattern will take quite a bit longer.
Introducing Skrapa
We’ve incorporated the techniques described above into an easy to install Python package. The package is designed and tested to work on Linux and Microsoft Windows systems. Some of the notable features include:
Configurable scanning:
Scan all the process memory, specific processes by name or process identifier.
Regex and YARA support.
Support for user callback functions, define custom functions that execute routines when user specified conditions are met.
Easy to incorporate in bigger projects and scripts due to easy to use API.
The package was designed to be easily extensible by the end users, providing an API that can be leveraged to perform more.
Where to find Skrapa?
The Python library is available on our GitHub, together with some examples showing scenarios on how to use it.
Windows Defender (the antivirus shipped with standard installations of Windows) places malicious files into quarantine upon detection.
Reverse engineering mpengine.dll resulted in finding previously undocumented metadata in the Windows Defender quarantine folder that can be used for digital forensics and incident response.
Existing scripts that extract quarantined files do not process this metadata, even though it could be useful for analysis.
Fox-IT’s open-source digital forensics and incident response framework Dissect can now recover this metadata, in addition to recovering quarantined files from the Windows Defender quarantine folder.
dissect.cstruct allows us to use C-like structure definitions in Python, which enables easy continued research in other programming languages or reverse engineering in tools like IDA Pro.
Want to continue in IDA Pro? Just copy paste the structure definitions!
Introduction
During incident response engagements we often encounter antivirus applications that have rightfully triggered on malicious software that was deployed by threat actors. Most commonly we encounter this for Windows Defender, the antivirus solution that is shipped by default with Microsoft Windows. Windows Defender places malicious files in quarantine upon detection, so that the end user may decide to recover the file or delete it permanently. Threat actors, when faced with the detection capabilities of Defender, either disable the antivirus in its entirety or attempt to evade its detection.
The Windows Defender quarantine folder is valuable from the perspective of digital forensics and incident response (DFIR). First of all, it can reveal information about timestamps, locations and signatures of files that were detected by Windows Defender. Especially in scenarios where the threat actor has deleted the Windows Event logs, but left the quarantine folder intact, the quarantine folder is of great forensic value. Moreover, as the entire file is quarantined (so that the end user may choose to restore it), it is possible to recover files from quarantine for further reverse engineering and analysis.
While scripts already exist to recover files from the Defender quarantine folder, the purpose of much of the contents of this folder were previously unknown. We don’t like big unknowns, so we performed further research into the previously unknown metadata to see if we could uncover additional forensic traces.
Rather than just presenting our results, we’ve structured this blog to also describe the process to how we got there. Skip to the end if you are interested in the results rather than the technical details of reverse engineering Windows Defender.
In summary, whenever Defender puts a file into quarantine, it does three things: A bunch of metadata pertaining to when, why and how the file was quarantined is held in a QuarantineEntry. This QuarantineEntry is RC4-encrypted and saved to disk in the /ProgramData/Microsoft/Windows Defender/Quarantine/Entries folder.
The contents of the malicious file is stored in a QuarantineEntryResourceData file, which is also RC4-encrypted and saved to disk in the /ProgramData/Microsoft/Windows Defender/Quarantine/ResourceData folder.
Within the /ProgramData/Microsoft/Windows Defender/Quarantine/Resource folder, a Resource file is made. Both from previous research as well as from our own findings during reverse engineering, it appears this file contains no information that cannot be obtained from the QuarantineEntry and the QuarantineEntryResourceData files. Therefore, we ignore the Resource file for the remainder of this blog.
While previous scripts are able to recover some properties from the ResourceData and QuarantineEntry files, large segments of data were left unparsed, which gave us a hunch that additional forensic artefacts were yet to be discovered.
Windows Defender encrypts both the QuarantineEntry and the ResourceData files using a hardcoded RC4 key defined in mpengine.dll. This hardcoded key was initially published by Cuckoo and is paramount for the offline recovery of the quarantine folder.
Pivotting off of public scripts and Bauch’s whitepaper, we loaded mpengine.dll into IDA to further review how Windows Defender places a file into quarantine. Using the PDB available from the Microsoft symbol server, we get a head start with some functions and structures already defined.
Recovering metadata by investigating the QuarantineEntry file
Let us begin with the QuarantineEntry file. From this file, we would like to recover as much of the QuarantineEntry structure as possible, as this holds all kinds of valuable metadata. The QuarantineEntry file is not encrypted as one RC4 cipherstream, but consists of three chunks that are each individually encrypted using RC4.
These three chunks are what we have come to call QuarantineEntryFileHeader, QuarantineEntrySection1 and QuarantineEntrySection2.
QuarantineEntryFileHeader describes the size of QuarantineEntrySection1 and QuarantineEntrySection2, and contains CRC checksums for both sections.
QuarantineEntrySection1 contains valuable metadata that applies to all QuarantineEntryResource instances within this QuarantineEntry file, such as the DetectionName and the ScanId associated with the quarantine action.
QuarantineEntrySection2 denotes the length and offset of every QuarantineEntryResource instance within this QuarantineEntry file so that they can be correctly parsed individually.
A QuarantineEntry has one or more QuarantineEntryResource instances associated with it. This contains additional information such as the path of the quarantined artefact, and the type of artefact that has been quarantined (e.g. regkey or file).
An overview of the different structures within QuarantineEntry is provided in Figure 1:
Figure 1: An example overview of a QuarantineEntry. In this example, two files were simultaneously quarantined by Windows Defender. Hence, there are two QuarantineEntryResource structures contained within this single QuarantineEntry.
As QuarantineEntryFileHeader is mostly a structure that describes how QuarantineEntrySection1 and QuarantineEntrySection2 should be parsed, we will first look into what those two consist of.
QuarantineEntrySection1
When reviewing mpengine.dll within IDA, the contents of both QuarantineEntrySection1 and QuarantineEntrySection2 appear to be determined in the QexQuarantine::CQexQuaEntry::Commit function.
The function receives an instance of the QexQuarantine::CQexQuaEntry class. Unfortunately, the PDB file that Microsoft provides for mpengine.dll does not contain contents for this structure. Most fields could, however, be derived using the function names in the PDB that are associated with the CQexQuaEntry class:
Figure 2: Functions retrieving properties from QuarantineEntry
The Id, ScanId, ThreatId, ThreatName and Time fields are most important, as these will be written to the QuarantineEntry file.
At the start of the QexQuarantine::CQexQuaEntry::Commit function, the size of Section1 is determined.
Figure 3: Reviewing the decompiled output of CqExQuaEntry::Commit shows the size of QuarantineEntrySection1 being set to thre length of ThreatName plus 53.
This sets section1_size to a value of the length of the ThreatName variable plus 53. We can determine what these additional 53 bytes consist of by looking at what values are set in the QexQuarantine::CQexQuaEntry::Commit function for the Section1 buffer.
This took some experimentation and required trying different fields, offsets and sizes for the QuarantineEntrySection1 structure within IDA. After every change, we would review what these changes would do to the decompiled IDA view of the QexQuarantine::CQexQuaEntry::Commit function.
Some trial and error landed us the following structure definition:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
While reviewing the final decompiled output (right) for the assembly code (left), we noticed a field always being set to 1:
Figure 4: A field of QuarantineEntrySection1 always being set to the value of 1.
Given that we do not know what this field is used for, we opted to name the field ‘One’ for now. Most likely, it’s a boolean value that is always true within the context of the QexQuarantine::CQexQuaEntry::Commit commit function.
QuarantineEntrySection2
Now that we have a structure definition for the first section of a QuarantineEntry, we now move on to the second part. QuarantineEntrySection2 holds the number of QuarantineEntryResource objects confined within a QuarantineEntry, as well as the offsets into the QuarantineEntry structure where they are located.
In most scenarios, one threat gets detected at a time, and one QuarantineEntry will be associated with one QuarantineEntryResource. This is not always the case: for example, if one unpacks a ZIP folder that contains multiple malicious files, Windows Defender might place them all into quarantine. Each individual malicious file of the ZIP would then be one QuarantineEntryResource, but they are all confined within one QuarantineEntry.
QuarantineEntryResource
To be able to parse QuarantineEntryResource instances, we look into the CQexQuaResource::ToBinary function. This function receives a QuarantineEntryResource object, as well as a pointer to a buffer to which it needs to write the binary output to. If we can reverse the logic within this function, we can convert the binary output back into a parsed instance during forensic recovery.
Looking into the CQexQuaResource::ToBinary function, we see two very similar loops as to what was observed before for serializing the ThreatName of QuarantineEntrySection1. By reviewing various decrypted QuarantineEntry files, it quickly became apparent that these loops are responsible for reserving space in the output buffer for DetectionPath and DetectionType, with DetectionPath being UTF-16 encoded:
Figure 5: Reservation of space for DetectionPath and DetectionType at the beginning of CQexQuaResource::ToBinary
Fields
When reviewing the QexQuarantine::CQexQuaEntry::Commit function, we observed an interesting loop that (after investigating function calls and renaming variables) explains the data that is stored between the DetectionType and DetectionPath:
Figure 6: Alignment logic for serializing Fields
It appears QuarantineEntryResource structures have one or more QuarantineResourceField instances associated with them, with the number of fields associated with a QuarantineEntryResource being stored in a single byte in between the DetectionPath and DetectionType. When saving the QuarantineEntry to disk, fields have an alignment of 4 bytes. We could not find mentions of QuarantineEntryResourceField structures in prior Windows Defender research, even though they can hold valuable information.
The CQExQuaResource class has several different implementations of AddField, accepting different kinds of parameters. Reviewing these functions showed that fields have an Identifier, Type, and a buffer Data with a size of Size, resulting in a simple TLV-like format:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To understand what kinds of types and identifiers are possible, we delve further into the different versions of the AddField functions, which all accept a different data type:
Figure 7: Finding different field types based on different implementations of the CqExQuaResource::AddField function
Visiting these functions, we reviewed the Type and Size variables to understand the different possible types of fields that can be set for QuarantineResource instances. This yields the following FIELD_TYPE enum:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As the AddField functions are part of a virtual function table (vtable) of the CQexQuaResource class, we cannot trivially find all places where the AddField function is called, as they are not directly called (which would yield an xref in IDA). Therefore, we have not exhausted all code paths leading to a call of AddField to identify all possible Identifier values and how they are used. Our research yielded the following field identifiers as the most commonly observed, and of the most forensic value:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Especially CreationTime, LastAccessTime and LastWriteTime can provide crucial data points during an investigation.
Revisiting the QuarantineEntrySection2 and QuarantineEntryResource structures
Now that we have an understanding of how fields work and how they are stored within the QuarantineEntryResource, we can derive the following structure for it:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Revisiting the QexQuarantine::CQexQuaEntry::Commit function, we can now understand how this function determines at which offset every QuarantineEntryResource is located within QuarantineEntry. Using these offsets, we will later be able to parse individual QuarantineEntryResource instances. Thus, the QuarantineEntrySection2 structure is fairly straightforward:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The last step for recovery of QuarantineEntry: the QuarantineEntryFileHeader
Now that we have a proper understanding of the QuarantineEntry, we want to know how it ends up written to disk in encrypted form, so that we can properly parse the file upon forensic recovery. By inspecting the QexQuarantine::CQexQuaEntry::Commit function further, we can find how this ends up passing QuarantineSection1 and QuarantineSection2 to a function named CUserDatabase::Add.
We noted earlier that the QuarantineEntry contains three RC4-encrypted chunks. The first chunk of the file is created in the CUserDatabase::Add function, and is the QuarantineEntryHeader. The second chunk is QuarantineEntrySection1. The third chunk starts with QuarantineEntrySection2, followed by all QuarantineEntryResource structures and their 4-byte aligned QuarantineEntryResourceField structures.
We knew from Bauch’s work that the QuarantineEntryFileHeader has a static size of 60 bytes, and contains the size of QuarantineEntrySection1 and QuarantineEntrySection2. Thus, we need to decrypt the QuarantineEntryFileHeader first.
Based on Bauch’s work, we started with the following structure for QuarantineEntryFileHeader:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
That leaves quite some bytes unknown though, so we went back to trusty IDA. Inspecting the CUserDatabase:Add function helps us further understand the QuarantineEntryHeader structure. For example, we can see the hardcoded magic header and footer:
Figure 8: Magic header and footer being set for the QuarantineEntryHeader
A CRC checksum calculation can be seen for both the buffer of QuarantineEntrySection1 and QuarantineSection2:
Figure 9: CRC Checksum logic within CUserDatabase::Add
These checksums can be used upon recovery to verify the validity of the file. The CUserDatabase:Add function then writes the three chunks in RC4-encrypted form to the QuarantineEntry file buffer.
Based on these findings of the Magic header and footer and the CRC checksums, we can revise the structure definition for the QuarantineEntryFileHeader:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was the last piece to be able to parse QuarantineEntry structures from their on-disk form. However, we do not want just the metadata: we want to recover the quarantined files as well.
Recovering files by investigating QuarantineEntryResourceData
We can now correctly parse QuarantineEntry files, so it is time to turn our attention to the QuarantineEntryResourceData file. This file contains the RC4-encrypted contents of the file that has been placed into quarantine.
Step one: eyeball hexdumps
Let’s start by letting Windows Defender quarantine a Mimikatz executable and reviewing its output files in the quarantine folder. One would think that merely RC4 decrypting the QuarantineEntryResourceData file would result in the contents of the original file. However, a quick hexdump of a decrypted QuarantineEntryResourceData file shows us that there is more information contained within:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As visible in the hexdump, the MZ value (which is located at the beginning of the buffer of the Mimikatz executable) only starts at offset 0xCC. This gives reason to believe there is potentially valuable information preceding it.
There is also additional information at the end of the ResourceData file:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
At the end of the hexdump, we see an additional buffer, which some may recognize as the “Zone Identifier”, or the “Mark of the Web”. As this Zone Identifier may tell you something about where a file originally came from, it is valuable for forensic investigations.
Step two: open IDA
To understand where these additional buffers come from and how we can parse them, we again dive into the bowels of mpengine.dll. If we review the QuarantineFile function, we see that it receives a QuarantineEntryResource and QuarantineEntry as parameters. When following the code path, we see that the BackupRead function is called to write to a buffer of which we know that it will later be RC4-encrypted by Defender and written to the quarantine folder:
Figure 10: BackupRead being called withi nthe QuarantineFile function.
Step three: RTFM
A glance at the documentation of BackupRead reveals that this function returns a buffer seperated by Win32 stream IDs. The streams stored by BackupRead contain all data streams as well as security data about the owner and permissions of a file. On NTFS file systems, a file can have multiple data attributes or streams: the “main” unnamed data stream and optionally other named data streams, often referred to as “alternate data streams”. For example, the Zone Identifier is stored in a seperate Zone.Identifier data stream of a file. It makes sense that a function intended for backing up data preserves these alternate data streams as well.
The fact that BackupRead preserves these streams is also good news for forensic analysis. First of all, malicious payloads can be hidden in alternate data streams. Moreover, alternate datastreams such as the Zone Identifier and the security data can help to understand where a file has come from and what it contains. We just need to recover the streams as they have been saved by BackupRead!
Diving into IDA is not necessary, as the documentation tells us all that we need. For each data stream, the BackupRead function writes a WIN32_STREAM_ID to disk, which denotes (among other things) the size of the stream. Afterwards, it writes the data of the stream to the destination file and continues to the next stream. The WIN32_STREAM_ID structure definition is documented on the Microsoft Learn website:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
While reversing parts of mpengine.dll, we came across an interesting looking call in the HandleThreatDetection function. We appreciate that threats must be dealt with swiftly and with utmost discipline, but could not help but laugh at the curious choice of words when it came to naming this particular function. Figure 11: A function call to SendThreatToCamp, a ‘call’ to action that seems pretty harsh.
Implementing our findings into Dissect
We now have all structure definitions that we need to recover all metadata and quarantined files from the quarantine folder. There is only one step left: writing an implementation.
During incident response, we do not want to rely on scripts scattered across home directories and git repositories. This is why we integrate our research into Dissect.
We can leave all the boring stuff of parsing disks, volumes and evidence containers to Dissect, and write our implementation as a plugin to the framework. Thus, the only thing we need to do is parse the artefacts and feed the results back into the framework.
The dive into Windows Defender of the previous sections resulted in a number of structure definitions that we need to recover data from the Windows Defender quarantine folder. When making an implementation, we want our code to reflect these structure definitions as closely as possible, to make our code both readable and verifiable. This is where dissect.cstruct comes in. It can parse structure definitions and make them available in your Python code. This removes a lot of boilerplate code for parsing structures and greatly enhances the readability of your parser. Let’s review how easily we can parse a QuarantineEntry file using dissect.cstruct :
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As you can see, when the structure format is known, parsing it is trivial using dissect.cstruct. The only caveat is that the QuarantineEntryFileHeader, QuarantineEntrySection1 and QuarantineEntrySection2 structures are individually encrypted using the hardcoded RC4 key. Because only the size of QuarantineEntryFileHeader is static (60 bytes), we parse that first and use the information contained in it to decrypt the other sections.
To parse the individual fields contained within the QuarantineEntryResource, we have to do a bit more work. We cannot add the QuarantineEntryResourceField directly to the QuarantineEntryResource structure definition within dissect.cstruct, as it currently does not support the type of alignment used by Windows Defender. However, it does support the QuarantineEntryResourceField structure definition, so all we have to do is follow the alignment logic that we saw in IDA:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We can use dissect.cstruct‘s dumpstruct function to visualize our parsing to verify if we are correctly loading in all data:
And just like that, our parsing is done. Utilizing dissect.cstruct makes parsing structures much easier to understand and implement. This also facilitates rapid iteration: we have altered our structure definitions dozens of times during our research, which would have been pure pain without having the ability to blindly copy-paste structure definitions into our Python editor of choice.
Implementing the parser within the Dissect framework brings great advantages. We do not have to worry at all about the format in which the forensic evidence is provided. Implementing the Defender recovery as a Dissect plugin means it just works on standard forensic evidence formats such as E01 or ASDF, or against forensic packages the like of KAPE and Acquire, and even on a live virtual machine:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The full implementation of Windows Defender quarantine recovery can be observed on Github.
Conclusion
We hope to have shown that there can be great benefits to reverse engineering the internals of Microsoft Windows to discover forensic artifacts. By reverse engineering mpengine.dll, we were able to further understand how Windows Defender places detected files into quarantine. We could then use this knowledge to discover (meta)data that was previously not fully documented or understood. The main results of this are the recovery of more information about the original quarantined file, such as various timestamps and additional NTFS data streams, like the Zone.Identifier, which is information that can be useful in digital forensics or incident response investigations.
The documentation of QuarantineEntryResourceField was not available prior to this research and we hope others can use this to further investigate which fields are yet to be discovered. We have also documented how the BackupRead functionality is used by Defender to preserve the different data streams present in the NTFS file, including the Zone Identifier and Security Descriptor.
When writing our parser, using dissect.cstruct allowed us to tightly integrate our findings of reverse engineering in our parsing, enhancing the readability and verifiability of the code. This can in turn help others to pivot off of our research, just like we did when pivotting off of the research of others into the Windows Defender quarantine folder.
This research has been implemented as a plugin for the Dissect framework. This means that our parser can operate independently of the type of evidence it is being run against. This functionality has been added to dissect.target as of January 2nd 2023 and is installed with Dissect as of version 3.4.
Security information and event management (SIEM) tooling allows security teams to collect and analyse logs from a wide variety of sources. In turn this is used to detect and handle incidents. Evidently it is important to ensure that the log ingestion is complete and uninterrupted. Luckily SIEMs offer out-of-the-box solutions and/or capabilities to create custom health monitoring. In this blog post we will take a look at the health monitoring capabilities for log ingestion in Microsoft Sentinel.
Microsoft Sentinel
Microsoft Sentinel is the cloud-native Security information and event management (SIEM) and Security orchestration, automation, and response (SOAR) solution provided by Microsoft. It provides intelligent security analytics and threat intelligence across the enterprise, offering a single solution for alert detection, threat visibility, proactive hunting, and threat response. As a cloud-native solution, it can easily scale to accommodate the growing security needs of an organization and alleviate the cost of maintaining your own infrastructure.
Microsoft Sentinel utilizes Data Connectors to handle log ingestion. Microsoft Sentinel comes with out of the box connectors for Microsoft services, these are the service-to-service connectors. Additionally, there are many built-in connectors for third-party services, which utilize Syslog, Common Event Format (CEF) or REST APIs to connect the data sources to Microsoft Sentinel.
Besides logs from Microsoft services and third-party services, Sentinel can also collect logs from Azure VMs and non-Azure VMs. The log collection is done via the Azure Monitor Agent (AMA) or the Log Analytics Agent (MMA). As a brief aside, it’s important to note that the Log Analytics Agent is on a deprecation path and won’t be supported after August 31, 2024.
The state of the Data Connectors can be monitored with the out-of-the-box solutions or by creating a custom solution.
Microsoft provides two out-of-the-box features to perform health monitoring on the data connectors: The Data connectors health monitoring workbook & SentinelHealth data table.
Using the Data connectors health monitoring workbook
The Data collection health monitoring workbook is an out-of-the-box solution that provides insight regarding the log ingestion status, detection of anomalies and the health status of the Log Analytics agents.
The workbook consists of three tabs: Overview, Data collection anomalies & Agents info.
The Overview tab shows the general status of the log ingestions in the selected workspace. It contains data such as the Events per Second (EPS), data volume and time of the last log received. For the tab to function, the required Subscription and Workspace have to be selected at the top
The Data collection anomalies tab provides info for detecting anomalies in the log ingestion process. Each tab in the view presents a specific table. The General tab is a collection of a multiple tables.
We’re given a few configuration options for the view:
AnomaliesTimeRange: Define the total time range for the anomaly detection.
SampleInterval: Define the time interval in which data is sampled in the defined time range. Each time sample gets an anomaly score, which is used for the detection.
PositiveAlertThreshold: Define the positive anomaly score threshold.
NegativeAlertThreshold: Define the negative anomaly score threshold.
The view itself contains the expected amount of events, the actual amount of events & anomaly score per table. When a significant drop or rise in events is detected, a further investigation is advised. The logic behind the view can also be re-used to setup alerting when a certain threshold is exceeded.
The Agent info tab contains information about the health of the AMA and MMA agents installed on your Azure and non-Azure machines. The view allows you to monitor System location, Heartbeat status and latency, Available memory and disk space & Agent operations. There are two tabs in the view to choose between Azure machines only and all machines.
You can find the workbook under Microsoft Sentinel > Workbooks > Templates, then type Data collection health monitoring in the search field. Click View Template to open the workbook. If you plan on using the workbook frequently, hit the Save button so it shows up under My Workbooks.
The SentinelHealth data table
The SentinelHealth data table provides information on the health of your Sentinel resources. The content of the table is not limited to only the data connectors, but also the health of your automation rules, playbooks and analytic rules. Given the scope of this blog post, we will focus solely on the data connector events.
Currently the table has support for following data connectors:
Amazon Web Services (CloudTrail and S3)
Dynamics 365
Office 365
Microsoft Defender for Endpoint
Threat Intelligence – TAXII
Threat Intelligence Platforms
For the data connectors, there are two types of events: Data fetch status change & Data fetch failure summary.
The Data fetch status change events contain the status of the data fetching and additional information. The status is represented by Success or Failure and depending on the status, different additional information is given in the ExtendedProperties field:
For a Success, the field will contain the destination of the logs.
For a Failure, the field will contain an error message describing the failure. The content of this message depends on the failure type.
These events will be logged once an hour if the status is stable (i.e. status doesn’t change from Success to Failure and vice versa). Once a status change is detected it will be logged immediately.
The Data fetch failure summary events are logged once an hour, per connector, per workspace, with an aggregated failure summary. They are only logged when the connector has experienced polling errors during the given hour. The event itself contains additional information in the ExtendedProperties field, such as all the encountered failures and the time period for which the connector’s source platform was queried.
Using the SentinelHealth data table
Before we can start using the SentinelHealth table, we first have to enable it. Go to Microsoft Sentinel > Settings > Settings tab > Auditing and health monitoring, press Enable to enable the health monitoring.
Once the SentinelHealth table contains data, we can start querying on it. Below you’ll find some example queries to run.
List the latest failure per connector
SentinelHealth
| where TimeGenerated > ago(7d)
| where OperationName == "Data fetch status change"
| where Status == "Failure"
| summarize TimeGenerated = arg_max(TimeGenerated,*) by SentinelResourceName, SentinelResourceId
Connector status change from Failure to Success
let success_status = SentinelHealth
| where TimeGenerated > ago(1d)
| where OperationName == "Data fetch status change"
| where Status == "Success"
| summarize TimeGenerated = arg_max(TimeGenerated,*) by SentinelResourceName, SentinelResourceId;
let failure_status = SentinelHealth
| where TimeGenerated > ago(1d)
| where OperationName == "Data fetch status change"
| where Status == "Failure"
| summarize TimeGenerated = arg_max(TimeGenerated,*) by SentinelResourceName, SentinelResourceId;
success_status
| join kind=inner (failure_status) on SentinelResourceName, SentinelResourceId
| where TimeGenerated > TimeGenerated1
Connector status change from Success to Failure
let success_status = SentinelHealth
| where TimeGenerated > ago(1d)
| where OperationName == "Data fetch status change"
| where Status == "Success"
| summarize TimeGenerated = arg_max(TimeGenerated,*) by SentinelResourceName, SentinelResourceId;
let failure_status = SentinelHealth
| where TimeGenerated > ago(1d)
| where OperationName == "Data fetch status change"
| where Status == "Failure"
| summarize TimeGenerated = arg_max(TimeGenerated,*) by SentinelResourceName, SentinelResourceId;
success_status
| join kind=inner (failure_status) on SentinelResourceName, SentinelResourceId
| where TimeGenerated > TimeGenerated1
Custom Solutions
With the help of built-in Azure features and KQL queries, there is the possibility to create custom solutions. The idea is to create a KQL query and then have it executed by an Azure feature, such as Azure Monitor, Azure Logic Apps or as a Sentinel Analytics Rule. Below you’ll find two examples of custom solutions.
Log Analytics Alert
For the first example, we’ll setup an alert in the Log Analytics workspace where Sentinel is running on. The alert logic will run on a recurring basis and alert the necessary people when it is triggered. For starters, we’ll go the the Log Analytics Workspace and and start the creation of a new alert.
Select Custom log search for the signal and we’ll use the Connector status change from Success to Failure query example as logic.
Set both the aggregation and evaluation period to 1hr, so it doesn’t incur a high monthly cost. Next, attach an email Action Group to the alert, so the necessary people are informed of the failure.
Lastly, give the alert a severity level, name and description to finish off.
Logic App Teams Notification
For the second example, we’ll create a Logic App that will send an overview via Teams of all the tables with an anomalous score.
For starters, we’ll create a logic app and create a Workflow inside the logic app.
Inside the Workflow, we’ll design the logic for the Teams Notification. We’ll start off with a Recurrence trigger. Define an interval on which you’d like to receive notifications. In the example, an interval of two days was chosen.
Next, we’ll add the Run query and visualize results action. In this action, we have to define the Subscription, Resource Group, Resource Type, Resource Name, Query, Time Range and Chart Type. Define the first parameters to select your Log Analytics Workspace and then use following query. The query is based on the logic from the Data Connector Workbook. The query looks back on the data of the past two weeks with an interval of one day per data sample. If needed, the time period and interval can be increased or decreased. The UpperThreshold and LowerThreshold parameter can be adapted to make the detection more or less sensitive.
let UpperThreshold = 5.0; // Upper Anomaly threshold score
let LowerThreshold = -5.0; // Lower anomaly threshold score
let TableIgnoreList = dynamic(['SecurityAlert', 'BehaviorAnalytics', 'SecurityBaseline', 'ProtectionStatus']); // select tables you want to EXCLUDE from the results
union withsource=TableName1 *
| make-series count() on TimeGenerated from ago(14d) to now() step 1d by TableName1
| extend (anomalies, score, baseline) = series_decompose_anomalies(count_, 1.5, 7, 'linefit', 1, 'ctukey', 0.01)
| where anomalies[-1] == 1 or anomalies[-1] == -1
| extend Score = score[-1]
| where Score >= UpperThreshold or Score <= LowerThreshold
| where TableName1 !in (TableIgnoreList)
| project TableName=TableName1, ExpectedCount=round(todouble(baseline[-1]),1), ActualCount=round(todouble(count_[-1]),1), AnomalyScore = round(todouble(score[-1]),1)
Lastly, define the Time Range and Chart Type parameter. For Time Range pick Set in query and for Chart Type pick Html Table.
Now that the execution of the query is defined, we can define the sending of a Teams message. Select the Post message in a chat or channel action and configure the action to send the body of the query to a channel/person as Flowbot.
Once the Teams action is defined, the logic app is completed. When the logic app runs, you should expect an output similar to the image below. The parameters in the table can be analysed to detect Data Connector issues.
Conclusion
In conclusion, as stated in the intro, monitoring the health of data connectors is a critical part of ensuring an uninterrupted log ingestion process into the SIEM. Microsoft Sentinel offers great capabilities for monitoring the health of data connectors, thus enabling security teams to ensure the smooth functioning of log ingestion processes and promptly address any issues that may arise. The combination of the two out-of-the-box solutions and the flexibility to create custom monitoring solutions, makes Microsoft Sentinel a comprehensive and adaptable choice for managing and monitoring security events.
Frederik Meutermans
Frederik is a Senior Security Consultant in the Cloud Security Team. He specializes in the Microsoft Azure cloud stack, with a special focus on cloud security monitoring. He mainly has experience as security analyst and security monitoring engineer.
In application assessments you have to do the most effective work you can in the time period defined by the client to maximize the assurance you’re providing. At IncludeSec we’ve done a couple innovative things to improve the overall effectiveness of the work we do, and we’re always on the hunt for more ways to squeeze even more value into our assessments by finding more risks and finding them faster. One topic that we revisit frequently to ensure we’re doing the best we can to maximize efficiency is static analysis tooling (aka SAST.)
Recently we did a bit of a comparison example of two open source static analysis tools to automate detection of suspicious or vulnerable code patterns identified during assessments. In this post we discuss the experience of using Semgrep and Brakeman to create the same custom rule within each tool for a client’s Ruby on Rails assessment our team was assessing. We’re also very interested in trying out GitHub’s CodeQL, but unfortunately the Ruby support is still in development so that will have to wait for another time.
Semgrep is a pattern-matching tool that is semantically-aware and works with several languages (currently its Ruby support is marked as beta, so it is likely not at full maturity yet). Brakeman is a long-lived Rails-specific static-analysis tool, familiar to most who have worked with Rails security. Going in, I had no experience writing custom rules for either one.
This blog post is specifically about writing custom rules for code patterns that are particular to the project I’m assessing. First though I want to mention that both tools have some pre-built general rules for use with most Ruby/Rails projects — Brakeman has a fantastic set of built-in rules for Rails projects that has proven very useful on assessments (just make sure the developers of the project haven’t disabled rules in config/brakeman.yml, and yes we have seen developers do this to make SAST warnings go away!). Semgrep has an online registry of user-submitted rules for Ruby that is also handy (especially as examples for writing your own rules), but the current rule set for Ruby is not quite as extensive as Brakeman. In Brakeman the rules are known as “checks”, for consistency we’ll use the term “rules” for both tools, but you the reader can just keep that fact in mind.
First custom rule: Verification of authenticated functionality
I chose a simple pattern for the first rule I wanted to make, mainly to familiarize myself with the process of creating rules in both Semgrep and Brakeman. The application had controllers that handle non-API routes. These controllers enforced authentication by adding a before filter: before_action :login_required. Often in Rails projects, this line is included in a base controller class, then skipped when authentication isn’t required using skip_before_filter. This was not the case in the webapp I was looking at — the before filter was manually set in every single controller that needed authentication, which seemed error-prone as an overall architectural pattern, but alas it is the current state of the code base.
I wanted to get a list of any non-API controllers that lack this callback so I can ensure no functionality is unintentionally exposed without authentication. API routes handled authentication in a different way so consideration for them was not a priority for this first rule.
Semgrep
I went to the Semgrep website and found that Semgrep has a nice interactive tutorial, which walks you through building custom rules. I found it to be incredibly simple and powerful — after finishing the tutorial in about 20 minutes I thought I had all the knowledge I needed to make the rules I wanted. Although the site also has an online IDE for quick iteration I opted to develop locally, as the online IDE would require submitting our client’s code to a 3rd party which we obviously can’t do for security and liability reasons. The rule would eventually have to be run against the entire codebase anyways.
I encountered a few challenges when writing the rule:
It’s a little tricky to find things that do not match a pattern (e.g. lack of a login_required filter). You can’t just search all files for ones that don’t match, you have to have a pattern that it does search for, then exclude the cases matching your negative pattern. I was running into a bug here that made it even harder but the Semgrep team fixed it when we gave them a heads up!
Matching only classes derived from ApplicationController was difficult because Semgrep doesn’t currently trace base classes recursively, so any that were more than one level removed from ApplicationController would be excluded (for example, if there was a class DerivedController < ApplicationController, it wouldn’t match SecondLevelDerivedController < DerivedController.) The Semgrep team gave me a tip about using a metavariable regex to filter for classes ending in “Controller” which worked for this situation and added no false positives.
My final custom rule for Semgrep follows:
rules:
- id: controller-without-authn
patterns:
- pattern: |
class $CLASS
...
end
- pattern-not: |
class $CLASS
...
before_action ..., :login_required, ...
...
end
- metavariable-regex:
metavariable: '$CLASS'
regex: '.*Controller'
message: |
$CLASS does not use the login_required filter.
severity: WARNING
languages:
- ruby
I ran the rule using the following command: semgrep --config=../../../semgrep/ | grep "does not use"
The final grep is necessary because Semgrep will print the matched patterns which, in this case, were the entire classes. There’s currently no option in Semgrep to show only a list of matching files without the actual matched patterns themselves. That made it difficult to see the list of affected controllers, so I used grep on the output to filter the patterns out. This rule resulted in 47 identified controllers. Creating this rule originally took about two hours including going through the tutorial and debugging the issues I ran into but now that the issues are fixed I expect it would take less time in future iterations.
Overall I think the rule is pretty self-explanatory — it finds all files that define a class then excludes the ones that have a login_required before filter. Check out the semgrep tutorial lessons if you’re unsure.
Brakeman
Brakeman has a wiki page which describes custom rule building, but it doesn’t have a lot of detail about what functionality is available to you. It gives examples of finding particular method calls and whether user input finds their ways into these calls. There’s no example of finding controllers.
The page didn’t give any more about what I wanted to do so I headed off to Brakeman’s folder of built-in rules in GitHub to see if there are any examples of something similar there. There is a CheckSkipBeforeFilter rule which is very similar to what I want — it checks whether the login_required callback is skipped with skip_before_filter. As mentioned, the app isn’t implemented that way, but it showed me how to iterate controllers and check before filters.
This got me most of the way there but I also needed to skip API controllers for this particular app’s architecture. After about an hour of tinkering and looking through Brakeman controller tracker code I had the following rule:
require 'brakeman/checks/base_check'
class Brakeman::CheckControllerWithoutAuthn < Brakeman::BaseCheck
Brakeman::Checks.add self
@description = "Checks for a controller without authN"
def run_check
controllers = tracker.controllers.select do |_name, controller|
not check_filters controller
end
Hash[controllers].each do |name, controller|
warn :controller => name,
:warning_type => "No authN",
:warning_code => :basic_auth_password,
:message => "No authentication for controller",
:confidence => :high,
:file => controller.file
end
end
# Check whether a non-api controller has a :login_required before_filter
def check_filters controller
return true if controller.parent.to_s.include? "ApiController"
controller.before_filters.each do |filter|
next unless call? filter
next unless filter.first_arg.value == :login_required
return true
end
return false
end
end
Running it with brakeman --add-checks-path ../brakeman --enable ControllerWithoutAuthn -t ControllerWithoutAuthn resulted in 43 controllers without authentication — 4 fewer than Semgrep flagged.
Taking a close look at the controllers that Semgrep flagged and Brakeman did not, I realized the app is importing shared functionality from another module, which included a login_required filter. Therefore, Semgrep had 4 false positives that Brakeman did not flag. Since Semgrep works on individual files I don’t believe there’s an easy way to prevent those ones from being flagged.
Second custom rule: Verification of correct and complete authorization across functionality
The next case I wanted assurance on was vertical authorization at the API layer. ApiControllers in the webapp have a method authorization_permissions() which is called at the top of each derived class with a hash table of function_name/permission pairs. This function saves the passed hash table into an instance variable. ApiControllers have a before filter that, when any method is invoked, will look up the permission associated with the called method in the hash table and ensure that the user has the correct permission before proceeding.
Manual review was required to determine whether any methods had incorrect privileges, as analysis tools can’t understand business logic, but they can find methods entirely lacking authorization control — that was my goal for these rules.
Semgrep
Despite being seemingly a more complex scenario, this was still pretty straightforward in Semgrep:
rules:
- id: method-without-authz
patterns:
- pattern: |
class $CONTROLLER < ApiController
...
def $FUNCTION
...
end
...
end
- pattern-not: |
class $CONTROLLER < ApiController
...
authorization_permissions ... :$FUNCTION => ...
...
def $FUNCTION
...
end
...
end
message: |
Detected api controller $CONTROLLER which does not check for authorization for the $FUNCTION method
severity: WARNING
languages:
- ruby
It finds all methods on ApiControllers then excludes the ones that have some authorization applied. Semgrep found seven controllers with missing authorization checks.
Brakeman
I struggled to make this one in Brakeman at first, even thinking it might not be possible. The Brakeman team kindly guided me towards Collection#options which contains all method calls invoked at the class level excluding some common ones like before_filter. The following rule grabs all ApiControllers, looks through their options for the call to authorization_permissions, then checks whether each controller method has an entry in the authorization_permissions hash.
require 'brakeman/checks/base_check'
class Brakeman::CheckApicontrollerWithoutAuthz < Brakeman::BaseCheck
Brakeman::Checks.add self
@description = "Checks for an ApiController without authZ"
def run_check
# Find all api controllers
api_controllers = tracker.controllers.select do |_name, controller|
is_apicontroller controller
end
# Go through all methods on all ApiControllers
# and find if they have any methods that are not in the authorization matrix
Hash[api_controllers].each do |name, controller|
perms = controller.options[:authorization_permissions].first.first_arg.to_s
controller.each_method do |method_name, info|
if not perms.include? ":#{method_name})"
warn :controller => name,
:warning_type => "No authZ",
:warning_code => :basic_auth_password,
:message => "No authorization check for #{name}##{method_name}",
:confidence => :high,
:file => controller.file
end
end
end
end
def is_apicontroller controller
# Only care about controllers derived from ApiController
return controller.parent.to_s.include? "ApiController"
end
end
Using this rule Brakeman found the same seven controllers with missing authorization as Semgrep.
Conclusion
So who is the winner of this showdown? For Ruby, both tools are valuable, there is no definitive winner in our comparison when we’re specificially talking about custom rules. Currently I think Semgrep edges out Brakeman a bit for writing quick and dirty custom checks on assessments, as it’s faster to get going but it does have slightly more false positives in our limited comparison testing.
Semgrep rules are fairly intuitive to write and self explanatory; Brakeman requires additional understanding by looking into its source code to understand its architecture and also there is the need to use existing rules as a guide. After creating a few Brakeman rules it gets a lot easier, but the initial learning curve was a bit higher than other SAST tools. However, Brakeman has some sophisticated features that Semgrep does not, especially the user-input tracing functionality, that weren’t really shown in these examples. If some dangerous function is identified and you need to see if any user input gets to it (source/sink flow), that is a great Brakeman use case. Also, Brakeman’s default ruleset is great and I use them on every Rails test I do.
Ultimately Semgrep and Brakeman are both great tools with quirks and particular use-cases and deserve to be in your arsenal of SAST tooling. Enormous thanks to both Clint from the Semgrep team and Justin the creator of Brakeman for providing feedback on this post!
One of the things that has always been important in IncludeSec’s progress as a company is finding the best talent for the task at hand. We decided early on that if the best Python hacker in the world was not in the US then we would go find that person and work with them! Or whatever technology the project at hand is; C, Go, Ruby, Scala, Java, etc.
As it turns out the best Python hackers (and many other technologies) might actually be in Argentina. We’re not the only ones that have noticed this. Immunity Security, IOActive Security, Gotham Digital Science, and many others have a notable presence in Argentina (The NY Times even wrote an article on how great the hackers are there!) We’ve worked with dozens of amazing Argentinian hackers over the last six years comprising ~30% of our team and we’ve also enjoyed the quality of the security conferences like EkoParty in Buenos Aires.
As a small thank you to the entire Argentinian hacker scene, we’re going to do a free training class on May 30/31st 2019 teaching advanced web hacking techniques. This training is oriented towards hackers earlier in their career who have already experienced the world of OWASP top 10 and are looking to take their hacking skills to the next level.
If that sounds like you, you’re living in Argentina, and can make it to Buenos Aires on May 30th & 31st then this might be an awesome opportunity for you!
So Ashley Madison(AM) got hacked, it was first announced about a month ago and the attackers claimed they’d drop the full monty of user data if the AM website did not cease operations. The AM parent company Avid Life Media(ALM) did not cease business operations for the site and true to their word it seems the attackers have leaked everything they promised on August 18th 2015 including:
full database dumps of user data
emails
internal ALM documents
as well as a limited number of user passwords
Back in college I used to do forensics contests for the “Honey Net Project” and thought this might be a fun nostalgic trip to try and recreate my pseudo-forensics investigation style on the data within the AM leak.
Disclaimer:I will not be releasing any personal or confidential information within this blog post that may be found in the AM leak. The purpose of this blog post is to provide an honest holistic forensic analysis and minimal statistical analysis of the data found within the leak. Consider this a journalistic exploration more than anything.
Also note, that the credit card files were deleted and not reviewed as part of this write-up
———–[Grabbing the Leak]
First we go find where on the big bad dark web the release site is located. Thankfully knowing a shady guy named Boris pays off for me, and we find a torrent file for the release of the August 18th Ashley Madison user data dump. The torrent file we found has the following SHA1 hash.
e01614221256a6fec095387cddc559bffa832a19 impact-team-ashley-release.torrent
After extracting all the files we have the following sizes and
file hashes for evidence audit purposes:
The attackers make it clear they have no desire to bridge their dark web identities with their real-life identities and have taken many measures to ensure this does not occur.
The torrent file and messaging were released via the anonymous Tor network through an Onion web server which serves only HTML/TXT content. If the attacker took proper OPSEC precautions while setting up the server, law enforcement and AM may never find them. That being said hackers have been known to get sloppy and slip up their OPSEC. The two most famous cases of this were when Sabu of Anonymous and separately the Dread Pirate Roberts of SilkRoad; were both caught even though they primarily used Tor for their internet activities.
Within the dump we see that the files are signed with PGP. Signing a file in this manner is a way of saying “I did this” even though we don’t know the real-life identity of the person/group claiming to do this is (there is a bunch of crypto and math that makes this possible.) As a result we can be more confident that if there are files which are signed by this PGP key, then it was released by the same person/group.
In my opinion, this is done for two reasons. First the leaker wants to claim responsibility in an identity attributable manner, but not reveal their real-life identity. Secondly, the leaker wishes to dispel statements regarding “false leaks” made by the Ashley Madison team. The AM executive and PR teams have been in crises communications mode explaining that there have been manyfake leaks.
The “Impact Team” is using the following public PGP key to sign their releases.
Old: Public Key Packet(tag 6)(525 bytes)
Ver 4 - new
Public key creation time - Mon Jul 27 22:15:10 EDT 2015
Pub alg - RSA Encrypt or Sign(pub 1)
RSA n(4096 bits) - ...
RSA e(17 bits) - ...
Old: User ID Packet(tag 13)(36 bytes)
User ID - Impact Team <[email protected]>
Old: Signature Packet(tag 2)(568 bytes)
Ver 4 - new
Sig type - Positive certification of a User ID and Public Key packet(0x13).
Pub alg - RSA Encrypt or Sign(pub 1)
Hash alg - SHA1(hash 2)
Hashed Sub: signature creation time(sub 2)(4 bytes)
Time - Mon Jul 27 22:15:10 EDT 2015
Hashed Sub: key flags(sub 27)(1 bytes)
Flag - This key may be used to certify other keys
Flag - This key may be used to sign data
Hashed Sub: preferred symmetric algorithms(sub 11)(5 bytes)
Sym alg - AES with 256-bit key(sym 9)
Sym alg - AES with 192-bit key(sym 8)
Sym alg - AES with 128-bit key(sym 7)
Sym alg - CAST5(sym 3)
Sym alg - Triple-DES(sym 2)
Hashed Sub: preferred hash algorithms(sub 21)(5 bytes)
Hash alg - SHA256(hash 8)
Hash alg - SHA1(hash 2)
Hash alg - SHA384(hash 9)
Hash alg - SHA512(hash 10)
Hash alg - SHA224(hash 11)
Hashed Sub: preferred compression algorithms(sub 22)(3 bytes)
Comp alg - ZLIB <RFC1950>(comp 2)
Comp alg - BZip2(comp 3)
Comp alg - ZIP <RFC1951>(comp 1)
Hashed Sub: features(sub 30)(1 bytes)
Flag - Modification detection (packets 18 and 19)
Hashed Sub: key server preferences(sub 23)(1 bytes)
Flag - No-modify
Sub: issuer key ID(sub 16)(8 bytes)
Key ID - 0x24373CD574ABAA38
Hash left 2 bytes - e3 95
RSA m^d mod n(4096 bits) - ...
-> PKCS-1
Old: Public Subkey Packet(tag 14)(525 bytes)
Ver 4 - new
Public key creation time - Mon Jul 27 22:15:10 EDT 2015
Pub alg - RSA Encrypt or Sign(pub 1)
RSA n(4096 bits) - ...
RSA e(17 bits) - ...
Old: Signature Packet(tag 2)(543 bytes)
Ver 4 - new
Sig type - Subkey Binding Signature(0x18).
Pub alg - RSA Encrypt or Sign(pub 1)
Hash alg - SHA1(hash 2)
Hashed Sub: signature creation time(sub 2)(4 bytes)
Time - Mon Jul 27 22:15:10 EDT 2015
Hashed Sub: key flags(sub 27)(1 bytes)
Flag - This key may be used to encrypt communications
Flag - This key may be used to encrypt storage
Sub: issuer key ID(sub 16)(8 bytes)
Key ID - 0x24373CD574ABAA38
Hash left 2 bytes - 0b 61
RSA m^d mod n(4095 bits) - ...
-> PKCS-1
We can verify the released files are attributable to the PGP public key
in question using the following commands:
$ gpg --import ./74ABAA38.txt
$ gpg --verify ./member_details.dump.gz.asc ./member_details.dump.gz
gpg: Signature made Sat 15 Aug 2015 11:23:32 AM EDT using RSA key ID 74ABAA38
gpg: Good signature from "Impact Team <[email protected]>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 6E50 3F39 BA6A EAAD D81D ECFF 2437 3CD5 74AB AA38
This also tells us at what date the dump was signed and packaged.
———–[Catching the attackers]
The PGP key’s meta-data shows a user ID for the mailtor dark web email service. The last known location of which was: http://mailtoralnhyol5v.onion
Don’t bother emailing the email address found in the PGP key as it does not have a valid MX record. The fact that this exists at all seems to be one of those interesting artifact of what happens when Internet tools like GPG get used on the dark web.
If the AM attackers were to be caught; here (in no particular order) are the most likely ways this would happen:
The person(s) responsible tells somebody. Nobody keeps something like this a secret, if the attackers tell anybody, they’re likely going to get caught.
If the attackers review email from a web browser, they might get revealed via federal law enforcement or private investigation/IR teams hired by AM. The FBI is known to have these capabilities.
If the attackers slip up with their diligence in messaging only via TXT and HTML on the web server. Meta-data sinks ships kids — don’t forget.
If the attackers slip up with their diligence on configuring their server. One bad config of a web server leaks an internal IP, or worse!
The attackers slipped up during their persistent attack against AM and investigators hired by AM find evidence leading back to the attackers.
The attackers have not masked their writing or image creation style and leave some semantic finger print from which they can be profiled.
If none of those things happen, I don’t think these attackers will ever be caught. The cyber-crime fighters have a daunting task in front of them, I’ve helped out a couple FBI and NYPD cyber-crime fighters and I do not envy the difficult and frustrating job they have — good luck to them! Today we’re living in the Wild West days of the Internet.
———–[Leaked file extraction and evidence gathering]
Now to document the information seen within this data leak we proceed with a couple of commands to gather the file size and we’ll also check the file hashes to ensure the uniqueness of the files. Finally we review the meta-data of some of the compressed files. The meta-data shows the time-stamp embedded into the various compressed files. Although meta-data can easily be faked, it is usually not.
Next we’ll extract these files and examine their file size to take a closer look.
$ 7z e ashleymadisondump.7z
We find within the extracted 7zip file another 7zip file
“swappernet_User_Table.7z” was found and also extracted.
We now have the following files sizes and SHA1 hashes for evidence
integrity & auditing purposes:
$ du -sh ashleymadisondump/*
68K 20131002-domain-list.xlsx
52K ALMCLUSTER (production domain) computers.txt
120K ALMCLUSTER (production domain) hashdump.txt
68K ALM - Corporate Chart.pptx
256K ALM Floor Plan - ports and names.pdf
8.0M ALM - January 2015 - Company Overview.pptx
1.8M ALM Labs Inc. Articles of Incorporation.pdf
708K announcement.png
8.0K Areas of concern - customer data.docx
8.0K ARPU and ARPPU.docx
940K Ashley Madison Technology Stack v5(1).docx
16K Avid Life Media - Major Shareholders.xlsx
36K AVIDLIFEMEDIA (primary corporate domain) computers.txt
332K AVIDLIFEMEDIA (primary corporate domain) user information and hashes.txt
1.7M Avid Org Chart 2015 - May 14.pdf
24K Banks.xlsx
6.1M Copies of Option Agreements.pdf
8.0K Credit useage.docx
16K CSF Questionnaire (Responses).xlsx
132K Noel's loan agreement.pdf
8.0K Number of traveling man purchases.docx
1.5M oneperday_am_am_member.txt
940K oneperday_aminno_member.txt
672K oneperday.txt
44K paypal accounts.xlsx
372K [email protected]_20101103_133855.pdf
16K q2 2013 summary compensation detail_managerinput_trevor-s team.xlsx
8.0K README.txt
8.0K Rebill Success Rate Queries.docx
8.0K Rev by traffic source rebill broken out.docx
8.0K Rev from organic search traffic.docx
4.0K Sales Queries
59M swappernet_QA_User_Table.txt #this was extracted from swappernet_User_Table.7z in the same dir
17M swappernet_User_Table.7z
$ sha1sum ashleymadisondump/*
f0af9ea887a41eb89132364af1e150a8ef24266f 20131002-domain-list.xlsx
30401facc68dab87c98f7b02bf0a986a3c3615f0 ALMCLUSTER (production domain) computers.txt
c36c861fd1dc9cf85a75295e9e7bcf6cf04c7d2c ALMCLUSTER (production domain) hashdump.txt
6be635627aa38462ebcba9266bed5b492a062589 ALM - Corporate Chart.pptx
4dec7623100f59395b68fd13d3dcbbff45bef9c9 ALM Floor Plan - ports and names.pdf
601e0b462e1f43835beb66743477fe94bbda5293 ALM - January 2015 - Company Overview.pptx
d17cb15a5e3af15bc600421b10152b2ea1b9c097 ALM Labs Inc. Articles of Incorporation.pdf
1679eca2bc172cba0b5ca8d14f82f9ced77f10df announcement.png
6a618e7fc62718b505afe86fbf76e2360ade199d Areas of concern - customer data.docx
91f65350d0249211234a52b260ca2702dd2eaa26 ARPU and ARPPU.docx
50acee0c8bb27086f12963e884336c2bf9116d8a Ashley Madison Technology Stack v5(1).docx
71e579b04bbba4f7291352c4c29a325d86adcbd2 Avid Life Media - Major Shareholders.xlsx
ef8257d9d63fa12fb7bc681320ea43d2ca563e3b AVIDLIFEMEDIA (primary corporate domain) computers.txt
ec54caf0dc7c7206a7ad47dad14955d23b09a6c0 AVIDLIFEMEDIA (primary corporate domain) user information and hashes.txt
614e80a1a6b7a0bbffd04f9ec69f4dad54e5559e Avid Org Chart 2015 - May 14.pdf
c3490d0f6a09bf5f663cf0ab173559e720459649 Banks.xlsx
1538c8f4e537bb1b1c9a83ca11df9136796b72a3 Copies of Option Agreements.pdf
196b1ba40894306f05dcb72babd9409628934260 Credit useage.docx
2c9ba652fb96f6584d104e166274c48aa4ab01a3 CSF Questionnaire (Responses).xlsx
0068bc3ee0dfb796a4609996775ff4609da34acb Noel's loan agreement.pdf
c3b4d17fc67c84c54d45ff97eabb89aa4402cae8 Number of traveling man purchases.docx
9e6f45352dc54b0e98932e0f2fe767df143c1f6d oneperday_am_am_member.txt
de457caca9226059da2da7a68caf5ad20c11de2e oneperday_aminno_member.txt
d596e3ea661cfc43fd1da44f629f54c2f67ac4e9 oneperday.txt
37fdc8400720b0d78c2fe239ae5bf3f91c1790f4 paypal accounts.xlsx
2539bc640ea60960f867b8d46d10c8fef5291db7 [email protected]_20101103_133855.pdf
5bb6176fc415dde851262ee338755290fec0c30c q2 2013 summary compensation detail_managerinput_trevor-s team.xlsx
5435bfbf180a275ccc0640053d1c9756ad054892 README.txt
872f3498637d88ddc75265dab3c2e9e4ce6fa80a Rebill Success Rate Queries.docx
d4e80e163aa1810b9ec70daf4c1591f29728bf8e Rev by traffic source rebill broken out.docx
2b5f5273a48ed76cd44e44860f9546768bda53c8 Rev from organic search traffic.docx
sha1sum: Sales Queries: Is a directory
0f63704c118e93e2776c1ad0e94fdc558248bf4e swappernet_QA_User_Table.txt
9d67a712ef6c63ae41cbba4cf005ebbb41d92f33 swappernet_User_Table.7z
———–[Quick summary of each of the leaked files]
The following files are MySQL data dumps of the main AM database:
member_details.dump.gz
aminno_member.dump.gz
member_login.dump.gz
aminno_member_email.dump.gz
CreditCardTransactions.7z
Also included was another AM database which contains user info (separate from the emails):
am_am.dump.gz
In the top level directory you can also find these additional files:
74ABAA38.txt
Impact Team’s Public PGP key used for signing the releases (The .asc files are the signatures)
ashleymadisondump.7z
This contains various internal and corporate private files.
README
Impact Team’s justification for releasing the user data.
Various .asc files such as “member_details.dump.gz.asc”
These are all PGP signature files to prove that one or more persons who are part of the “Impact Team” attackers released them.
Within the ashleymadisondump.7z we can extract and view the following files:
Number of traveling man purchases.docx
SQL queries to investigate high-travel user’s purchases.
AVIDLIFEMEDIA (primary corporate domain) user information and hashes.txt
AVIDLIFEMEDIA (primary corporate domain) computers.txt
The output of the dnscmd windows command executing on what appears to be a primary domain controller. The timestamp indicates that the command was run on July 1st 2015. There is also “pwdump” style export of 1324 user accounts which appear to be from the ALM domain controller. These passwords will be easy to crack as NTLM hashes aren’t the strongest
Noel’s loan agreement.pdf
A promissory note for the CEO to pay back ~3MM in Canadian monies.
Areas of concern – customer data.docx
Appears to be a risk profile of the major security concerns that ALM has regarding their customer’s data. And yes, a major user data dump is on the list of concerns.
Banks.xlsx
A listing of all ALM associated bank account numbers and the biz which owns them.
Rev by traffic source rebill broken out.docx
Rebill Success Rate Queries.docx
Both of these are SQL queries to investigate Rebilling of customers.
README.txt
Impact Team statement regarding their motivations for the attack and leak.
Copies of Option Agreements.pdf
All agreements for what appears all of the company’s outstanding options.
paypal accounts.xlsx
Various user/passes for ALM paypal accounts (16 in total)
swappernet_QA_User_Table.txt
swappernet_User_Table.7z
This file is a database export into CSV format. I appears to be from a QA server
ALMCLUSTER (production domain) computers.txt
The output of the dnscmd windows command executing on what appears to be a production domain controller. The timestamp indicates that the command was run on July 1st 2015.
ALMCLUSTER (production domain) hashdump.txt
A “pwdump” style export of 1324 user accounts which appear to be from the ALM domain controller. These passwords will be easy to crack as NTLM hashes aren’t the strongest.
ALM Floor Plan – ports and names.pdf
Seating map of main office, this type of map is usually used for network deployment purposes.
ARPU and ARPPU.docx
A listing of SQL commands which provide revenue and other macro financial health info.
Presumably these queries would run on the primary DB or a biz intel slave.
Credit useage.docx
SQL queries to investigate credit card purchases.
Avid Org Chart 2015 – May 14.pdf
A per-team organizational chart of what appears to be the entire company.
announcement.png
The graphic created by Impact Team to announce their demand for ALM to shut down it’s flagship website AM.
[email protected]_20101103_133855.pdf
Contract outlining the terms of a purchase of the biz Seekingarrangement.com
CSF Questionnaire (Responses).xlsx
Company exec Critical Success Factors spreadsheet. Answering questions like “In what area would you hate to see something go wrong?” and the CTO’s response is about hacking.
ALM – January 2015 – Company Overview.pptx
This is a very detailed breakdown of current biz health, marketing spend, and future product plans.
Ashley Madison Technology Stack v5(1).docx
A detailed walk-through of all major servers and services used in the ALM production environment.
oneperday.txt
oneperday_am_am_member.txt
oneperday_aminno_member.txt
These three files have limited leak info as a “teaser” for the .dump files that are found in the highest level directory of the AM leak.
Rev from organic search traffic.docx
SQL queries to explore the revenue generated from search traffic.
20131002-domain-list.xlsx
BA list of the 1083 domain names that are, have been, or are seeking to be owned by ALM.
Sales Queries/
Empty Directory
ALM Labs Inc. Articles of Incorporation.pdf
The full 109 page Articles of Incorporation, ever aspect of inital company formation.
ALM – Corporate Chart.pptx
A detailed block diagram defining the relationship between various tax and legal business entity names related to ALM businesses.
Avid Life Media – Major Shareholders.xlsx
A listing of each major shareholder and their equity stake
———–[File meta-data analysis]
First we’ll take a look at the 7zip file in the top level directory.
If we’re to believe this meta-data, the newest file is from July 19th 2015 and the oldest is from October 19th 2012. The timestamp for the file announcement.png shows a creation date of July 10th 2015. This file is the graphical announcement from the leakers. The file swappernet_User_Table.7z
has a timestamp of July 9th 2015. Since this file is a database dump, one might presume that these files were created for the original release and the other files were copied from a file-system that preserves timestamps.
Within that 7zip file we’ve found another which looks like:
Within the ashleymadisondump directory extracted from ashleymadisondump.7z we’ve got
the following file types that we’ll examine for meta-data:
8 txt
8 docx
6 xlsx
6 pdf
2 pptx
1 png
1 7z
The PNG didn’t seem to have any EXIF meta-data, and we’ve already covered the 7z file.
The text files probably don’t usually yield anything to us meta-data wise.
In the MS Word docx files we have the following meta-data:
Areas of concern – customer data.docx
No Metadata
ARPU and ARPPU.docx
No Metadata
Ashley Madison Technology Stack v5(1).docx
Created Michael Morris, created and last modified on Sep 17 2013.
Credit useage.docx
No Metadata
Number of traveling man purchases.docx
No Metadata
Rebill Success Rate Queries.docx
No Metadata
Rev by traffic source rebill broken out.docx
No Metadata
Rev from organic search traffic.docx
No Metadata
In the MS Powerpoint pptx files we have the following meta-data:
ALM – Corporate Chart.pptx
Created by “Diana Horvat” on Dec 5 2012 and last updated by “Tatiana Kresling”
on Dec 13th 2012
ALM – January 2015 – Company Overview.pptx
Created Rizwan Jiwan, Jan 21 2011 and last modified on Jan 20 2015.
In the MS Excel xlsx files we have the following meta-data:
20131002-domain-list.xlsx
Written by Kevin McCall, created and last modified Oct 2nd 2013
Avid Life Media – Major Shareholders.xlsx
Jamal Yehia, created and last modified July 15th 2013
Banks.xlsx
Created by “Elena” and Keith Lalonde, created Dec 15 2009 and last modified Feb 26th 2010
CSF Questionnaire (Responses).xlsx
No Metadata
paypal accounts.xlsx
Created by Keith Lalonde, created Oct 28 2010 and last modified Dec 22nd 2010
q2 2013 summary compensation detail_managerinput_trevor-s team.xlsx
No Metadata
And finally within the PDF files we also see additional meta-data:
ALM Floor Plan – ports and names.pdf
Written by Martin Price in MS Visio, created and last modified April 23 2015
ALM Labs Inc. Articles of Incorporation.pdf
Created with DocsCorp Pty Ltd (www.docscorp.com), created and last modified on Oct 17 2012
Avid Org Chart 2015 – May 14.pdf
Created and last modified on May 14 2015
Copies of Option Agreements.pdf
OmniPage CSDK 16 OcrToolkit, created and last modified on Oct 16 2012
Noel’s loan agreement.pdf
Created and last modified on Sep 18 2013
[email protected]_20101103_133855.pdf
Created and last modified on Jul 7 2015
———–[MySQL Dump file loading and evidence gathering]
At this point all of the dump files have been decompressed with gunzip or 7zip. The dump files are standard MySQL backup file (aka Dump files) the info in the dump files implies that it was taken from multiple servers:
$ grep 'MySQL dump' *.dump
am_am.dump:-- MySQL dump 10.13 Distrib 5.5.33, for Linux (x86_64)
aminno_member.dump:-- MySQL dump 10.13 Distrib 5.5.40-36.1, for Linux (x86_64)
aminno_member_email.dump:-- MySQL dump 10.13 Distrib 5.5.40-36.1, for Linux (x86_64)
member_details.dump:-- MySQL dump 10.13 Distrib 5.5.40-36.1, for Linux (x86_64)
member_login.dump:-- MySQL dump 10.13 Distrib 5.5.40-36.1, for Linux (x86_64)
Also within the dump files was info referencing being executed from localhost, this implies an attacker was on the Database server in question.
Of course, all of this info is just text and can easily be faked, but it’s interesting none-the-less considering the possibility that it might be correct and unaltered.
To load up the MySQL dumps we’ll start with a fresh MySQL database instance
on a decently powerful server and run the following commands:
--As root MySQL user
CREATE DATABASE aminno;
CREATE DATABASE am;
CREATE USER 'am'@'localhost' IDENTIFIED BY 'loyaltyandfidelity';
GRANT ALL PRIVILEGES ON aminno.* TO 'am'@'localhost';
GRANT ALL PRIVILEGES ON am.* TO 'am'@'localhost';
Now back at the command line we’ll execute these to import the main dumps:
$ mysql -D aminno -uam -ployaltyandfidelity < aminno_member.dump
$ mysql -D aminno -uam -ployaltyandfidelity < aminno_member_email.dump
$ mysql -D aminno -uam -ployaltyandfidelity < member_details.dump
$ mysql -D aminno -uam -ployaltyandfidelity < member_login.dump
$ mysql -D am -uam -ployaltyandfidelity < am_am.dump
Now that you’ve got the data loaded up you can recreate some of the findings ksugihara made with his analysis here [Edit: It appears ksugihara has taken this offline, I don’t have a mirror]. We didn’t have much more to add for holistic statistics analysis than what he’s already done so check out his blog post for more on the primary data dumps. There still is one last final database export though…
Within the file ashleymadisondump/swappernet_QA_User_Table.txt we have a final database export, but this one is not in the MySQL dump format. It is instead in CSV format. The file name implies this was an export from a QA Database server.
This file has the following columns (left to right in the CSV):
recid
id
username
userpassword
refnum
disable
ipaddress
lastlogin
lngstatus
strafl
ap43
txtCoupon
bot
Sadly within the file we see user passwords are in clear text which is always a bad security practice. At the moment though we don’t know if these are actual production user account passwords, and if so how old they are. My guess is that these are from an old QA server when AM was a smaller company and hadn’t moved to secure password hashing practices like bcrypt.
These commands show us there are 765,607 records in this database export and
only four of them have a blank password. Many of the passwords repeat and
397,974 of the passwords are unique.
After importing the CSV into MS excel we can use sort and filter to make some
additional statements based on the data.
The only logins marked as “lastlogin” column in the year 2015 are from the
following users:
SIMTEST101
SIMTEST130
JULITEST2
JULITEST3
swappernetwork
JULITEST4
HEATSEEKERS
The final and most recent login was from AvidLifeMedia’s office IP range.
275,285 of these users have an entry for the txtCupon.
All users with the “bot” column set to TRUE have either passwords