Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

Memory Scanning for the Masses

25 January 2024 at 00:00

Author: Axel Boesenach and Erik Schamper

In this blog post we will go into a user-friendly memory scanning Python library that was created out of the necessity of having more control during memory scanning. We will give an overview of how this library works, share the thought process and the why’s. This blog post will not cover the inner workings of the memory management of the respective platforms.

Memory Scanning

Memory scanning is the practice of iterating over the different processes running on a computer system and searching through their memory regions for a specific pattern. There can be a myriad of reasons to scan the memory of certain processes. The most common use cases are probably credential access (accessing the memory of the lsass.exe process for example), scanning for possible traces of malware and implants or recovery of interesting data, such as cryptographic material.

If time is as valuable to you as it is to us at Fox-IT, you probably noticed that performing a full memory scan looking for a pattern is a very time-consuming process, to say the least.

Why is scanning memory so time consuming when you know what you are looking for, and more importantly; how can this scanning process be sped up? While looking into different detection techniques to identify running Cobalt Strike beacons, we noticed something we could easily filter on, speeding up our scanning processes: memory attributes.

Speed up scanning with memory attributes

Memory attributes are comparable to the permission system we all know and love on our regular file and directory structures. The permission system dictates what kind of actions are allowed within a specific memory region and can be changed to different sets of attributes by their respective API calls.

The following memory attributes exist on both the Windows and UNIX platforms:

  • Read (R)
  • Write (W)
  • Execute (E)

The Windows platform has some extra permission attributes, plus quite an extensive list of allocation1 and protection2 attributes. These attributes can also be used to filter when looking for specific patterns within memory regions but are not important to go into right now.

So how do we leverage this information about attributes to speed up our scanning processes? It turns out that by filtering the regions to scan based on the memory attributes set for the regions, we can speed up our scanning process tremendously before even starting to look for our specified patterns.

Say for example we are looking for a specific byte pattern of an implant that is present in a certain memory region of a running process on the Windows platform. We already know what pattern we are looking for and we also know that the memory regions used by this specific implant are always set to:

TypeProtectionInitial
PRVERWERW
Table 1. Example of implant memory attributes that are set

Depending on what is running on the system, filtering on the above memory attributes already rules out a large portion of memory regions for most running processes on a Windows system.

If we take a notepad.exe process as an example, we can see that the different sections of the executable have their respective rights. The .text section of an executable contains executable code and is thus marked with the E permission as its protection:

If we were looking for just the sections and regions that are marked as being executable, we would only need to scan the .text section of the notepad.exe process. If we scan all the regions of every running process on the system, disregarding the memory attributes which are set, scanning for a pattern will take quite a bit longer.

Introducing Skrapa

We’ve incorporated the techniques described above into an easy to install Python package. The package is designed and tested to work on Linux and Microsoft Windows systems. Some of the notable features include:

  • Configurable scanning:
    • Scan all the process memory, specific processes by name or process identifier.
  • Regex and YARA support.
  • Support for user callback functions, define custom functions that execute routines when user specified conditions are met.
  • Easy to incorporate in bigger projects and scripts due to easy to use API.

The package was designed to be easily extensible by the end users, providing an API that can be leveraged to perform more.

Where to find Skrapa?

The Python library is available on our GitHub, together with some examples showing scenarios on how to use it.

GitHub: https://github.com/fox-it/skrapa

References

  1. https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc ↩︎
  2. https://learn.microsoft.com/en-us/windows/win32/Memory/memory-protection-constants ↩︎

Reverse, Reveal, Recover: Windows Defender Quarantine Forensics

14 December 2023 at 05:13

Max Groot and Erik Schamper

TL;DR

  • Windows Defender (the antivirus shipped with standard installations of Windows) places malicious files into quarantine upon detection.
  • Reverse engineering mpengine.dll resulted in finding previously undocumented metadata in the Windows Defender quarantine folder that can be used for digital forensics and incident response.
  • Existing scripts that extract quarantined files do not process this metadata, even though it could be useful for analysis.
  • Fox-IT’s open-source digital forensics and incident response framework Dissect can now recover this metadata, in addition to recovering quarantined files from the Windows Defender quarantine folder.
  • dissect.cstruct allows us to use C-like structure definitions in Python, which enables easy continued research in other programming languages or reverse engineering in tools like IDA Pro.
    • Want to continue in IDA Pro? Just copy paste the structure definitions!

Introduction

During incident response engagements we often encounter antivirus applications that have rightfully triggered on malicious software that was deployed by threat actors. Most commonly we encounter this for Windows Defender, the antivirus solution that is shipped by default with Microsoft Windows. Windows Defender places malicious files in quarantine upon detection, so that the end user may decide to recover the file or delete it permanently. Threat actors, when faced with the detection capabilities of Defender, either disable the antivirus in its entirety or attempt to evade its detection.

The Windows Defender quarantine folder is valuable from the perspective of digital forensics and incident response (DFIR). First of all, it can reveal information about timestamps, locations and signatures of files that were detected by Windows Defender. Especially in scenarios where the threat actor has deleted the Windows Event logs, but left the quarantine folder intact, the quarantine folder is of great forensic value. Moreover, as the entire file is quarantined (so that the end user may choose to restore it), it is possible to recover files from quarantine for further reverse engineering and analysis.

While scripts already exist to recover files from the Defender quarantine folder, the purpose of much of the contents of this folder were previously unknown. We don’t like big unknowns, so we performed further research into the previously unknown metadata to see if we could uncover additional forensic traces.

Rather than just presenting our results, we’ve structured this blog to also describe the process to how we got there. Skip to the end if you are interested in the results rather than the technical details of reverse engineering Windows Defender.

Diving into Windows Defender internals

Existing Research

We started by looking into existing research into the internals of Windows Defender. The most extensive documentation we could find on the structures of Windows Defender quarantine files was Florian Bauchs’ whitepaper analyzing antivirus software quarantine files, but we also looked at several scripts on GitHub.

  • In summary, whenever Defender puts a file into quarantine, it does three things:
    A bunch of metadata pertaining to when, why and how the file was quarantined is held in a QuarantineEntry. This QuarantineEntry is RC4-encrypted and saved to disk in the /ProgramData/Microsoft/Windows Defender/Quarantine/Entries folder.
  • The contents of the malicious file is stored in a QuarantineEntryResourceData file, which is also RC4-encrypted and saved to disk in the /ProgramData/Microsoft/Windows Defender/Quarantine/ResourceData folder.
  • Within the /ProgramData/Microsoft/Windows Defender/Quarantine/Resource folder, a Resource file is made. Both from previous research as well as from our own findings during reverse engineering, it appears this file contains no information that cannot be obtained from the QuarantineEntry and the QuarantineEntryResourceData files. Therefore, we ignore the Resource file for the remainder of this blog.

While previous scripts are able to recover some properties from the ResourceData and QuarantineEntry files, large segments of data were left unparsed, which gave us a hunch that additional forensic artefacts were yet to be discovered.

Windows Defender encrypts both the QuarantineEntry and the ResourceData files using a hardcoded RC4 key defined in mpengine.dll. This hardcoded key was initially published by Cuckoo and is paramount for the offline recovery of the quarantine folder.

Pivotting off of public scripts and Bauch’s whitepaper, we loaded mpengine.dll into IDA to further review how Windows Defender places a file into quarantine. Using the PDB available from the Microsoft symbol server, we get a head start with some functions and structures already defined.

Recovering metadata by investigating the QuarantineEntry file

Let us begin with the QuarantineEntry file. From this file, we would like to recover as much of the QuarantineEntry structure as possible, as this holds all kinds of valuable metadata. The QuarantineEntry file is not encrypted as one RC4 cipherstream, but consists of three chunks that are each individually encrypted using RC4.

These three chunks are what we have come to call QuarantineEntryFileHeader, QuarantineEntrySection1 and QuarantineEntrySection2.

  • QuarantineEntryFileHeader describes the size of QuarantineEntrySection1 and QuarantineEntrySection2, and contains CRC checksums for both sections.
  • QuarantineEntrySection1 contains valuable metadata that applies to all QuarantineEntryResource instances within this QuarantineEntry file, such as the DetectionName and the ScanId associated with the quarantine action.
  • QuarantineEntrySection2 denotes the length and offset of every QuarantineEntryResource instance within this QuarantineEntry file so that they can be correctly parsed individually.

A QuarantineEntry has one or more QuarantineEntryResource instances associated with it. This contains additional information such as the path of the quarantined artefact, and the type of artefact that has been quarantined (e.g. regkey or file).

An overview of the different structures within QuarantineEntry is provided in Figure 1:

upload_55de55932e8d2b42e392875c9982dfb5
Figure 1: An example overview of a QuarantineEntry. In this example, two files were simultaneously quarantined by Windows Defender. Hence, there are two QuarantineEntryResource structures contained within this single QuarantineEntry.

As QuarantineEntryFileHeader is mostly a structure that describes how QuarantineEntrySection1 and QuarantineEntrySection2 should be parsed, we will first look into what those two consist of.

QuarantineEntrySection1

When reviewing mpengine.dll within IDA, the contents of both QuarantineEntrySection1 and QuarantineEntrySection2 appear to be determined in the
QexQuarantine::CQexQuaEntry::Commit function.

The function receives an instance of the QexQuarantine::CQexQuaEntry class. Unfortunately, the PDB file that Microsoft provides for mpengine.dll does not contain contents for this structure. Most fields could, however, be derived using the function names in the PDB that are associated with the CQexQuaEntry class:

upload_5238619cbda300bf7ebb129b3a592985
Figure 2: Functions retrieving properties from QuarantineEntry

The Id, ScanId, ThreatId, ThreatName and Time fields are most important, as these will be written to the QuarantineEntry file.

At the start of the QexQuarantine::CQexQuaEntry::Commit function, the size of Section1 is determined.

upload_a6065a0a572b7fd2230d23c07ce25c02
Figure 3: Reviewing the decompiled output of CqExQuaEntry::Commit shows the size of QuarantineEntrySection1 being set to thre length of ThreatName plus 53.

This sets section1_size to a value of the length of the ThreatName variable plus 53. We can determine what these additional 53 bytes consist of by looking at what values are set in the QexQuarantine::CQexQuaEntry::Commit function for the Section1 buffer.

This took some experimentation and required trying different fields, offsets and sizes for the QuarantineEntrySection1 structure within IDA. After every change, we would review what these changes would do to the decompiled IDA view of the QexQuarantine::CQexQuaEntry::Commit function.

Some trial and error landed us the following structure definition:

struct QuarantineEntrySection1 {
CHAR Id[16];
CHAR ScanId[16];
QWORD Timestamp;
QWORD ThreatId;
DWORD One;
CHAR DetectionName[];
};
view raw defender-1.c hosted with ❤ by GitHub

While reviewing the final decompiled output (right) for the assembly code (left), we noticed a field always being set to 1:

upload_fd8ee20d24251cb21ef29720c1b2a3a8
Figure 4: A field of QuarantineEntrySection1 always being set to the value of 1.

Given that we do not know what this field is used for, we opted to name the field ‘One’ for now. Most likely, it’s a boolean value that is always true within the context of the QexQuarantine::CQexQuaEntry::Commit commit function.

QuarantineEntrySection2

Now that we have a structure definition for the first section of a QuarantineEntry, we now move on to the second part. QuarantineEntrySection2 holds the number of QuarantineEntryResource objects confined within a QuarantineEntry, as well as the offsets into the QuarantineEntry structure where they are located.

In most scenarios, one threat gets detected at a time, and one QuarantineEntry will be associated with one QuarantineEntryResource. This is not always the case: for example, if one unpacks a ZIP folder that contains multiple malicious files, Windows Defender might place them all into quarantine. Each individual malicious file of the ZIP would then be one QuarantineEntryResource, but they are all confined within one QuarantineEntry.

QuarantineEntryResource

To be able to parse QuarantineEntryResource instances, we look into the CQexQuaResource::ToBinary function. This function receives a QuarantineEntryResource object, as well as a pointer to a buffer to which it needs to write the binary output to. If we can reverse the logic within this function, we can convert the binary output back into a parsed instance during forensic recovery.

Looking into the CQexQuaResource::ToBinary function, we see two very similar loops as to what was observed before for serializing the ThreatName of QuarantineEntrySection1. By reviewing various decrypted QuarantineEntry files, it quickly became apparent that these loops are responsible for reserving space in the output buffer for DetectionPath and DetectionType, with DetectionPath being UTF-16 encoded:

upload_2673150155ac8022e30f0a5819615b59
Figure 5: Reservation of space for DetectionPath and DetectionType at the beginning of CQexQuaResource::ToBinary

Fields

When reviewing the QexQuarantine::CQexQuaEntry::Commit function, we observed an interesting loop that (after investigating function calls and renaming variables) explains the data that is stored between the DetectionType and DetectionPath:

upload_abea0cbdffa486b1db6ee9440caf85d5
Figure 6: Alignment logic for serializing Fields

It appears QuarantineEntryResource structures have one or more QuarantineResourceField instances associated with them, with the number of fields associated with a QuarantineEntryResource being stored in a single byte in between the DetectionPath and DetectionType. When saving the QuarantineEntry to disk, fields have an alignment of 4 bytes. We could not find mentions of QuarantineEntryResourceField structures in prior Windows Defender research, even though they can hold valuable information.

The CQExQuaResource class has several different implementations of AddField, accepting different kinds of parameters. Reviewing these functions showed that fields have an Identifier, Type, and a buffer Data with a size of Size, resulting in a simple TLV-like format:

struct QuarantineEntryResourceField {
WORD Size;
WORD Identifier:12;
FIELD_TYPE Type:4;
CHAR Data[Size];
};
view raw defender-2.c hosted with ❤ by GitHub

To understand what kinds of types and identifiers are possible, we delve further into the different versions of the AddField functions, which all accept a different data type:

upload_e3940f49457a6a21316e894c863cedf3
Figure 7: Finding different field types based on different implementations of the CqExQuaResource::AddField function

Visiting these functions, we reviewed the Type and Size variables to understand the different possible types of fields that can be set for QuarantineResource instances. This yields the following FIELD_TYPE enum:

enum FIELD_TYPE : WORD {
STRING = 0x1,
WSTRING = 0x2,
DWORD = 0x3,
RESOURCE_DATA = 0x4,
BYTES = 0x5,
QWORD = 0x6,
};
view raw defender-3.c hosted with ❤ by GitHub

As the AddField functions are part of a virtual function table (vtable) of the CQexQuaResource class, we cannot trivially find all places where the AddField function is called, as they are not directly called (which would yield an xref in IDA). Therefore, we have not exhausted all code paths leading to a call of AddField to identify all possible Identifier values and how they are used. Our research yielded the following field identifiers as the most commonly observed, and of the most forensic value:

enum FIELD_IDENTIFIER : WORD {
CQuaResDataID_File = 0x02,
CQuaResDataID_Registry = 0x03,
Flags = 0x0A,
PhysicalPath = 0x0C,
DetectionContext = 0x0D,
Unknown = 0x0E,
CreationTime = 0x0F,
LastAccessTime = 0x10,
LastWriteTime = 0x11,
};
view raw defender-4.c hosted with ❤ by GitHub

Especially CreationTime, LastAccessTime and LastWriteTime can provide crucial data points during an investigation.

Revisiting the QuarantineEntrySection2 and QuarantineEntryResource structures

Now that we have an understanding of how fields work and how they are stored within the QuarantineEntryResource, we can derive the following structure for it:

struct QuarantineEntryResource {
WCHAR DetectionPath[];
WORD FieldCount;
CHAR DetectionType[];
};
view raw defender-5.c hosted with ❤ by GitHub

Revisiting the QexQuarantine::CQexQuaEntry::Commit function, we can now understand how this function determines at which offset every QuarantineEntryResource is located within QuarantineEntry. Using these offsets, we will later be able to parse individual QuarantineEntryResource instances. Thus, the QuarantineEntrySection2 structure is fairly straightforward:

struct QuarantineEntrySection2 {
DWORD EntryCount;
DWORD EntryOffsets[EntryCount];
};
view raw defender-6.c hosted with ❤ by GitHub

The last step for recovery of QuarantineEntry: the QuarantineEntryFileHeader

Now that we have a proper understanding of the QuarantineEntry, we want to know how it ends up written to disk in encrypted form, so that we can properly parse the file upon forensic recovery. By inspecting the QexQuarantine::CQexQuaEntry::Commit function further, we can find how this ends up passing QuarantineSection1 and QuarantineSection2 to a function named CUserDatabase::Add.

We noted earlier that the QuarantineEntry contains three RC4-encrypted chunks. The first chunk of the file is created in the CUserDatabase::Add function, and is the QuarantineEntryHeader. The second chunk is QuarantineEntrySection1. The third chunk starts with QuarantineEntrySection2, followed by all QuarantineEntryResource structures and their 4-byte aligned QuarantineEntryResourceField structures.

We knew from Bauch’s work that the QuarantineEntryFileHeader has a static size of 60 bytes, and contains the size of QuarantineEntrySection1 and QuarantineEntrySection2. Thus, we need to decrypt the QuarantineEntryFileHeader first.

Based on Bauch’s work, we started with the following structure for QuarantineEntryFileHeader:

struct QuarantineEntryHeader {
char magic[16];
char unknown1[24];
uint32_t section1_size;
uint32_t section2_size;
char unknown[12];
};
view raw defender-7.c hosted with ❤ by GitHub

That leaves quite some bytes unknown though, so we went back to trusty IDA. Inspecting the CUserDatabase:Add function helps us further understand the QuarantineEntryHeader structure. For example, we can see the hardcoded magic header and footer:

upload_98b8e4a53c747b0e1a87190ef49328a3
Figure 8: Magic header and footer being set for the QuarantineEntryHeader

A CRC checksum calculation can be seen for both the buffer of QuarantineEntrySection1 and QuarantineSection2:

upload_1a2abe99ea699e4305c04f708fb8905d
Figure 9: CRC Checksum logic within CUserDatabase::Add

These checksums can be used upon recovery to verify the validity of the file. The CUserDatabase:Add function then writes the three chunks in RC4-encrypted form to the QuarantineEntry file buffer.

Based on these findings of the Magic header and footer and the CRC checksums, we can revise the structure definition for the QuarantineEntryFileHeader:

struct QuarantineEntryFileHeader {
CHAR MagicHeader[4];
CHAR Unknown[4];
CHAR _Padding[32];
DWORD Section1Size;
DWORD Section2Size;
DWORD Section1CRC;
DWORD Section2CRC;
CHAR MagicFooter[4];
};
view raw defender-8.c hosted with ❤ by GitHub

This was the last piece to be able to parse QuarantineEntry structures from their on-disk form. However, we do not want just the metadata: we want to recover the quarantined files as well.

Recovering files by investigating QuarantineEntryResourceData

We can now correctly parse QuarantineEntry files, so it is time to turn our attention to the QuarantineEntryResourceData file. This file contains the RC4-encrypted contents of the file that has been placed into quarantine.

Step one: eyeball hexdumps

Let’s start by letting Windows Defender quarantine a Mimikatz executable and reviewing its output files in the quarantine folder. One would think that merely RC4 decrypting the QuarantineEntryResourceData file would result in the contents of the original file. However, a quick hexdump of a decrypted QuarantineEntryResourceData file shows us that there is more information contained within:

max@dissect $ hexdump -C mimikatz_resourcedata_rc4_decrypted.bin | head -n 20
00000000 03 00 00 00 02 00 00 00 a4 00 00 00 00 00 00 00 |…………….|
00000010 00 00 00 00 01 00 04 80 14 00 00 00 30 00 00 00 |…………0…|
00000020 00 00 00 00 4c 00 00 00 01 05 00 00 00 00 00 05 |….L………..|
00000030 15 00 00 00 a4 14 d2 9b 1a 02 a7 4f 07 f6 37 b4 |………..O..7.|
00000040 e8 03 00 00 01 05 00 00 00 00 00 05 15 00 00 00 |…………….|
00000050 a4 14 d2 9b 1a 02 a7 4f 07 f6 37 b4 01 02 00 00 |…….O..7…..|
00000060 02 00 58 00 03 00 00 00 00 00 14 00 ff 01 1f 00 |..X………….|
00000070 01 01 00 00 00 00 00 05 12 00 00 00 00 00 18 00 |…………….|
00000080 ff 01 1f 00 01 02 00 00 00 00 00 05 20 00 00 00 |………… …|
00000090 20 02 00 00 00 00 24 00 ff 01 1f 00 01 05 00 00 | …..$………|
000000a0 00 00 00 05 15 00 00 00 a4 14 d2 9b 1a 02 a7 4f |……………O|
000000b0 07 f6 37 b4 e8 03 00 00 01 00 00 00 00 00 00 00 |..7………….|
000000c0 00 ae 14 00 00 00 00 00 00 00 00 00 4d 5a 90 00 |…………MZ..|
000000d0 03 00 00 00 04 00 00 00 ff ff 00 00 b8 00 00 00 |…………….|
000000e0 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 |….@………..|
000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |…………….|
00000100 00 00 00 00 00 00 00 00 20 01 00 00 0e 1f ba 0e |…….. …….|
00000110 00 b4 09 cd 21 b8 01 4c cd 21 54 68 69 73 20 70 |….!..L.!This p|
00000120 72 6f 67 72 61 6d 20 63 61 6e 6e 6f 74 20 62 65 |rogram cannot be|
00000130 20 72 75 6e 20 69 6e 20 44 4f 53 20 6d 6f 64 65 | run in DOS mode|
view raw defender-hex-1 hosted with ❤ by GitHub

As visible in the hexdump, the MZ value (which is located at the beginning of the buffer of the Mimikatz executable) only starts at offset 0xCC. This gives reason to believe there is potentially valuable information preceding it.

There is also additional information at the end of the ResourceData file:

max@dissect $ hexdump -C mimikatz_resourcedata_rc4_decrypted.bin | tail -n 10
0014aed0 00 00 00 00 52 00 00 00 00 00 00 00 2c 00 00 00 |….R…….,…|
0014aee0 3a 00 5a 00 6f 00 6e 00 65 00 2e 00 49 00 64 00 |:.Z.o.n.e…I.d.|
0014aef0 65 00 6e 00 74 00 69 00 66 00 69 00 65 00 72 00 |e.n.t.i.f.i.e.r.|
0014af00 3a 00 24 00 44 00 41 00 54 00 41 00 5b 5a 6f 6e |:.$.D.A.T.A.[Zon|
0014af10 65 54 72 61 6e 73 66 65 72 5d 0d 0a 5a 6f 6e 65 |eTransfer]..Zone|
0014af20 49 64 3d 33 0d 0a 52 65 66 65 72 72 65 72 55 72 |Id=3..ReferrerUr|
0014af30 6c 3d 43 3a 5c 55 73 65 72 73 5c 75 73 65 72 5c |l=C:\Users\user\|
0014af40 44 6f 77 6e 6c 6f 61 64 73 5c 6d 69 6d 69 6b 61 |Downloads\mimika|
0014af50 74 7a 5f 74 72 75 6e 6b 2e 7a 69 70 0d 0a |tz_trunk.zip..|
view raw defender-hex-2 hosted with ❤ by GitHub

At the end of the hexdump, we see an additional buffer, which some may recognize as the “Zone Identifier”, or the “Mark of the Web”. As this Zone Identifier may tell you something about where a file originally came from, it is valuable for forensic investigations.

Step two: open IDA

To understand where these additional buffers come from and how we can parse them, we again dive into the bowels of mpengine.dll. If we review the QuarantineFile function, we see that it receives a QuarantineEntryResource and QuarantineEntry as parameters. When following the code path, we see that the BackupRead function is called to write to a buffer of which we know that it will later be RC4-encrypted by Defender and written to the quarantine folder:

upload_a5cf05e6acd61d331963cd4c3ef9f95a
Figure 10: BackupRead being called withi nthe QuarantineFile function.

Step three: RTFM

A glance at the documentation of BackupRead reveals that this function returns a buffer seperated by Win32 stream IDs. The streams stored by BackupRead contain all data streams as well as security data about the owner and permissions of a file. On NTFS file systems, a file can have multiple data attributes or streams: the “main” unnamed data stream and optionally other named data streams, often referred to as “alternate data streams”. For example, the Zone Identifier is stored in a seperate Zone.Identifier data stream of a file. It makes sense that a function intended for backing up data preserves these alternate data streams as well.

The fact that BackupRead preserves these streams is also good news for forensic analysis. First of all, malicious payloads can be hidden in alternate data streams. Moreover, alternate datastreams such as the Zone Identifier and the security data can help to understand where a file has come from and what it contains. We just need to recover the streams as they have been saved by BackupRead!

Diving into IDA is not necessary, as the documentation tells us all that we need. For each data stream, the BackupRead function writes a WIN32_STREAM_ID to disk, which denotes (among other things) the size of the stream. Afterwards, it writes the data of the stream to the destination file and continues to the next stream. The WIN32_STREAM_ID structure definition is documented on the Microsoft Learn website:

typedef struct _WIN32_STREAM_ID {
STREAM_ID StreamId;
STREAM_ATTRIBUTES StreamAttributes;
QWORD Size;
DWORD StreamNameSize;
WCHAR StreamName[StreamNameSize / 2];
} WIN32_STREAM_ID;
view raw defender-9.c hosted with ❤ by GitHub

Who slipped this by the code review?

While reversing parts of mpengine.dll, we came across an interesting looking call in the HandleThreatDetection function. We appreciate that threats must be dealt with swiftly and with utmost discipline, but could not help but laugh at the curious choice of words when it came to naming this particular function.
upload_f98fac728a52164d573769c06a18b18f
Figure 11: A function call to SendThreatToCamp, a ‘call’ to action that seems pretty harsh.

Implementing our findings into Dissect

We now have all structure definitions that we need to recover all metadata and quarantined files from the quarantine folder. There is only one step left: writing an implementation.

During incident response, we do not want to rely on scripts scattered across home directories and git repositories. This is why we integrate our research into Dissect.

We can leave all the boring stuff of parsing disks, volumes and evidence containers to Dissect, and write our implementation as a plugin to the framework. Thus, the only thing we need to do is parse the artefacts and feed the results back into the framework.

The dive into Windows Defender of the previous sections resulted in a number of structure definitions that we need to recover data from the Windows Defender quarantine folder. When making an implementation, we want our code to reflect these structure definitions as closely as possible, to make our code both readable and verifiable. This is where dissect.cstruct comes in. It can parse structure definitions and make them available in your Python code. This removes a lot of boilerplate code for parsing structures and greatly enhances the readability of your parser. Let’s review how easily we can parse a QuarantineEntry file using dissect.cstruct :

from dissect.cstruct import cstruct
defender_def= """
struct QuarantineEntryFileHeader {
CHAR MagicHeader[4];
CHAR Unknown[4];
CHAR _Padding[32];
DWORD Section1Size;
DWORD Section2Size;
DWORD Section1CRC;
DWORD Section2CRC;
CHAR MagicFooter[4];
};
struct QuarantineEntrySection1 {
CHAR Id[16];
CHAR ScanId[16];
QWORD Timestamp;
QWORD ThreatId;
DWORD One;
CHAR DetectionName[];
};
struct QuarantineEntrySection2 {
DWORD EntryCount;
DWORD EntryOffsets[EntryCount];
};
struct QuarantineEntryResource {
WCHAR DetectionPath[];
WORD FieldCount;
CHAR DetectionType[];
};
struct QuarantineEntryResourceField {
WORD Size;
WORD Identifier:12;
FIELD_TYPE Type:4;
CHAR Data[Size];
};
"""
c_defender = cstruct()
c_defender.load(defender_def)
class QuarantineEntry:
def __init__(self, fh: BinaryIO):
# Decrypt & parse the header so that we know the section sizes
self.header = c_defender.QuarantineEntryFileHeader(rc4_crypt(fh.read(60)))
# Decrypt & parse Section 1. This will tell us some information about this quarantine entry.
# These properties are shared for all quarantine entry resources associated with this quarantine entry.
self.metadata = c_defender.QuarantineEntrySection1(rc4_crypt(fh.read(self.header.Section1Size)))
# […]
# The second section contains the number of quarantine entry resources contained in this quarantine entry,
# as well as their offsets. After that, the individal quarantine entry resources start.
resource_buf = BytesIO(rc4_crypt(fh.read(self.header.Section2Size)))
view raw defender.py hosted with ❤ by GitHub

As you can see, when the structure format is known, parsing it is trivial using dissect.cstruct. The only caveat is that the QuarantineEntryFileHeader, QuarantineEntrySection1 and QuarantineEntrySection2 structures are individually encrypted using the hardcoded RC4 key. Because only the size of QuarantineEntryFileHeader is static (60 bytes), we parse that first and use the information contained in it to decrypt the other sections.

To parse the individual fields contained within the QuarantineEntryResource, we have to do a bit more work. We cannot add the QuarantineEntryResourceField directly to the QuarantineEntryResource structure definition within dissect.cstruct, as it currently does not support the type of alignment used by Windows Defender. However, it does support the QuarantineEntryResourceField structure definition, so all we have to do is follow the alignment logic that we saw in IDA:

# As the fields are aligned, we need to parse them individually
offset = fh.tell()
for _ in range(field_count):
# Align
offset = (offset + 3) & 0xFFFFFFFC
fh.seek(offset)
# Parse
field = c_defender.QuarantineEntryResourceField(fh)
self._add_field(field)
# Move pointer
offset += 4 + field.Size

We can use dissect.cstruct‘s dumpstruct function to visualize our parsing to verify if we are correctly loading in all data:

upload_193c111d8639e63369484615023e25e8

And just like that, our parsing is done. Utilizing dissect.cstruct makes parsing structures much easier to understand and implement. This also facilitates rapid iteration: we have altered our structure definitions dozens of times during our research, which would have been pure pain without having the ability to blindly copy-paste structure definitions into our Python editor of choice.

Implementing the parser within the Dissect framework brings great advantages. We do not have to worry at all about the format in which the forensic evidence is provided. Implementing the Defender recovery as a Dissect plugin means it just works on standard forensic evidence formats such as E01 or ASDF, or against forensic packages the like of KAPE and Acquire, and even on a live virtual machine:

max@dissect $ target-query ~/Windows10.vmx -q -f defender.quarantine
<filesystem/windows/defender/quarantine/file hostname='DESKTOP-AR98HFK' domain=None ts=2022-11-22 09:37:16.536575+00:00 quarantine_id=b'\xe3\xc1\x03\x80\x00\x00\x00\x003\x12]]\x07\x9a\xd2\xc9' scan_id=b'\x88\x82\x89\xf5?\x9e J\xa5\xa8\x90\xd0\x80\x96\x80\x9b' threat_id=2147729891 detection_type='file' detection_name='HackTool:Win32/Mimikatz.D' detection_path='C:\\Users\\user\\Documents\\mimikatz.exe' creation_time=2022-11-22 09:37:00.115273+00:00 last_write_time=2022-11-22 09:37:00.240202+00:00 last_accessed_time=2022-11-22 09:37:08.081676+00:00 resource_id='9EC21BB792E253DBDC2E88B6B180C4E048847EF6'>
max@dissect $ target-query ~/Windows10.vmx -f defender.recover -o /tmp/ -v
2023-02-14T07:10:20.335202Z [info] <Target /home/max/Windows10.vmx>: Saving /tmp/9EC21BB792E253DBDC2E88B6B180C4E048847EF6.security_descriptor [dissect.target.target]
2023-02-14T07:10:20.335898Z [info <Target /home/max/Windows10.vmx>: Saving /tmp/9EC21BB792E253DBDC2E88B6B180C4E048847EF6 [dissect.target.target]
2023-02-14T07:10:20.337956Z [info] <Target /home/max/Windows10.vmx>: Saving /tmp/9EC21BB792E253DBDC2E88B6B180C4E048847EF6.ZoneIdentifierDATA [dissect.target.target]
view raw defender-query hosted with ❤ by GitHub

The full implementation of Windows Defender quarantine recovery can be observed on Github.

Conclusion

We hope to have shown that there can be great benefits to reverse engineering the internals of Microsoft Windows to discover forensic artifacts. By reverse engineering mpengine.dll, we were able to further understand how Windows Defender places detected files into quarantine. We could then use this knowledge to discover (meta)data that was previously not fully documented or understood. The main results of this are the recovery of more information about the original quarantined file, such as various timestamps and additional NTFS data streams, like the Zone.Identifier, which is information that can be useful in digital forensics or incident response investigations.

The documentation of QuarantineEntryResourceField was not available prior to this research and we hope others can use this to further investigate which fields are yet to be discovered. We have also documented how the BackupRead functionality is used by Defender to preserve the different data streams present in the NTFS file, including the Zone Identifier and Security Descriptor.

When writing our parser, using dissect.cstruct allowed us to tightly integrate our findings of reverse engineering in our parsing, enhancing the readability and verifiability of the code. This can in turn help others to pivot off of our research, just like we did when pivotting off of the research of others into the Windows Defender quarantine folder.

This research has been implemented as a plugin for the Dissect framework. This means that our parser can operate independently of the type of evidence it is being run against. This functionality has been added to dissect.target as of January 2nd 2023 and is installed with Dissect as of version 3.4.

Data Connector Health Monitoring on Microsoft Sentinel

6 December 2023 at 08:00

Introduction

Security information and event management (SIEM) tooling allows security teams to collect and analyse logs from a wide variety of sources. In turn this is used to detect and handle incidents. Evidently it is important to ensure that the log ingestion is complete and uninterrupted. Luckily SIEMs offer out-of-the-box solutions and/or capabilities to create custom health monitoring. In this blog post we will take a look at the health monitoring capabilities for log ingestion in Microsoft Sentinel.

Microsoft Sentinel

Microsoft Sentinel is the cloud-native Security information and event management (SIEM) and Security orchestration, automation, and response (SOAR) solution provided by Microsoft. It provides intelligent security analytics and threat intelligence across the enterprise, offering a single solution for alert detection, threat visibility, proactive hunting, and threat response. As a cloud-native solution, it can easily scale to accommodate the growing security needs of an organization and alleviate the cost of maintaining your own infrastructure.

Microsoft Sentinel utilizes Data Connectors to handle log ingestion. Microsoft Sentinel comes with out of the box connectors for Microsoft services, these are the service-to-service connectors. Additionally, there are many built-in connectors for third-party services, which utilize Syslog, Common Event Format (CEF) or REST APIs to connect the data sources to Microsoft Sentinel.

Besides logs from Microsoft services and third-party services, Sentinel can also collect logs from Azure VMs and non-Azure VMs. The log collection is done via the Azure Monitor Agent (AMA) or the Log Analytics Agent (MMA). As a brief aside, it’s important to note that the Log Analytics Agent is on a deprecation path and won’t be supported after August 31, 2024.

The state of the Data Connectors can be monitored with the out-of-the-box solutions or by creating a custom solution.

Microsoft provides two out-of-the-box features to perform health monitoring on the data connectors: The Data connectors health monitoring workbook & SentinelHealth data table.

Using the Data connectors health monitoring workbook

The Data collection health monitoring workbook is an out-of-the-box solution that provides insight regarding the log ingestion status, detection of anomalies and the health status of the Log Analytics agents.

The workbook consists of three tabs: Overview, Data collection anomalies & Agents info.

The Overview tab shows the general status of the log ingestions in the selected workspace. It contains data such as the Events per Second (EPS), data volume and time of the last log received. For the tab to function, the required Subscription and Workspace have to be selected at the top

Data connectors health monitoring workbook - Overview


The Data collection anomalies tab provides info for detecting anomalies in the log ingestion process. Each tab in the view presents a specific table. The General tab is a collection of a multiple tables.

We’re given a few configuration options for the view:

  • AnomaliesTimeRange: Define the total time range for the anomaly detection.
  • SampleInterval: Define the time interval in which data is sampled in the defined time range. Each time sample gets an anomaly score, which is used for the detection.
  • PositiveAlertThreshold: Define the positive anomaly score threshold.
  • NegativeAlertThreshold: Define the negative anomaly score threshold.

The view itself contains the expected amount of events, the actual amount of events & anomaly score per table. When a significant drop or rise in events is detected, a further investigation is advised. The logic behind the view can also be re-used to setup alerting when a certain threshold is exceeded.

Data connectors health monitoring workbook - Data Collection anomalies view

The Agent info tab contains information about the health of the AMA and MMA agents installed on your Azure and non-Azure machines. The view allows you to monitor System location, Heartbeat status and latency, Available memory and disk space & Agent operations. There are two tabs in the view to choose between Azure machines only and all machines.

Data connectors health monitoring workbook - Agent information view

You can find the workbook under Microsoft Sentinel > Workbooks > Templates, then type Data collection health monitoring in the search field. Click View Template to open the workbook. If you plan on using the workbook frequently, hit the Save button so it shows up under My Workbooks.

The SentinelHealth data table

The SentinelHealth data table provides information on the health of your Sentinel resources. The content of the table is not limited to only the data connectors, but also the health of your automation rules, playbooks and analytic rules. Given the scope of this blog post, we will focus solely on the data connector events.

Currently the table has support for following data connectors:

  • Amazon Web Services (CloudTrail and S3)
  • Dynamics 365
  • Office 365
  • Microsoft Defender for Endpoint
  • Threat Intelligence – TAXII
  • Threat Intelligence Platforms

For the data connectors, there are two types of events: Data fetch status change & Data fetch failure summary.

The Data fetch status change events contain the status of the data fetching and additional information. The status is represented by Success or Failure and depending on the status, different additional information is given in the ExtendedProperties field:

  • For a Success, the field will contain the destination of the logs.
  • For a Failure, the field will contain an error message describing the failure. The content of this message depends on the failure type.

These events will be logged once an hour if the status is stable (i.e. status doesn’t change from Success to Failure and vice versa). Once a status change is detected it will be logged immediately.

The Data fetch failure summary events are logged once an hour, per connector, per workspace, with an aggregated failure summary. They are only logged when the connector has experienced polling errors during the given hour. The event itself contains additional information in the ExtendedProperties field, such as all the encountered failures and the time period for which the connector’s source platform was queried.

Using the SentinelHealth data table

Before we can start using the SentinelHealth table, we first have to enable it. Go to Microsoft Sentinel > Settings > Settings tab > Auditing and health monitoring, press Enable to enable the health monitoring.

Once the SentinelHealth table contains data, we can start querying on it. Below you’ll find some example queries to run.

List the latest failure per connector

SentinelHealth
| where TimeGenerated > ago(7d)
| where OperationName == "Data fetch status change"
| where Status == "Failure"
| summarize TimeGenerated = arg_max(TimeGenerated,*) by SentinelResourceName, SentinelResourceId

Connector status change from Failure to Success

let success_status = SentinelHealth
| where TimeGenerated > ago(1d)
| where OperationName == "Data fetch status change"
| where Status == "Success"
| summarize TimeGenerated = arg_max(TimeGenerated,*) by SentinelResourceName, SentinelResourceId;
let failure_status = SentinelHealth
| where TimeGenerated > ago(1d)
| where OperationName == "Data fetch status change"
| where Status == "Failure"
| summarize TimeGenerated = arg_max(TimeGenerated,*) by SentinelResourceName, SentinelResourceId;
success_status
| join kind=inner (failure_status) on SentinelResourceName, SentinelResourceId
| where TimeGenerated > TimeGenerated1

Connector status change from Success to Failure

let success_status = SentinelHealth
| where TimeGenerated > ago(1d)
| where OperationName == "Data fetch status change"
| where Status == "Success"
| summarize TimeGenerated = arg_max(TimeGenerated,*) by SentinelResourceName, SentinelResourceId;
let failure_status = SentinelHealth
| where TimeGenerated > ago(1d)
| where OperationName == "Data fetch status change"
| where Status == "Failure"
| summarize TimeGenerated = arg_max(TimeGenerated,*) by SentinelResourceName, SentinelResourceId;
success_status
| join kind=inner (failure_status) on SentinelResourceName, SentinelResourceId
| where TimeGenerated > TimeGenerated1

Custom Solutions

With the help of built-in Azure features and KQL queries, there is the possibility to create custom solutions. The idea is to create a KQL query and then have it executed by an Azure feature, such as Azure Monitor, Azure Logic Apps or as a Sentinel Analytics Rule. Below you’ll find two examples of custom solutions.

Log Analytics Alert

For the first example, we’ll setup an alert in the Log Analytics workspace where Sentinel is running on. The alert logic will run on a recurring basis and alert the necessary people when it is triggered. For starters, we’ll go the the Log Analytics Workspace and and start the creation of a new alert.

Select Custom log search for the signal and we’ll use the Connector status change from Success to Failure query example as logic.

Log Analytics Alert - Query

Set both the aggregation and evaluation period to 1hr, so it doesn’t incur a high monthly cost. Next, attach an email Action Group to the alert, so the necessary people are informed of the failure.

Log Analytics Alert - Notification Group

Lastly, give the alert a severity level, name and description to finish off.

Log Analytics Alert - Overview

Logic App Teams Notification

For the second example, we’ll create a Logic App that will send an overview via Teams of all the tables with an anomalous score.

For starters, we’ll create a logic app and create a Workflow inside the logic app.

Logic App for Teams Notification - Overview

Inside the Workflow, we’ll design the logic for the Teams Notification. We’ll start off with a Recurrence trigger. Define an interval on which you’d like to receive notifications. In the example, an interval of two days was chosen.

Recurrence settings for Logic App

Next, we’ll add the Run query and visualize results action. In this action, we have to define the Subscription, Resource Group, Resource Type, Resource Name, Query, Time Range and Chart Type. Define the first parameters to select your Log Analytics Workspace and then use following query. The query is based on the logic from the Data Connector Workbook. The query looks back on the data of the past two weeks with an interval of one day per data sample. If needed, the time period and interval can be increased or decreased. The UpperThreshold and LowerThreshold parameter can be adapted to make the detection more or less sensitive.

let UpperThreshold = 5.0; // Upper Anomaly threshold score
let LowerThreshold = -5.0; // Lower anomaly threshold score
let TableIgnoreList = dynamic(['SecurityAlert', 'BehaviorAnalytics', 'SecurityBaseline', 'ProtectionStatus']); // select tables you want to EXCLUDE from the results
union withsource=TableName1 *
| make-series count() on TimeGenerated from ago(14d) to now() step 1d by TableName1
| extend (anomalies, score, baseline) = series_decompose_anomalies(count_, 1.5, 7, 'linefit', 1, 'ctukey', 0.01)
| where anomalies[-1] == 1 or anomalies[-1] == -1
| extend Score = score[-1]
| where Score >= UpperThreshold or Score <= LowerThreshold
| where TableName1 !in (TableIgnoreList)
| project TableName=TableName1, ExpectedCount=round(todouble(baseline[-1]),1), ActualCount=round(todouble(count_[-1]),1), AnomalyScore = round(todouble(score[-1]),1)

Lastly, define the Time Range and Chart Type parameter. For Time Range pick Set in query and for Chart Type pick Html Table.

Now that the execution of the query is defined, we can define the sending of a Teams message. Select the Post message in a chat or channel action and configure the action to send the body of the query to a channel/person as Flowbot.

Teams Notification settings in Logic App

Once the Teams action is defined, the logic app is completed. When the logic app runs, you should expect an output similar to the image below. The parameters in the table can be analysed to detect Data Connector issues.

Example Teams Notification

Conclusion

In conclusion, as stated in the intro, monitoring the health of data connectors is a critical part of ensuring an uninterrupted log ingestion process into the SIEM. Microsoft Sentinel offers great capabilities for monitoring the health of data connectors, thus enabling security teams to ensure the smooth functioning of log ingestion processes and promptly address any issues that may arise. The combination of the two out-of-the-box solutions and the flexibility to create custom monitoring solutions, makes Microsoft Sentinel a comprehensive and adaptable choice for managing and monitoring security events.

Frederik Meutermans Headshot

Frederik Meutermans

Frederik is a Senior Security Consultant in the Cloud Security Team. He specializes in the Microsoft Azure cloud stack, with a special focus on cloud security monitoring. He mainly has experience as security analyst and security monitoring engineer.

You can find Frederik on LinkedIn.

Custom Static Analysis Rules Showdown: Brakeman vs. Semgrep

In application assessments you have to do the most effective work you can in the time period defined by the client to maximize the assurance you’re providing. At IncludeSec we’ve done a couple innovative things to improve the overall effectiveness of the work we do, and we’re always on the hunt for more ways to squeeze even more value into our assessments by finding more risks and finding them faster. One topic that we revisit frequently to ensure we’re doing the best we can to maximize efficiency is static analysis tooling (aka SAST.)

Recently we did a bit of a comparison example of two open source static analysis tools to automate detection of suspicious or vulnerable code patterns identified during assessments. In this post we discuss the experience of using Semgrep and Brakeman to create the same custom rule within each tool for a client’s Ruby on Rails assessment our team was assessing. We’re also very interested in trying out GitHub’s CodeQL, but unfortunately the Ruby support is still in development so that will have to wait for another time.

Semgrep is a pattern-matching tool that is semantically-aware and works with several languages (currently its Ruby support is marked as beta, so it is likely not at full maturity yet). Brakeman is a long-lived Rails-specific static-analysis tool, familiar to most who have worked with Rails security. Going in, I had no experience writing custom rules for either one.

This blog post is specifically about writing custom rules for code patterns that are particular to the project I’m assessing. First though I want to mention that both tools have some pre-built general rules for use with most Ruby/Rails projects — Brakeman has a fantastic set of built-in rules for Rails projects that has proven very useful on assessments (just make sure the developers of the project haven’t disabled rules in config/brakeman.yml, and yes we have seen developers do this to make SAST warnings go away!). Semgrep has an online registry of user-submitted rules for Ruby that is also handy (especially as examples for writing your own rules), but the current rule set for Ruby is not quite as extensive as Brakeman. In Brakeman the rules are known as “checks”, for consistency we’ll use the term “rules” for both tools, but you the reader can just keep that fact in mind.

First custom rule: Verification of authenticated functionality

I chose a simple pattern for the first rule I wanted to make, mainly to familiarize myself with the process of creating rules in both Semgrep and Brakeman. The application had controllers that handle non-API routes. These controllers enforced authentication by adding a before filter: before_action :login_required. Often in Rails projects, this line is included in a base controller class, then skipped when authentication isn’t required using skip_before_filter. This was not the case in the webapp I was looking at — the before filter was manually set in every single controller that needed authentication, which seemed error-prone as an overall architectural pattern, but alas it is the current state of the code base.

I wanted to get a list of any non-API controllers that lack this callback so I can ensure no functionality is unintentionally exposed without authentication. API routes handled authentication in a different way so consideration for them was not a priority for this first rule.

Semgrep

I went to the Semgrep website and found that Semgrep has a nice interactive tutorial, which walks you through building custom rules. I found it to be incredibly simple and powerful — after finishing the tutorial in about 20 minutes I thought I had all the knowledge I needed to make the rules I wanted. Although the site also has an online IDE for quick iteration I opted to develop locally, as the online IDE would require submitting our client’s code to a 3rd party which we obviously can’t do for security and liability reasons. The rule would eventually have to be run against the entire codebase anyways.

I encountered a few challenges when writing the rule:

  • It’s a little tricky to find things that do not match a pattern (e.g. lack of a login_required filter). You can’t just search all files for ones that don’t match, you have to have a pattern that it does search for, then exclude the cases matching your negative pattern. I was running into a bug here that made it even harder but the Semgrep team fixed it when we gave them a heads up!
  • Matching only classes derived from ApplicationController was difficult because Semgrep doesn’t currently trace base classes recursively, so any that were more than one level removed from ApplicationController would be excluded (for example, if there was a class DerivedController < ApplicationController, it wouldn’t match SecondLevelDerivedController < DerivedController.) The Semgrep team gave me a tip about using a metavariable regex to filter for classes ending in “Controller” which worked for this situation and added no false positives.

My final custom rule for Semgrep follows:

rules:
- id: controller-without-authn
  patterns:
  - pattern: |
      class $CLASS
        ...
      end
  - pattern-not: |
      class $CLASS
        ...
        before_action ..., :login_required, ...
        ...
      end
  - metavariable-regex:
      metavariable: '$CLASS'
      regex: '.*Controller'  
  message: |
  $CLASS does not use the login_required filter.
  severity: WARNING
  languages:
  - ruby

I ran the rule using the following command: semgrep --config=../../../semgrep/ | grep "does not use"

The final grep is necessary because Semgrep will print the matched patterns which, in this case, were the entire classes. There’s currently no option in Semgrep to show only a list of matching files without the actual matched patterns themselves. That made it difficult to see the list of affected controllers, so I used grep on the output to filter the patterns out. This rule resulted in 47 identified controllers. Creating this rule originally took about two hours including going through the tutorial and debugging the issues I ran into but now that the issues are fixed I expect it would take less time in future iterations.

Overall I think the rule is pretty self-explanatory — it finds all files that define a class then excludes the ones that have a login_required before filter. Check out the semgrep tutorial lessons if you’re unsure.

Brakeman

Brakeman has a wiki page which describes custom rule building, but it doesn’t have a lot of detail about what functionality is available to you. It gives examples of finding particular method calls and whether user input finds their ways into these calls. There’s no example of finding controllers.

The page didn’t give any more about what I wanted to do so I headed off to Brakeman’s folder of built-in rules in GitHub to see if there are any examples of something similar there. There is a CheckSkipBeforeFilter rule which is very similar to what I want — it checks whether the login_required callback is skipped with skip_before_filter. As mentioned, the app isn’t implemented that way, but it showed me how to iterate controllers and check before filters.

This got me most of the way there but I also needed to skip API controllers for this particular app’s architecture. After about an hour of tinkering and looking through Brakeman controller tracker code I had the following rule:

require 'brakeman/checks/base_check'

class Brakeman::CheckControllerWithoutAuthn < Brakeman::BaseCheck
  Brakeman::Checks.add self

  @description = "Checks for a controller without authN"

  def run_check
  controllers = tracker.controllers.select do |_name, controller|
      not check_filters controller
  end
  Hash[controllers].each do |name, controller|
    warn  :controller => name,
          :warning_type => "No authN",
          :warning_code => :basic_auth_password,
          :message => "No authentication for controller",
          :confidence => :high,
          :file => controller.file
  end
  end

# Check whether a non-api controller has a :login_required before_filter
  def check_filters controller
  return true if controller.parent.to_s.include? "ApiController"
  controller.before_filters.each do |filter|
      next unless call? filter
      next unless filter.first_arg.value == :login_required
      return true
  end
  return false
  end
end

Running it with brakeman --add-checks-path ../brakeman --enable ControllerWithoutAuthn -t ControllerWithoutAuthn resulted in 43 controllers without authentication — 4 fewer than Semgrep flagged.

Taking a close look at the controllers that Semgrep flagged and Brakeman did not, I realized the app is importing shared functionality from another module, which included a login_required filter. Therefore, Semgrep had 4 false positives that Brakeman did not flag. Since Semgrep works on individual files I don’t believe there’s an easy way to prevent those ones from being flagged.

Second custom rule: Verification of correct and complete authorization across functionality

The next case I wanted assurance on was vertical authorization at the API layer. ApiControllers in the webapp have a method authorization_permissions() which is called at the top of each derived class with a hash table of function_name/permission pairs. This function saves the passed hash table into an instance variable. ApiControllers have a before filter that, when any method is invoked, will look up the permission associated with the called method in the hash table and ensure that the user has the correct permission before proceeding.

Manual review was required to determine whether any methods had incorrect privileges, as analysis tools can’t understand business logic, but they can find methods entirely lacking authorization control — that was my goal for these rules.

Semgrep

Despite being seemingly a more complex scenario, this was still pretty straightforward in Semgrep:

rules:
- id: method-without-authz
  patterns:
  - pattern: |
    class $CONTROLLER < ApiController
        ...
        def $FUNCTION
          ...
        end
    ...
    end
  - pattern-not: |
    class $CONTROLLER < ApiController
        ...
        authorization_permissions ... :$FUNCTION => ...
        ...
        def $FUNCTION
          ...
        end
    ...
    end
  message: |
  Detected api controller $CONTROLLER which does not check for authorization for the $FUNCTION method
  severity: WARNING
  languages:
  - ruby

It finds all methods on ApiControllers then excludes the ones that have some authorization applied. Semgrep found seven controllers with missing authorization checks.

Brakeman

I struggled to make this one in Brakeman at first, even thinking it might not be possible. The Brakeman team kindly guided me towards Collection#options which contains all method calls invoked at the class level excluding some common ones like before_filter. The following rule grabs all ApiControllers, looks through their options for the call to authorization_permissions, then checks whether each controller method has an entry in the authorization_permissions hash.

require 'brakeman/checks/base_check'

class Brakeman::CheckApicontrollerWithoutAuthz < Brakeman::BaseCheck
  Brakeman::Checks.add self

  @description = "Checks for an ApiController without authZ"

  def run_check

  # Find all api controllers
  api_controllers = tracker.controllers.select do |_name, controller|
      is_apicontroller controller
  end

  # Go through all methods on all ApiControllers
  # and find if they have any methods that are not in the authorization matrix
  Hash[api_controllers].each do |name, controller|
    perms = controller.options[:authorization_permissions].first.first_arg.to_s

    controller.each_method do |method_name, info|
      if not perms.include? ":#{method_name})"
        warn  :controller => name,
              :warning_type => "No authZ",
              :warning_code => :basic_auth_password,
              :message => "No authorization check for #{name}##{method_name}",
              :confidence => :high,
              :file => controller.file
      end
    end
  end
  end

  def is_apicontroller controller
  # Only care about controllers derived from ApiController
  return controller.parent.to_s.include? "ApiController"
  end

end

Using this rule Brakeman found the same seven controllers with missing authorization as Semgrep.

Conclusion

So who is the winner of this showdown? For Ruby, both tools are valuable, there is no definitive winner in our comparison when we’re specificially talking about custom rules. Currently I think Semgrep edges out Brakeman a bit for writing quick and dirty custom checks on assessments, as it’s faster to get going but it does have slightly more false positives in our limited comparison testing.

Semgrep rules are fairly intuitive to write and self explanatory; Brakeman requires additional understanding by looking into its source code to understand its architecture and also there is the need to use existing rules as a guide. After creating a few Brakeman rules it gets a lot easier, but the initial learning curve was a bit higher than other SAST tools. However, Brakeman has some sophisticated features that Semgrep does not, especially the user-input tracing functionality, that weren’t really shown in these examples. If some dangerous function is identified and you need to see if any user input gets to it (source/sink flow), that is a great Brakeman use case. Also, Brakeman’s default ruleset is great and I use them on every Rails test I do.

Ultimately Semgrep and Brakeman are both great tools with quirks and particular use-cases and deserve to be in your arsenal of SAST tooling. Enormous thanks to both Clint from the Semgrep team and Justin the creator of Brakeman for providing feedback on this post!

The post Custom Static Analysis Rules Showdown: Brakeman vs. Semgrep appeared first on Include Security Research Blog.

IncludeSec’s free training in Buenos Aries for our Argentine hacker friends.

29 April 2019 at 20:20

One of the things that has always been important in IncludeSec’s progress as a company is finding the best talent for the task at hand. We decided early on that if the best Python hacker in the world was not in the US then we would go find that person and work with them! Or whatever technology the project at hand is; C, Go, Ruby, Scala, Java, etc.

As it turns out the best Python hackers (and many other technologies) might actually be in Argentina. We’re not the only ones that have noticed this. Immunity Security, IOActive Security, Gotham Digital Science, and many others have a notable presence in Argentina (The NY Times even wrote an article on how great the hackers are there!) We’ve worked with dozens of amazing Argentinian hackers over the last six years comprising ~30% of our team and we’ve also enjoyed the quality of the security conferences like EkoParty in Buenos Aires.

As a small thank you to the entire Argentinian hacker scene, we’re going to do a free training class on May 30/31st 2019 teaching advanced web hacking techniques. This training is oriented towards hackers earlier in their career who have already experienced the world of OWASP top 10 and are looking to take their hacking skills to the next level.

If that sounds like you, you’re living in Argentina, and can make it to Buenos Aires on May 30th & 31st then this might be an awesome opportunity for you!

Please fill out the application here if this is something that would be awesome for you. We’ll close the application on May 10th.
https://docs.google.com/forms/d/e/1FAIpQLScrjV8wei7h-AY_kW7QwXZkYPDvSQswzUy6BTT9zg8L_Xejxg/viewform?usp=sf_link

Gracias,

Erik Cabetas
Managing Partner

The post IncludeSec’s free training in Buenos Aries for our Argentine hacker friends. appeared first on Include Security Research Blog.

A light-weight forensic analysis of the AshleyMadison Hack

19 August 2015 at 14:13

———–[Intro]

So Ashley Madison(AM) got hacked, it was first announced about a month ago and the attackers claimed they’d drop the full monty of user data if the AM website did not cease operations. The AM parent company Avid Life Media(ALM) did not cease business operations for the site and true to their word it seems the attackers have leaked everything they promised on August 18th 2015 including:

  • full database dumps of user data
  • emails
  • internal ALM documents
  • as well as a limited number of user passwords

Back in college I used to do forensics contests for the “Honey Net Project” and thought this might be a fun nostalgic trip to try and recreate my pseudo-forensics investigation style on the data within the AM leak.

Disclaimer: I will not be releasing any personal or confidential information
within this blog post that may be found in the AM leak. The purpose of
this blog post is to provide an honest holistic forensic analysis and minimal
statistical analysis of the data found within the leak. Consider this a
journalistic exploration more than anything.

Also note, that the credit card files were deleted and not reviewed as part of this write-up

———–[Grabbing the Leak]

First we go find where on the big bad dark web the release site is located. Thankfully knowing a shady guy named Boris pays off for me, and we find a torrent file for the release of the August 18th Ashley Madison user data dump. The torrent file we found has the following SHA1 hash.
e01614221256a6fec095387cddc559bffa832a19  impact-team-ashley-release.torrent

After extracting all the files we have the following sizes and
file hashes for evidence audit purposes:

$  du -sh *
4.0K    74ABAA38.txt
9.5G    am_am.dump
2.6G    am_am.dump.gz
4.0K    am_am.dump.gz.asc
13G     aminno_member.dump
3.1G    aminno_member.dump.gz
4.0K    aminno_member.dump.gz.asc
1.7G    aminno_member_email.dump
439M    aminno_member_email.dump.gz
4.0K    aminno_member_email.dump.gz.asc
111M    ashleymadisondump/
37M     ashleymadisondump.7z
4.0K    ashleymadisondump.7z.asc
278M    CreditCardTransactions.7z
4.0K    CreditCardTransactions.7z.asc
2.3G    member_details.dump
704M    member_details.dump.gz
4.0K    member_details.dump.gz.asc
4.2G    member_login.dump
2.7G    member_login.dump.gz
4.0K    member_login.dump.gz.asc
4.0K    README
4.0K    README.asc

$ sha1sum *
a884c4fcd61e23aecb80e1572254933dc85e2b4a  74ABAA38.txt
e4ff3785dbd699910a512612d6e065b15b75e012  am_am.dump
e0020186232dad71fcf92c17d0f11f6354b4634b  am_am.dump.gz
b7363cca17b05a2a6e9d8eb60de18bc98834b14e  am_am.dump.gz.asc
d412c3ed613fbeeeee0ab021b5e0dd6be1a79968  aminno_member.dump
bc60db3a78c6b82a5045b797e6cd428f367a18eb  aminno_member.dump.gz
8a1c328142f939b7f91042419c65462ea9b2867c  aminno_member.dump.gz.asc
2dcb0a5c2a96e4f3fff5a0a3abae19012d725a7e  aminno_member_email.dump
ab5523be210084c08469d5fa8f9519bc3e337391  aminno_member_email.dump.gz
f6144f1343de8cc51dbf20921e2084f50c3b9c86  aminno_member_email.dump.gz.asc
sha1sum: ashleymadisondump: Is a directory
26786cb1595211ad3be3952aa9d98fbe4c5125f9  ashleymadisondump.7z
eb2b6f9b791bd097ea5a3dca3414a3b323b8ad37  ashleymadisondump.7z.asc
0ad9c78b9b76edb84fe4f7b37963b1d956481068  CreditCardTransactions.7z
cb87d9fb55037e0b1bccfe50c2b74cf2bb95cd6c  CreditCardTransactions.7z.asc
11e646d9ff5d40cc8e770a052b36adb18b30fd52  member_details.dump
b4849cec980fe2d0784f8d4409fa64b91abd70ef  member_details.dump.gz
3660f82f322c9c9e76927284e6843cbfd8ab8b4f  member_details.dump.gz.asc
436d81a555e5e028b83dcf663a037830a7007811  member_login.dump
89fbc9c44837ba3874e33ccdcf3d6976f90b5618  member_login.dump.gz
e24004601486afe7e19763183934954b1fc469ef  member_login.dump.gz.asc
4d80d9b671d95699edc864ffeb1b50230e1ec7b0  README
a9793d2b405f31cc5f32562608423fffadc62e7a  README.asc

———–[Attacker Identity & Attribution]

The attackers make it clear they have no desire to bridge their dark web identities with their real-life identities and have taken many measures to ensure this does not occur.

The torrent file and messaging were released via the anonymous Tor network through an Onion web server which serves only HTML/TXT content. If the attacker took proper OPSEC precautions while setting up the server, law enforcement and AM may never find them. That being said hackers have been known to get sloppy and slip up their OPSEC. The two most famous cases of this were when Sabu of Anonymous and separately the Dread Pirate Roberts of SilkRoad; were both caught even though they primarily used Tor for their internet activities.

Within the dump we see that the files are signed with PGP. Signing a file in this manner is a way of saying “I did this” even though we don’t know the real-life identity of the person/group claiming to do this is (there is a bunch of crypto and math that makes this possible.) As a result we can be more confident that if there are files which are signed by this PGP key, then it was released by the same person/group.

In my opinion, this is done for two reasons. First the leaker wants to claim responsibility in an identity attributable manner, but not reveal their real-life identity. Secondly, the leaker wishes to dispel statements regarding “false leaks” made by the Ashley Madison team. The AM executive and PR teams have been in crises communications mode explaining that there have been many fake leaks.

The “Impact Team” is using the following public PGP key to sign their releases.

$ cat ./74ABAA38.txt

-----BEGIN PGP PUBLIC KEY BLOCK-----

Version: GnuPG v1.4.12 (GNU/Linux)




mQINBFW25a4BEADt5OKS5F36aACyyPc4UMZAnhLnbImhxv5A2n7koTKg1QhyA1mI
InLLriKW3GR0Y4Fx+84pvjbYdoJAnuqMemI0oP+2VAJqwC0LYVVcFHKK6ZElYiN8
4/3e5WWYv6vzrHwB+3NbQ1O9bbUjgk9ky2RsdTe+vDBhKwKS0kPSb28h0oMpAs87
pJcgWZ57jjtvyUEIKXQZAqLvFo5xayS8dEp8tRgNLauQ0SafKGsxjW5cRd2Ok3Z5
QtIS44WnYECe3tqqFYSOo4kdHBeswC8zaKapYaNzxsHw9msdZvx/rkrMgXtJye/o
vmf2RdLIcvqK0Nwf1LDLhweCBP61wVn8gWqSrzww+as1ObE6b64hYKHFzdIMcqJ3
sbAErRrfZMqZ6ihWnlSjzDDx2L3n5T16ZIDxGx5Mt0KDYIo8RqDdF+VKLCT7Eq/C
g/Ax+06Eez4rVnY+xeW6Tj+1iBAlrGRIcRHCX89fNwLxr4Bcq/q1KKrCwVsgonBK
+3Mzzs2/b9XQ/Z6bDHFnMWUTDhomBmNcZOz9sHrZZI9XUzx/bfS6CoQ3MIqDhNM+
l7cKZ/Icfs6IDoOsYIS3QeTWC8gv3IBTvtfKFnf1o6JnkP0Qv6SrckslztNA4HDL
2iIMMGs34vDc11ddTzMBBkig1NgtiaHqHhG5T8OoOD9c3hEmTQzir7iCPQARAQAB
tCRJbXBhY3QgVGVhbSA8aW1wYWN0dGVhbUBtYWlsdG9yLm5ldD6JAjgEEwECACIF
AlW25a4CGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJECQ3PNV0q6o445UQ
AKYIVyrpVKKBA4jliarqngKvkEBRd62CXHY42ZdjFmubLvRw5nC0nDdGUyGPRYOl
0RddL2C7ROqW9lCYfNl3BAQYEXMADDjoBMEQkepIxeIVehat46ksbJuFZ0+uI6EB
aVcJCR4S2C+hJP09q9tn/7RKacIolfeT0+s9IteFghKKK0c8Aot52A/hExrqjldo
fsMX6liSFQjDQpPhQpqiAJ8z9N3eeFwcAAc/gqNz9bE0Wug/OXh0OAHUQk3fS57a
uIi8medOr+kAqHziuO79+5Hkachsp+8c58jBtIzZM4bO6e42aEa2yHv0FGG5MhoB
x7MH0ympFdwbgebpF6kpH371GIsJcyumwQ3Yn4Sy2kp2XmB8xOQo2W8tWRtLW1dI
yGAXHXXy5UI5FJek7G1KvQXCy4pa756RGDFiqdqigq0KC27A/at02M8CP6R9RxC9
YSnru0Qrl7JeATekWM3w8sKs8r6yMEDFAcpK2NHaYzF6/o6t/HEqUWD41DZ2cqqg
9i4uoXpkAB3vAG/snNg1B8g89b3vbVUf6hSIcU89G3lgj9hh87Q/TSsISRJ+yq0N
sLEeVmDmOdf+xb44g3RuRJ9yh0h3j8jdQOq0FvvwW3UHKIVDQlFB3kgHY478TCIa
5MMCtMovGv/ukGKlU8aELKV0/sVsliMh8HDdFQICTd0MuQINBFW25a4BEADIh8Vg
tMGfByY/+IgPd9l3u0I4FZLHqKGKOIpfFEeA31jPAhfOqQyBRcnEN/TxLwJ8NLnL
+GdQ+0z1YncZPxpHU/z8zyMwGpZM/hMbkixA9ysyu06S7hna4YMfifT+lOe1lGSo
Tz3Fz1u2OGH+2UzVk5+Rv0FqDl6X1ZoqhMTswzW0jYR7JLLJip5MTMrLD0rSl0b5
a2XvF9Tpjzy9KWubsJk4W7x00Egu2EU9NhEZXaY18H3rxvYgXT7JMjq/y+IUp2Cd
Bv/XCNWmzl66/ZSLC8hzlcxmAYpmBkxafYNdptMeVzsH/xHmN2zSFjuBNx0Mkk+R
TrOxK/boS9onrGsSQ3zItWJAmodo2qYFjlirtu9pURSdYEINNQ5DgWymg43iAIfp
Xp5/yGBj4BlWE80qEAVsBB2BIRs7QHvpd34xETP08dXMsswIrMn/XxvHumyPoimj
mcNvIpvnAZqt6xppo6BSZ3y7MU4cSIRsZzLuSvkwGk97Jv2sMNvXlPRxzpU9ozsI
iYJAk6/n8kbQiTJk/SeiCTbf6e+BzbZbgIE3O9iPKhfW+6zWjC4TL+lBeyWTy1PP
PcQTT+najDqIwysz2BFuPozwuUQsnfQnyRytSjcI5m1fDoYpJPH8NNRIu9lzp+RN
YENVKXiCfnUCMCnSzxP3Kij3Wt227JLZQqnBUQARAQABiQIfBBgBAgAJBQJVtuWu
AhsMAAoJECQ3PNV0q6o4C2EP/29Bis5Skt9NxHVUBpC1OgRL8V+JD5TjNurMT6Pu
E75szLsMZ84z0MQ6n74ADIgEuznPDIa9hMZGK9DwlsQfFOlC/jyTYxSpgAgN6LAl
qoJztVzLRnMd2gZjOj6wajUy616b8u3Q3zovHcEKll5niUyNwHXovZcCzukFqJBF
a3JU/tkPvBuj2PEWf4ytuO6He2ERuSnsi+7mil8rTAAV/PPy7N2R/T7OUa6ERoGg
hqIGythWizRtZBVPRzush+8L181GBU2ps7nJ1resZ7T0OsCFL67J6t8r8IpmjWWt
fiiV05E71UAyNWLOWriS57qAwNcQ0W2UYKkFFKor+oWaBB+hCpvb8Za5867wpH8l
O6gpS/G17e+MKHTn60hw64xIVFJn7pka+OdAINjPRo5B5qVyvM3puEjRepx1piOG
HKOan00quI0dhF2Gia59zrBHK/agdF4FjkJSjER8uf/jJpo184p38zuQ7kyMXUxY
ExpGcVMVjVOoWKVRPGXYEz2nc9HIZ6mHbvhzsWQEAVwwIxZCos5dW1AMW3Otn30A
uFqPsx4jh/ANGhqUASz18bBrZ8DW3zceVs2zelkMpdL0z7ifU/UNn2rtDlpgLwFl
9ggUtPwXnSxqB7doSxfJyPJUum+bZxMb4Iq5BNNa/tme7TeWGl9bmsVwcQXSQlY2
uZnr
=v0qe
-----END PGP PUBLIC KEY BLOCK-----

The key has the following Meta-data below.

Old: Public Key Packet(tag 6)(525 bytes)
        Ver 4 - new
        Public key creation time - Mon Jul 27 22:15:10 EDT 2015
        Pub alg - RSA Encrypt or Sign(pub 1)
        RSA n(4096 bits) - ...
        RSA e(17 bits) - ...
Old: User ID Packet(tag 13)(36 bytes)
        User ID - Impact Team <[email protected]>
Old: Signature Packet(tag 2)(568 bytes)
        Ver 4 - new
        Sig type - Positive certification of a User ID and Public Key packet(0x13).
        Pub alg - RSA Encrypt or Sign(pub 1)
        Hash alg - SHA1(hash 2)
        Hashed Sub: signature creation time(sub 2)(4 bytes)
                Time - Mon Jul 27 22:15:10 EDT 2015
        Hashed Sub: key flags(sub 27)(1 bytes)
                Flag - This key may be used to certify other keys
                Flag - This key may be used to sign data
        Hashed Sub: preferred symmetric algorithms(sub 11)(5 bytes)
                Sym alg - AES with 256-bit key(sym 9)
                Sym alg - AES with 192-bit key(sym 8)
                Sym alg - AES with 128-bit key(sym 7)
                Sym alg - CAST5(sym 3)
                Sym alg - Triple-DES(sym 2)
        Hashed Sub: preferred hash algorithms(sub 21)(5 bytes)
                Hash alg - SHA256(hash 8)
                Hash alg - SHA1(hash 2)
                Hash alg - SHA384(hash 9)
                Hash alg - SHA512(hash 10)
                Hash alg - SHA224(hash 11)
        Hashed Sub: preferred compression algorithms(sub 22)(3 bytes)
                Comp alg - ZLIB <RFC1950>(comp 2)
                Comp alg - BZip2(comp 3)
                Comp alg - ZIP <RFC1951>(comp 1)
        Hashed Sub: features(sub 30)(1 bytes)
                Flag - Modification detection (packets 18 and 19)
        Hashed Sub: key server preferences(sub 23)(1 bytes)
                Flag - No-modify
        Sub: issuer key ID(sub 16)(8 bytes)
                Key ID - 0x24373CD574ABAA38
        Hash left 2 bytes - e3 95
        RSA m^d mod n(4096 bits) - ...
                -> PKCS-1
Old: Public Subkey Packet(tag 14)(525 bytes)
        Ver 4 - new
        Public key creation time - Mon Jul 27 22:15:10 EDT 2015
        Pub alg - RSA Encrypt or Sign(pub 1)
        RSA n(4096 bits) - ...
        RSA e(17 bits) - ...
Old: Signature Packet(tag 2)(543 bytes)
        Ver 4 - new
        Sig type - Subkey Binding Signature(0x18).
        Pub alg - RSA Encrypt or Sign(pub 1)
        Hash alg - SHA1(hash 2)
        Hashed Sub: signature creation time(sub 2)(4 bytes)
                Time - Mon Jul 27 22:15:10 EDT 2015
        Hashed Sub: key flags(sub 27)(1 bytes)
                Flag - This key may be used to encrypt communications
                Flag - This key may be used to encrypt storage
        Sub: issuer key ID(sub 16)(8 bytes)
                Key ID - 0x24373CD574ABAA38
        Hash left 2 bytes - 0b 61
        RSA m^d mod n(4095 bits) - ...
                -> PKCS-1

We can verify the released files are attributable to the PGP public key
in question using the following commands:

$ gpg --import ./74ABAA38.txt
$ gpg --verify ./member_details.dump.gz.asc ./member_details.dump.gz
gpg: Signature made Sat 15 Aug 2015 11:23:32 AM EDT using RSA key ID 74ABAA38
gpg: Good signature from "Impact Team <[email protected]>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 6E50 3F39 BA6A EAAD D81D  ECFF 2437 3CD5 74AB AA38

This also tells us at what date the dump was signed and packaged.

———–[Catching the attackers]

The PGP key’s meta-data shows a user ID for the mailtor dark web email service. The last known location of which was:
http://mailtoralnhyol5v.onion

Don’t bother emailing the email address found in the PGP key as it does not have a valid MX record. The fact that this exists at all seems to be one of those interesting artifact of what happens when Internet tools like GPG get used on the dark web.

If the AM attackers were to be caught; here (in no particular order) are the most likely ways this would happen:

  • The person(s) responsible tells somebody. Nobody keeps something like this a secret, if the attackers tell anybody, they’re likely going to get caught.
  • If the attackers review email from a web browser, they might get revealed via federal law enforcement or private investigation/IR teams hired by AM. The FBI is known to have these capabilities.
  • If the attackers slip up with their diligence in messaging only via TXT and HTML on the web server. Meta-data sinks ships kids — don’t forget.
  • If the attackers slip up with their diligence on configuring their server. One bad config of a web server leaks an internal IP, or worse!
  • The attackers slipped up during their persistent attack against AM and investigators hired by AM find evidence leading back to the attackers.
  • The attackers have not masked their writing or image creation style and leave some semantic finger print from which they can be profiled.

If none of those  things happen, I don’t think these attackers will ever be caught. The cyber-crime fighters have a daunting task in front of them, I’ve helped out a couple FBI and NYPD cyber-crime fighters and I do not envy the difficult and frustrating job they have — good luck to them! Today we’re living in the Wild West days of the Internet.

———–[Leaked file extraction and evidence gathering]

Now to document the information seen within this data leak we proceed with a couple of commands to gather the file size and we’ll also check the file hashes to ensure the uniqueness of the files. Finally we review the meta-data of some of the compressed files. The meta-data shows the time-stamp embedded into the various compressed files. Although meta-data can easily be faked, it is usually not.

Next we’ll extract these files and examine their file size to take a closer look.

$ 7z e ashleymadisondump.7z

We find within the extracted 7zip file another 7zip file
“swappernet_User_Table.7z” was found and also extracted.

We now have the following files sizes and SHA1 hashes for evidence
integrity & auditing purposes:

$ du -sh ashleymadisondump/*
68K     20131002-domain-list.xlsx
52K     ALMCLUSTER (production domain) computers.txt
120K    ALMCLUSTER (production domain) hashdump.txt
68K     ALM - Corporate Chart.pptx
256K    ALM Floor Plan - ports and names.pdf
8.0M    ALM - January 2015 - Company Overview.pptx
1.8M    ALM Labs Inc. Articles of Incorporation.pdf
708K    announcement.png
8.0K    Areas of concern - customer data.docx
8.0K    ARPU and ARPPU.docx
940K    Ashley Madison Technology Stack v5(1).docx
16K     Avid Life Media - Major Shareholders.xlsx
36K     AVIDLIFEMEDIA (primary corporate domain) computers.txt
332K    AVIDLIFEMEDIA (primary corporate domain) user information and hashes.txt
1.7M    Avid Org Chart 2015 - May 14.pdf
24K     Banks.xlsx
6.1M    Copies of Option Agreements.pdf
8.0K    Credit useage.docx
16K     CSF Questionnaire (Responses).xlsx
132K    Noel's loan agreement.pdf
8.0K    Number of traveling man purchases.docx
1.5M    oneperday_am_am_member.txt
940K    oneperday_aminno_member.txt
672K    oneperday.txt
44K     paypal accounts.xlsx
372K    [email protected]_20101103_133855.pdf
16K     q2 2013 summary compensation detail_managerinput_trevor-s team.xlsx
8.0K    README.txt
8.0K    Rebill Success Rate Queries.docx
8.0K    Rev by traffic source rebill broken out.docx
8.0K    Rev from organic search traffic.docx
4.0K    Sales Queries
59M     swappernet_QA_User_Table.txt  #this was extracted from swappernet_User_Table.7z in the same dir
17M     swappernet_User_Table.7z
$ sha1sum ashleymadisondump/*
f0af9ea887a41eb89132364af1e150a8ef24266f  20131002-domain-list.xlsx
30401facc68dab87c98f7b02bf0a986a3c3615f0  ALMCLUSTER (production domain) computers.txt
c36c861fd1dc9cf85a75295e9e7bcf6cf04c7d2c  ALMCLUSTER (production domain) hashdump.txt
6be635627aa38462ebcba9266bed5b492a062589  ALM - Corporate Chart.pptx
4dec7623100f59395b68fd13d3dcbbff45bef9c9  ALM Floor Plan - ports and names.pdf
601e0b462e1f43835beb66743477fe94bbda5293  ALM - January 2015 - Company Overview.pptx
d17cb15a5e3af15bc600421b10152b2ea1b9c097  ALM Labs Inc. Articles of Incorporation.pdf
1679eca2bc172cba0b5ca8d14f82f9ced77f10df  announcement.png
6a618e7fc62718b505afe86fbf76e2360ade199d  Areas of concern - customer data.docx
91f65350d0249211234a52b260ca2702dd2eaa26  ARPU and ARPPU.docx
50acee0c8bb27086f12963e884336c2bf9116d8a  Ashley Madison Technology Stack v5(1).docx
71e579b04bbba4f7291352c4c29a325d86adcbd2  Avid Life Media - Major Shareholders.xlsx
ef8257d9d63fa12fb7bc681320ea43d2ca563e3b  AVIDLIFEMEDIA (primary corporate domain) computers.txt
ec54caf0dc7c7206a7ad47dad14955d23b09a6c0  AVIDLIFEMEDIA (primary corporate domain) user information and hashes.txt
614e80a1a6b7a0bbffd04f9ec69f4dad54e5559e  Avid Org Chart 2015 - May 14.pdf
c3490d0f6a09bf5f663cf0ab173559e720459649  Banks.xlsx
1538c8f4e537bb1b1c9a83ca11df9136796b72a3  Copies of Option Agreements.pdf
196b1ba40894306f05dcb72babd9409628934260  Credit useage.docx
2c9ba652fb96f6584d104e166274c48aa4ab01a3  CSF Questionnaire (Responses).xlsx
0068bc3ee0dfb796a4609996775ff4609da34acb  Noel's loan agreement.pdf
c3b4d17fc67c84c54d45ff97eabb89aa4402cae8  Number of traveling man purchases.docx
9e6f45352dc54b0e98932e0f2fe767df143c1f6d  oneperday_am_am_member.txt
de457caca9226059da2da7a68caf5ad20c11de2e  oneperday_aminno_member.txt
d596e3ea661cfc43fd1da44f629f54c2f67ac4e9  oneperday.txt
37fdc8400720b0d78c2fe239ae5bf3f91c1790f4  paypal accounts.xlsx
2539bc640ea60960f867b8d46d10c8fef5291db7  [email protected]_20101103_133855.pdf
5bb6176fc415dde851262ee338755290fec0c30c  q2 2013 summary compensation detail_managerinput_trevor-s team.xlsx
5435bfbf180a275ccc0640053d1c9756ad054892  README.txt
872f3498637d88ddc75265dab3c2e9e4ce6fa80a  Rebill Success Rate Queries.docx
d4e80e163aa1810b9ec70daf4c1591f29728bf8e  Rev by traffic source rebill broken out.docx
2b5f5273a48ed76cd44e44860f9546768bda53c8  Rev from organic search traffic.docx
sha1sum: Sales Queries: Is a directory
0f63704c118e93e2776c1ad0e94fdc558248bf4e  swappernet_QA_User_Table.txt
9d67a712ef6c63ae41cbba4cf005ebbb41d92f33  swappernet_User_Table.7z

———–[Quick summary of each of the leaked files]

The following files are MySQL data dumps of the main AM database:

  • member_details.dump.gz
  • aminno_member.dump.gz
  • member_login.dump.gz
  • aminno_member_email.dump.gz
  • CreditCardTransactions.7z

Also included was another AM database which contains user info (separate from the emails):

  • am_am.dump.gz

In the top level directory you can also find these additional files:

  • 74ABAA38.txt
    Impact Team’s Public PGP key used for signing the releases (The .asc files are the signatures)
  • ashleymadisondump.7z
    This contains various internal and corporate private files.
  • README
    Impact Team’s justification for releasing the user data.
  • Various .asc files such as “member_details.dump.gz.asc”
    These are all PGP signature files to prove that one or more persons who are part of the “Impact Team” attackers released them.

Within the ashleymadisondump.7z we can extract and view the following files:

  • Number of traveling man purchases.docx
    SQL queries to investigate high-travel user’s purchases.
  • q2 2013 summary compensation detail_managerinput_trevor-s team.xlsx
    Per-employee compensation listings.
  • AVIDLIFEMEDIA (primary corporate domain) user information and hashes.txt
  • AVIDLIFEMEDIA (primary corporate domain) computers.txt
    The output of the dnscmd windows command executing on what appears to be a primary domain controller. The timestamp indicates that the command was run on July 1st 2015. There is also “pwdump” style export of 1324 user accounts which appear to be from the ALM domain controller. These passwords will be easy to crack as NTLM hashes aren’t the strongest
  • Noel’s loan agreement.pdf
    A promissory note for the CEO to pay back ~3MM in Canadian monies.
  • Areas of concern – customer data.docx
    Appears to be a risk profile of the major security concerns that ALM has regarding their customer’s data. And yes, a major user data dump is on the list of concerns.
  • Banks.xlsx
    A listing of all ALM associated bank account numbers and the biz which owns them.
  • Rev by traffic source rebill broken out.docx
  • Rebill Success Rate Queries.docx
    Both of these are SQL queries to investigate Rebilling of customers.
  • README.txt
    Impact Team statement regarding their motivations for the attack and leak.
  • Copies of Option Agreements.pdf
    All agreements for what appears all of the company’s outstanding options.
  • paypal accounts.xlsx
    Various user/passes for ALM paypal accounts (16 in total)
  • swappernet_QA_User_Table.txt
  • swappernet_User_Table.7z
    This file is a database export into CSV format. I appears to be from a QA server
  • ALMCLUSTER (production domain) computers.txt
    The output of the dnscmd windows command executing on what appears to be a production domain controller. The timestamp indicates that the command was run on July 1st 2015.
  • ALMCLUSTER (production domain) hashdump.txt
    A “pwdump” style export of 1324 user accounts which appear to be from the ALM domain controller. These passwords will be easy to crack as NTLM hashes aren’t the strongest.
  • ALM Floor Plan – ports and names.pdf
    Seating map of main office, this type of map is usually used for network deployment purposes.
  • ARPU and ARPPU.docx
    A listing of SQL commands which provide revenue and other macro financial health info.
    Presumably these queries would run on the primary DB or a biz intel slave.
  • Credit useage.docx
    SQL queries to investigate credit card purchases.
  • Avid Org Chart 2015 – May 14.pdf
    A per-team organizational chart of what appears to be the entire company.
  • announcement.png
    The graphic created by Impact Team to announce their demand for ALM to shut down it’s flagship website AM.
  • [email protected]_20101103_133855.pdf
    Contract outlining the terms of a purchase of the biz Seekingarrangement.com
  • CSF Questionnaire (Responses).xlsx
    Company exec Critical Success Factors spreadsheet. Answering questions like “In what area would you hate to see something go wrong?” and the CTO’s response is about hacking.
  • ALM – January 2015 – Company Overview.pptx
    This is a very detailed breakdown of current biz health, marketing spend, and future product plans.
  • Ashley Madison Technology Stack v5(1).docx
    A detailed walk-through of all major servers and services used in the ALM production environment.
  • oneperday.txt
  • oneperday_am_am_member.txt
  • oneperday_aminno_member.txt
    These three files have limited leak info as a “teaser” for the .dump files that are found in the highest level directory of the AM leak.
  • Rev from organic search traffic.docx
    SQL queries to explore the revenue generated from search traffic.
  • 20131002-domain-list.xlsx
    BA list of the 1083 domain names that are, have been, or are seeking to be owned by ALM.
  • Sales Queries/
    Empty Directory
  • ALM Labs Inc. Articles of Incorporation.pdf
    The full 109 page Articles of Incorporation, ever aspect of inital company formation.
  • ALM – Corporate Chart.pptx
    A detailed block diagram defining the relationship between various tax and legal business entity names related to ALM businesses.
  • Avid Life Media – Major Shareholders.xlsx
    A listing of each major shareholder and their equity stake

———–[File meta-data analysis]

First we’ll take a look at the 7zip file in the top level directory.

$ 7z l ashleymadisondump.7z

Listing archive: ashleymadisondump.7z

----

Path = ashleymadisondump.7z

Type = 7z

Method = LZM

Solid = +

Blocks = 1

Physical Size = 37796243

Headers Size = 1303



   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2015-07-09 12:25:48 ....A     17271957     37794940  swappernet_User_Table.7z
2015-07-10 12:14:35 ....A       723516               announcement.png
2015-07-01 18:03:56 ....A        51222               ALMCLUSTER (production domain) computers.txt
2015-07-01 17:58:55 ....A       120377               ALMCLUSTER (production domain) hashdump.txt
2015-06-25 22:59:22 ....A        35847               AVIDLIFEMEDIA (primary corporate domain) computers.txt
2015-06-14 21:18:11 ....A       339221               AVIDLIFEMEDIA (primary corporate domain) user information and hashes.txt
2015-07-18 15:23:34 ....A       686533               oneperday.txt
2015-07-18 15:20:43 ....A       959099               oneperday_aminno_member.txt
2015-07-18 19:00:45 ....A      1485289               oneperday_am_am_member.txt
2015-07-19 17:01:11 ....A         6031               README.txt
2015-07-07 11:41:36 ....A         6042               Areas of concern - customer data.docx
2015-07-07 12:14:42 ....A         5907               Sales Queries/ARPU and ARPPU.docx
2015-07-07 12:04:35 ....A       960553               Ashley Madison Technology Stack v5(1).docx
2015-07-07 12:14:42 ....A         5468               Sales Queries/Credit useage.docx
2015-07-07 12:14:43 ....A         5140               Sales Queries/Number of traveling man purchases.docx
2015-07-07 12:14:47 ....A         5489               Sales Queries/Rebill Success Rate Queries.docx
2015-07-07 12:14:43 ....A         5624               Sales Queries/Rev by traffic source rebill broken out.docx
2015-07-07 12:14:42 ....A         6198               Sales Queries/Rev from organic search traffic.docx
2015-07-08 23:17:19 ....A       259565               ALM Floor Plan - ports and names.pdf
2012-10-19 16:54:20 ....A      1794354               ALM Labs Inc. Articles of Incorporation.pdf
2015-07-07 12:04:10 ....A      1766350               Avid Org Chart 2015 - May 14.pdf
2012-10-20 12:23:11 ....A      6344792               Copies of Option Agreements.pdf
2013-09-18 14:39:25 ....A       132798               Noel's loan agreement.pdf
2015-07-07 10:16:54 ....A       380043               [email protected]_20101103_133855.pdf
2012-12-13 15:26:58 ....A        67816               ALM - Corporate Chart.pptx
2015-07-07 12:14:28 ....A      8366232               ALM - January 2015 - Company Overview.pptx
2013-10-07 10:30:28 ....A        67763               20131002-domain-list.xlsx
2013-07-15 15:20:14 ....A        13934               Avid Life Media - Major Shareholders.xlsx
2015-07-09 11:57:58 ....A        22226               Banks.xlsx
2015-07-07 11:41:41 ....A        15703               CSF Questionnaire (Responses).xlsx
2015-07-09 11:57:58 ....A        42511               paypal accounts.xlsx
2015-07-07 12:04:44 ....A        15293               q2 2013 summary compensation detail_managerinput_trevor-s team.xlsx
2015-07-18 13:54:40 D....            0            0  Sales Queries
------------------- ----- ------------ ------------  ------------------------
                              41968893     37794940  32 files, 1 folders

If we’re to believe this meta-data, the newest file is from July 19th 2015 and the oldest is from October 19th 2012. The timestamp for the file announcement.png shows a creation date of July 10th 2015. This file is the graphical announcement from the leakers. The file swappernet_User_Table.7z
has a timestamp of July 9th 2015. Since this file is a database dump, one might presume that these files were created for the original release and the other files were copied from a file-system that preserves timestamps.

Within that 7zip file we’ve found another which looks like:

$ 7z l ashleymadisondump/swappernet_User_Table.7z

Listing archive: ./swappernet_User_Table.7z

----

Path = ./swappernet_User_Table.7z

Type = 7z

Method = LZMA

Solid = -

Blocks = 1

Physical Size = 17271957

Headers Size = 158




   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2015-06-27 18:39:40 ....A     61064200     17271799  swappernet_QA_User_Table.txt
------------------- ----- ------------ ------------  ------------------------
                              61064200     17271799  1 files, 0 folders

Within the ashleymadisondump directory extracted from ashleymadisondump.7z we’ve got
the following file types that we’ll examine for meta-data:

8 txt
8 docx
6 xlsx
6 pdf
2 pptx
1 png
1 7z

The PNG didn’t seem to have any EXIF meta-data, and we’ve already covered the 7z file.

The text files probably don’t usually yield anything to us meta-data wise.

In the MS Word docx files  we have the following meta-data:

  • Areas of concern – customer data.docx
    No Metadata
  • ARPU and ARPPU.docx
    No Metadata
  • Ashley Madison Technology Stack v5(1).docx
    Created Michael Morris, created and last modified on Sep 17 2013.
  • Credit useage.docx
    No Metadata
  • Number of traveling man purchases.docx
    No Metadata
  • Rebill Success Rate Queries.docx
    No Metadata
  • Rev by traffic source rebill broken out.docx
    No Metadata
  • Rev from organic search traffic.docx
    No Metadata

In the MS Powerpoint pptx files we have the following meta-data:

  • ALM – Corporate Chart.pptx
    Created by “Diana Horvat” on Dec 5 2012 and last updated by “Tatiana Kresling”
    on Dec 13th 2012
  • ALM – January 2015 – Company Overview.pptx
    Created Rizwan Jiwan, Jan 21 2011 and last modified on Jan 20 2015.

In the MS Excel xlsx files we have the following meta-data:

  • 20131002-domain-list.xlsx
    Written by Kevin McCall, created and last modified Oct 2nd 2013
  • Avid Life Media – Major Shareholders.xlsx
    Jamal Yehia, created and last modified July 15th 2013
  • Banks.xlsx
    Created by “Elena” and Keith Lalonde, created Dec 15 2009 and last modified Feb 26th  2010
  • CSF Questionnaire (Responses).xlsx
    No Metadata
  • paypal accounts.xlsx
    Created by Keith Lalonde, created Oct 28  2010 and last modified Dec 22nd  2010
  • q2 2013 summary compensation detail_managerinput_trevor-s team.xlsx
    No Metadata

And finally within the PDF files we also see additional meta-data:

  • ALM Floor Plan – ports and names.pdf
    Written by Martin Price in MS Visio, created and last modified April 23 2015
  • ALM Labs Inc. Articles of Incorporation.pdf
    Created with DocsCorp Pty Ltd (www.docscorp.com), created and last modified on Oct 17 2012
  • Avid Org Chart 2015 – May 14.pdf
    Created and last modified on May 14 2015
  • Copies of Option Agreements.pdf
    OmniPage CSDK 16 OcrToolkit, created and last modified on Oct 16 2012
  • Noel’s loan agreement.pdf
    Created and last modified on Sep 18 2013
  • [email protected]_20101103_133855.pdf
    Created and last modified on Jul 7 2015

———–[MySQL Dump file loading and evidence gathering]

At this point all of the dump files have been decompressed with gunzip or 7zip. The dump files are standard MySQL backup file (aka Dump files) the info in the dump files implies that it was taken from multiple servers:

$ grep 'MySQL dump' *.dump
am_am.dump:-- MySQL dump 10.13  Distrib 5.5.33, for Linux (x86_64)
aminno_member.dump:-- MySQL dump 10.13  Distrib 5.5.40-36.1, for Linux (x86_64)
aminno_member_email.dump:-- MySQL dump 10.13  Distrib 5.5.40-36.1, for Linux (x86_64)
member_details.dump:-- MySQL dump 10.13  Distrib 5.5.40-36.1, for Linux (x86_64)
member_login.dump:-- MySQL dump 10.13  Distrib 5.5.40-36.1, for Linux (x86_64)

Also within the dump files was info referencing being executed from localhost, this implies an attacker was on the Database server in question.

Of course, all of this info is just text and can easily be faked, but it’s interesting none-the-less considering the possibility that it might be correct and unaltered.

To load up the MySQL dumps we’ll start with a fresh MySQL database instance
on a decently powerful server and run the following commands:

--As root MySQL user
CREATE DATABASE aminno;
CREATE DATABASE am;
CREATE USER 'am'@'localhost' IDENTIFIED BY 'loyaltyandfidelity';
GRANT ALL PRIVILEGES ON aminno.* TO 'am'@'localhost';
GRANT ALL PRIVILEGES ON am.* TO 'am'@'localhost';

Now back at the command line we’ll execute these to import the main dumps:

$ mysql -D aminno -uam -ployaltyandfidelity < aminno_member.dump

$ mysql -D aminno -uam -ployaltyandfidelity < aminno_member_email.dump

$ mysql -D aminno -uam -ployaltyandfidelity < member_details.dump

$ mysql -D aminno -uam -ployaltyandfidelity < member_login.dump

$ mysql -D am -uam -ployaltyandfidelity < am_am.dump

Now that you’ve got the data loaded up you can recreate some of the findings ksugihara made with his analysis here [Edit: It appears ksugihara has taken this offline, I don’t have a mirror]. We didn’t have much more to add for holistic statistics analysis than what he’s already done so check out his blog post for more on the primary data dumps. There still is one last final database export though…

Within the file ashleymadisondump/swappernet_QA_User_Table.txt we have a final database export, but this one is not in the MySQL dump format. It is instead in CSV format. The file name implies this was an export from a QA Database server.

This file has the following columns (left to right in the CSV):

  • recid
  • id
  • username
  • userpassword
  • refnum
  • disable
  • ipaddress
  • lastlogin
  • lngstatus
  • strafl
  • ap43
  • txtCoupon
  • bot

Sadly within the file we see user passwords are in clear text which is always a bad security practice. At the moment though we don’t know if these are actual production user account passwords, and if so how old they are. My guess is that these are from an old QA server when AM was a smaller company and hadn’t moved to secure password hashing practices like bcrypt.

These commands show us there are 765,607 records in this database export and
only four of them have a blank password. Many of the passwords repeat and
397,974 of the passwords are unique.

$ cut -d , -f 4 < swappernet_QA_User_Table.txt |wc -l
765607
$ cut -d , -f 4 < swappernet_QA_User_Table.txt | sed '/^s*$/d' |wc -l
765603
$ cut -d , -f 4 < swappernet_QA_User_Table.txt | sed '/^s*$/d' |sort -u |wc -l
387974

Next we see the top 25 most frequently used passwords in this database export
using the command:

$ cut -d , -f 4 < swappernet_QA_User_Table.txt |sort|uniq -c |sort -rn|head -25
   5882 123456
   2406 password
    950 pussy
    948 12345
    943 696969
    917 12345678
    902 fuckme
    896 123456789
    818 qwerty
    746 1234
    734 baseball
    710 harley
    699 swapper
    688 swinger
    647 football
    645 fuckyou
    641 111111
    538 swingers
    482 mustang
    482 abc123
    445 asshole
    431 soccer
    421 654321
    414 1111
    408 hunter

After importing the CSV into MS excel we can use sort and filter to make some
additional statements based on the data.

    1. The only logins marked as “lastlogin” column in the year 2015 are from the
      following users:
      SIMTEST101
      SIMTEST130
      JULITEST2
      JULITEST3
      swappernetwork
      JULITEST4
      HEATSEEKERS
    1. The final and most recent login was from AvidLifeMedia’s office IP range.
    2. 275,285 of these users have an entry for the txtCupon.
    3. All users with the “bot” column set to TRUE have either passwords

“statueofliberty” or “cake”

The post A light-weight forensic analysis of the AshleyMadison Hack appeared first on Include Security Research Blog.

❌
❌