❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayPentest/Red Team

Shellcode: Entropy Reduction With Base32 Encoding.

By: odzhan
7 April 2023 at 11:57

Introduction

Compressed, encrypted, and random data all contain a high amount of entropy, which is why many products use entropy analysis to detect malicious code in binaries that have never been examined before.

In a previous post about masking, I suggested using a deterministic random number generator with the Fisher-Yates shuffle to try and scramble data without increasing the entropy that occurs with compression and encryption. Of course, no confidentiality is provided with that approach and may not be desirable.

This got me thinking about what algorithms could be used to reduce entropy and it’s much simpler than I thought. If all we need to do is reduce the number of bits per byte, then simple encoding is the answer. Or am I mistaken? Let me know in the comments.

The disadvantage of using Base32 encoding is obviously an increase in the size of data by approx. 67%. However, if the compression ratio of original data is close to 50-70% (which can be achieved with something like GZip, LZMA or ZPAQ), then the increase after reducing entropy isn’t that bad. We have confidentiality of the data and it looks benign from analysis.

Increasing

Compression algorithms like LZ77, Huffman or Arithmetic coders are designed to remove redundant or repetitive data. Encryption algorithms will use techniques like transposition, byte substitution to make data more unpredictable or seem random. And if they’re good, the original data will appear to be random with no discernible pattern or structure. That makes it difficult to analyse and determine exactly what it is.

Decreasing

Adding redundant data, such as null bytes, can lower the entropy score. However, some tests will disregard zeros because they affect the accuracy of results. The reason for using Base32 and not Base64 is because the latter doesn’t reduce the entropy enough and Base16 is simply going to waste too much space. Base32 isn’t perfect but it’s good enough for demonstrating the idea.

Encoding

There are problems with this code and it’s inadvisable to use it outside this blog. For it to work, inbuf should allow out-of-bound reads up to 5 bytes. Essentially, pad inbuf with at least 5 null bytes. Might seem odd for it to work that way, but it helps later when combining both encoding and decoding. For every encoded byte, two bits will always be zero and you could say that’s a pattern. That can be avoided by flipping bits of some bytes.

size_t
base32_encode(size_t inlen, void *inbuf, void *outbuf) {
    uint8_t *out = (uint8_t*)outbuf;
    uint8_t *in = (uint8_t*)inbuf;
    uint32_t x = 0, z = 0;
    size_t outlen = (inlen + 4) / 5 * 8;
    
    // return size of buffer required?
    if (!outbuf) return outlen;
    
    // encode bytes
    while (outlen) {
        x = (x << 8) | *in++;
        z += 8;
        while (z >= 5) {
            z -= 5;
            *out++ = (x >> z) & 31;
            outlen--;
        }
    }
    // return encoded length
    return (out - (uint8_t*)outbuf);
}

Decoding

This needs to know the original size of data before encoding. That’s not a big issue considering most compression and encryption work the same way. Just include the original length when transporting the encoded data.

void
base32_decode(size_t outlen, void *inbuf, void *outbuf) {
    uint8_t *out = (uint8_t*)outbuf;
    uint8_t *in = (uint8_t*)inbuf;
    uint32_t x = 0, z = 0;
    
    while (outlen) {
        x = (x << 5) | *in++;
        z += 5;
        while (z >= 8) {
            z -= 8;
            *out++ = (x >> z) & 255;
            outlen--;
        }
    }
}

Combined Encoding and Decoding

Well, you may have noticed the similarity between the two by now and thought about combining both. So long as the correct length is calculated for the output, we can indeed combine both and switch between encoding and decoding using a flag.

void
base32(size_t outlen, void *inbuf, void *outbuf, bool encode) {
    uint8_t *out = (uint8_t*)outbuf;
    uint8_t *in = (uint8_t*)inbuf;
    uint32_t x = 0, z = 0;    
    uint8_t wl = 8, rl = 5, m = 255;

    if (encode) {
        rl = 8; // read length
        wl = 5; // write length
        m = 31; // mask
    }
    
    while (outlen) {
        x = (x << rl) | *in++;
        z += rl;
        while (z >= wl) {
            z -= wl;
            *out++ = (x >> z) & m;
            outlen--;
        }
    }
}

Results

To test if the encoding reduces entropy of data, random bytes are generated from the operating system. A Shannon entropy test is used to calculate before and after.

double 
shannon_entropy(void *inbuf, size_t len) {
    uint8_t *in = (uint8_t*)inbuf;
    double entropy = 0;
    uint32_t frequency[256] = {0};

    // Count the frequency of each byte value
    for (size_t i=0; i<len; i++) {
        frequency[in[i]]++;
    }

    // Calculate the entropy
    for (size_t i=0; i<256; i++) {
        if (frequency[i] > 0) {
            double probability = (double) frequency[i] / len;
            entropy -= probability * log2(probability);
        }
    }
    return entropy;
}

Before encoding, the Shannon test scores 7.835897 for 1024 random bytes. After using Base32 to encode it, Shannon reports 4.983226.

Summary

Some of you may be saying: β€œThis isn’t Base32 encoding!” It’s not what’s described in RFC4648 that uses an alphabet as the final step, but they are using the same process. This is only intended to reduce entropy, but wouldn’t take that much for it to work like Base32 described in the RFC. The main difference of course is that padding with the equal sign isn’t used. instead, the code depends on the caller to specify the output length and pad the input buffer for out-of-bound reads.

Finally, the decoding algorithm above can be used to compress strings, but only if each character fits into 5-bits of information. For that reason, it might only be suitable for uppercase or lowercase characters together with a few symbols or digits.

Source code here.

WOW64 Callback Table (FinFisher)

By: odzhan
19 April 2023 at 18:07

Introduction

Ken Johnson (otherwise known as Skywing) first talked about the KiUserExceptionDispatcher back in 2007 . Since then, scattered around the internet are various posts talking about it, but for some reason nobody demonstrating how to use it. It’s been documented that FinFisher misuses the function pointers as part of its virtual machine functionality, so let’s take a look at how to find the table before doing anything creative with it…The code to locate the table didn’t take long and didn’t require looking at FinFisher internals or existing code. It’s a simple heuristic based search.

References

Observations

If you take a look at ntdll!LdrpLoadWow64, that’s called during initialization of a WOW64 process, you’ll see it loading wow64.dll and resolving the address of six exports. This process has been better documented in the posts mentioned above.

  • Wow64LdrpInitialize
  • Wow64PrepareForException
  • Wow64ApcRoutine
  • Wow64PrepareForDebuggerAttach
  • Wow64SuspendLocalThread
  • Wow64SuspendLocalProcess

A closer look at how this works will provide you with an array of function names stored in STRING format and a pointer to a variable that holds each address resolved. The following is my attempt at recreating the same structure.

typedef union _W64_T {
    LPVOID p;
    DWORD64 q;
    LPVOID *pp;
} W64_T;
    
typedef struct _WOW64_CALLBACK {
    STRING Name;
    W64_T  Function;
} WOW64_CALLBACK, *PWOW64_CALLBACK;

//
// Structure based on 64-bit version of NTDLL on Windows 10
//
typedef struct _WOW64_CALLBACK_TABLE {
    WOW64_CALLBACK  Wow64LdrpInitialize;
    WOW64_CALLBACK  Wow64PrepareForException;
    WOW64_CALLBACK  Wow64ApcRoutine;
    WOW64_CALLBACK  Wow64PrepareForDebuggerAttach;
    WOW64_CALLBACK  Wow64SuspendLocalThread;
    WOW64_CALLBACK  Wow64SuspendLocalProcess;
} WOW64_CALLBACK_TABLE, *PWOW64_CALLBACK_TABLE;

WOW64_CALLBACK_TABLE Wow64Table = {
    {RTL_CONSTANT_STRING("Wow64LdrpInitialize"), NULL},
    {RTL_CONSTANT_STRING("Wow64PrepareForException"), NULL},
    {RTL_CONSTANT_STRING("Wow64ApcRoutine"), NULL},
    {RTL_CONSTANT_STRING("Wow64PrepareForDebuggerAttach"), NULL},
    {RTL_CONSTANT_STRING("Wow64SuspendLocalThread"), NULL},
    {RTL_CONSTANT_STRING("Wow64SuspendLocalProcess"), NULL}
    };

Locating Table

There could be a number of ways to do this. In the following example, we search the .rdata section for STRING structures that equal the function pointer we wish to find. Since these strings are constant and unlikely to change, it works reasonably well.

BOOL 
IsReadOnlyPtr(LPVOID ptr) {
    MEMORY_BASIC_INFORMATION mbi;
    
    if (!ptr) return FALSE;
    
    DWORD res = VirtualQuery(ptr, &mbi, sizeof(mbi));
    if (res != sizeof(mbi)) return FALSE;

    return ((mbi.State   == MEM_COMMIT    ) &&
            (mbi.Type    == MEM_IMAGE     ) && 
            (mbi.Protect == PAGE_READONLY));
}

BOOL
GetWow64FunctionPointer(PWOW64_CALLBACK Callback) {
    auto m = (PBYTE)GetModuleHandleW(L"ntdll");
    auto nt = (PIMAGE_NT_HEADERS)(m + ((PIMAGE_DOS_HEADER)m)->e_lfanew);
    auto sh = IMAGE_FIRST_SECTION(nt);
    
    for (DWORD i=0; i<nt->FileHeader.NumberOfSections; i++) {
        if (*(PDWORD)sh[i].Name == *(PDWORD)".rdata") {
            auto rva = sh[i].VirtualAddress;
            auto cnt = (sh[i].Misc.VirtualSize - sizeof(STRING)) / sizeof(ULONG_PTR);
            auto ptr = (PULONG_PTR)(m + rva);
            
            for (DWORD j=0; j<cnt; j++) {
                if (!IsReadOnlyPtr((LPVOID)ptr[j])) continue;
                
                auto api = (PSTRING)ptr[j];

                if (api->Length == Callback->Name.Length && 
                    api->MaximumLength == Callback->Name.MaximumLength) 
                {
                    if (!strncmp(api->Buffer, Callback->Name.Buffer, Callback->Name.Length)) {
                        Callback->Function.p = (PVOID)ptr[j + 1];
                        return TRUE;
                    }
                }
            }
            break;
        }
    }
    return FALSE;
}

void
GetWow64CallbackTable(PWOW64_CALLBACK_TABLE Table) {
    GetWow64FunctionPointer(&Table->Wow64LdrpInitialize);
    GetWow64FunctionPointer(&Table->Wow64PrepareForException);
    GetWow64FunctionPointer(&Table->Wow64ApcRoutine);
    GetWow64FunctionPointer(&Table->Wow64PrepareForDebuggerAttach);
    GetWow64FunctionPointer(&Table->Wow64SuspendLocalThread);
    GetWow64FunctionPointer(&Table->Wow64SuspendLocalProcess);
}

Summary

This type of code isn’t useful to a 32-Bit WOW process without jumping to 64-Bit since the function pointers are stored in the 64-Bit version of NTDLL. There are potentially other uses though like intercepting APCs, anti-debugging and processing exceptions before VEH or SEH, which FinFisher did successfully for many many years….

PoC here

Delegated NT DLL

By: odzhan
13 February 2024 at 20:30

Introduction

redplait and Adam/Hexacorn already documented this in 2017 and 2018 respectively, so it’s not a new discovery. Officially available since RedStone 2 released in April 2017, redplait states it was introduced with insider build 15007 released in January 2017. It has similarities with the WOW64 function table present in AMD64 versions of NTDLL.

References

Observations

Using IDA Pro or Ghidra with support for debugging symbols enabled, the table can be found at ntdll!_LdrpDelegatedNtdllExports

The DLL itself would be found in one of the following paths.

ArchPath
x86C:\Windows\System32\
AMD64C:\Windows\SysWOW64\
ARM64C:\Windows\SysWOW64\

Table

The table containing function pointers on my system appears in the .text section which indicates the DLL was compiled with .rdata and .text merged. Therefore the PoC to locate the table may or may not work for you unless you change the section to .rdata. Safe to assume it’s in read-only memory on some systems but I haven’t checked. It contains the addresses of at least thirteen function pointers located in the .data section. There may be more or less depending on the build.

The interesting thing is that, like the WOW64 table, the NT delegate table provides a simple way to intercept a variety of callbacks in 32-bit mode without the need to overwrite code with inline hooking. Most tools designed to detect malicious hooks look at executable code rather than changes to function pointers that normally reside in read-only or read-write memory.

PoC to find table

Shellcode: Data Masking 2

By: odzhan
29 April 2024 at 08:08

Introduction

This is a quick follow up post to Data Masking that discussed how one might use the Fisher-Yates shuffle and a DRBG to mask shellcode. There’s a lot of ways to mask data that don’t involve using an XOR operation but despite being relatively simple to implement are rarely used. You can use involutions, partial encryption, base encoding, simple arithmetic using addition and subtraction or modular variants. In the past, I’ve found components of block and stream ciphers to be a good source of techniques because the operations need to be invertible. In this post we’ll look at how easy it is to use byte substitution.

Substitution Box

Although substitution ciphers date back to ancient times, DES was first to use fixed s-box arrays for encrypting data. Since then, such non-linear operations have become a standard component of many block ciphers. To implement, we perform the following steps:

  1. Initialize 256-byte array we call β€œs-box” using [0, 255]
  2. Shuffle s-box using random seed.
  3. Create inverse s-box.

This S-Box can then be used for masking data and the inverse can be used for unmasking. The reason we use a random seed to shuffle the s-box is so that the masked data is always different. Here’s a snippet of C code to demonstrate…

//
// simple byte substitution using fisher-yates shuffle and DRBG
//
typedef struct _mask_ctx {
    uint8_t sbox[256], sbox_inv[256];
} mask_ctx;

void
init_mask(mask_ctx *c) {
    uint8_t seed[ENCRYPT_KEY_LEN];
    
    // initialise sbox
    for (int i=0; i<256; i++) {
        c->sbox[i] = (uint8_t)i;
    }
    // initialise seed/key
    random(seed, ENCRYPT_KEY_LEN);

    // shuffle sbox using random seed.
    shuffle(seed, c->sbox, 256);

    // create inverse
    for (int i=0; i<256; i++) {
        c->sbox_inv[c->sbox[i]] = i;
    }
}

// mask buf
void
encode(mask_ctx *c, void *buf, size_t len) {
    uint8_t *x = (uint8_t*)buf;
    
    for (size_t i=0; i<len; i++) {
        x[i] = c->sbox[x[i]];
    }
}

// unmask buf
void
decode(mask_ctx *c, void *buf, size_t len) {
    uint8_t *x = (uint8_t*)buf;
    
    for (size_t i=0; i<len; i++) {
        x[i] = c->sbox_inv[x[i]];
    }
}

void
dump(const char *str, void *buf, size_t len) {
    uint8_t *x = (uint8_t*)buf;
    
    printf("\n%s:\n", str);
    
    for (size_t i=0; i<len; i++) {
        printf(" %02X", x[i]);
    }
}

int
main(int argc, char *argv[]) {
    mask_ctx c;
    uint8_t buf[32];
    
    // using random bytes here for testing..
    random(buf, sizeof(buf));
    
    init_mask(&c);
    dump("raw", buf, sizeof(buf));
    
    encode(&c, buf, sizeof(buf));
    dump("encoded", buf, sizeof(buf));
    
    decode(&c, buf, sizeof(buf));
    dump("decoded", buf, sizeof(buf));
    
    return 0;
}

And here’s the output of the program.

You can simplify the above example by just using the srand(), rand() functions and a modulo operator instead of a DRBG. See the full example here.

Since it’s a bit more complicated than it needs to be. What we can do is repurpose the key initialization of RC4 and derive an inverse lookup table from that. Here’s a little more code to demonstrate.

typedef struct _mask_ctx {
    uint8_t sbox[256];
    uint8_t key[16];
    uint8_t sbox_inv[256];
} mask_ctx;

// initialise using RC4
void
init_mask(mask_ctx *c) {
    // initialise sbox
    for (size_t i=0; i<256; i++) {
        c->sbox[i] = (uint8_t)i;
    }
    // shuffle sbox
    for (size_t i=0, j=0; i<256; i++) {
        j = (j + (c->sbox[i] + c->key[i % 16])) & 255;
        uint8_t t = c->sbox[i] & 255;
        c->sbox[i] = c->sbox[j];
        c->sbox[j] = t;
    }
    // create inverse
    for (size_t i=0; i<256; i++) {
        c->sbox_inv[c->sbox[i]] = i;
    }
}

// mask or unmask
void
mask(uint8_t *sbox, size_t len, void *buf) {
    uint8_t *in = (uint8_t*)buf;
    uint8_t *out = (uint8_t*)buf;
    
    for (size_t i=0; i<len; i++) {
        out[i] = sbox[in[i]];
    }
}

In this case, mask() performs both encoding and decoding depending on the sbox array passed to it. And just for fun the 32-bit assembly code…

;
; Simple obfuscation using byte substitution.
;
    
    bits 32
    
%ifndef BIN
    global _init_mask_x86
    global init_mask_x86
    
    global _mask_x86
    global mask_x86
%endif

    section .text
    
    ;
    ; void init_mask_x86(mask_ctx*c);
    ;
_init_mask_x86:
init_mask_x86:
    pushad
    mov    edi, [esp+32+4]
    push   edi
    pop    esi
    xor    eax, eax          ; i=0
initialise_sbox:
    stosb                    ; c->sbox[i]=i
    inc    al                ; i++
    jnz    initialise_sbox   ; i<256
    cdq                      ; j=0
shuffle_sbox:
    ; j = (j + (c->sbox[i] + c->key[i % 16])) & 255;
    movzx  ecx, al           ; t = i % 16
    and    cl, 15            ;
    
    add    dl, [edi+ecx]     ; j += c->key[i % 16]
    mov    cl, [esi+eax]     ; t = c->sbox[i]
    add    dl, cl            ; j += c->sbox[i]
    xchg   cl, [esi+edx]     ; swap(t, s[j])
    mov    [esi+eax], cl
    
    inc    al                ; i++
    jnz    shuffle_sbox      ; i<256
    add    edi, 16
create_inverse:
    mov    dl, [esi+eax]     ; sbox_inv[sbox[i]] = i
    mov    [edi+edx], al     ; 
    inc    al
    jnz    create_inverse
    
    popad
    ret
    
    ;
    ; void mask_x86(void *sbox, size_t inlen, void *inbuf);
    ;
mask_x86:
_mask_x86:
    pushad
    lea    esi, [esp+32+4]
    lodsd
    xchg   ebx, eax          ; bx = sbox
    lodsd
    xchg   ecx, eax          ; cx = inlen
    lodsd
    xchg   esi, eax          ; si = inbuf
    push   esi
    pop    edi
mask_loop:
    lodsb                    ; al = in[i]
    xlatb                    ; al = sbox[al]
    stosb                    ; out[i] = al
    loop   mask_loop
    popad
    ret

Summary

There’s lots of ways to obfuscate data. Most people will use an XOR but this is a safe indicator of obfuscation and a way to detect it. Using byte substitution only requires a byte-to-byte mapping and probably harder to detect.

❌
❌