Normal view

There are new articles available, click to refresh the page.
Before yesterdaymodexp

Shellcode: Data Masking

By: odzhan
31 July 2022 at 00:01

Introduction

There are more than four ways to mask data, but these are the main ones to focus on in this post.

  1. Lossless Compression
  2. Encryption
  3. Steganography
  4. Shuffling

If we want to detect a compressed or encrypted stream of bytes but can’t rely on a file header for a signature, the best way is by using something like a Chi-Square test. The more uniform the data is, the more likely it is to be compressed or encrypted.

Steganography is better at masking. Some image formats already use lossless compression to reduce the size of files. The PNG format, for example, uses Zlib, and the high compression ratio will result in the file having a high amount of entropy. The GIF format also uses LZW as its compression method but is limited to 256 colours, which results in losing information during the encoding process. Of course, you have the option of parsing GIFs manually, but PNG is probably easier to work with in most image encoding libraries.

Involutions

In mathematics, an involution, or an involutory function, is a function that is its own inverse; For the following instructions, I’m merely using this word to describe what they do in practice. Executed once will mask data, and executing again will unmask. These are very common but also very weak when used alone.

1) eXclusive-OR bitwise operation
    xor   eax, -1

2) Bitwise NOT
    not   eax
    xor   eax, -1

3) Bitwise Negation
    neg   eax
    imul  eax, -1
    
4) Circular Shift
    shrd  eax, eax, 16
    ror   eax, 16
    rol   eax, 16
    ror   al, 4
  
5) Byte Swapping
    bswap eax
    xchg  al, ah

The circular shift and byte swapping operations are much closer to a permutation. They could also be used on large arrays in addition to the shuffling.

Random Shuffling

Let’s imagine you want to shuffle a deck of cards for an online poker game. The shuffling algorithm must be unbiased, and the results can’t be predictable before a game begins. Many who have asked for such an algorithm know of the Fisher-Yates shuffle. It’s an algorithm for generating a random permutation of a finite sequence. It was proposed by Ronald Fisher and Frank Yates in their book Statistical Tables for Biological, Agricultural and Medical Research published in 1939. Richard Durstenfeld modified the algorithm in 1964, and Donald E. Knuth popularised it in his 1968 book The Art of Computer Programming, hence why some refer to it as the Knuth Shuffle.

The following code in C illustrates how one might shuffle a byte array. Here, we’re using the current time as a seed to initialise the PRNG, which wouldn’t be recommended for a poker game. 😀

void
fisher_yates_shuffle(void *inbuf, size_t inlen) {
    uint8_t *in = (uint8_t*)inbuf;
    uint8_t t;
    
    srand(time(0));
    
    for (size_t i = inlen - 1; i > 0; i--) {
        uint32_t j = rand() % (i + 1);
        uint8_t = in[i];
        in[i] = in[j];
        in[j] = t;
    }
}

Obtaining a unique sequence of numbers to shuffle the array is problematic. Most software will use a pseudorandom number generator (PRNG). However, knowing how to generate the same sequence of numbers used to shuffle a deck of cards allows us to determine where every card is and even reverse the process. But that’s precisely what makes Fisher-Yates useful for masking. We want to unshuffle our masked data later; it’s just that rand() isn’t suitable. We need something else.

Keyed/Seeded/Deterministic Shuffling

Apart from rand() being weak for shuffling, unshuffling the array would require starting with the last number returned by it. rand() doesn’t support this type of random access, therefore our unshuffling algorithm would be required to generate the exact same sequence of numbers and store each one in memory before starting to unshuffle. We need a function that can produce deterministic values based on a seed or key. Seeded or keyed shuffling and unshuffling is really what we need.

A PRNG is also a Deterministic Random Bit Generator (DRBG). The DRBG/PRNG-generated sequence is not truly random because an initial value, called the PRNG’s seed (which may include truly random values), entirely determines the output bits generated by it. Therefore, we can replace rand() with a stream cipher like RC4, ChaCha, or a block cipher like AES in Counter (CTR) mode and generate deterministic values.

NIST has defined how to construct a DRBG from CTR mode in SP 800-90Ar1, but it’s unnecessary to use this for masking. Rather than implement a DRBG, we just need to encrypt the range index using a secret key and then derive an unbiased number within that range from the ciphertext. The following code tries to demonstrate how it might be done in practice.

#if defined(_WIN64)
//
// SPECK128-256
//
#define WORDLEN 64
#define PRNG_MAX_INT (INT64_MAX + 1)
#define ENCRYPT_KEY_LEN 32
#define ENCRYPT_BLOCK_LEN 16
#define R(v,n)(((v)>>(n))|((v)<<(64-(n))))
typedef unsigned long long W;

void 
encrypt(void*mk,void*p){
    W k[4],*x=(W*)p,i,t;

    for (i=0; i<4; i++) k[i] = ((W*)mk)[i];

    for (i=0; i<34; i++) {
        x[1] = (R(x[1], 8) + x[0]) ^ k[0],
        x[0] = R(x[0], 61) ^ x[1],
        k[1] = (R(k[1], 8) + k[0]) ^ i,
        k[0] = R(k[0], 61) ^ k[1];
        t = k[1], k[1] = k[2], k[2] = k[3], k[3] = t;
    }
}

#else
//
// SPECK64-128
//
#define WORDLEN 32
#define PRNG_MAX_INT (INT32_MAX + 1)
#define ENCRYPT_KEY_LEN 16
#define ENCRYPT_BLOCK_LEN 8
#define R(v,n)(((v)>>(n))|((v)<<(32-(n))))
typedef unsigned int W;

void 
encrypt(void* mk, void* p) {
    W k[4],*x=(W*)p,i,t;

    for (i=0; i<4; i++) k[i] = ((W*)mk)[i];

    for (i=0; i<27; i++) {
        x[0] = (R(x[0], 8) + x[1]) ^ k[0],
        x[1] = R(x[1], 29) ^ x[0],
        t = k[3],
        k[3] = (R(k[1], 8) + k[0]) ^ i,
        k[0] = R(k[0], 29) ^ k[3],
        k[1] = k[2], k[2]=t;
    }
}
#endif

W
prng_word(void *key, W max) {
    W r, x[2], ctr = 1, d = ((-max) / max) + 1;
    if (d == 0) return 0;
    
    for (;;) {
        x[0] = max;
        x[1] = ctr++;
        
        encrypt(key, x);
        
        r = x[0] / d;
        if (r < max) return r;
    }
}

void
shuffle(void *seed, void *inbuf, size_t inlen) {
    uint8_t *in = (uint8_t*)inbuf;
    
    for (size_t i = inlen - 1; i > 0; i--) {
        uint32_t j = prng_word(seed, (i + 1));
        uint8_t t = in[i];
        in[i] = in[j];
        in[j] = t;
    }
}

void
unshuffle(void *seed, void *inbuf, size_t inlen) {
    uint8_t *in = (uint8_t*)inbuf;
    
    for (size_t i = 0; i < inlen; i++) {
        uint32_t j = prng_word(seed, (i + 1));
        uint8_t t = in[i];
        in[i] = in[j];
        in[j] = t;
    }
}

There are times when elements of the array will remain in the same position after shuffling. This typically happens with small arrays. In that case, something else is required for masking. Now, if you know of a way to fix that, feel free to leave a comment or drop me an email.

Summary

Shuffling doesn’t provide any confidentiality for the masked data like encryption does and doesn’t reduce its size like compression does. However, shuffling a large enough array using a secure cipher and secret key to generate a sequence of numbers can probably make it difficult to recover the original data without the key used to initialise the PRNG. That seems helpful in masking data and better than an XOR. But of course, something like this is in no way intended or implied to be a suitable replacement for encryption and shouldn’t be used for any critical information!

References

Shellcode: Base-N Decoding for Text-Only Compression and Obfuscation

By: odzhan
22 August 2022 at 01:00

Introduction

Concise Binary Object Representation (CBOR), the binary equivalent to JavaScript Object Notation (JSON), is ideal for storing a configuration to a shellcode/stager/loader. I’ve always wanted support for text-only compression to store API strings and URLs. CBOR currently doesn’t support compression, and while Zlib is recommended quite a lot for JSON, it wasn’t designed for short strings/input. A format like CBOR would benefit by supporting text-only compression, encryption and masking natively. In the meantime, however, developers are responsible for implementing those features independently.

Before we cover Base-N decoding, we should talk about some well-known compression algorithms and why they’re unsuitable for short inputs. Huffman encoding, for example, is a lossless compression method that assigns shorter bit strings to a range of bytes. The most frequently used bytes are assigned the least amount of bits, helping reduce the size of the original input. Recovering the original data requires the same bit-to-byte mappings used during encoding. These mappings, also known as “Huffman tables”, are stored with compressed data and can sometimes require more space than the input itself.

LZ encoding also isn’t suitable since it works by storing full strings or a “match reference” that consists of an offset and length to the same range of bytes found earlier. Zlib and LZMA are excellent compression algorithms, but are obviously designed specifically for large data blocks rather than short strings.

In this blog post, we’ll examine how effective it is to use Base-N decoding for text-only compression. It’s similar to Huffman encoding, but without the need for Huffman tables. The results will be compared with some of the following projects designed for compressing short strings:

UniShox2 is considered the best of all and uses a combination of three encoding methods:

  • Entropy coding (Huffman, Arithmetic)
  • Dictionary coder (LZ77,LZ78,LZW)
  • Delta encoding

Applications

In case you’re wondering why on earth compressing short strings would be useful, I’ve copied the following list of applications from the UniShox2 repository for you to consider.

  • Compression for low memory devices such as Arduino and ESP8266
  • Sending messages over Websockets
  • Compression of Chat application text exchange including Emojis
  • Storing compressed text in databases
  • Faster retrieval speed when used as join keys
  • Bandwidth cost saving for messages transferred to and from Cloud infrastructure
  • Storage cost reduction for Cloud databases
  • Some people even use it for obfuscation

Base-64 Encoding

I’ll assume most of you are familiar with Base-64 encoding, but not necessarily how it works internally. It’s a binary-to-text encoding scheme that converts 24-bits of binary to a 32-bit string. It uses 8-Bit ASCII characters to store 6-Bits of binary, which increases the data by approx. 33%. For example, encoding 32 bytes of binary would require 44 bytes of space for the encoded string. To calculate the necessary space, we divide the length of the binary by three and multiply by four. Taking into account any padding, we then align up by four. In C, we can use something like the following:

uint32_t OutLength = (((4 * (InLength / 3)) + 3) & -4).

The following is Base-64 encoding without using a lookup table.

#define ROTL32(v,n)(((v)<<(n))|((v)>>(32-(n))))

void
base64_encode(void *inbuf, int inlen, char *outbuf) {
    uint8_t  *in = (uint8_t*)inbuf;
    char     *out = outbuf;
    int      i;
    uint32_t len=0;

    while (inlen) {
        uint32_t x = 0;
        uint8_t c;
        // read 3 or less bytes. if required, pad with zeros
        for (len=i=0; i<3; i++) {
            x |= (i < inlen) ? in[len++] : 0;
            x <<= 8;
        }
        in += len;
        inlen -= len;
        len = (len * 8 + 4) / 6;
        // encode len bytes.
        for (i=0; i<len; i++) {
            x = ROTL32(x, 6);
            c = x % 64;
            if (c < 26) c += 'A';
            else if (c < 52) c = (c - 26) + 'a';
            else if (c < 62) c = (c - 52) + '0';
            else if (c == 63) c = '+';
            else c = '/';
            *out++ = c;
        }
    }
    // if required, add padding.
    while (len++ < 4) *out++ = '=';
    *out = 0;
}

Base-N Decoding

Since Base-64 encoding will increase the original data by 33%, what prevents us from using Base-64 decoding to reduce the size of arbitrary strings by 25%? The compression ratio upon conversion to binary entirely depends on what characters the string contains, so you’ll get different results depending on the input. However, decoding should always result in some compression of the original string. The following table lists the approximate decrease in space used by a string when using various Base-N decoding.

Base Input Alphabet % Decrease
2 64 x “0” 01 88
10 18446744073709551615 0123456789 50
16 FFFFFFFFFFFFFFFF 0123456789ABCDEF 44
26 THEQUICKBROWNFOXJUMPSOVERTHELAZYDOG ABCDEFGHIJKLMNOPQRSTUVWXYZ
40
36
THEQUICKBROWNFOXJUMPSOVERTHELAZYDOG2 ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 34
52 Thequickbrownfoxjumpsoverthelazydog ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 29
62 Thequickbrownfoxjumpsoverthelazydog2 ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789 25

As you can see, a higher base number results in a lower compression ratio. And, of course, there are more printable characters required for punctuation, which will only decrease it further. My intention here isn’t to compete with or replace existing string compression tools. I’m merely pointing out that anyone can use Base-N decoding to compress strings with little effort. The following code in C can be used as a reference.

Base-N Compression with 64-Bit Integers

//
// Compress string using Base-N decoding.
//
uint64_t
base_n_compress(char str[], char base_tbl[]) {
    uint64_t val = 0, pwr = 1;

    size_t inlen = strlen(str);
    size_t base_n = strlen(base_tbl);

    for (size_t i=0; i<inlen; i++) {
        const char *ptr = strchr(base_tbl, str[i]);
        if (!ptr) return 0;
        int idx = (ptr - base_tbl) + 1;
        val += pwr * idx; 
        pwr *= base_n;
    }
    return val;
}

//
// Decompress string using Base-N encoding.
//
void 
base_n_decompress(uint64_t val, char base_tbl[], char str[]) { 
    size_t   base_n = strlen(base_tbl);
    uint64_t pwr = base_n;
    int      outlen, i;
    
    val--;
    
    for (outlen = 1; val >= pwr; outlen++) {
        val -= pwr; 
        pwr *= base_n;
    }
    
    str[outlen] = 0;

    for (i = 0; i < outlen; i++) { 
        str[i] = base_tbl[val % base_n]; 
        val /= base_n; 
    }
}

The only problem with this code is when the string converted to binary exceeds 2^{64} bits. Then we need to use bignum arithmetic. Of course, you won’t have that problem in some languages that already support multi-precision arithmetic. Getting a Python implementation of the same code without the 2^{64} bits limit is relatively simple.

Base-N Compression with Arbitrary Arithmetic

There are no limits to string compression once we start using bignum arithmetic. However, it makes more sense to use an algorithm designed specifically for large data blocks at some point. To demonstrate how it works with OpenSSL’s BIGNUM implementation. The following two functions work well for strings that might exceed 2^{64} bits. This code resolves the limitations of the previous code.

//
// Compress a string using Base-N decoding.
//
void
base_n_compress(BIGNUM *val, const char *str, const char *base_tbl) {
    size_t inlen = strlen(str);
    BIGNUM *pwr = BN_new();
    BIGNUM *tmp = BN_new();
    BIGNUM *base = BN_new();
    BN_CTX *ctx = BN_CTX_new();
    
    BN_one(pwr);  // pwr = 1
    BN_set_word(base, strlen(base_tbl));
    
    // for each character
    for (size_t i=0; i<inlen; i++) {
        // get index
        uint8_t idx = 0;
        const char *ptr = strchr(base_tbl, str[i]);
        if (!ptr) break;
        idx = (uint8_t)(ptr - base_tbl) + 1;
        
        BN_set_word(tmp, idx);       //
        BN_mul(tmp, tmp, pwr, ctx);  // tmp = pwr * idx
        BN_add(val, val, tmp);       // val += tmp
        BN_mul(pwr, pwr, base, ctx); // pwr *= base
    }
    
    BN_free(pwr);
    BN_free(tmp);
    BN_free(base);
    BN_CTX_free(ctx);
}

//
// Decompress a string using Base-N encoding.
//
std::string
base_n_decompress(BIGNUM *val, char *alpha) {
    const char *base_tbl = alpha;
    size_t outlen;
    std::string str;
    
    BIGNUM *pwr = BN_new();
    BIGNUM *base = BN_new();
    BIGNUM *rem = BN_new();
    BN_CTX *ctx = BN_CTX_new();
    
    BN_sub_word(val, 1);                 // val--;
    BN_set_word(pwr, strlen(base_tbl));  // pwr = strlen(base_tbl)
    BN_set_word(base, strlen(base_tbl)); // base = strlen(base_tbl)
    
    // obtain the length of string
    for (outlen=1; BN_cmp(val, pwr) >= 0; outlen++) {
        BN_sub(val, val, pwr);          // val -= pwr; 
        BN_mul(pwr, pwr, base, ctx);    // pwr *= base
    }
    
    // write string to buffer
    for (size_t i=0; i<outlen; i++) {
        BN_div(val, rem, val, base, ctx); // val % tbl_len
        unsigned long r = BN_get_word(rem);
        str.push_back(base_tbl[r]);
    }
    
    BN_free(pwr);
    BN_free(rem);
    BN_free(base);
    BN_CTX_free(ctx);
    
    return str;
}

Frequency Analysis

Base-N decoding doesn’t choose the length of bit strings optimally. It doesn’t assign the shortest amount of bits to bytes that occur more frequently in the string like Huffman encoding. If we only use a base number equal to the length of unique characters in the string, we can compress it much better. The following code can generate an optimal alphabet based on the string to compress.

//
// Generate an alphabet for optimal compression.
//
void
generate_alphabet(char *alpha, std::string str) {
    std::unordered_map<char, int> freq;
    
    // count frequency of each character in string we want to compress.
    for (const char &c: str) {
        freq[c]++;
    }
    
    // convert map to a vector and sort in ascending order.
    std::vector<std::pair<char, int>> elems(freq.begin(), freq.end());
    std::sort(elems.begin(), elems.end(), [](auto &left, auto &right) {
        return left.second > right.second;
    });

    // save each character to output buffer.
    for (auto &pair : elems) {
        *alpha++ = pair.first;
    }
}

We perform the same tests as before and see a distinct improvement. However, the higher compression ratio is more likely the result of a smaller lookup table/base number rather than sorting the most frequent characters in ascending order.

Base Input Alphabet % Decrease
1 64 x “0” 0 99
9 18446744073709551615 457106938 60
1 FFFFFFFFFFFFFFFF F 94
26 THEQUICKBROWNFOXJUMPSOVERTHELAZYDOG OERUHTNGWBKCIQFXJMPSVLAZYD
40
27
THEQUICKBROWNFOXJUMPSOVERTHELAZYDOG2 OERUTHNGW2BKCIQFXJMPSVLAZYD 39
26 Thequickbrownfoxjumpsoverthelazydog oeruhfwbkciqTngxjmpsvtlazyd 40
28 Thequickbrownfoxjumpsoverthelazydog2 oeruhfwbkciqTngxjmpsvtl2azyd 39

Compared to Other Libraries

The following examples are from the UniShox2 repository. Green columns highlight the best ratio, but these are only preliminary tests. The Base-N decoding uses frequency analysis before compression. I would not want to claim that Base-N compression outperforms UniShox2!

String Size UniShox2 Base-N
Decoding
Shoco
Beauty is not in the face. Beauty is a light in the heart. 58 30 31
46
The quick brown fox jumps over the lazy dog. 44 31 27 38
WRITING ENTIRELY IN BLOCK CAPITALS IS SHOUTING, and it’s rude 61 47 38 58
Rose is a rose is a rose is a rose. 35 12 14 25
039f7094-83e4-4d7f-aa38-8844c67bd82d 36 18 18 36
2021-07-15T16:37:35.897Z 24 9 12
24
(760) 756-7568 14 7 6 14
This is a loooooooooooooooooooooong string 42 15 19 25

Summary

We see that Base-N decoding, which works similar to Huffman encoding, can be effective for compressing and obfuscating short strings. The results are even better when frequency analysis occurs before compression. Shuffling the bits used in the base table makes it possible to have a type of “polymorphic text-to-binary” algorithm. There are limitations, of course, like the need for multi-precision arithmetic when the conversion of string to binary exceeds 2^{32} or 2^{64} bit integers. However, perhaps someone will devise a more optimal algorithm that avoids the need for such.

Shellcode: Entropy Reduction With Base32 Encoding.

By: odzhan
7 April 2023 at 11:57

Introduction

Compressed, encrypted, and random data all contain a high amount of entropy, which is why many products use entropy analysis to detect malicious code in binaries that have never been examined before.

In a previous post about masking, I suggested using a deterministic random number generator with the Fisher-Yates shuffle to try and scramble data without increasing the entropy that occurs with compression and encryption. Of course, no confidentiality is provided with that approach and may not be desirable.

This got me thinking about what algorithms could be used to reduce entropy and it’s much simpler than I thought. If all we need to do is reduce the number of bits per byte, then simple encoding is the answer. Or am I mistaken? Let me know in the comments.

The disadvantage of using Base32 encoding is obviously an increase in the size of data by approx. 67%. However, if the compression ratio of original data is close to 50-70% (which can be achieved with something like GZip, LZMA or ZPAQ), then the increase after reducing entropy isn’t that bad. We have confidentiality of the data and it looks benign from analysis.

Increasing

Compression algorithms like LZ77, Huffman or Arithmetic coders are designed to remove redundant or repetitive data. Encryption algorithms will use techniques like transposition, byte substitution to make data more unpredictable or seem random. And if they’re good, the original data will appear to be random with no discernible pattern or structure. That makes it difficult to analyse and determine exactly what it is.

Decreasing

Adding redundant data, such as null bytes, can lower the entropy score. However, some tests will disregard zeros because they affect the accuracy of results. The reason for using Base32 and not Base64 is because the latter doesn’t reduce the entropy enough and Base16 is simply going to waste too much space. Base32 isn’t perfect but it’s good enough for demonstrating the idea.

Encoding

There are problems with this code and it’s inadvisable to use it outside this blog. For it to work, inbuf should allow out-of-bound reads up to 5 bytes. Essentially, pad inbuf with at least 5 null bytes. Might seem odd for it to work that way, but it helps later when combining both encoding and decoding. For every encoded byte, two bits will always be zero and you could say that’s a pattern. That can be avoided by flipping bits of some bytes.

size_t
base32_encode(size_t inlen, void *inbuf, void *outbuf) {
    uint8_t *out = (uint8_t*)outbuf;
    uint8_t *in = (uint8_t*)inbuf;
    uint32_t x = 0, z = 0;
    size_t outlen = (inlen + 4) / 5 * 8;
    
    // return size of buffer required?
    if (!outbuf) return outlen;
    
    // encode bytes
    while (outlen) {
        x = (x << 8) | *in++;
        z += 8;
        while (z >= 5) {
            z -= 5;
            *out++ = (x >> z) & 31;
            outlen--;
        }
    }
    // return encoded length
    return (out - (uint8_t*)outbuf);
}

Decoding

This needs to know the original size of data before encoding. That’s not a big issue considering most compression and encryption work the same way. Just include the original length when transporting the encoded data.

void
base32_decode(size_t outlen, void *inbuf, void *outbuf) {
    uint8_t *out = (uint8_t*)outbuf;
    uint8_t *in = (uint8_t*)inbuf;
    uint32_t x = 0, z = 0;
    
    while (outlen) {
        x = (x << 5) | *in++;
        z += 5;
        while (z >= 8) {
            z -= 8;
            *out++ = (x >> z) & 255;
            outlen--;
        }
    }
}

Combined Encoding and Decoding

Well, you may have noticed the similarity between the two by now and thought about combining both. So long as the correct length is calculated for the output, we can indeed combine both and switch between encoding and decoding using a flag.

void
base32(size_t outlen, void *inbuf, void *outbuf, bool encode) {
    uint8_t *out = (uint8_t*)outbuf;
    uint8_t *in = (uint8_t*)inbuf;
    uint32_t x = 0, z = 0;    
    uint8_t wl = 8, rl = 5, m = 255;

    if (encode) {
        rl = 8; // read length
        wl = 5; // write length
        m = 31; // mask
    }
    
    while (outlen) {
        x = (x << rl) | *in++;
        z += rl;
        while (z >= wl) {
            z -= wl;
            *out++ = (x >> z) & m;
            outlen--;
        }
    }
}

Results

To test if the encoding reduces entropy of data, random bytes are generated from the operating system. A Shannon entropy test is used to calculate before and after.

double 
shannon_entropy(void *inbuf, size_t len) {
    uint8_t *in = (uint8_t*)inbuf;
    double entropy = 0;
    uint32_t frequency[256] = {0};

    // Count the frequency of each byte value
    for (size_t i=0; i<len; i++) {
        frequency[in[i]]++;
    }

    // Calculate the entropy
    for (size_t i=0; i<256; i++) {
        if (frequency[i] > 0) {
            double probability = (double) frequency[i] / len;
            entropy -= probability * log2(probability);
        }
    }
    return entropy;
}

Before encoding, the Shannon test scores 7.835897 for 1024 random bytes. After using Base32 to encode it, Shannon reports 4.983226.

Summary

Some of you may be saying: “This isn’t Base32 encoding!” It’s not what’s described in RFC4648 that uses an alphabet as the final step, but they are using the same process. This is only intended to reduce entropy, but wouldn’t take that much for it to work like Base32 described in the RFC. The main difference of course is that padding with the equal sign isn’t used. instead, the code depends on the caller to specify the output length and pad the input buffer for out-of-bound reads.

Finally, the decoding algorithm above can be used to compress strings, but only if each character fits into 5-bits of information. For that reason, it might only be suitable for uppercase or lowercase characters together with a few symbols or digits.

Source code here.

WOW64 Callback Table (FinFisher)

By: odzhan
19 April 2023 at 18:07

Introduction

Ken Johnson (otherwise known as Skywing) first talked about the KiUserExceptionDispatcher back in 2007 . Since then, scattered around the internet are various posts talking about it, but for some reason nobody demonstrating how to use it. It’s been documented that FinFisher misuses the function pointers as part of its virtual machine functionality, so let’s take a look at how to find the table before doing anything creative with it…The code to locate the table didn’t take long and didn’t require looking at FinFisher internals or existing code. It’s a simple heuristic based search.

References

Observations

If you take a look at ntdll!LdrpLoadWow64, that’s called during initialization of a WOW64 process, you’ll see it loading wow64.dll and resolving the address of six exports. This process has been better documented in the posts mentioned above.

  • Wow64LdrpInitialize
  • Wow64PrepareForException
  • Wow64ApcRoutine
  • Wow64PrepareForDebuggerAttach
  • Wow64SuspendLocalThread
  • Wow64SuspendLocalProcess

A closer look at how this works will provide you with an array of function names stored in STRING format and a pointer to a variable that holds each address resolved. The following is my attempt at recreating the same structure.

typedef union _W64_T {
    LPVOID p;
    DWORD64 q;
    LPVOID *pp;
} W64_T;
    
typedef struct _WOW64_CALLBACK {
    STRING Name;
    W64_T  Function;
} WOW64_CALLBACK, *PWOW64_CALLBACK;

//
// Structure based on 64-bit version of NTDLL on Windows 10
//
typedef struct _WOW64_CALLBACK_TABLE {
    WOW64_CALLBACK  Wow64LdrpInitialize;
    WOW64_CALLBACK  Wow64PrepareForException;
    WOW64_CALLBACK  Wow64ApcRoutine;
    WOW64_CALLBACK  Wow64PrepareForDebuggerAttach;
    WOW64_CALLBACK  Wow64SuspendLocalThread;
    WOW64_CALLBACK  Wow64SuspendLocalProcess;
} WOW64_CALLBACK_TABLE, *PWOW64_CALLBACK_TABLE;

WOW64_CALLBACK_TABLE Wow64Table = {
    {RTL_CONSTANT_STRING("Wow64LdrpInitialize"), NULL},
    {RTL_CONSTANT_STRING("Wow64PrepareForException"), NULL},
    {RTL_CONSTANT_STRING("Wow64ApcRoutine"), NULL},
    {RTL_CONSTANT_STRING("Wow64PrepareForDebuggerAttach"), NULL},
    {RTL_CONSTANT_STRING("Wow64SuspendLocalThread"), NULL},
    {RTL_CONSTANT_STRING("Wow64SuspendLocalProcess"), NULL}
    };

Locating Table

There could be a number of ways to do this. In the following example, we search the .rdata section for STRING structures that equal the function pointer we wish to find. Since these strings are constant and unlikely to change, it works reasonably well.

BOOL 
IsReadOnlyPtr(LPVOID ptr) {
    MEMORY_BASIC_INFORMATION mbi;
    
    if (!ptr) return FALSE;
    
    DWORD res = VirtualQuery(ptr, &mbi, sizeof(mbi));
    if (res != sizeof(mbi)) return FALSE;

    return ((mbi.State   == MEM_COMMIT    ) &&
            (mbi.Type    == MEM_IMAGE     ) && 
            (mbi.Protect == PAGE_READONLY));
}

BOOL
GetWow64FunctionPointer(PWOW64_CALLBACK Callback) {
    auto m = (PBYTE)GetModuleHandleW(L"ntdll");
    auto nt = (PIMAGE_NT_HEADERS)(m + ((PIMAGE_DOS_HEADER)m)->e_lfanew);
    auto sh = IMAGE_FIRST_SECTION(nt);
    
    for (DWORD i=0; i<nt->FileHeader.NumberOfSections; i++) {
        if (*(PDWORD)sh[i].Name == *(PDWORD)".rdata") {
            auto rva = sh[i].VirtualAddress;
            auto cnt = (sh[i].Misc.VirtualSize - sizeof(STRING)) / sizeof(ULONG_PTR);
            auto ptr = (PULONG_PTR)(m + rva);
            
            for (DWORD j=0; j<cnt; j++) {
                if (!IsReadOnlyPtr((LPVOID)ptr[j])) continue;
                
                auto api = (PSTRING)ptr[j];

                if (api->Length == Callback->Name.Length && 
                    api->MaximumLength == Callback->Name.MaximumLength) 
                {
                    if (!strncmp(api->Buffer, Callback->Name.Buffer, Callback->Name.Length)) {
                        Callback->Function.p = (PVOID)ptr[j + 1];
                        return TRUE;
                    }
                }
            }
            break;
        }
    }
    return FALSE;
}

void
GetWow64CallbackTable(PWOW64_CALLBACK_TABLE Table) {
    GetWow64FunctionPointer(&Table->Wow64LdrpInitialize);
    GetWow64FunctionPointer(&Table->Wow64PrepareForException);
    GetWow64FunctionPointer(&Table->Wow64ApcRoutine);
    GetWow64FunctionPointer(&Table->Wow64PrepareForDebuggerAttach);
    GetWow64FunctionPointer(&Table->Wow64SuspendLocalThread);
    GetWow64FunctionPointer(&Table->Wow64SuspendLocalProcess);
}

Summary

This type of code isn’t useful to a 32-Bit WOW process without jumping to 64-Bit since the function pointers are stored in the 64-Bit version of NTDLL. There are potentially other uses though like intercepting APCs, anti-debugging and processing exceptions before VEH or SEH, which FinFisher did successfully for many many years….

PoC here

Delegated NT DLL

By: odzhan
13 February 2024 at 20:30

Introduction

redplait and Adam/Hexacorn already documented this in 2017 and 2018 respectively, so it’s not a new discovery. Officially available since RedStone 2 released in April 2017, redplait states it was introduced with insider build 15007 released in January 2017. It has similarities with the WOW64 function table present in AMD64 versions of NTDLL.

References

Observations

Using IDA Pro or Ghidra with support for debugging symbols enabled, the table can be found at ntdll!_LdrpDelegatedNtdllExports

The DLL itself would be found in one of the following paths.

ArchPath
x86C:\Windows\System32\
AMD64C:\Windows\SysWOW64\
ARM64C:\Windows\SysWOW64\

Table

The table containing function pointers on my system appears in the .text section which indicates the DLL was compiled with .rdata and .text merged. Therefore the PoC to locate the table may or may not work for you unless you change the section to .rdata. Safe to assume it’s in read-only memory on some systems but I haven’t checked. It contains the addresses of at least thirteen function pointers located in the .data section. There may be more or less depending on the build.

The interesting thing is that, like the WOW64 table, the NT delegate table provides a simple way to intercept a variety of callbacks in 32-bit mode without the need to overwrite code with inline hooking. Most tools designed to detect malicious hooks look at executable code rather than changes to function pointers that normally reside in read-only or read-write memory.

PoC to find table

❌
❌