## Introduction

There are more than four ways to mask data, but these are the main ones to focus on in this post.

- Lossless Compression
- Encryption
- Steganography
- Shuffling

If we want to detect a compressed or encrypted stream of bytes but canβt rely on a file header for a signature, the best way is by using something like a Chi-Square test. The more uniform the data is, the more likely it is to be compressed or encrypted.

Steganography is better at masking. Some image formats already use lossless compression to reduce the size of files. The PNG format, for example, uses Zlib, and the high compression ratio will result in the file having a high amount of entropy. The GIF format also uses LZW as its compression method but is limited to 256 colours, which results in losing information during the encoding process. Of course, you have the option of parsing GIFs manually, but PNG is probably easier to work with in most image encoding libraries.

## Involutions

In mathematics, an involution, or an involutory function, is a function that is its own inverse; For the following instructions, Iβm merely using this word to describe what they do in practice. Executed once will mask data, and executing again will unmask. These are very common but also very weak when used alone.

1) eXclusive-OR bitwise operation xor eax, -1 2) Bitwise NOT not eax xor eax, -1 3) Bitwise Negation neg eax imul eax, -1 4) Circular Shift shrd eax, eax, 16 ror eax, 16 rol eax, 16 ror al, 4 5) Byte Swapping bswap eax xchg al, ah

The circular shift and byte swapping operations are much closer to a permutation. They could also be used on large arrays in addition to the shuffling.

## Random Shuffling

Letβs imagine you want to shuffle a deck of cards for an online poker game. The shuffling algorithm must be unbiased, and the results canβt be predictable before a game begins. Many who have asked for such an algorithm know of the Fisher-Yates shuffle. Itβs an algorithm for generating a random permutation of a finite sequence. It was proposed by Ronald Fisher and Frank Yates in their book Statistical Tables for Biological, Agricultural and Medical Research published in 1939. Richard Durstenfeld modified the algorithm in 1964, and Donald E. Knuth popularised it in his 1968 book The Art of Computer Programming, hence why some refer to it as the Knuth Shuffle.

The following code in C illustrates how one might shuffle a byte array. Here, weβre using the current time as a seed to initialise the PRNG, which wouldnβt be recommended for a poker game.

void fisher_yates_shuffle(void *inbuf, size_t inlen) { uint8_t *in = (uint8_t*)inbuf; uint8_t t; srand(time(0)); for (size_t i = inlen - 1; i > 0; i--) { uint32_t j = rand() % (i + 1); uint8_t = in[i]; in[i] = in[j]; in[j] = t; } }

Obtaining a unique sequence of numbers to shuffle the array is problematic. Most software will use a pseudorandom number generator (PRNG). However, knowing how to generate the same sequence of numbers used to shuffle a deck of cards allows us to determine where every card is and even reverse the process. But thatβs precisely what makes Fisher-Yates useful for masking. We want to unshuffle our masked data later; itβs just that rand() isnβt suitable. We need something else.

## Keyed/Seeded/Deterministic Shuffling

Apart from rand() being weak for shuffling, unshuffling the array would require starting with the last number returned by it. rand() doesnβt support this type of random access, therefore our unshuffling algorithm would be required to generate the exact same sequence of numbers and store each one in memory before starting to unshuffle. We need a function that can produce deterministic values based on a seed or key. Seeded or keyed shuffling and unshuffling is really what we need.

A PRNG is also a Deterministic Random Bit Generator (DRBG). The DRBG/PRNG-generated sequence is not truly random because an initial value, called the PRNGβs seed (which may include truly random values), entirely determines the output bits generated by it. Therefore, we can replace rand() with a stream cipher like RC4, ChaCha, or a block cipher like AES in Counter (CTR) mode and generate deterministic values.

NIST has defined how to construct a DRBG from CTR mode in SP 800-90Ar1, but itβs unnecessary to use this for masking. Rather than implement a DRBG, we just need to encrypt the range index using a secret key and then derive an unbiased number within that range from the ciphertext. The following code tries to demonstrate how it might be done in practice.

#if defined(_WIN64) // // SPECK128-256 // #define WORDLEN 64 #define PRNG_MAX_INT (INT64_MAX + 1) #define ENCRYPT_KEY_LEN 32 #define ENCRYPT_BLOCK_LEN 16 #define R(v,n)(((v)>>(n))|((v)<<(64-(n)))) typedef unsigned long long W; void encrypt(void*mk,void*p){ W k[4],*x=(W*)p,i,t; for (i=0; i<4; i++) k[i] = ((W*)mk)[i]; for (i=0; i<34; i++) { x[1] = (R(x[1], 8) + x[0]) ^ k[0], x[0] = R(x[0], 61) ^ x[1], k[1] = (R(k[1], 8) + k[0]) ^ i, k[0] = R(k[0], 61) ^ k[1]; t = k[1], k[1] = k[2], k[2] = k[3], k[3] = t; } } #else // // SPECK64-128 // #define WORDLEN 32 #define PRNG_MAX_INT (INT32_MAX + 1) #define ENCRYPT_KEY_LEN 16 #define ENCRYPT_BLOCK_LEN 8 #define R(v,n)(((v)>>(n))|((v)<<(32-(n)))) typedef unsigned int W; void encrypt(void* mk, void* p) { W k[4],*x=(W*)p,i,t; for (i=0; i<4; i++) k[i] = ((W*)mk)[i]; for (i=0; i<27; i++) { x[0] = (R(x[0], 8) + x[1]) ^ k[0], x[1] = R(x[1], 29) ^ x[0], t = k[3], k[3] = (R(k[1], 8) + k[0]) ^ i, k[0] = R(k[0], 29) ^ k[3], k[1] = k[2], k[2]=t; } } #endif W prng_word(void *key, W max) { W r, x[2], ctr = 0, d = ((-max) / max) + 1; if (d == 0) return 0; for (;;) { x[0] = max; x[1] = ctr++; encrypt(key, x); r = x[0] / d; if (r < max) return r; } } void shuffle(void *seed, void *inbuf, size_t inlen) { uint8_t *in = (uint8_t*)inbuf; for (size_t i = inlen - 1; i > 0; i--) { uint32_t j = prng_word(seed, (i + 1)); uint8_t t = in[i]; in[i] = in[j]; in[j] = t; } } void unshuffle(void *seed, void *inbuf, size_t inlen) { uint8_t *in = (uint8_t*)inbuf; for (size_t i = 0; i < inlen; i++) { uint32_t j = prng_word(seed, (i + 1)); uint8_t t = in[i]; in[i] = in[j]; in[j] = t; } }

There are times when elements of the array will remain in the same position after shuffling. This typically happens with small arrays. In that case, something else is required for masking. Now, if you know of a way to fix that, feel free to leave a comment or drop me an email.

## Summary

Shuffling doesnβt provide any confidentiality for the masked data like encryption does and doesnβt reduce its size like compression does. However, shuffling a large enough array using a secure cipher and secret key to generate a sequence of numbers can probably make it difficult to recover the original data without the key used to initialise the PRNG. That seems helpful in masking data and better than an XOR. But of course, something like this is in no way intended or implied to be a suitable replacement for encryption and shouldnβt be used for any critical information!

## References

- Counter-based random number generator (CBRNG)
- Efficiently Generating a Number in a Range
- How to Win at Poker, and Other Science Lessons
- The definitive guide to Modulo Bias and how to avoid it!

odzhan