Kernel Karnage – Part 1

21 October 2021 at 15:13

I start the first week of my internship in true spooktober fashion as I dive into a daunting subject that’s been scaring me for some time now: The Windows Kernel.

1. KdPrint(“Hello, world!\n”);

When I finished my previous internship, which was focused on bypassing Endpoint Detection and Response (EDR) software and Anti-Virus (AV) software from a user land point of view, we joked around with the idea that the next topic would be defeating the same problem but from kernel land. At that point in time, I had no experience at all with the Windows kernel and it all seemed very advanced and above my level of technical ability. As I write this blogpost, I have to admit it wasn’t as scary or difficult as I thought it to be; C/C++ is still C/C++ and assembly instructions are still headache-inducing, but comprehensible with the right resources and time dedication.

In this first post, I will lay out some of the technical concepts and ideas behind the goal of this internship, as well as reflect back on my first steps in successfully bypassing/disabling a reputable Anti-Virus product, but more on that later.

2. BugCheck?

To set this rollercoaster in motion, I highly recommend checking out this post in which I briefly covered User Space (and Kernel Space to a certain extent) and how EDRs interact with them.

In short, the Windows OS roughly consists of 2 layers, User Space and Kernel Space.

User Space or user land contains the Windows Native API: ntdll.dll, the WIN32 subsystem: kernel32.dll, user32.dll, advapi.dll,... and all the user processes and applications. When applications or processes need more advanced access or control to hardware devices, memory, CPU, etc., they will use ntdll.dll to talk to the Windows kernel.

The functions contained in ntdll.dll will load a number, called “the system service number”, into the EAX register of the CPU and then execute the syscall instruction (x64-bit), which starts the transition to kernel mode while jumping to a predefined routine called the system service dispatcher. The system service dispatcher performs a lookup in the System Service Dispatch Table (SSDT) using the number in the EAX register as an index. The code then jumps to the relevant system service and returns to user mode upon completion of execution.

Kernel Space or kernel land is the bottom layer in between User Space and the hardware and consists of a number of different elements. At the heart of Kernel Space we find ntoskrnl.exe or as we’ll call it: the kernel. This executable houses the most critical OS code, like thread scheduling, interrupt and exception dispatching, and various kernel primitives. It also contains the different managers such as the I/O manager and memory manager. Next to the kernel itself, we find device drivers, which are loadable kernel modules. I will mostly be messing around with these, since they run fully in kernel mode. Apart from the kernel itself and the various drivers, Kernel Space also houses the Hardware Abstraction Layer (HAL), win32k.sys, which mainly handles the User Interface (UI), and various system and subsystem processes (Lsass.exe, Winlogon.exe, Services.exe, etc.), but they’re less relevant in relation to EDRs/AVs.

Opposed to User Space, where every process has its own virtual address space, all code running in Kernel Space shares a single common virtual address space. This means that a kernel-mode driver can overwrite or write to memory belonging to other drivers, or even the kernel itself. When this occurs and results in the driver crashing, the entire operating system will crash.

In 2005, with the first x64-bit edition of Windows XP, Microsoft introduced a new feature called Kernel Patch Protection (KPP), colloquially known as PatchGuard. PatchGuard is responsible for protecting the integrity of the Window kernel, by hashing its critical structures and performing comparisons at random time intervals. When PatchGuard detects a modification, it will immediately Bugcheck the system (KeBugCheck(0x109);), resulting in the infamous Blue Screen Of Death (BSOD) with the message: “CRITICAL_STRUCTURE_CORRUPTION”.

3. A battle on two fronts

The goal of this internship is to develop a kernel driver that will be able to disable, bypass, mislead, or otherwise hinder EDR/AV software on a target. So what exactly is a driver, and why do we need one?

As stated in the Microsoft Documentation, a driver is a software component that lets the operating system and a device communicate with each other. Most of us are familiar with the term “graphics card driver”; we frequently need to update it to support the latest and greatest games. However, not all drivers are tied to a piece of hardware, there is a separate class of drivers called Software Drivers.

Software drivers run in kernel mode and are used to access protected data that is only available in kernel mode, from a user mode application. To understand why we need a driver, we have to look back in time and take into consideration how EDR/AV products work or used to work.

Obligatory disclaimer: I am by no means an expert and a lot of the information used to write this blog post comes from sources which may or may not be trustworthy, complete or accurate.

EDR/AV products have adapted and evolved over time with the increased complexity of exploits and attacks. A common way to detect malicious activity is for the EDR/AV to hook the WIN32 API functions in user land and transfer execution to itself. This way when a process or application calls a WIN32 API function, it will pass through the EDR/AV so it can be inspected and either allowed, or terminated. Malware authors bypassed this hooking method by directly using the underlying Windows Native API (ntdll.dll) functions instead, leaving the WIN32 API functions mostly untouched. Naturally, the EDR/AV products adapted, and started hooking the Windows Native API functions. Malware authors have used several methods to circumvent these hooks, using techniques such as direct syscalls, unhooking and more. I recommend checking out A tale of EDR bypass methods by @ShitSecure (S3cur3Th1sSh1t).

When the battle could no longer be fought in user land (since Windows Native API is the lowest level), it transitioned into kernel land. Instead of hooking the Native API functions, EDR/AV started patching the System Service Dispatch Table (SSDT). Sounds familiar? When execution from ntdll.dll is transitioned to the system service dispatcher, the lookup in the SSDT will yield a memory address belonging to a EDR/AV function instead of the original system service. This practice of patching the SSDT is risky at best, because it affects the entire operating system and if something goes wrong it will result in a crash.

With the introduction of PatchGuard (KPP), Microsoft made an end to patching SSDT in x64-bit versions of Windows (x86 is unaffected) and instead introduced a new feature called Kernel Callbacks. A driver can register a callback for a certain action. When this action is performed, the driver will receive either a pre- or post-action notification.

EDR/AV products make heavy use of these callbacks to perform their inspections. A good example would be the PsSetCreateProcessNotifyRoutine() callback:

When a user application wants to spawn a new process, it will call the CreateProcessW() function in kernel32.dll, which will then trigger the create process callback, letting the kernel know a new process is about to be created.
Meanwhile the EDR/AV driver has implemented the PsSetCreateProcessNotifyRoutine() callback and assigned one of its functions (0xFA7F) to that callback.
The kernel registers the EDR/AV driver function address (0xFA7F) in the callback array.
The kernel receives the process creation callback from CreateProcessW() and sends a notification to all the registered drivers in the callback array.
The EDR/AV driver receives the process creation notification and executes its assigned function (0xFA7F).
The EDR/AV driver function (0xFA7F) instructs the EDR/AV application running in user land to inject into the User Application’s virtual address space and hook ntdll.dll to transfer execution to itself.

With EDR/AV products transitioning to kernel space, malware authors had to follow suit and bring their own kernel driver to get back on equal footing. The job of the malicious driver is fairly straight forward: eliminate the kernel callbacks to the EDR/AV driver. So how can this be achieved?

An evil application in user space is aware we want to run Mimikatz.exe, a well known tool to extract plaintext passwords, hashes, PIN codes and Kerberos tickets from memory.
The evil application instructs the evil driver to disable the EDR/AV product.
The evil driver will first locate and read the callback array and then patch any entries belonging to EDR/AV drivers by replacing the first instruction in their callback function (0xFA7F) with a return RET (0xC3) instruction.
Mimikatz.exe can now run and will call ReadProcessMemory(), which will trigger a callback.
The kernel receives the callback and sends a notification to all the registered drivers in the callback array.
The EDR/AV driver receives the process creation notification and executes its assigned function (0xFA7F).
The EDR/AV driver function (0xFA7F) executes the RET (0xC3) instruction and immediately returns.
Execution resumes with ReadProcessMemory(), which will call NtReadVirtualMemory(), which in turn will execute the syscall and transition into kernel mode to read the lsass.exe process memory.

4. Don’t reinvent the wheel

Armed with all this knowledge, I set out to put the theory into practice. I stumbled upon Windows Kernel Ps Callback Experiments by @fdiskyou which explains in depth how he wrote his own evil driver and evilcli user application to disable EDR/AV as explained above. To use the project you need Visual Studio 2019 and the latest Windows SDK and WDK.

I also set up two virtual machines configured for remote kernel debugging with WinDbg

Windows 10 build 19042
Windows 11 build 21996

With the following options enabled:

bcdedit /set TESTSIGNING ON
bcdedit /debug on
bcdedit /dbgsettings serial debugport:2 baudrate:115200
bcdedit /set hypervisorlaunchtype off

To compile and build the driver project, I had to make a few modifications. First the build target should be Debug – x64. Next I converted the current driver into a primitive driver by modifying the evil.inf file to meet the new requirements.

;
; evil.inf
;

[Version]
Signature="$WINDOWS NT$"
Class=System
ClassGuid={4d36e97d-e325-11ce-bfc1-08002be10318}
Provider=%ManufacturerName%
DriverVer=
CatalogFile=evil.cat
PnpLockDown=1

[DestinationDirs]
DefaultDestDir = 12


[SourceDisksNames]
1 = %DiskName%,,,""

[SourceDisksFiles]


[DefaultInstall.ntamd64]

[Standard.NT$ARCH$]


[Strings]
ManufacturerName="<Your manufacturer name>" ;TODO: Replace with your manufacturer name
ClassName=""
DiskName="evil Source Disk"

Once the driver compiled and got signed with a test certificate, I installed it on my Windows 10 VM with WinDbg remotely attached. To see kernel debug messages in WinDbg I updated the default mask to 8: kd> ed Kd_Default_Mask 8.

sc create evil type= kernel binPath= C:\Users\Cerbersec\Desktop\driver\evil.sys
sc start evil

Using the evilcli.exe application with the -l flag, I can list all the registered callback routines from the callback array for process creation and thread creation. When I first tried this I immediately bluescreened with the message “Page Fault in Non-Paged Area”.

5. The mystery of 3 bytes

This BSOD message is telling me I’m trying to access non-committed memory, which is an immediate bugcheck. The reason this happened has to do with Windows versioning and the way we find the callback array in memory.

Locating the callback array in memory by hand is a trivial task and can be done with WinDbg or any other kernel debugger. First we disassemble the PsSetCreateProcessNotifyRoutine() function and look for the first CALL (0xE8) instruction.

Next we disassemble the PspSetCreateProcessNotifyRoutine() function until we find a LEA (0x4C 0x8D 0x2D) (load effective address) instruction.

Then we can inspect the memory address that LEA puts in the r13 register. This is the callback array in memory.

To view the different drivers in the callback array, we need to perform a logical AND operation with the address in the callback array and 0xFFFFFFFFFFFFFFF8.

The driver roughly follows the same method to locate the callback array in memory; by calculating offsets to the instructions we looked for manually, relative to the PsSetCreateProcessNotifyRoutine() function base address, which we obtain using the MmGetSystemRoutineAddress() function.

ULONG64 FindPspCreateProcessNotifyRoutine()
{
	LONG OffsetAddr = 0;
	ULONG64	i = 0;
	ULONG64 pCheckArea = 0;
	UNICODE_STRING unstrFunc;

	RtlInitUnicodeString(&unstrFunc, L"PsSetCreateProcessNotifyRoutine");
    //obtain the PsSetCreateProcessNotifyRoutine() function base address
	pCheckArea = (ULONG64)MmGetSystemRoutineAddress(&unstrFunc);
	KdPrint(("[+] PsSetCreateProcessNotifyRoutine is at address: %llx \n", pCheckArea));

    //loop though the base address + 20 bytes and search for the right OPCODE (instruction)
    //we're looking for 0xE8 OPCODE which is the CALL instruction
	for (i = pCheckArea; i < pCheckArea + 20; i++)
	{
		if ((*(PUCHAR)i == OPCODE_PSP[g_WindowsIndex]))
		{
			OffsetAddr = 0;

			//copy 4 bytes after CALL (0xE8) instruction, the 4 bytes contain the relative offset to the PspSetCreateProcessNotifyRoutine() function address
			memcpy(&OffsetAddr, (PUCHAR)(i + 1), 4);
			pCheckArea = pCheckArea + (i - pCheckArea) + OffsetAddr + 5;

			break;
		}
	}

	KdPrint(("[+] PspSetCreateProcessNotifyRoutine is at address: %llx \n", pCheckArea));
	
    //loop through the PspSetCreateProcessNotifyRoutine base address + 0xFF bytes and search for the right OPCODES (instructions)
    //we're looking for 0x4C 0x8D 0x2D OPCODES which is the LEA, r13 instruction
	for (i = pCheckArea; i < pCheckArea + 0xff; i++)
	{
		if (*(PUCHAR)i == OPCODE_LEA_R13_1[g_WindowsIndex] && *(PUCHAR)(i + 1) == OPCODE_LEA_R13_2[g_WindowsIndex] && *(PUCHAR)(i + 2) == OPCODE_LEA_R13_3[g_WindowsIndex])
		{
			OffsetAddr = 0;

            //copy 4 bytes after LEA, r13 (0x4C 0x8D 0x2D) instruction
			memcpy(&OffsetAddr, (PUCHAR)(i + 3), 4);
            //return the relative offset to the callback array
			return OffsetAddr + 7 + i;
		}
	}

	KdPrint(("[+] Returning from CreateProcessNotifyRoutine \n"));
	return 0;
}

The takeaways here are the OPCODE_*[g_WindowsIndex] constructions, where OPCODE_*[g_WindowsIndex] are defined as:

UCHAR OPCODE_PSP[]	 = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe8, 0xe8, 0xe8, 0xe8, 0xe8, 0xe8 };
//process callbacks
UCHAR OPCODE_LEA_R13_1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x4c, 0x4c, 0x4c, 0x4c, 0x4c, 0x4c };
UCHAR OPCODE_LEA_R13_2[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d };
UCHAR OPCODE_LEA_R13_3[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2d, 0x2d, 0x2d, 0x2d, 0x2d, 0x2d };
// thread callbacks
UCHAR OPCODE_LEA_RCX_1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x48, 0x48, 0x48, 0x48, 0x48 };
UCHAR OPCODE_LEA_RCX_2[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d };
UCHAR OPCODE_LEA_RCX_3[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d };

And g_WindowsIndex acts as an index based on the Windows build number of the machine (osVersionInfo.dwBuildNumer).

To solve the mystery of the BSOD, I compared debug output with manual calculations and found out that my driver had been looking for the 0x00 OPCODE instead of the 0xE8 (CALL) OPCODE to obtain the base address of the PspSetCreateProcessNotifyRoutine() function. The first 0x00 OPCODE it finds is located at a 3 byte offset from the 0xE8 OPCODE, resulting in an invalid offset being copied by the memcpy() function.

After adjusting the OPCODE array and the function responsible for calculating the index from the Windows build number, the driver worked just fine.

6. Driver vs Anti-Virus

To put the driver to the test, I installed it on my Windows 11 VM together with a reputable anti-virus product. After patching the AV driver callback routines in the callback array, mimikatz.exe was successfully executed.

When returning the AV driver callback routines back to their original state, mimikatz.exe was detected and blocked upon execution.

7. Conclusion

We started this first internship post by looking at User vs Kernel Space and how EDRs interact with them. Since the goal of the internship is to develop a kernel driver to hinder EDR/AV software on a target, we have then discussed the concept of kernel drivers and kernel callbacks and how they are used by security software. As a first practical example, we used evilcli, combined with some BSOD debugging to patch the kernel callbacks used by an AV product and have Mimikatz execute undetected.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

Normal view