Among the variety of penetration testing engagements NCC Group delivers, some – often within the gaming industry – require performing the assignment in a blackbox fashion against an obfuscated binary, and the client’s priorities revolve more around evaluating the strength of their obfuscation against content protection violations, rather than exercising the application’s security boundaries.
The following post aims at providing insight into the tools and methods used to conduct those engagements using real-world examples. While this approach allows for describing techniques employed by actual protections, only a subset of the material can be explicitly listed here (see disclaimer for more information).
Unpacking Phase
When first attempting to analyze a hostile binary, the first step is generally to unpack the actual contents of its sections from runtime memory. The standard way to proceed consists of letting the executable run until the unpacking stub has finished deobfuscating, decompressing and/or deciphering the executable’s sections. The unpacked binary can then be reconstructed, by dumping the recovered sections into a new executable and (usually) rebuilding the imports section from the recovered IAT(Import Address Table).
This can be accomplished in many ways including:
Debugging manually and using plugins such as Scylla to reconstruct the imports section
Python scripting leveraging Windows debugging libraries like winappdbg and executable file format libraries like pefile
Intel Pintools dynamically instrumenting the binary at run-time (JIT instrumentation mode recommended to avoid integrity checks)
Expectedly, these approaches can be thwarted by anti-debug mechanisms and various detection mechanisms which, in turn, can be evaded via more debugger plugins such as ScyllaHide or by implementing various hooks such as those highlighted by ICPin. Finally, the original entry point of the application can usually be identified by its immediate calls to canonical C++ language’s internal initialization functions such as _initterm() and _initterm_e.
While the dynamic method is usually sufficient, the below samples highlight automated implementations that were successfully used via a python script to handle a simple packer that did not require imports rebuilding, and a versatile (albeit slower) dynamic execution engine implementation allowing a more granular approach, fit to uncover specific behaviors.
Control Flow Flattening
Once unpacked, the binary under investigation exposes a number of functions obfuscated using control flow graph (CFG) flattening, a variety of antidebug mechanisms, and integrity checks. Those can be identified as a preliminary step by running instrumented under ICPin (sample output below).
Overview
When disassembled, the CFG of each obfuscated function exhibits the pattern below: a state variable has been added to the original flow, which gets initialized in the function prologue and the branching structure has been replaced by a loop of pointer table-based dispatchers (highlighted in white).
Each dispatch loop level contains between 2 and 16 indirect jumps to basic blocks (BBLs) actually implementing the function’s logic.
There are a number of ways to approach this problem, but the CFG flattening implemented here can be handled using a fully symbolic approach that does not require a dynamic engine, nor a real memory context. The first step is, for each function, to identify the loop using a loop-matching algorithm, then run a symbolic engine through it, iterating over all the possible index values and building an index-to-offset map, with the original function’s logic implemented within the BBL-chains located between the blocks belonging to the loop:
Real Destination(s) Recovery
The following steps consist of leveraging the index-to-offset map to reconnect these BBL-chains with each other, and recreate the original control-flow graph. As can be seen in the captures below, the value of the state variable is set using instruction-level obfuscation. Some BBL-chains only bear a static possible destination which can be swiftly evaluated.
For dynamic-destination BBL-chains, once the register used as a state variable has been identified, the next step is to identify the determinant symbols, i.e, the registers and memory locations (globals or local variables) that affect the value of the state register when re-entering the dispatch loop.
This can be accomplished by computing the intermediate language representation (IR) of the assembly flow graph (or BBLs) and building a dependency graph from it. Here we are taking advantage of a limitation of the obfuscator: the determinants for multi-destination BBLs are always contained within the BBL subgraph formed between two dispatchers.
With those determinants identified, the task that remains is to identify what condition these determinants are fulfilling, as well as what destinations in code we jump to once the condition has been evaluated. The Z3 SMT solver from Microsoft is traditionally used around dynamic symbolic engines (DSE) as a means to finding input values leading to new paths. Here, the deobfusactor uses its capabilities to identify the type of comparison the instructions are replacing.
For example, for the equal pattern, the code asks Z3 if 2 valid destination indexes (D1 and D2) exist such that:
If the determinants are equal, the value of the state register is equal to D1
If the determinants are different, the value of the state register is equal to D2
Finally, the corresponding instruction can be assembled and patched into the assembly, replacing the identified patterns with equivalent assembly sequences such as the ones below, where
mod0 and mod1 are the identified determinants
#SREG is the state register, now free to be repurposed to store the value of one of the determinants (which may be stored in memory):
#OFFSET0 is the offset corresponding to the destination index if the tested condition is true
#OFFSET1 is the offset corresponding to the destination index if the tested condition is false
The resulting CFG, since every original block has been reattached directly to its real target(s), effectively separates the dispatch loop from the significant BBLs. Below is the result of this first pass against a sample function:
This approach does not aim at handling all possible theoretical cases; it takes advantage of the fact that the obfuscator only transforms a small set of arithmetic operations.
Integrity Check Removal
Once the flow graph has been unflattened, the next step is to remove the integrity checks. These can mostly be identified using a simple graph matching algorithm (using Miasm’s “MatchGraphJoker” expressions) which also constitutes a weakness in the obfuscator. In order to account for some corner cases, the detection logic implemented here involves symbolically executing the identified loop candidates, and recording their reads against the .text section in order to provide a robust identification.
On the above graph, the hash verification flow is highlighted in yellow and the failure case (in this case, sending the execution to an address with invalid instructions) in red. Once the loop has been positively identified, the script simply links the green basic blocks to remove the hash check entirely.
“Dead” Instructions Removal
The resulting assembly is unflattened, and does not include the integrity checks anymore, but still includes a number of “dead” instructions which do not have any effect on the function’s logic and can be removed. For example, in the sample below, the value of EAX is not accessed between its first assignment and its subsequent ones. Consequently, the first assignment of EAX, regardless of the path taken, can be safely removed without altering the function’s logic.
Using a dependency graph (depgraph) again, but this time, keeping a map of ASM <-> IR (one-to-many), the following pass removes the assembly instructions for which the depgraph has determined all corresponding IRs are non-performative.
Finally, the framework-provided simplifications, such as bbl-merger can be applied automatically to each block bearing a single successor, provided the successor only has a single predecessor. The error paths can also be identified and “cauterized”, which should be a no-op since they should never be executed but smoothen the rebuilding of the executable.
A Note On Antidebug Mechanisms
While a number of canonical anti-debug techniques were identified in the samples; only a few will be covered here as the techniques are well-known and can be largely ignored.
PEB->isBeingDebugged
In the example below, the function checks the PEB for isBeingDebugged (offset 0x2) and send the execution into a stack-mangling loop before continuing execution which is leads to a certain crash, obfuscating context from a naive debugging attempt.
Debug Interrupts
Another mechanism involves debug software interrupts and vectored exception handlers, but is rendered easily comprehensible once the function has been processed. The code first sets two local variables to pseudorandom constant values, then registers a vectored exception handler via a call to AddVectoredExceptionHandler. An INT 0x3 (debug interrupt) instruction is then executed (via the indirect call to ISSUE_INT3_FN), but encoded using the long form of the instruction: 0xCD 0x03.
After executing the INT 0x3 instruction, the code flow is resumed in the exception handler as can be seen below.
If the exception code from the EXCEPTION_RECORD structure is a debug breakpoint, a bitwise NOT is applied to one of the constants stored on stack. Additionally, the Windows interrupt handler handles every debug exception assuming they stemmed from executing the short version of the instruction (0xCC), so were a debugger to intercept the exception, those two elements need to be taken into consideration in order for execution to continue normally.
Upon continuing execution, a small arithmetic operation checks that the addition of one of the initially set constants (0x8A7B7A99) and a third one (0x60D7B571) is equal to the bitwise NOT of the second initial constant (0x14ACCFF5), which is the operation performed by the exception handler.
A variant using the same exception handler operates in a very similar manner, substituting the debug exception with an access violation triggered via allocating a guard page and accessing it (this behavior is also flagged by ICPin).
Rebuilding The Executable
Once all the passes have been applied to all the obfuscated functions, the patches can be recorded, then applied to a free area of the new executable, and a JUMP is inserted at the function’s original offset.
Example of a function before and after deobfuscation:
Obfuscator’s Integrity Checking Internals
It is generally unnecessary to dig into the details of an obfuscator’s integrity checking mechanism; most times, as described in the previous example, identifying its location or expected result is sufficient to disable it. However, this provides a good opportunity to demonstrate the use of a DSE to address an obfuscator’s internals – theoretically its most hardened part.
ICPin output immediately highlights a number of code locations performing incremental reads on addresses in the executable’s .text section. Some manual investigation of these code locations points us to the spot where a function call or branching instruction switches to the obfuscated execution flow. However, there are no clearly defined function frames and the entire set of executed instructions is too large to display in IDA.
In order to get a sense of the execution flow, a simple jitter callback can be used to gather all the executed blocks as the engine runs through the code. Looking at the discovered blocks, it becomes apparent that the code uses conditional instructions to alter the return address on the stack, and hides its real destination with opaque predicates and obfuscated logic.
Starting with that information, it would be possible to take a similar approach as in the previous example and thoroughly rebuild the IR CFG, apply simplifications, and recompile the new assembly using LLVM. However, in this instance, armed with the knowledge that this obfuscated code implements an integrity check, it is advantageous to leverage the capabilities of a DSE.
A CFG of the obfuscated flow can still be roughly computed, by recording every block executed and adding edges based on the tracked destinations. The stock simplifications and SSA form can be used to obtain a graph of the general shape below:
Deciphering The Data Blobs
On a first run attempt, one can observe 8-byte reads from blobs located in two separate memory locations in the .text section, which are then processed through a loop (also conveniently identified by the tracking engine). With the memX symbols representing constants in memory, and blob0 representing the sequentially read input from a 32bit ciphertext blob, the symbolic values extracted from the blobs look as follows, looping 32 times:
Consequently, the 128-bit key can be trivially recovered from the remaining memory locations identified by the symbolic engine.
Extracting The Offset Ranges
With the decryption cipher identified, the next step is to reverse the logic of computing ranges of memory to be hashed. Here again, the memory tracking execution engine proves useful and provides two data points of interest: – The binary is not hashed in a continuous way; rather, 8-byte offsets are regularly skipped – A memory region is iteratively accessed before each hashing
Using a DSE such as this one, symbolizing the first two bytes of the memory region and letting it run all the way to the address of the instruction that reads memory, we obtain the output below (edited for clarity):
The accessed memory’s symbolic addresses alone provide a clear hint at the encoding: only 7 of the bits of each symbolized byte are used to compute the address. Looking further into the accesses, the second byte is only used if the first byte’s most significant bit is not set, which tracks with a simple unsigned integer base-128 compression. Essentially, the algorithm reads one byte at a time, using 7 bits for data, and using the last bit to indicate whether one or more byte should be read to compute the final value.
Identifying The Hashing Algorithm
In order to establish whether the integrity checking implements a known hashing algorithm, despite the static disassembly showing no sign of known constants, a memory tracking symbolic execution engine can be used to investigate one level deeper. Early in the execution (running the obfuscated code in its entirety may take a long time), one can observe the following pattern, revealing well-known SHA1 constants.
Examining the relevant code addresses (as seen in the SSA notation below), it becomes evident that, in order to compute the necessary hash constants, a simple XOR instruction is used with two otherwise meaningless constants, rendering algorithm identification less obvious from static analysis alone.
And the expected SHA1 constants are stored on the stack:
Additionally, the SHA1 algorithm steps can be further observed in the SSA graph, such as the ROTL-5 and ROTL-30 operations, plainly visible in the IL below.
Final Results
The entire integrity checking logic recovered from the obfuscator implemented in Python below was verified to produce the same digest, as when running under the debugger, or a straightforward LLVM jitter. The parse_ranges() function handles the encoding, while the accumulate_bytes() generator handles the deciphering and processing of both range blobs and skipped offset blobs.
Once the hashing of the memory ranges dictated by the offset table has completed, the 64bit values located at the offsets deciphered from the second blob are subsequently hashed. Finally, once the computed hash value has been successfully compared to the valid digest stored within the RWX .text section of the executable, the execution flow is deemed secure and the obfuscator proceeds to decipher protected functions within the .text section.
def parse_ranges(table):
ranges = []
rangevals = []
tmp = []
for byte in table:
tmp.append(byte)
if not byte&0x80:
val = 0
for i,b in enumerate(tmp):
val |= (b&0x7F)<<(7*i)
rangevals.append(val)
tmp = [] # reset
offset = 0
for p in [(rangevals[i], rangevals[i+1]) for i in range(0, len(rangevals), 2)]:
offset += p[0]
if offset == 0xFFFFFFFF:
break
ranges.append((p[0], p[1]))
offset += p[1]
return ranges
def accumulate_bytes(r, s):
# TEA Key is 128 bits
dw6 = 0xF866ED75
dw7 = 0x31CFE1EF
dw4 = 0x1955A6A0
dw5 = 0x9880128B
key = struct.pack('IIII', dw6, dw7, dw4, dw5)
# Decipher ranges plaintext
ranges_blob = pe[pe.virt2off(r[0]):pe.virt2off(r[0])+r[1]]
ranges = parse_ranges(Tea(key).decrypt(ranges_blob))
# Decipher skipped offsets plaintext (8bytes long)
skipped_blob = pe[pe.virt2off(s[0]):pe.virt2off(s[0])+s[1]]
skipped_decrypted = Tea(key).decrypt(skipped_blob)
skipped = sorted( \
[int.from_bytes(skipped_decrypted[i:i+4], byteorder='little', signed=False) \
for i in range(0, len(skipped_decrypted), 4)][:-2:2] \
)
skipped_copy = skipped.copy()
next_skipped = skipped.pop(0)
current = 0x0
for rr in ranges:
current += rr[0]
size = rr[1]
# Get the next 8 bytes to skip
while size and next_skipped and next_skipped = 0
yield blob
current = next_skipped+8
next_skipped = skipped.pop(0) if skipped else None
blob = pe[pe.rva2off(current):pe.rva2off(current)+size]
yield blob
current += len(blob)
# Append the initially skipped offsets
yield b''.join(pe[pe.rva2off(rva):pe.rva2off(rva)+0x8] for rva in skipped_copy)
return
def main():
global pe
hashvalue = hashlib.sha1()
hashvalue.update(b'\x7B\x0A\x97\x43')
with open(argv[1], "rb") as f:
pe = PE(f.read())
accumulator = accumulate_bytes((0x140A85B51, 0xFCBCF), (0x1409D7731, 0x12EC8))
# Get all hashed bytes
for blob in accumulator:
hashvalue.update(blob)
print(f'SHA1 FINAL: {hashvalue.hexdigest()}')
return
Disclaimer
None of the samples used in this publication were part of an NCC Group engagement. They were selected from publicly available binaries whose obfuscators exhibited features similar to previously encountered ones.
Due to the nature of this material, specific content had to be redacted, and a number of tools that were created as part of this effort could not be shared publicly.
Despite these limitations, the author hopes the technical content shared here is sufficient to provide the reader with a stimulating read.
Vendor: McAfee
Vendor URL: https://kc.mcafee.com/corporate/index?page=content&id=sb10361
Versions affected: Prior to 7.3.0 HF1
Systems Affected: Windows OSs without NULL page protection
Author: Balazs Bucsay <balazs.bucsay[ at ]nccgroup[.dot.]com> @xoreipeip
CVE Identifier: CVE-2021-23893
Risk: 8.8 - CWE-269: Improper Privilege Management
Summary
McAfee’s Complete Data Protection package contained the Drive Encryption (DE) software. This software was used to transparently encrypt the drive contents. The versions prior to 7.3.0 HF1 had a vulnerability in the kernel driver MfeEpePC.sys that could be exploited on certain Windows systems for privilege escalation or DoS.
Impact
Privilege Escalation vulnerability in a Windows system driver of McAfee Drive Encryption (DE) prior to 7.3.0 could allow a local non-admin user to gain elevated system privileges via exploiting an unutilized memory buffer.
Details
The Drive Encryption software’s kernel driver was loaded to the kernel at boot time and certain IOCTLs were available for low-privileged users.
One of the available IOCTL was referencing an event that was set to NULL before initialization. In case the IOCTL was called at the right time, the procedure used NULL as an event and referenced the non-existing structure on the NULL page.
If the user mapped the NULL page and created a fake structure there that mimicked a real Even structure, it was possible to manipulate certain regions of the memory and eventually execute code in the kernel.
Recommendation
Install or update Disk Encryption 7.3.0 HF1, which has this vulnerability fixed.
Vendor Communication
February 24, 2021: Vulnerability was reported to McAfee
March 9, 2021: McAfee was able to reproduce the crash with the originally provided DoS exploit
October 1, 2021: McAfee released the new version of DE, which fixes the issue
Acknowledgements
Thanks to the Cedric Halbronn for his support during the development of the exploit.
About NCC Group
NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.
An overview of the vulnerability assigned CVE-2021-31956 (NTFS Paged Pool Memory corruption) and how to trigger
An introduction into the Windows Notification Framework (WNF) from an exploitation perspective
Exploit primitives which can be built using WNF
In this article I aim to build on that previous knowledge and cover the following areas:
Exploitation without the CVE-2021-31955 information disclosure
Enabling better exploit primitives through PreviousMode
Reliability, stability and exploit clean-up
Thoughts on detection
The version targeted within this blog was Windows 10 20H2 (OS Build 19042.508). However, this approach has been tested on all Windows versions post 19H1 when the segment pool was introduced.
Exploitation without CVE-2021-31955 information disclosure
I hinted in the previous blog post that this vulnerability could likely be exploited without the usage of the separate EPROCESS address leak vulnerability CVE-2021-31955). This was also realised too by Yan ZiShuang and documented within the blog post.
Typically, for Windows local privilege escalation, once an attacker has achieved arbitrary write or kernel code execution then the aim will be to escalate privileges for their associated userland process or pan a privileged command shell. Windows processes have an associated kernel structure called _EPROCESS which acts as the process object for that process. Within this structure, there is a Token member which represents the process’s security context and contains things such as the token privileges, token types, session id etc.
CVE-2021-31955 lead to an information disclosure of the address of the _EPROCESS for each running process on the system and was understood to be used by the in-the-wild attacks found by Kaspersky. However, in practice for exploitation of CVE-2021-31956 this separate vulnerability is not needed.
This is due to the _EPROCESS pointer being contained within the _WNF_NAME_INSTANCE as the CreatorProcess member:
Therefore, provided that it is possible to get a relative read/write primitive using a _WNF_STATE_DATA to be able to read and{write to a subsequent _WNF_NAME_INSTANCE, we can then overwrite the StateData pointer to point at an arbitrary location and also read the CreatorProcess address to obtain the address of the _EPROCESS structure within memory.
The initial pool layout we are aiming is as follows:
The difficulty with this is that due to the low fragmentation heap (LFH) randomisation, it makes reliably achieving this memory layout more difficult and iteration one of this exploit stayed away from the approach until more research was performed into improving the general reliability and reducing the chances of a BSOD.
As an example, under normal scenarios you might end up with the following allocation pattern for a number of sequentially allocated blocks:
In the absense of an LFH "Heap Randomisation" weakness or vulnerability, then this post explains how it is possible to achieve a "reasonably" high level of exploitation success and what necessary cleanups need to occur in order to maintain system stability post exploitation.
Stage 1: The Spray and Overflow
Starting from where we left off in the first article, we need to go back and rework the spray and overflow.
Firstly, our _WNF_NAME_INSTANCE is 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. As mentioned previously this gets put into a chunk of size 0xC0.
We also need to spray _WNF_STATE_DATA objects of size 0xA0 (which when added with the header 0x10 + the POOL_HEADER (0x10) we also end up with a chunk allocated of 0xC0.
As mentioned within part 1 of the article, since we can control the size of the vulnerable allocation we can also ensure that our overflowing NTFS extended attribute chunk is also allocated within the 0xC0 segment.
However, we cannot deterministically know which object will be adjacent to our vulnerable NTFS chunk (as mentioned above), we cannot take a similar approach of free’ing holes as in the past article and then reusing the resulting holes, as both the _WNF_STATE_DATA and _WNF_NAME_INSTANCE objects are allocated at the same time, and we need both present within the same pool segment.
Therefore, we need to be very careful with the overflow. We make sure that only the following fields are overflowed by 0x10 bytes (and the POOL_HEADER).
In the case of a corrupted _WNF_NAME_INSTANCE, both the Header and RunRef members will be overflowed:
As we don’t know if we are going to overflow a _WNF_NAME_INSTANCE or a _WNF_STATE_DATA first, then we can trigger the overflow and check for corruption by loop through querying each _WNF_STATE_DATA using NtQueryWnfStateData.
If we detect corruption, then we know we have identified our _WNF_STATE_DATA object. If not, then we can repeatedly trigger the spray and overflow until we have obtained a _WNF_STATE_DATA object which allows a read/write across the pool subsegment.
There are a few problems with this approach, some which can be addressed and some which there is not a perfect solution for:
We only want to corrupt _WNF_STATE_DATA objects but the pool segment also contains _WNF_NAME_INSTANCE objects due to needing to be the same size. Using only a 0x10 data size overflow and cleaning up afterwards (as described in the Kernel Memory Cleanup section) means that this issue does not cause a problem.
Occasionally our unbounded _WNF_STATA_DATA containing chunk can be allocated within the final block within the pool segment. This means that when querying with NtQueryWnfStateData an unmapped memory read will occur off the end of the page. This rarely happens in practice and increasing the spray size reduces the likelihood of this occurring (see Exploit Testing and Statistics section).
Other operating system functionality may make an allocation within the 0xC0 pool segment and lead to corruption and instability. By performing a large spray size before triggering the overflow, from practical testing, this seems to rarely happen within the test environment.
I think it’s useful to document these challenges with modern memory corruption exploitation techniques where it’s not always possible to gain 100% reliability.
Overall with 1) remediated and 2+3 only occurring very rarely, in lieu of a perfect solution we can move to the next stage.
Stage 2: Locating a _WNF_NAME_INSTANCE and overwriting the StateData pointer
Once we have unbounded our _WNF_STATE_DATA by overflowing the DataSize and AllocatedSize as described above, and within the first blog post, then we can then use the relative read to locate an adjacent _WNF_NAME_INSTANCE.
By scanning through the memory we can locate the pattern "\x03\x09\xa8" which denotes the start of a _WNF_NAME_INSTANCE and from this obtain the interesting member variables.
The CreatorProcess, StateName, StateData, ScopeInstance can be disclosed from the identified target object.
We can then use the relative write to replace the StateData pointer with an arbitrary location which is desired for our read and write primitive. For example, an offset within the _EPROCESS structure based on the address which has been obtained from CreatorProcess.
Care needs to be taken here to ensure that the new location StateData points at overlaps with sane values for the AllocatedSize, DataSize values preceding the data wishing to be read or written.
In this case the aim was to achieve a full arbitrary read and write but without having the constraints of needing to find sane and reliable AllocatedSize and DataSize values prior to the memory which it was desired to write too.
Our overall goal was to target the KTHREAD structure’s PreviousMode member and then make use of make use of the APIs NtReadVirtualMemory and NtWriteVirtualMemory to enable a more flexible arbitrary read and write.
It helps to have a good understanding of how these kernel memory structure are used to understand how this works. In a massively simplified overview, the kernel mode portion of Windows contains a number of subsystems. The hardware abstraction layer (HAL), the executive subsystems and the kernel. _EPROCESS is part of the executive layer which deals with general OS policy and operations. The kernel subsystem handles architecture specific details for low level operations and the HAL provides a abstraction layer to deal with differences between hardware.
Processes and threads are represeted at both the executive and kernel "layer" within kernel memory as _EPROCESS and _KPROCESS and _ETHREAD and _KTHREAD structures respectively.
The documentation on PreviousMode states "When a user-mode application calls the Nt or Zw version of a native system services routine, the system call mechanism traps the calling thread to kernel mode. To indicate that the parameter values originated in user mode, the trap handler for the system call sets the PreviousMode field in the thread object of the caller to UserMode. The native system services routine checks the PreviousMode field of the calling thread to determine whether the parameters are from a user-mode source."
Looking at MiReadWriteVirtualMemory which is called from NtWriteVirtualMemory we can see that if PreviousMode is not set when a user-mode thread executes, then the address validation is skipped and kernel memory space addresses can be written too:
This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.
So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.
From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.
0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080
We can check this correct (and see we hit our breakpoint in the previous article):
This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.
So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.
So we now know how to calculate the address of the `_KTHREAD` kernel data structure which is associated with our running exploit thread.
At the end of stage 2 we have the following memory layout:
Stage 3 – Abusing PreviousMode
Once we have set the StateData pointer of the _WNF_NAME_INSTANCE prior to the _KPROCESSThreadListHead Flink we can leak out the value by confusing it with the DataSize and the ChangeTimestamp, we can then calculate the FLINK as “FLINK = (uintptr_t)ChangeTimestamp << 32 | DataSize` after querying the object.
This allows us to calculate the _KTHREAD address using FLINK - 0x2f8.
Once we have the address of the _KTHREAD we need to again find a sane value to confuse with the AllocatedSize and DataSize to allow reading and writing of PreviousMode value at offset 0x232.
Allowing the most significant word of the Process pointer shown above to be used as the AllocatedSize and the UserAffinity to act as the DataSize. Incidentally, we can actually influence this value used for DataSize using SetProcessAffinityMask or launching the process with start /affinity exploit.exe but for our purposes of being able to read and write PreviousMode this is fine.
Visually this looks as follows after the StateData has been modified:
This gives a 3 byte read (and up to 0xffff900f bytes write if needed – but we only need 3 bytes), of which the PreviousMode is included (i.e set to 1 before modification):
00 00 01 00 00 00 00 00 00 00 | ..........
Using the most significant word of the pointer with it always being a kernel mode address, should ensure that this is a sufficient AllocatedSize to enable overwriting PreviousMode.
Post Exploitation
Once we have set PreviousMode to 0, as mentioned above, this now gives an unconstrained read/write across the whole kernel memory space using NtWriteVirtualMemory and NtReadVirtualMemory. This is a very powerful method and demonstrates how moving from an awkward to use arbitrary read/write to a better method which enables easier post exploitation and enhanced clean up options.
It is then trivial to walk the ActiveProcessLinks within the EPROCESS, obtain a pointer to a SYSTEM token and replace the existing token with this or to perform escalation by overwriting the _SEP_TOKEN_PRIVILEGES for the existing token using techniques which have been long used by Windows exploits.
Kernel Memory Cleanup
OK, so the above is good enough for a proof of concept exploit but due to the potentially large amount of memory writes needing to occur for exploit success, then it could leave the kernel in a bad state. Also, when the process terminates then certain memory locations which have been overwritten could trigger a BSOD when that corrupted memory is used.
This part of the exploitation process is often overlooked by proof of concept exploit writers but is often the most challenging for use in real world scenario’s (red teams / simulated attacks etc) where stability and reliability are important. Going through this process also helps understand how these types of attacks can also be detected.
This section of the blog describes some improvements which can be made in this area.
PreviousMode Restoration
On the version of Windows tested, if we try to launch a new process as SYSTEM but PreviousMode is still set to 0. Then we end up with the following crash:
More research needs to be performed to determine if this is necessary on prior versions or if this was a recently introduced change.
This can be fixed simply by using our NtWriteVirtualMemory APIs to restore the PreviousMode value to 1 before launching the cmd.exe shell.
StateData Pointer Restoration
The _WNF_STATE_DATAStateData pointer is free’d when the _WNF_NAME_INSTANCE is freed on process termination (incidentially also an arbitrary free). If this is not restored to the original value, we will end up with a crash as follows:
Although we could restore this using the WNF relative read/write, as we have arbitrary read and write using the APIs, we can implement a function which uses a previously saved ScopeInstance pointer to search for the StateName of our targeted _WNF_NAME_INSTANCE object address.
Visually this looks as follows:
Some example code for this is:
/*** This function returns back the address of a _WNF_NAME_INSTANCE looked up by its internal StateName* It performs an _RTL_AVL_TREE tree walk against the sorted tree of _WNF_NAME_INSTANCES. * The tree root is at _WNF_SCOPE_INSTANCE+0x38 (NameSet)**/QWORD*FindStateName(unsigned__int64 StateName){ QWORD* i;// _WNF_SCOPE_INSTANCE+0x38 (NameSet)for (i = (QWORD*)read64((char*)BackupScopeInstance+0x38); ; i = (QWORD*)read64((char*)i +0x8)) {while (1) {if (!i)return0;// StateName is 0x18 after the TreeLinks FLINK QWORD CurrStateName = (QWORD)read64((char*)i +0x18);if (StateName >= CurrStateName)break; i = (QWORD*)read64(i); } QWORD CurrStateName = (QWORD)read64((char*)i +0x18);if (StateName <= CurrStateName)break; }return (QWORD*)((QWORD*)i -2);}
Then once we have obtained our _WNF_NAME_INSTANCE we can then restore the original StateData pointer.
RunRef Restoration
The next crash encountered was related to the fact that we may have corrupted many RunRef from _WNF_NAME_INSTANCE‘s in the process of obtaining our unbounded _WNF_STATE_DATA. When ExReleaseRundownProtection is called and an invalid value is present, we will crash as follows:
To restore these correctly we need to think about how these objects fit together in memory and how to obtain a full list of all _WNF_NAME_INSTANCES which could possibly be corrupt.
Within _EPROCESS we have a member WnfContext which is a pointer to a _WNF_PROCESS_CONTEXT.
As you can see there is a member TemporaryNamesListHead which is a linked list of the addresses of the TemporaryNamesListHead within the _WNF_NAME_INSTANCE.
Therefore, we can calculate the address of each of the _WNF_NAME_INSTANCES by iterating through the linked list using our arbitrary read primitives.
We can then determine if the Header or RunRef has been corrupted and restore to a sane value which does not cause a BSOD (i.e. 0).
An example of this is:
/*** This function starts from the EPROCESS WnfContext which points at a _WNF_PROCESS_CONTEXT* The _WNF_PROCESS_CONTEXT contains a TemporaryNamesListHead at 0x40 offset. * This linked list is then traversed to locate all _WNF_NAME_INSTANCES and the header and RunRef fixed up.**/voidFindCorruptedRunRefs(LPVOID wnf_process_context_ptr){// +0x040 TemporaryNamesListHead : _LIST_ENTRY LPVOID first = read64((char*)wnf_process_context_ptr +0x40); LPVOID ptr; for (ptr = read64(read64((char*)wnf_process_context_ptr +0x40)); ; ptr = read64(ptr)) {if (ptr == first) return;// +0x088 TemporaryNameListEntry : _LIST_ENTRY QWORD* nameinstance = (QWORD*)ptr -17; QWORD header = (QWORD)read64(nameinstance);if (header !=0x0000000000A80903) {// Fix the header up. write64(nameinstance, 0x0000000000A80903);// Fix the RunRef up. write64((char*)nameinstance +0x8, 0); } }}
NTOSKRNL Base Address
Whilst this isn’t actually needed by the exploit, I had the need to obtain NTOSKRNL base address to speed up some examinations and debugging of the segment heap. With access to the EPROCESS/KPROCESS or ETHREAD/KTHREAD, then the NTOSKRNL base address can be obtained from the kernel stack. By putting a newly created thread into the wait state, we can then walk the kernel stack for that thread and obtain the return address of a known function. Using this and a fixed offset we can calculate the NTOSKRNL base address. A similar technique was used within KernelForge.
The following output shows the thread whilst in the wait state:
As there are some elements of instability and non-deterministic elements of this exploit, then an exploit testing framework was developed to determine the effectiveness across multiple runs and on multiple different supported platforms and by varying the exploit parameters. Whilst this lab environment is not fully representative of a long-running operating system with potentially other third party drivers etc installed and a more noisy kernel pool, it gives some indication of this approach is feasible and also feeds into possible detection mechanisms.
The key variables which can be modified with this exploit are:
Spray size
Post-exploitation choices
All these are measured over 100 iterations of the exploit (over 5 runs) for a timeout duration of 15 seconds (i.e. a BSOD did not occur within 15 seconds of an execution of the exploit).
SYSTEM shells – Number of times a SYSTEM shell was launched.
Total LFH Writes – For all 100 runs of the exploit, how many corruptions were triggered.
Avg LFH Writes – Average number of LFH overflows needed to obtain a SYSTEM shell.
Failed after 32 – How many times the exploit failed to overflow an adjacent object of the required target type, by reaching the max number of overflow attempts. 32 was chosen a semi-arbitrary value based on empirical testing and the blocks in the BlockBitmap for the LFH being scanned by groups of 32 blocks.
BSODs on exec – Number of times the exploit BSOD the box on execution.
Unmapped Read – Number of times the relative read reaches unmapped memory (ExpWnfReadStateData) – included in the BSOD on exec count above.
Spray Size Variation
The following statistics show runs when varying the spray size.
Spray size 3000
Result
Run 1
Run 2
Run 3
Run 4
Run 5
Avg
SYSTEM shells
85
82
76
75
75
78
Total LFH writes
708
726
707
678
624
688
Avg LFH writes
8
8
9
9
8
8
Failed after 32
1
3
2
1
1
2
BSODs on exec
14
15
22
24
24
20
Unmapped Read
4
5
8
6
10
7
Spray size 6000
Result
Run 1
Run 2
Run 3
Run 4
Run 5
Avg
SYSTEM shells
84
80
78
84
79
81
Total LFH writes
674
643
696
762
706
696
Avg LFH writes
8
8
9
9
8
8
Failed after 32
2
4
3
3
4
3
BSODs on exec
14
16
19
13
17
16
Unmapped Read
2
4
4
5
4
4
Spray size 10000
Result
Run 1
Run 2
Run 3
Run 4
Run 5
Avg
SYSTEM shells
84
85
87
85
86
85
Total LFH writes
805
714
761
688
694
732
Avg LFG writes
9
8
8
8
8
8
Failed after 32
3
5
3
3
3
3
BSODs on exec
13
10
10
12
11
11
Unmapped Read
1
0
1
1
0
1
Spray size 20000
Result
Run 1
Run 2
Run 3
Run 4
Run 5
Avg
SYSTEM shells
89
90
94
90
90
91
Total LFH writes
624
763
657
762
650
691
Avg LFG writes
7
8
7
8
7
7
Failed after 32
3
2
1
2
2
2
BSODs on exec
8
8
5
8
8
7
Unmapped Read
0
0
0
0
1
0
From this was can see that increasing the spray size leads to a much decreased chance of hitting an unmapped read (due to the page not being mapped) and thus reducing the number of BSODs.
On average, the number of overflows needed to obtain the correct memory layout stayed roughly the same regardless of spray size.
Post Exploitation Method Variation
I also experimented with the post exploitation method used (token stealing vs modifying the existing token). The reason for this is that performing the token stealing method there are more kernel reads/writes and a longer time duration between reverting PreviousMode.
20000 spray size
With all the _SEP_TOKEN_PRIVILEGES enabled:
Result
Run 1
Run 2
Run 3
Run 4
Run 5
Avg
PRIV shells
94
92
93
92
89
92
Total LFH writes
939
825
825
788
724
820
Avg LFG writes
9
8
8
8
8
8
Failed after 32
2
2
1
2
0
1
BSODs on exec
4
6
6
6
11
6
Unmapped Read
0
1
1
2
2
1
Therefore, there is only negligible difference these two methods.
Detection
After all of this is there anything we have learned which could help defenders?
Well firstly there is a patch out for this vulnerability since the 8th of June 2021. If your reading this and the patch is not applied, then there are obviously bigger problems with the patch management lifecycle to focus on
However, there are some engineering insights which can be gained from this and in general detecting memory corruption exploits within the wild. I will focus specifically on the vulnerability itself and this exploit, rather than the more generic post exploitation technique detection (token stealing etc) which have been covered in many online articles. As I never had access to the in the wild exploit, these detection mechanisms may not be useful for that scenario. Regardless, this research should allow security researchers a greater understanding in this area.
The main artifacts from this exploit are:
NTFS Extended Attributes being created and queried.
WNF objects being created (as part of the spray)
Failed exploit attempts leading to BSODs
NTFS Extended Attributes
Firstly, examining the ETW framework for Windows, the provider Microsoft-Windows-Kernel-File was found to expose "SetEa" and "QueryEa" events.
This can be captured as part of an ETW trace:
As this vulnerability can be exploited a low integrity (and thus from a sandbox), then the detection mechanisms would vary based on if an attacker had local code execution or chained it together with a browser exploit.
One idea for endpoint detection and response (EDR) based detection would be that a browser render process executing both of these actions (in the case of using this exploit to break out of a browser sandbox) would warrant deeper investigation. For example, whilst loading a new tab and web page, the browser process "MicrosoftEdge.exe" triggers these events legitimately under normal operation, whereas the sandboxed renderer process "MicrosoftEdgeCP.exe" does not. Chrome while loading a new tab and web page did not trigger either of the events too. I didn’t explore too deeply if there were any render operations which could trigger this non-maliciously but provides a place where defenders can explore further.
WNF Operations
The second area investigated was to determine if there were any ETW events produced by WNF based operations. Looking through the "Microsoft-Windows-Kernel-*" providers I could not find any related events which would help in this area. Therefore, detecting the spray through any ETW logging of WNF operations did not seem feasible. This was expected due to the WNF subsystem not being intended for use by non-MS code.
Crash Dump Telemetry
Crash Dumps are a very good way to detect unreliable exploitation techniques or if an exploit developer has inadvertently left their development system connected to a network. MS08-067 is a well known example of Microsoft using this to identify an 0day from their WER telemetry. This was found by looking for shellcode, however, certain crashes are pretty suspicious when coming from production releases. Apple also seem to have added telemetry to iMessage for suspicious crashes too.
In the case of this specific vulnerability when being exploited with WNF, there is a slim chance (approx. <5%) that the following BSOD can occur which could act a detection artefact:
Under normal operation you would not expect a memcpy operation to fault accessing unmapped memory when triggered by the WNF subsystem. Whilst this telemetry might lead to attack attempts being discovered prior to an attacker obtaining code execution. Once kernel code execution has been gained or SYSTEM, they may just disable the telemetry or sanitise it afterwards – especially in cases where there could be system instability post exploitation. Windows 11 looks to have added additional ETW logging with these policy settings to determine scenarios when this is modified:
This article demonstrates some of the further lengths an exploit developer needs to go to achieve more reliable and stable code execution beyond a simple POC.
At this point we now have an exploit which is much more succesful and less likely to cause instability on the target system than a simple POC. However, we can only get about 90%~ success rate due to the techniques used. This seems to be about the limit with this approach and without using alternative exploit primitives. The article also gives some examples of potential ways to identify exploitation of this vulnerability and detection of memory corruption exploits in general.
Acknowledgements
Boris Larin, for discovering this 0day being exploited within the wild and the initial write-up.
Yan ZiShuang, for performing parallel research into exploitation of this vuln and blogging about it.
Recently I decided to take a look at CVE-2021-31956, a local privilege escalation within Windows due to a kernel memory corruption bug which was patched within the June 2021 Patch Tuesday.
Microsoft describe the vulnerability within their advisory document, which notes many versions of Windows being affected and in-the-wild exploitation of the issue being used in targeted attacks. The exploit was found in the wild by https://twitter.com/oct0xor of Kaspersky.
Kaspersky produced a nice summary of the vulnerability and describe briefly how the bug was exploited in the wild.
As I did not have access to the exploit (unlike Kaspersky?), I attempted to exploit this vulnerability on Windows 10 20H2 to determine the ease of exploitation and to understand the challenges attackers face when writing a modern kernel pool exploits for Windows 10 20H2 and onwards.
One thing that stood out to me was the mention of the Windows Notification Framework (WNF) used by the in-the-wild attackers to enable novel exploit primitives. This lead to further investigation into how this could be used to aid exploitation in general. The findings I present below are obviously speculation based on likely uses of WNF by an attacker. I look forward to seeing the Kaspersky write-up to determine if my assumptions on how this feature could be leveraged are correct!
This blog post is the first in the series and will describe the vulnerability, the initial constraints from an exploit development perspective and finally how WNF can be abused to obtain a number of exploit primitives. The blogs will also cover exploit mitigation challenges encountered along the way, which make writing modern pool exploits more difficult on the most recent versions of Windows.
Future blog posts will describe improvements which can be made to an exploit to enhance reliability, stability and clean-up afterwards.
Vulnerability Summary
As there was already a nice summary produced by Kaspersky it was trivial to locate the vulnerable code inside the ntfs.sys driver’s NtfsQueryEaUserEaList function:
Basically the code above loops through each NTFS extended attribute (Ea) for a file and copies from the Ea Block into the output buffer based on the size of ea_block->EaValueLength + ea_block->EaNameLength + 9.
There is a check to ensure that the ea_block_size is less than or equal to out_buf_length - padding.
The out_buf_length is then decremented by the size of the ea_block_size and its padding.
The padding is calculated by ((ea_block_size + 3) & 0xFFFFFFFC) - ea_block_size;
This is because each Ea Block should be padded to be 32-bit aligned.
Putting some example numbers into this, lets assume the following: There are two extended attributes within the extended attributes for the file.
At the first iteration of the loop we could have the following values:
By looking at the callers for NtfsCommonQueryEa we can see that we can see that NtQueryEaFile system call path triggers this code path to reach the vulnerable code.
The documentation for the Zw version of this syscall function is here.
We can see that the output buffer Buffer is passed in from userspace, together with the Length of this buffer. This means we end up with a controlled size allocation in the kernel space based on the size of the buffer. However, to trigger this vulnerability, we need to trigger an underflow as described as above.
In order to do trigger the underflow, we need to set our output buffer size to be length of the first Ea Block.
Providing we are padding the allocation, the second Ea Block will be written out of bounds of the buffer when the second Ea Block is queried.
The interesting things from this vulnerability from an attacker perspective are:
1) The attacker can control the data which is used within the overflow and the size of the overflow. Extended attribute values do not constrain the values which they can contain. 2) The overflow is linear and will corrupt any adjacent pool chunks. 3) The attacker has control over the size of the pool chunk allocated.
However, the question is can this be exploited reliably in the presence of modern kernel pool mitigations and is this a “good” memory corruption:
So how do we construct a file containing NTFS extended attributes which will lead to the vulnerability being triggered when NtQueryEaFile is called?
The function NtSetEaFile has the Zw version documented here.
The Buffer parameter here is “a pointer to a caller-supplied, FILE_FULL_EA_INFORMATION-structured input buffer that contains the extended attribute values to be set”.
Therefore, using the values above, the first extended attribute occupies the space within the buffer between 0-18.
There is then the padding length of 2, with the second extended attribute starting at 20 offset.
The key thing here is that NextEntryOffset of the first EA block is set to the offset of the overflowing EA including the padding position (20). Then for the overflowing EA block the NextEntryOffset is set to 0 to end the chain of extended attributes being set.
This means constructing two extended attributes, where the first extended attribute block is the size in which we want to allocate our vulnerable buffer (minus the pool header). The second extended attribute block is set to the overflow data.
If we set our first extended attribute block to be exactly the size of the Length parameter passed in NtQueryEaFile then, provided there is padding, the check will be underflowed and the second extended attribute block will allow copy of an attacker-controlled size.
So in summary, once the extended attributes have been written to the file using NtSetEaFile. It is then necessary to trigger the vulnerable code path to act on them by setting the outbuffer size to be exactly the same size as our first extended attribute using NtQueryEaFile.
Understanding the kernel pool layout on Windows 10
The next thing we need to understand is how kernel pool memory works. There is plenty of older material on kernel pool exploitation on older versions of Windows, however, not very much on recent versions of Windows 10 (19H1 and up). There has been significant changes with bringing userland Segment Heap concepts to the Windows kernel pool. I highly recommend reading Scoop the Windows 10 Pool! by Corentin Bayet and Paul Fariello from Synacktiv for a brilliant paper on this and proposing some initial techniques. Without this paper being published already, exploitation of this issue would have been significantly harder.
Firstly the important thing to understand is to determine where in memory the vulnerable pool chunk is allocated and what the surrounding memory looks like. We determine what heap structure in which the chunk lives on from the four “backends”:
Low Fragmentation Heap (LFH)
Variable Size Heap (VS)
Segment Allocation
Large Alloc
I started off using the NtQueryEaFile parameter Length value above of 0x12 to end up with a vulnerable chunk of sized 0x30 allocated on the LFH as follows:
This is due to the size of the allocation fitting being below 0x200.
We can step through the corruption of the adjacent chunk occurring by settings a conditional breakpoint on the following location:
bp Ntfs!NtfsQueryEaUserEaList "j @r12 != 0x180 & @r12 != 0x10c & @r12 != 0x40 '';'gc'" then breakpointing on the memcpy location.
This example ignores some common sizes which are often hit on 20H2, as this code path is used by the system often under normal operation.
It should be mentioned that I initially missed the fact that the attacker has good control over the size of the pool chunk initially and therefore went down the path of constraining myself to an expected chunk size of 0x30. This constraint was not actually true, however, demonstrates that even with more restricted attacker constraints these can often be worked around and that you should always try to understand the constraints of your bug fully before jumping into exploitation
By analyzing the vulnerable NtFE allocation, we can see we have the following memory layout:
This means that chunk size calculation will be, 0x12 + 0x10 = 0x22, with this then being rounded up to the 0x30 segment chunk size.
We can however also adjust both the size of the allocation and the amount of data we will overflow.
As an alternative example, using the following values overflows from a chunk of 0x70 into the adjacent pool chunk (debug output is taken from testing code):
NtCreateFile is located at 0x773c2f20 in ntdll.dll
RtlDosPathNameToNtPathNameN is located at 0x773a1bc0 in ntdll.dll
NtSetEaFile is located at 0x773c42e0 in ntdll.dll
NtQueryEaFile is located at 0x773c3e20 in ntdll.dll
WriteEaOverflow EaBuffer1->NextEntryOffset is 96
WriteEaOverflow EaLength1 is 94
WriteEaOverflow EaLength2 is 59
WriteEaOverflow Padding is 2
WriteEaOverflow ea_total is 155
NtSetEaFileN sucess
output_buf_size is 94
GetEa2 pad is 1
GetEa2 Ea1->NextEntryOffset is 12
GetEa2 EaListLength is 31
GetEa2 out_buf_length is 94
This ends up being allocated within a 0x70 byte chunk:
As you can see it is therefore possible to influence the size of the vulnerable chunk.
At this point, we need to determine if it is possible to allocate adjacent chunks of a useful size class which can be overflowed into, to gain exploit primitives, as well as how to manipulate the paged pool to control the layout of these allocations (feng shui).
Much less has been written on Windows Paged Pool manipulation than Non-Paged pool and to our knowledge nothing at all has been publicly written about using WNF structures for exploitation primitives so far.
WNF Introduction
The Windows Notification Facitily is a notification system within Windows which implements a publisher/subscriber model for delivering notifications.
Great previous research has been performed by Alex Ionescu and Gabrielle Viala documenting how this feature works and is designed.
I don’t want to duplicate the background here, so I recommend reading the following documents first to get up to speed:
Having a good grounding in the above research will allow a better understanding of how WNF related structures used by Windows.
Controlled Paged Pool Allocation
One of the first important things for kernel pool exploitation is being able to control the state of the kernel pool to be able to obtain a memory layout desired by the attacker.
There has been plenty of previous research into non-paged pool and the session pool, however, less from a paged pool perspective. As this overflow is occurring within the paged pool, then we need to find exploit primitives allocated within this pool.
Now after some reversing of WNF, it was determined that the majority of allocations used within this feature use memory from the paged pool.
I started off by looking through the primary structures associated with this feature and what could be controlled from userland.
One of the first things which stood out to me was that the actual data used for notifications is stored after the following structure:
Looking at the function NtUpdateWnfStateData we can see that this can be used for controlled size allocations within the paged pool, and can be used to store arbitrary data.
The following allocation occurs within ExpWnfWriteStateData, which is called from NtUpdateWnfStateData:
This is useful for filling the pool with data of a controlled size and data, and we continue our investigation of the WNF feature.
Controlled Free
The next thing which would be useful from an exploit perspective would be the ability to free WNF chunks on demand within the paged pool.
There’s also an API call which does this called NtDeleteWnfStateData, which calls into ExpWnfDeleteStateData in turn ends up free’ing our allocation.
Whilst researching this area, I was able to reuse the free’d chunk straight away with a new allocation. More investigation is needed to determine if the LFH makes use of delayed free lists as in my case from empirical testing, then I did not seem to be hitting this after a large spray of Wnf chunks.
Relative Memory Read
Now we have the ability to perform both a controlled allocation and free, but what about the data, itself and can we do anything useful with it?
Well, looking back at the structure, you may well have spotted that the AllocatedSize and DataSize are contained within it:
The DataSize is to denote the size of the actual data following the structure within memory and is used for bounds checking within the NtQueryWnfStateData function. The actual memory copy operation takes place in the function ExpWnfReadStateData:
At this point there are many interesting things which can be leaked out, especially considering that the both the NTFS vulnerable chunk and the WNF chunk can be positioned with other interesting objects. Items such as the ProcessBilled field can also be leaked using this technique.
We can also use the ChangeStamp value to determine which of our objects is corrupted when spraying the pool with _WNF_STATE_DATA objects.
Taking a look at the NtUpdateWnfStateData function, we end up with an interesting call: ExpWnfWriteStateData((__int64)nameInstance, InputBuffer, Length, MatchingChangeStamp, CheckStamp);. Below shows some of the contents of the ExpWnfWriteStateData function:
We can see that if we corrupt the AllocatedSize, represented by v12[1] in the code above, so that it is bigger than the actual size of the data, then the existing allocation will be used and a memcpy operation will corrupt further memory.
So at this point its worth noting that the relative write has not really given us anything more than we had already with the NTFS overflow. However, as the data can be both read and written back using this technique then it opens up the ability to read data, modify certain parts of it and write it back.
_POOL_HEADER BlockSize Corruption to Arbitrary Read using Pipe Attributes
As mentioned previously, when I first started investigating this vulnerability, I was under the impression that the pool chunk needed to be very small in order to trigger the underflow, but this wrong assumption lead to me trying to pivot to pool chunks of a more interesting variety. By default, within the 0x30 chunk segment alone, I could not find any interesting objects which could be used to achieve arbitrary read.
Therefore my approach was to use the NTFS overflow to corrupt the BlockSize of a 0x30 sized chunk WNF _POOL_HEADER.
By ensuring that the PoolQuota bit of the PoolType is not set, we can avoid any integrity checks for when the chunk is freed.
By setting the BlockSize to a different size, once the chunk is free’d using our controlled free, we can force the chunks address to be stored within the wrong lookaside list for the size.
Then we can reallocate another object of a different size, matching the size we used when corrupting the chunk now placed on that lookaside list, to take the place of this object.
Finally, we can then trigger corruption again and therefore corrupt our more interesting object.
Initially I demonstrated this being possible using another WNF chunk of size 0x220:
As PipeAttribute chunks are also a controllable size and allocated on the paged pool, it is possible to place one adjacent to either a vulnerable NTFS chunk or a WNF chunk which allows relative write’s.
Using this layout we can corrupt the PipeAttribute‘s Flink pointer and point this back to a fake pipe attribute as described in the paper above. Please refer back to that paper for more detailed information on the technique.
Diagramatically we end up with the following memory layout for the arbitrary read part:
Whilst this worked and provided a nice reliable arbitrary read primitive, the original aim was to explore WNF more to determine how an attacker may have leveraged it.
The journey to arbitrary write
After taking a step back after this minor Pipe Attribute detour and with the realisation that I could actually control the size of the vulnerable NTFS chunks. I started to investigate if it was possible to corrupt the StateData pointer of a _WNF_NAME_INSTANCE structure. Using this, so long as the DataSize and AllocatedSize could be aligned to sane values in the target area in which the overwrite was to occur in, then the bounds checking within the ExpWnfWriteStateData would be successful.
Looking at the creation of the _WNF_NAME_INSTANCE we can see that it will be of size 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. This ends up being put into a chunk of 0xC0 within the segment pool:
We can perform a spray as before using any size of _WNF_STATE_DATA which will lead to a _WNF_NAME_INSTANCE instance being allocated for each _WNF_STATE_DATA created.
Therefore can end up with our desired memory layout with a _WNF_NAME_INSTANCE adjacent to our overflowing NTFS chunk, as follows:
I also made use of CVE-2021-31955 as a quick way to get hold of an EPROCESS address. At this was used within the in the wild exploit. However, with the primitives and flexibility of this overflow, it is expected that this would likely not be needed and this could also be exploited at low integrity.
There are still some challenges here though, and it is not as simple as just overwriting the StateName with a value which you would like to look up.
StateName Corruption
For a successful StateName lookup, the internal state name needs to match the external name queried from.
At this stage it is worth going into the StateName lookup process in more depth.
The key thing to realise here is that whilst Version, LifeTime, DataScope and Sequence are controlled, the Sequence number for WnfTemporaryStateName state names is stored in a global.
As you can see from the below, based on the DataScope the current server Silo Globals or the Server Silo Globals are offset into to obtain v10 and then this used as the Sequence which is incremented by 1 each time.
i[3] in this case is actually the StateName of a _WNF_NAME_INSTANCE structure, as this is outside of the _RTL_BALANCED_NODE rooted off the NameSet member of a _WNF_SCOPE_INSTANCE structure.
Each of the _WNF_NAME_INSTANCE are joined together with the TreeLinks element. Therefore the tree traversal code above walks the AVL tree and uses it to find the correct StateName.
One challenge from a memory corruption perspective is that whilst you can determine the external and internal StateName‘s of the objects which have been heap sprayed, you don’t necessarily know which of the objects will be adjacent to the NTFS chunk which is being overflowed.
However, with careful crafting of the pool overflow, we can guess the appropriate value to set the _WNF_NAME_INSTANCE structure’s StateName to be.
It is also possible to construct your own AVL tree by corrupting the TreeLinks pointers, however, the main caveat with that is that care needs to be taken to avoid safe unlinking protection occurring.
As we can see from Windows Mitigations, Microsoft has implemented a significant number of mitigations to make heap and pool exploitation more difficult.
In a future blog post I will discuss in depth how this affects this specific exploit and what clean-up is necessary.
Security Descriptor
One other challenge I ran into whilst developing this exploit was due the security descriptor.
Initially I set this to be the address of a security descriptor within userland, which was used in NtCreateWnfStateName.
Performing some comparisons between an unmodified security descriptor within kernel space and the one in userspace demonstrated that these were different.
I then attempted to provide the fake the security descriptor with the same values. This didn’t work as expected and NtUpdateWnfStateData was still returning permission denied (-1073741790).
After experimenting some more, patching up a fake security descriptor with the following values worked and the data was successfully written to my arbitrary location:
Initially when testing out the arbitrary write, I was expecting that when I set the StateData pointer to be 0x6161616161616161 a kernel crash near the memcpy location. However, in practice the execution of ExpWnfWriteStateData was found to be performed in a worker thread. When an access violation occurs, this is caught and the NT status -1073741819 which is STATUS_ACCESS_VIOLATION is propagated back to userland. This made initial debugging more challenging, as the code around that function was a significantly hot path and with conditional breakpoints lead to a huge program standstill.
Anyhow, typically after achieving an arbitrary write an attacker will either leverage to perform a data-only based privilege escalation or to achieve arbitrary code execution.
As we are using CVE-2021-31955 for the EPROCESS address leak we continue our research down this path.
To recap, the following steps were needing to be taken:
1) The internal StateName matched up with the correct internal StateName so the correct external StateName can be found when required. 2) The Security Descriptor passing the checks in ExpWnfCheckCallerAccess. 3) The offsets of DataSize and AllocSize being appropriate for the area of memory desired.
So in summary we have the following memory layout after the overflow has occurred and the EPROCESS being treated as a _WNF_STATE_DATA:
We can then demonstrate corrupting the EPROCESS struct:
These approaches and pros and cons have been discussed previously by EDG team members whilst exploiting a vulnerability in KTM.
The next stage will be discussed within a follow-up blog post as there are still some challenges to face before reliable privilege escalation is achieved.
Summary
In summary we have described more about the vulnerability and how it can be triggered. We have seen how WNF can be leveraged to enable a novel set of exploit primitive. That is all for now in part 1! In the next blog I will cover reliability improvements, kernel memory clean up and continuation.