Normal view

There are new articles available, click to refresh the page.
Before yesterdayJust Another Blog

CVE-2022-26809 Reaching Vulnerable Point starting from 0 Knowledge on RPC

By: s1ckb017
17 June 2022 at 00:00

Lately, along to malware analisys activity I started to study/test Windows to understand something more of its internals. The CVE here analyzed, has been a good opportunity to play with RPC and learn new funny things I never touched before and moreover it looked challenging enough to spent time.

This blogpost shows my roadmap to understand and reproduce the vulnerability, the analisys has the purpose just to arrive to a PoC of the vulnerability not to understand every bit of the RPC implementation neither to write an exploit for it.
RPC, for my basic knowlege, was just a way to call procedure remotely, e.g. client wants to execute a procedure in a server and get the result just like a syscall between user space and server space. This method allows in SW design to decouple goals and purpose and often lead to a good segregation of the permission. Despite this highly level, the RPC represented just another attack vector. The RPC protocol is implemented on top of various medium or transport protocols, e.g. pipes, UDP, TCP.
The vulnerability, as reported by Microsoft is an RCE on Remote Procedure Call. It’s not specified where is the bug, neither how to trigger it, so it is required to obtain the patch and diff it with a vulnerable version of the main library that implements the RPC, i.e. rpcrt4.dll.

Below is shown the exactly things I did to arrive to the vulnerable point, so the blogpost is more about finding a way without having complete knowledge of the analyzed object. Indeed the information I got about the RPC internals are shown here in a logic-time order so they are not represented in a structured way, as is usually done at the end of an analisys.

Search for the vulnerability

Before starting with the usual binary diff, it’s required to get the right windows version in order to get the vulnerble library. I was lucky, my FLARE mount Windows 10 version 21H2 that is vulnerable according to the cpe published by the NIST. Before patching the systems it’s required to download the windows symbols (PDB) just to have the Windows symbols in Ghidra.

Symbols Download

The symbols could be downloaded with the following commands, it’s required to have installed windows kits:

    cd "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\"
    .\symchk.exe  /s srv*c:\SYMBOLS*https://msdl.microsoft.com/download/symbols C:\Windows\System32\*.dll  

After, downloaded the symbols I opened the library rpcrt4.dll, that is supposed to be vulnerable, in Ghidra then resolved symbols loading the right PDB and exported the Binary for Bindiff.

The hash of my rpcrt4.dll library is: b35fdb8d452e39cdf4393c09530837eff01d33c7

Since my windows version is:

Vulnerable Windows Version

The patch to download, obtained from Microsoft, is the following:

Windows CVE-2022-26809 Patch

After applying the patch the system contains a new rpcrt4.dll having SHA1: d78a9d416a1187da8550fb0d5a4bace48cfa8179

Windows rpcrt4.dll after patch


In order to load into ghidra the patched library has been necessary to download again the symbols and import the new PDB. After this operation, the binaries, i.e. the vulnerable and the patched one, are exported for bindiff and a binary diff is executed on them.

In the following image it’s clear that the patch introduced new functions and some other are not available anymore.
Diffing Main Info

Indeed, only 97% of similarity with ~640 unmatched functions. Since I am not an expert of the RPC internals, I started investigate the differencies starting from the functions that have an high percent of similarity tending to 100%.

Low diff functions

I was lucky, the fix seems to be in the ProcessReceivedPDU() routine, indeed seems that in the new rpcrt4.dll version has been added a function to check the sum between two elements.


So, seems that an integer overflow was the problem in the library.
Now, it’s required to find the answer to some questions:

  1. Is the fix introduced in other functions?
  2. How the bug could be reached, i.e. what we need to raise it?
  3. How to gain command execution?

The Patch

The fix has been introduced calling a function,i.e. UIntAdd(), to sum two items and check if integer overflow happened. The routine is very basic and it was present already in the vulnerable library but never called.


The fix, as can be shown by call reference has been applied to multiple routines but in the vulnerable code the routine is never used.

UIntAdd() call refs.

At this point, due to multiple usage of UIntAdd() introduced I could assume that the bug was an integer overflow but I was not sure about which routine is effectively reachable in a easy way for a newbe like me.

UIntAdd() call refs.

UIntAdd() call refs.

UIntAdd() call refs.

Search the way to reach the vulnerable point

Since my knowledge in RPC internals tending to 0 then for me the most easier approach was to build an RPC Client/Server example and sets some breakpoint on first instruction of the routines that contain the fix:

  1. void OSF_CCONNECTION: OSF_CCALL::ProcessReceivedPDU(OSF_CCALL *this,void *param_1,int param_2) - offset: BaseLibrary + 0x3ac7c
  2. long OSF_CCALL::GetCoalescedBuffer(OSF_CCALL *this,_RPC_MESSAGE *param_1) - offset BaseLibrary + 0xad10c
  3. long OSF_CCALL::ProcessResponse(OSF_CCALL *this,rpcconn_response *param_1,_RPC_MESSAGE *param_2,int *param_3) - offset: BaseLibrary + 0xae3f8
  4. long OSF_SCALL::GetCoalescedBuffer(OSF_SCALL *this,_RPC_MESSAGE *param_1,int param_2) - offset: BaseLibrary + 0xb35fc

The first example I run to check if the vulnerable code is reached in some way is a basic RPC client server with a Windows NT authentication and multiple kind of protocol selectable by arguments, the example used in this phase is visible at: Github

The github link above contains a basic server client RPC with customizable endpoint/protocol selection and basic authentication WINNT just on the connection, RPC_C_AUTHN_LEVEL_CONNECT.

The supported protocols are:

  1. ncacn_np - named pipes are the medium Doc
  2. ncacn_ip_tcp - TCP/IP is the stack on which the RPC messages are sent, indeed the endpoint is a server and port Doc
  3. ncacn_http - IIS is the protocol family, the endpoint is specified with just a port number Doc
  4. ncadg_ip_udp - UDP/IP is the protocol stack on which RPC messages are sent, this is obsolete.
  5. ncalrpc - the protocol is the local interprocess communication, the endpoint is specified with a string at most 53 bytes long Doc

In my initial tests, I run the example, RPCServerTest.exe 1 5000 using the TCP/IP as protocol in order to inspect packets with wireshark. The RPC is based on some transport protocol that can be different and on top there is of course a common protocol used to call remote procedure, i.e. passing parameters serialized and choosing the routine to execute.

So, I run the example via x64dbg and put the breakpoints in the main functions that were patched, using the following script:

$base_rpcrt4 = rpcrt4:base

$addr = $base_rpcrt4 + 0x3ac7c
lblset $addr, "OSF_SCALL::ProcessReceivedPDU_start"
bp $addr
log "Put BP on {addr} "

$addr = $base_rpcrt4 + 0xad10c
lblset $addr, "OSF_CCALL::GetCoalescedBuffer_start"
bp $addr
log "Put BP on {addr} "

$addr = $base_rpcrt4 + 0xae3f8
lblset $addr, "OSF_CCALL::ProcessResponse_start"
bp $addr
log "Put BP on {addr} "

$addr = $base_rpcrt4 + 0xb35fc
lblset $addr, "OSF_SCALL::GetCoalescedBuffer_start"
bp $addr
log "Put BP on {addr} "

Running the client, RPCClientTest.exe 1 5000, the execution stops at OSF_SCALL::ProcessReceivedPDU().


The only vulnerable function touched with the previous test is: OSF_SCALL::ProcessReceivedPDU() and it is reached when the client ask for executing the hello procedure. Now it’s required to understand what is passed as argument to the function as arguments, just by the name it’s possible to assume that the routine is called to process the Protocol Data Unit received.


In order to understand how the function is called and so which are the parameters passed it’s required to analize the functions in the stack trace. Below is shown how in the previous test is reached the vulnerable routine.

OSF_SCALL::ProcessReceivedPDU(OSF_SCALL *param_1,OSF_SCALL **param_2,byte *param_3,int param_4)
rpcrt4.dll + 0x3ac14 #  OSF_SCALL::BeginRpcCall(longlong *param_1,OSF_SCALL **param_2,OSF_SCALL **param_3)
rpcrt4.dll + 0x3d2ae #  void OSF_SCONNECTION::ProcessReceiveComplete(longlong *param_1,longlong *param_2,OSF_SCALL **param_3,ulonglong param_4)
rpcrt4.dll + 4c99c   #  void DispatchIOHelper(LOADABLE_TRANSPORT *param_1,int param_2,uint param_3,void *param_4, uint param_5,OSF_SCALL **param_6,void *param_7)
other libraries


The param1 of the OSF_SCALL::ProcessReceivedPDU() should be the class instance that maintain the status of the connection. The param2 of the OSF_SCALL::ProcessReceivedPDU() is just the tcp payload received from the client. The param3 of the OSF_SCALL::ProcessReceivedPDU() could be the tcp len field or the fragment len field, in any case seems that both are each time equals. The param4 of the OSF_SCALL::ProcessReceivedPDU() at first look should be something related the auth method choosen.

Unfortunately, seems that both param2 and param3 are passed directly to DispatchIOHelper() and as shown in the stack trace the dispatchIOHelper() is reached probably in some asynchronous way, maybe a thread wait for a messages from clients and start the dispatcher on every message received.

Before exploring the code I created a .h file containing the DCE RPC data structure in order to have a better view on ghidra.

 typedef struct
    char version_major;
    char version_minor;
    char pkt_type;
    char pkt_flags;
    unsigned int data_repres;
    unsigned short fragment_len;
    unsigned short auth_len;
    unsigned int call_id;
    unsigned int alloc_hint;
    unsigned short cxt_id;
    unsigned short opnum;
    char data[256]; // it should be .fragment_len - 24 bytes long
} dce_rpc_t;

Let’s explore the code to understand better what can be done and under which constraints.

The vulnerable code is the following:

In order to reach that part of the code is required to pass some check. It’s required to take this branch, this is taken every time. I don’t care why at the moment.

The following branches should not be taken, so the packet sent should be a REQUEST packet in order to avoid those branches.

It’s required to enter branch at 115 line, this is taken every time, maybe due to authentication.

The branch at line 117 in my tests is never taken.

The branch at line 119 should not be taken otherwise the data sent are overwritten, in order to not enter it’s required that the packet flags is not negative:

    8th bit the highest: Object <-- must be 0
    7th bit            : Maybe
    6th bit            : Did not execute
    5th bit            : Multiplex
    4th bit            : Reserved
    3th bit            : Cancel
    2th bit            : Last fragment
    1th bit            : First fragment

local_res18 is negative only if the highest bit is set, so in order to avoid that branch it’s required to not set Object flag.

It’s required to enter in the branch at line 128 so the packet sent should not be the first fragment and so it should not have the first fragment flag enabled. Since my tests until now have been conducted with this code Github, I decided to implement a client using Impacket library it is found here: Github.

Indeed with the initial test written in C, the al register, BP on base library + 0x3ad0e, containing the packet flags is set to 0x03, i.e. FIRST_FRAG & LAST_FRAG

On the second packet received with the impacket client al register contains 0 because the middle packet is not the first and neither the last of course.

The branch at line 130 is taken anytime, but the one at line 134 is never taken with any client. Not digged inside conditions that I did not need to bypass*

So, it’s required to force the following conditions :

if ((*(int *)(this + 0x244) == 0) || (*(int *)(this + 0x1cc) != 0))

To avoid the branch this+0x244 should be not 0 and this + 0x1cc should be 0. this+0x244 is set to 1 in the function void OSF_SCALL::BeginRpcCall(longlong *param_1,dce_rpc_t *tcp_payl,OSF_SCALL **param_3):baselibrary+0x8e806.

I focused on the *plVar3 + 0x38 value because *this +0x244 is zeroed every time in ActivateCall(), called from BeginRpcCall().

OSF_SCONNECTION::LookupBinding() just walks a list until it finds the second parameter that corresponds to context id passed in the dce rpc packet.

OSF_SBINDING * __thiscall OSF_SCONNECTION::LookupBinding(OSF_SCONNECTION *this,ushort param_1)

  SIMPLE_DICT *this_00;
  ulonglong uVar2;
  uint local_res8 [8];
  local_res8[0] = 0;
  this_00 = (SIMPLE_DICT *)(this + 0x80);
  uVar2 = (ulonglong)param_1;
  do {
    pOVar1 = (OSF_SBINDING *)SIMPLE_DICT::Next(this_00,local_res8);
    if (pOVar1 == (OSF_SBINDING *)0x0) {
      return (OSF_SBINDING *)0x0;
  } while (*(int *)(pOVar1 + 8) != (int)uVar2);
  return pOVar1;

So it gets every item from the list, stored at OSF_CONNECTION instance + 0x80.

void * __thiscall SIMPLE_DICT::Next(SIMPLE_DICT *this,uint *param_1)

  void *pvVar1;
  uint uVar2;
  ulonglong uVar3;
  uVar2 = *param_1;
  if (uVar2 < *(uint *)(this + 8)) {
    do {
      uVar3 = (ulonglong)uVar2;
      uVar2 = uVar2 + 1;
      pvVar1 = *(void **)(*(longlong *)this + uVar3 * 8);
      *param_1 = uVar2;
      if (pvVar1 != (void *)0x0) {
        return pvVar1;
    } while (uVar2 < *(uint *)(this + 8));
  *param_1 = 0;
  return (void *)0x0;

The routine that walks the dictionary shows the dictionary implementation, basically it contains the item number at dictionary+0x8 and every item is long just 8 bytes,indeed it is a memory address. The second parameter passed to the Next() routine is just an memory address pointing to a value used to check if the right item is found during the walk. So at the end the vulnerable code is reached if and only if the address returned from the lookup, i.e. *plVar3 + 0x38 has second bit set

      plVar3 = (longlong *)
               OSF_SCONNECTION::LookupBinding((OSF_SCONNECTION *)param_1[0x26],tcp_payl->cxt_id); // get the item
      param_1[0x27] = (longlong)plVar3;
      if (plVar3 != (longlong *)0x0) {
        if ((*(byte *)(*plVar3 + 0x38) & 2) != 0) {
                    /* enter here in order to make param1 + 0x244 != 0 */
          *(undefined4 *)((longlong)param_1 + 0x244) = 1;

The structures links starting from the connection instance structure to the structure containing the flag checked that could lead us to set instance(OSF_SCONNECTION) + 0x244 to 1 is shown below.

typedef struct
  offset 0x130: dictionary_t *dictionary;
  offset 0x244: uint32_t flag; // must be 1 to enter in vulnerable code block in ProcessReceivedPDU() 

typedef struct
  offset 0x80: items_list_t *itemslist;

typedef struct
  offset 0x0: item_t * items[items_no]
  offset 0x8: uint32_t items_no;

typedef struct
  offset 0x0: flags_check_t *flags;

typedef struct
  offset 0x38: uint32_t flag_checked 

At this point it’s important to understand where it creates the dictionary value and where the value is set. I found that the dictionary address changes every connect. So it should be created on the client connection.

To find where the dictionary is assigned to this[0x26], i.e. in the OSF_SCONNECTION instance, I placed a breakpoint on write on the memory address used to store the dictionary address, i.e. this[0x26].

This unveils where the dictionary is inserted into the instance object.

Seems that it is created at: rpcrt4.dll+0x3668f:OSF_SCALL::OSF_SCALL()

Below the stack trace:

rpcrt4.dll+0xd8e55:WS_NewConnection(CO_ADDRESS *param_1,BASE_CONNECTION **param_2)

Since, LookupBinding(dictionary, item), searches the item starting from the param1+0x80 and since there is just one client that is connecting.

Below is shown in the debugger the link that starts from the connection instance to the flag contained in the item returned from lookup.

The dictionary is filled with an item already instantiated, i.e. with flag already set, in the OSF_SCONNECTION::ProcessPContextList() routine. Basically, this routine is reached from with the following stack:

rpcrt4.dll + 0x3e4f5: SIMPLE_DICT::Insert() 
rpcrt4.dll + 0x39473: OSF_SCONNECTION::ProcessPContextList()
rpcrt4.dll + 0x38b31: OSF_SCONNECTION::AssociationRequested()
rpcrt4.dll + 0x3d3b6: OSF_SCONNECTION::processReceiveComplete()

OSF_SCONNECTION::processReceiveComplete() is called every time arrive a packet from the client. The S in front of the namespace means that the routine is used by the RPC servers.

This routine, OSF_SCONNECTION::ProcessPContextList(), store the context id in a new dictionary item, from this routine could be found the exact structure of the dictionary items at instance(OSF_SCONNECTION)[0x26]. Unfortunately the item, i.e. the structure that contains the flag to force is not initialized on every connection.

Indeed I added a break point on the flag memory address to track the instructions overwriting it. I would expect some free/alloc on every connection just like the dictionary, but this never happened. This led me to think that it should be created during RPC Server initialization.

The item address, i.e. the object that contains the flag to force, is retrieved and set in the dictionary at:

0x57b54 RPC_SERVER::FindInterfaceTransfer()

offset: 0x57a6c | mov rsi,qword ptr ds:[GlobalRpcServer]          ; RSI will contain the address of GlobalRPCServer
offset: 0x57aae | mov rdx,qword ptr ds:[rsi+120]                  ; most likely address of items                
offset: 0x57ab9 | mov rdi,qword ptr ds:[rdx+rax*8]                ; get the ith item       
offset: 0x57b30 | call RPC_INTERFACE::SelectTransferSyntax        ; check if the connection match with the interface                         
offset: 0x57b35 | test eax,eax                                    ; if eax == 0 return, interface/item found! 
offset: 0x57b54 | mov qword ptr ds:[rax],rdi                      ; Address of the item with the flag to force is returned writing it in *rax 

Basically, RPC_SERVER::FindInterfaceTransfer() is called by ProcessPContextList() and it is executed most likely to find the RPC interface according to the information sent by the client for example, uuid value.

Below the debugger view on 0x57b54.

Since GlobalRpcServer is fixed address, and it contains the address of the struct that contain the items, I just looked at the write reference on ghidra and found where GlobalRpcServer is filled with the struct address.

wchar_t ** InitializeRpcServer(undefined8 param_1,uchar **param_2,SIZE_T param_3)
  ppwVar24 = (wchar_t **)0x0;
  local_res8[0]._0_4_ = 0;
  ppwVar19 = ppwVar24;
  if (GlobalRpcServer == (RPC_SERVER *)0x0) {
    this = (RPC_SERVER *)AllocWrapper(0x1f0,param_2,param_3);
    ppwVar5 = ppwVar24;
    if (this != (RPC_SERVER *)0x0) {
      param_2 = local_res8;
      ppwVar5 = (wchar_t **)RPC_SERVER::RPC_SERVER(this,(long *)param_2);
      ppwVar19 = (wchar_t **)(ulonglong)(uint)local_res8[0];
    if (ppwVar5 == (wchar_t **)0x0) {
      GlobalRpcServer = (RPC_SERVER *)ppwVar5; 
      return (wchar_t **)0xe;
    GlobalRpcServer = (RPC_SERVER *)ppwVar5; // OFFSET: 0xb3d2; here the global rpc server is set to a new fresh memory.

The routine InitializeRpcServer() is called from a RpcServerUseProtseqEpW() that is directly called by the RPC server! As shown in the debugger view, *GlobalRpcServer + 0x120 is already filled but *(*GlobalRpcServer + 0x120) is equal to NULL.

Setting a breakpoint on memory write of: *(*GlobalRpcServer + 0x120) , I found where the address of the item is set! Below is shown the stack trace:

rpcrt4.dll + 0x3e4f9 at SIMPLE_DICT::Insert() 
rpcrt4.dll + 0xd498  at RPC_INTERFACE * RPC_SERVER::FindOrCreateInterfaceInternal()
rpcrt4.dll + 0xd196  at void RPC_SERVER::RegisterInterface() 
rpcrt4.dll + 0x7271e at void RpcServerRegisterIf()
server rpc function 

From the memory contents shown below:

The *item + 0x38 appears to be NULL, let’s set a breakpoint to that memory address. To resume up, the flag to force is reachable from:

typedef struct
  rpc_interfaces_t **addresses;

typedef struct
  offset 0x120:  item_t **items

typedef struct
  offset 0x38:  uint32_t flag_checked // Flag check in BeginRpcCall()

It was clear that our flag to force has not already set so I set up another write breakpoint on *item + 0x38 to find where it is set! So, I found the point where the flag is modified, as visibile in the debugger.

The code that modifies the flag is reached from this stack trace.

rpcrt4.dll + 0xd30e : RPC_INTERFACE::RegisterTypeManager() 
rpcrt4.dll + 0xd1bf : void RPC_SERVER::RegisterInterface() 
rpcrt4.dll + 0x7271e: void RpcServerRegisterIf() 
RPC server functions

void RpcServerRegisterIf(uint *param_1,uint *uuid,SIZE_T param_3)

              (GlobalRpcServer,param_1,uuid,param_3,0,0x4d2,gMaxRpcSize,(FuncDef2 *)0x0,
               (ushort **)0x0,(RPCP_INTERFACE_GROUP *)0x0);


void RPC_SERVER::RegisterInterface
               (RPC_SERVER *param_1,uint *param_2,uint *param_3,SIZE_T param_4,uint param_5,
               uint param_6,uint param_7,FuncDef2 *param_8,ushort **param_9,
               RPCP_INTERFACE_GROUP *param_10)

  if (param_3 != (uint *)0x0) {
    local_80 = param_3;
  local_58 = ZEXT816(0);
  local_78 = param_2;
  local_60 = param_3;
  puVar6 = RPC_INTERFACE::RegisterTypeManager(pRVar5,local_60,param_4);

RPC_INTERFACE::RegisterTypeManager(RPC_INTERFACE *param_1,undefined4 *param_2,SIZE_T param_3)
  if ((param_2 == (undefined4 *)0x0) ||
     (iVar6 = RPC_UUID::IsNullUuid((RPC_UUID *)param_2), iVar6 != 0)) {
    if ((*(uint *)(param_1 + 0x38) & 1) == 0) {
      *(int *)(param_1 + 200) = *(int *)(param_1 + 200) + 1;
      *(uint *)(param_1 + 0x38) = *(uint *)(param_1 + 0x38) | 1; // SET to 1 the flag!
      *(SIZE_T *)(param_1 + 0x40) = param_3;
      return (undefined4 *)0x0;
    puVar11 = (undefined4 *)0x6b0;

Looking at the stack trace and the code, it’s clear that the RPC server implementation called with RpcServerRegisterIf() without specifying any MgrTypeUuid. At this point it was most likely that I could, from client side, alter in some way that flag that trigger the vulnerable code.

Anyway, backing to the BeginRpcCall(), in order to execute vulnerable code it’s required that the item_t flag value has the second bit set. Since that value is never written, I could not use the debugger anymore to find where the second bit’s flag is set. So, I searched for instruction patterns basing on the information I got:

  1. The value is bitmask flag, this was clear because the values are checked with TEST instruction and because the value, i.e. 1 has been set with an or operator.
  2. The item_t is a kind of structure, most likely the access to the flag member more or less the same, i.e. [register + 0x38]
  3. Flag is 4 byte long

According to these information I searched for instruction like: or dword ptr [anyregs+0x38], 2

          (RPC_INTERFACE *this,_RPC_SERVER_INTERFACE *param_1,RPC_SERVER *param_2,uint param_3,
          uint param_4,uint param_5,FuncDef2 *param_6,void *param_7,long *param_8,
          RPCP_INTERFACE_GROUP *param_9)

  int iVar2;
  int iVar3;
  *(undefined4 *)(this + 8) = 0;
  RtlInitializeCriticalSectionAndSpinCount(this + 0x10,0);
  *(undefined4 *)(this + 0x38) = 0;
  *(undefined4 *)(this + 200) = 0;
  pRVar1 = this + 0x170;
  *(undefined (**) [16])(this + 0xd8) = (undefined (*) [16])(this + 0xe8);
  *(undefined8 *)(this + 0xe0) = 4;
  *(undefined (*) [16])(this + 0xe8) = ZEXT816(0);
  *(undefined (*) [16])(this + 0xf8) = ZEXT816(0);
  *(undefined4 *)(this + 0x14c) = 0;
  *(undefined4 *)(this + 0x150) = 0;
  *(undefined4 *)(this + 0x154) = 0;
  *(undefined4 *)(this + 0x158) = 0;
  *(undefined4 *)(this + 0x15c) = 0;
  *(undefined4 *)(this + 0x160) = 0;
  *(undefined4 *)(this + 0x164) = 0;
  *(undefined4 *)(this + 0x168) = 0;
  *(undefined4 *)(this + 0x16c) = 0;
  RtlInitializeSRWLock(this + 0x1e0);
  *(RPC_INTERFACE **)(this + 0x178) = pRVar1;
  *(RPC_INTERFACE **)pRVar1 = pRVar1;
  *(undefined4 *)(this + 0x180) = 0;
  *(undefined4 *)(this + 0x1e8) = 0;
  *(undefined (**) [16])(this + 0x1f0) = (undefined (*) [16])(this + 0x200);
  *(undefined8 *)(this + 0x1f8) = 4;
  *(undefined (*) [16])(this + 0x200) = ZEXT816(0);
  *(undefined (*) [16])(this + 0x210) = ZEXT816(0);
  *(undefined4 *)(this + 0x220) = 0;
  *(undefined8 *)(this + 0x228) = 0;
  *(RPC_SERVER **)this = param_2;
  *(undefined8 *)(this + 0xd0) = 0;
  *(undefined8 *)(this + 0xb8) = 0;
  *(undefined8 *)(this + 0x230) = 0;
  *(undefined8 *)(this + 0x238) = 0;
  iVar3 = UpdateRpcInterfaceInformation
                    ((longlong)this,(uint *)param_1,(ushort **)(ulonglong)param_3,param_4,param_5,
                     (longlong)param_6,(ushort **)param_7,(longlong)param_9);
  iVar2 = *(int *)param_1;
  *param_8 = iVar3;
  if ((iVar2 == 0x60) && (((byte)param_1[0x58] & 1) != 0)) {
    *(uint *)(this + 0x38) = *(uint *)(this + 0x38) | 2;
  return this;

Seems that the or is executed under some conditions. At this point, it was required to verify this assumption, I set up a breakpoint on those conditions: offset: 0xd87c

rpcrt4.dll + 0xd87c at RPC_INTERFACE::RPC_INTERFACE()
rpcrt4.dll + 0xb82e at InitializeRpcServer()
rpcrt4.dll + x      at PerformRpcInitialization()
rpcrt4.dll + y      at RpcServerUseProtseqEpW()

wchar_t ** InitializeRpcServer(undefined8 param_1,uchar **param_2,SIZE_T param_3)
      this_01 = (RPC_INTERFACE *)
                          (this_00,(_RPC_SERVER_INTERFACE *)&DAT_1800e7470,GlobalRpcServer,1,0x4d2,
                           0x1000,(FuncDef2 *)0x0,(void *)0x0,(long *)local_res8,
                           (RPCP_INTERFACE_GROUP *)0x0);

The parameter on which the condition is check is fixed, i.e. DAT_1800e7470, and never altered.

But, going on with the debugger, I break on the same point. This time with another stack trace that starts exactly in my rpc server test!

rpcrt4.dll + 0xd87c  at RPC_INTERFACE::RPC_INTERFACE()
rpcrt4.dll + 0xd475  at RPC_INTERFACE * RPC_SERVER::FindOrCreateInterfaceInternal()
rpcrt4.dll + 0xd196  at void RPC_SERVER::RegisterInterface()
rpcrt4.dll + 0x7271e at void RpcServerRegisterIf()
RPCServerTest.exe: main

          (RPC_SERVER *param_1,_RPC_SERVER_INTERFACE *param_2,ulonglong param_3,uint param_4,
          uint param_5,FuncDef2 *param_6,void *param_7,long *param_8,int *param_9,
          RPCP_INTERFACE_GROUP *param_10)

  pRVar5 = (RPC_INTERFACE *)AllocWrapper(0x240,p_Var8,uVar10);
  uVar2 = (uint)p_Var8;
  this = pRVar9;
  if (pRVar5 != (RPC_INTERFACE *)0x0) {
    p_Var8 = param_2;
    this = (RPC_INTERFACE *)
    uVar2 = (uint)p_Var8;

void RPC_SERVER::RegisterInterface
               (RPC_SERVER *param_1,uint *param_2,uint *param_3,SIZE_T param_4,uint param_5,
               uint param_6,uint param_7,FuncDef2 *param_8,ushort **param_9,
               RPCP_INTERFACE_GROUP *param_10)

  pRVar5 = FindOrCreateInterfaceInternal
                       (param_1,(_RPC_SERVER_INTERFACE *)param_2,(ulonglong)param_5,param_6,local_84
                        ,param_8,param_9,(long *)&local_80,(int *)&local_78,local_68);

void RpcServerRegisterIf(uint *param_1,uint *uuid,SIZE_T param_3)
  MUTEX *pMVar1;
  void *in_R9;
  undefined4 in_stack_ffffffffffffffc8;
  undefined4 in_stack_ffffffffffffffcc;
                    /* 0x726c0  1483  RpcServerRegisterIf */
  if ((RpcHasBeenInitialized != 0) ||
     (pMVar1 = PerformRpcInitialization
                          (void *)CONCAT44(in_stack_ffffffffffffffcc,in_stack_ffffffffffffffc8)),
     (int)pMVar1 == 0)) {
              (GlobalRpcServer,param_1,uuid,param_3,0,0x4d2,gMaxRpcSize,(FuncDef2 *)0x0,
               (ushort **)0x0,(RPCP_INTERFACE_GROUP *)0x0);

Basically, that conditions failed because of the first parameter passed by my rpc server test application to RpcServerRegisterIf().

At this point I looked at my code to find which kind of configuration I used and how I could change it in My test server app.

API prototype > 
  RPC_STATUS RpcServerRegisterIf(
    UUID          *MgrTypeUuid,
    RPC_MGR_EPV   *MgrEpv

My RpcServerRegisterIf call > 
   status = RpcServerRegisterIf(interfaces_v1_0_s_ifspec,

Value of interfaces_v1_0_s_ifspec >
  static const RPC_SERVER_INTERFACE interfaces___RpcServerInterface =
    0x04000000 // param_1[0x58]

Of course, the value of interfaces_v1_0_s_ifspec is equal to the one shown in the last debugger image! Since, the failing check is done on param_1[0x58] & 1, I changed 0x04000000 to 0x04000001 in order to have the vulnerable code executing!

Pay attention that this is not triggered by the client, in the previous screen the server just configure itself, no client connected to it. Finally in the BeginRpcCall() the check is satisfied.

Finally, the check inside the ProcessReceivedPDU() is satisfied, but the vulnerable code still not reached because of:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?)
          if ((*(int *)(this + 0x244) == 0) || (*(int *)(this + 0x1cc) != 0)) { // not enter here!
          else if (pkt_tyoe == 0) { // packet is a request
                    /* should not enter here  */
            if (*(int *)(this + 0x214) == 0) {
            goto _OSF_SCALL::DispatchRPCCall // that handle the request
            //vulnerable code

The problem now is to find under which conditions (this + 0x214) is set. I already saw that in the ActivateCall() routine called every time an RPC communication starts *(this + 0x214) is set to 0. For this, for sure on first call that value cannot be different by 0.

I decided to dig a bit in the code just to understand better how the messages are handled. The messages are processed according to the packet type defined in the DCERPC field,Reference, in the routine void OSF_SCONNECTION::ProcessReceiveComplete().

void OSF_SCONNECTION::ProcessReceiveComplete{

  if (pkt_type == 0) { // packet is a request
    if (*(int *)(this + 3) == 0) { // not investigated
        uVar6 = tcp_received->call_id;
        if (*(int *)((longlong)this + 0x17c) < (int)uVar6) { //should enter here if the call id is new, i.e. the packet refer to a new request
          ppvVar9 = (LPVOID *)OSF_SCALL::OSF_SCALL(puVar11,(longlong)this,(long *)pdVar18); // maybe instantiate the object to represent the call request
          iVar4 = OSF_SCALL::BeginRpcCall((longlong *)ppvVar9,tcp_received,ppOVar20);
          ppvVar9 = (LPVOID *)FindCall((OSF_SCONNECTION *)this,uVar6); // the call id has already view in the communication there should be an object, FIND IT!
          if (ppvVar9 == (LPVOID *)0x0) { // not found this is a new call id request, should have first fragment set and not reuse already used call ids?
            uVar6 = tcp_received->call_id;
            if (((int)uVar6 < *(int *)((longlong)this + 0x17c)) ||
               ((tcp_received->pkt_flags & 1U) == 0)) { // should not be 
              uVar6 = *(int *)((longlong)this + 0x17c) - uVar6;
              goto reach_fault?;
            goto LAB_18009016d;
call_processReceivedPDU: // call id object found just process new message
          uVar8 = OSF_SCALL::ProcessReceivedPDU((OSF_SCALL *)ppvVar9,tcp_received,uVar13,0);
        uVar6 = tcp_received->call_id;
        if ((*(byte *)&tcp_received->data_repres & 0xf0) != 0x10) {
          uVar6 = uVar6 >> 0x18 | (uVar6 & 0xff0000) >> 8 | (uVar6 & 0xff00) << 8 | uVar6 << 0x18;
        if ((tcp_received->pkt_flags & 1U) == 0) {
          if (*(int *)((longlong)this + 0x1c) != 0) {
            uVar6 = *(int *)((longlong)this + 0x17c) - uVar6;
            if (0x95 < uVar6) goto goto_SendFAULT;
            goto LAB_180090122;
        else if (*(int *)((longlong)this + 0x1c) != 0) { // this+0x1c could be a flag true or false that says if the call id is new or not
          *(undefined4 *)((longlong)this + 0x1c) = 0;
          *(uint *)((longlong)this + 0x17c) = uVar6;
          *(uint *)(this[0xf] + 0x1c8) = uVar6;
          iVar4 = OSF_SCALL::BeginRpcCall((longlong *)this[0xf],tcp_received,ppOVar20);
          goto LAB_18003d2b5;
        uVar8 = OSF_SCALL::ProcessReceivedPDU((OSF_SCALL *)this[0xf],tcp_received,uVar13,0);
        iVar4 = (int)uVar8;
  if (tcp_received->pkt_type == '\x0b') { // Executed when the client send a BIND request
      if (this[9] != 0) {
        uVar3 = 0;
        goto LAB_18008ff47;
      iVar4 = AssociationRequested((OSF_SCONNECTION *)this,tcp_received,uVar13,uVar6);

BeginRpcCall() ends in calling ProcessReceivedPDU() that is the primary function doing the packet parse job. ProcessReceivedPDU() when arrive the first and last fragment request:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
        *(uint *)(this + 0x1d8) = local_res8[0]; // set the fragment len in the object
        *(char **)pdVar1 = tcp_payload->data;
        tcp_payload = pdVar14;
                    /* if last frag set with first frag */
        *(undefined4 *)(this + 0x21c) = 3;
        if (((byte)this[0x2e0] & 4) != 0) {
          *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
               *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
          *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
        plVar13 = (longlong *)((ulonglong)tcp_payload & 0xffffffffffffff00 | (ulonglong)pkt_tyoe);
        uVar9 = DispatchRPCCall((longlong *)this,plVar13,puVar17); // process the remote call
        return uVar9;

ProcessReceivedPDU() when arrives the first fragment fragment request with last fragment flag unset:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
          /* it is not a the last frag */
          uVar5 = tcp_payload->alloc_hint; // client suggest to the server how many bytes it will need to handle the full request
          if (uVar5 == 0) {
            *(uint *)(this + 0x248) = local_res8[0];
            uVar5 = local_res8[0]; // set to the frag len if alloc hint is 0
          else {
            *(uint *)(this + 0x248) = uVar5;
          puVar17 = (uint *)(ulonglong)uVar5;
      /* Could not allocate more than: (this +0x138)+0x148 = 0x400000 */
          if (*(uint *)(**(longlong **)(this + 0x138) + 0x148) <= uVar5 &&
              uVar5 != *(uint *)(**(longlong **)(this + 0x138) + 0x148)) { // check to avoid possible integer overflow
            piVar11 = local_80;
            uVar18 = 0x1072;
            local_80[0] = 3;
            local_78 = uVar5;
            goto goto_adderror_and_send_fault;
          *(undefined4 *)(this + 0x1d8) = 0;
          pdVar14 = pdVar1;
      /* allocate buffer for all the fragments data */
          lVar7 = GetBufferDo((OSF_SCALL *)tcp_payload->data,(void **)pdVar1,uVar5,0,0, in_stack_ffffffffffffff60);
          if (lVar7 == 0) goto do_;
          pdVar14 = (dce_rpc_t *)0xe;
          goto cleanupAndSendFault;

long __thiscall
          (OSF_SCALL *this,void **param_1,uint len,int param_3,uint param_4,ulong param_5)

  long lVar1;
  OSF_SCALL *pOVar2;
  OSF_SCALL *dst;
  ulonglong uVar3;
  OSF_SCALL *local_res8;
  local_res8 = this;
  lVar1 = OSF_SCONNECTION::TransGetBuffer((OSF_SCONNECTION *)this,&local_res8,len + 0x18);
  if (lVar1 == 0) {
    if (param_3 == 0) {
      *param_1 = local_res8 + 0x18;
    else {
      uVar3 = (ulonglong)param_4;
      dst = local_res8 + 0x18;
      pOVar2 = dst;
      if ((longlong)*param_1 - 0x18U != 0) {
        BCACHE::Free((ulonglong)pOVar2,(longlong)*param_1 - 0x18U,uVar3);
      *param_1 = dst;
    lVar1 = 0;
  else {
    lVar1 = 0xe;
  return lVar1;

If the fragment received is the first but not the last then a new memory is allocated using GetBufferDo() routine. It’s interesting the check on the allocation_hint/fragment len that cannot be greater or equal to 0x400000, this is very important because of the strange implementation of GetBufferDo() that allocate len + 0x18 bytes.

If the packet is not the first fragment and not the last fragment:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
                    /* should enter this branch */
        if (*(void **)pdVar1 != (void *)0x0) {
          uVar5 = local_res8[0];
          if ((*(int *)(this + 0x244) == 0) || (*(int *)(this + 0x1cc) != 0)) { // !RPC_INTERFACE_HAS_PIPES
            if ((*(int *)(this + 0x214) == 0) || (*(int *)(this + 0x1cc) != 0)) {
/* fragment len + current copied */
              uVar5 = local_res8[0] + *(uint *)(this + 0x1d8);
/* alloc_hint <= fragment length  */
              if (*(uint *)(this + 0x248) <= uVar5 && uVar5 != *(uint *)(this + 0x248)) {
                if (*(int *)(this + 0x1cc) != 0) goto LAB_18008ea09;
                *(uint *)(this + 0x248) = uVar5;
/* overflow check 0x40000  */
                puVar17 = (uint *)(**(OSF_SCALL ***)(this + 0x138) + 0x148);
                if (*puVar17 <= uVar5 && uVar5 != *puVar17) {
                  piVar11 = local_50;
                  uVar18 = 0x1073;
                  local_50[0] = 3;
                  local_48 = uVar5;
                  goto goto_adderror_and_send_fault;
// reallocate because fragment len is greater than the alloc hint used during first fragment! 
                lVar7 = GetBufferDo(\**(OSF_SCALL ***)(this + 0x138),(void **)pdVar1,uVar5,1,
                                    *(uint *)(this + 0x1d8),in_stack_ffffffffffffff60);
                if (lVar7 != 0) goto fail;
              uVar5 = local_res8[0];
              puVar17 = (uint *)(ulonglong)local_res8[0];
//copy fragment data into the alloced buffer
              memcpy((void *)((longlong)*(void **)pdVar1 + (ulonglong)*(uint *)(this + 0x1d8)),tcp_payload->data,local_res8[0]);
              *(uint *)(this + 0x1d8) = *(int *)(this + 0x1d8) + uVar5;
              if ((pkt_flags & 2) == 0) {
                return 0; // if it is not the last fragment return 0
              goto dispatchRPCCall; // dispatch the request only if it is the last fragment
      else if (pkt_tyoe == 0) {
    /* here if the packet type is 0, i.e. request */
        if (*(int *)(this + 0x214) == 0) {
          uVar15 = GetBufferDo(\**(OSF_SCALL ***)(this + 0x138),(void **)pdVar1,uVar15,1, *(uint *)(this + 0x1d8),in_stack_ffffffffffffff60);
          pdVar14 = (dce_rpc_t *)(ulonglong)uVar15;
          if (uVar15 != 0) goto cleanupAndSendFault;
              memcpy((void *)((ulonglong)*(uint *)(this + 0x1d8) + *(longlong *)(this + 0xe0)), tcp_payload->data,uVar5);
              *(uint *)(this + 0x1d8) = *(int *)(this + 0x1d8) + uVar5;
              (\**(code \**)(**(longlong **)(this + 0x130) + 0x40))();
              if (*(int *)(this + 0x1d8) != *(int *)(this + 0x248)) {
                return 0;
              if ((pkt_flags & 2) == 0) {
                *(undefined4 *)(this + 0x230) = 0;
              else {
                *(undefined4 *)(this + 0x21c) = 3;
                if (((byte)this[0x2e0] & 4) != 0) {
                  *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
                       *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
                  *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
              plVar13 = (longlong *)0x0;
              goto Dispatch_rpc_call;
          /* store frag len */
          uVar5 = local_res8[0];
          iVar8 = QUEUE::PutOnQueue((QUEUE *)(this + 600),tcp_payload->data,local_res8[0]);
          if (iVar8 == 0) {
          /* integer overflow! */
          *(uint *)(this + 0x24c) = *(int *)(this + 0x24c) + uVar5;

Basically, the middle fragments are just copied in the alloced buffer under some conditions.

The DispatchRPC() seems to be the dispatcher of the full request, indeed it calls then the routine to handle the request, unmarshalling parameters and executing the
procedure remotely invoked. It is interesting that:

undefined8 OSF_SCALL::DispatchRPCCall(longlong *param_1,longlong *param_2,undefined8 param_3)
  if (*(int *)((longlong)param_1 + 0x1cc) < 1) {
    *(undefined4 *)((longlong)param_1 + 0x214) = 1;

*(int *)((longlong)param_1 + 0x1cc seems to be zero so ` *(undefined4 *)((longlong)param_1 + 0x214) = 1;` this instruction takes place.

It’s possible to force the DispatchRpcCall() and still accept fragments for the same call id? The response to this question is yes!

Looking at ProcessReceivedPDU() exists a path that leads to execute the DispatchRpcCall() without sending any last fragment. The constraints:

  1. Server configured with RPC_INTERFACE_HAS_PIPES -> int at offset 0x58 in the rpc interface defined in the server has second bit to 1
  2. Client sends a first fragment, that initiliaze the call id object
  3. Client sends the second fragment without first and last flag enabled, this ends to call DispatchRpcCall() that sets param_1 + 0x214 to 1.
  4. Client alloc hint sent in the first fragment must be equal to the copied buffer len at the end of second fragment.

This will lead the code to enter DispatchRpcCall() and make nexts client’s fragments to be handled by the buggy code.

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
            else if (pkt_tyoe == 0) {
                    /* here if the packet type is 0, i.e. request */
            if (*(int *)(this + 0x214) == 0) {
              puVar17 = (uint *)(ulonglong)local_res8[0];
              uVar15 = *(uint *)(this + 0x1d8) + local_res8[0];
              if (*(int *)(this + 0x1d8) != *(int *)(this + 0x248)) { // cheat on this to make fragments appear as the last!
                // 0x1d8 contain already copied bytes until this request!
                // 0x248 contains the alloc hint!
                return 0;
              if ((pkt_flags & 2) == 0) {
                *(undefined4 *)(this + 0x230) = 0;
              else {
                *(undefined4 *)(this + 0x21c) = 3;
                if (((byte)this[0x2e0] & 4) != 0) {
                  *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
                       *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
                  *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
              plVar13 = (longlong *)0x0;
              goto Dispatch_rpc_call;

At this point, it’s possible to write a PoC following the previous constraints, the PoC has been tested against my RPC Server using ncacn_np protocol.

With the poc seems to be impossible to trigger the vulnerability due some checks in ProcessReceivedPDU(): Basically the check, that makes impossible to overflow the integer, is executed on the queue messages size.

              iVar8 = QUEUE::PutOnQueue((QUEUE *)(this + 600),tcp_payload->data,local_res8[0]);
              if (iVar8 == 0) {
                    /* integer overflow! */
                *(uint *)(this + 0x24c) = *(int *)(this + 0x24c) + uVar5;
                if (((pkt_flags & 2) != 0) &&
                   (*(undefined4 *)(this + 0x21c) = 3, ((byte)this[0x2e0] & 4) != 0)) {
                  *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
                       *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
                  *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
                if (*(longlong *)(this + 0x18) != 0) {
                  if ((*(uint *)(this + 0x250) == 0) ||
                     ((*(uint *)(this + 0x24c) < *(uint *)(this + 0x250) &&
                      (*(int *)(this + 0x21c) != 3)))) {
                    if ((3 < *(int *)(this + 0x264)) &&
                       ((*(int *)(*(longlong *)(this + 0x130) + 0x18) != 0 &&
                        (uVar9 = 0, *(int *)(this + 0x21c) != 3)))) {
                      *(undefined4 *)(this + 0x2ac) = 1;
                      uVar9 = uVar19; // critical instruction that makes processreceivepdu return 1
                  else {
                    *(undefined4 *)(this + 0x250) = 0;
                    SCALL::IssueNotification((SCALL *)this,2);
                  return uVar9;
                if (((3 < *(int *)(this + 0x264)) &&
                    (*(int *)(*(longlong *)(this + 0x130) + 0x18) != 0)) &&
                   (*(int *)(this + 0x21c) != 3)) {
                  *(undefined4 *)(this + 0x2ac) = 1;
                  uVar9 = uVar19; // critical instruction that makes processreceivepdu return 1; My code entered heres
                SetEvent(*(HANDLE *)(this + 0x2c0));
                return uVar9; // 

When ProcessReceivedPDU() returns 1, it stops to listen for new messages so the client cannot trigger vulnerability anymore.

        uVar8 = OSF_SCALL::ProcessReceivedPDU((OSF_SCALL *)this[0xf],tcp_received,uVar13,0);
        iVar4 = (int)uVar8;
      if (iVar4 == 0) {
        (**(code **)(*this + 0x38))(); -> OSF_SCONNECTION::TransAsyncReceive() continues to read from the descriptor new packets
      goto LAB_18003d2c5;

In order to not enter the critical paths that lead the server stopping listen for new packets, in the same connection of course, it’s required that one of these should not be satisfied:

3 < *(int *)(this + 0x264) // number of queued packets
*(int *)(*(longlong *)(this + 0x130) + 0x18) != 0 // don't know
*(int *)(this + 0x21c) != 3  // can be unsatisfied on the packet with last fragmente flag enabled

*(this + 0x21c)  is set to 3 if the packets has the last fragment enabled.

The number of queued packets is incremented on a PutOnQueue() and decremented on a TakeOffQueue(), at most the queue could contain 4 packets. To me this was so strange because the application was multithread but looked like a monothread just because I did’t see any TakeOffQueue() called during my experiments.

int __thiscall QUEUE::PutOnQueue(QUEUE *this,void *param_1,uint param_2)
  *(int *)(this + 0xc) = iVar6 + 1;
  **(void ***)this = param_1;
  *(uint *)(*(longlong *)this + 8) = param_2;
  return 0;

undefined8 QUEUE::TakeOffQueue(longlong *param_1,undefined4 *param_2)
  int iVar1;
  iVar1 = *(int *)((longlong)param_1 + 0xc);
  if (iVar1 == 0) {
    return 0;
  *(int *)((longlong)param_1 + 0xc) = iVar1 + -1;
  *param_2 = *(undefined4 *)(*param_1 + -8 + (longlong)iVar1 * 0x10);
  return *(undefined8 *)(*param_1 + (longlong)*(int *)((longlong)param_1 + 0xc) * 0x10);

In general my thought on the server status was the following:

  1. A Thread running somewhere in the DispatcRpcCall() after the packet that forced to call DispatchRpcCall() has been received, i.e. constrants number 3.
  2. A Thread continuing to read from the medium, did not know from.

So, I inspected better DispatcRpcCall().

undefined8 OSF_SCALL::DispatchRPCCall(longlong *param_1,longlong *param_2,undefined8 param_3)
    uVar3 = RpcpConvertToLongRunning(this,param_2,param_3);
    bVar7 = (int)uVar3 != 0;
    if (bVar7) {
      uVar1 = 0xe;
    else {
      this = (longlong *)param_1[0x26];
      uVar1 =  OSF_SCONNECTION::TransAsyncReceive(); // this should wake up a thread to handle new incoming packet
    if (uVar1 == 0) {
      pOVar6 = (OSF_SCONNECTION *)param_1[0x26];
      if (*(int *)(pOVar6 + 0x18) == 0) { // this is the (this + 0x130) + 0x18)
      pcVar4 = OSF_SCALL::RemoveReference(); // calling this means that the connection is over!
    else {
      uVar3 = param_1[0x1c] - 0x18;
      if (uVar3 != 0) {
      if (bVar7) {
      else {
      pOVar6 = (OSF_SCONNECTION *)param_1[0x26];
      if (*(int *)(pOVar6 + 0x18) == 0) {
        pOVar6 = (OSF_SCONNECTION *)param_1[0x26];

int OSF_SCONNECTION::TransAsyncReceive(longlong *param_1,undefined8 param_2,undefined8 param_3)

  int iVar1;
  if (*(int *)(param_1 + 5) == 0) {
                    /* CO_Recv */
    iVar1 = CO_Recv(); // return true if a packet has been read
    if (iVar1 != 0) {
      if ((*(int *)(param_1 + 3) != 0) && (*(int *)((longlong)param_1 + 0x1c) == 0)) {
        OSF_SCALL::WakeUpPipeThreadIfNecessary((OSF_SCALL *)param_1[0xf],0x6be);
      *(undefined4 *)(param_1 + 5) = 1;
  else {
    iVar1 = -0x3ffdeff8;
  return iVar1;

ulonglong CO_Recv(PSLIST_HEADER param_1,undefined8 param_2,SIZE_T param_3)

  uint uVar1;
  ulonglong uVar2;
  if ((*(int *)&param_1[4].field_0x8 != 0) &&
     (*(int *)&param_1[4].field_0x8 == *(int *)&param_1[4].field_0x4)) {
    uVar1 = RPC_THREAD_POOL::TrySubmitWork(CO_RecvInlineCompletion,param_1);
    uVar2 = (ulonglong)uVar1;
    if (uVar1 != 0) {
      uVar2 = 0xc0021009;
    return uVar2;
  uVar2 = CO_SubmitRead(param_1,param_2,param_3); // ends in calling NMP_CONNECTION::Receive
  return uVar2;

According to my breakpoints, TransAsyncReceive() is executed and this led the server in waiting for a new packet after the packet that led to DispatchRpcCall(). If it, TransAsyncReceive(), returns 0 then the packet should has been read and handled by another thread, indeed it proceds calling DispatchHelper(). At this point, I imagined two threads:

  1. A thread started from TransAsyncReceive() that handle new packet and lead to enter in the ProcessReceivedPDU() continues to read until it not returns 1.
  2. A Thread that enters in DispatchHelper().

So, I started to track the memory at (this + 0x130) + 0x18 to identify where it’s written to 1. Because if it is not 0 then the condition that lead ProcessReceivedPDU() returning 1 is not satisfied anymore then it continue to read and because if it is 0 then a TakeOffQueue() is executed from DispatchRpcCall().

undefined8 OSF_SCALL::DispatchRPCCall(longlong *param_1,longlong *param_2,undefined8 param_3)
      if (*(int *)(pOVar6 + 0x18) == 0) {

void __thiscall OSF_SCONNECTION::DispatchQueuedCalls(OSF_SCONNECTION *this)
  longlong *plVar1;
  undefined4 local_res8 [2];
  while( true ) {
    RtlEnterCriticalSection(this + 0x118);
    plVar1 = (longlong *)QUEUE::TakeOffQueue((longlong *)(this + 0x1b8),local_res8); // decrement no. queued messaeges 
    if (plVar1 == (longlong *)0x0) break;

(this + 0x130) + 0x18 is set to 1, during the association request at offset rpcrt4.dll+0x38b99, just if the packet flag multiplexed is not set.

void OSF_SCONNECTION::AssociationRequested
               (OSF_SCONNECTION *param_1,dce_rpc_t *tcp_received,uint param_3,int param_4)

        if ((tcp_received->pkt_flags & 0x10U) == 0) {
          *(undefined4 *)(param_1 + 0x18) = 1;
          *(byte *)((longlong)pppppvVar21 + 3) = *(byte *)((longlong)pppppvVar21 + 3) | 3;

So, setting the RPC_HAS_PIPE in the server interface and specifying MULTIPLEX flag during the connection, it is under client control, will lead the code into the vulnerable point. At this point the debugger enters in the OSF_SCALL::GetCoalescedBuffer() that is another vulnerable call.

The code enter here from this stack trace:

rpcrt4.dll+0xb404d in OSF_SCALL::Receive()
rpcrt4.dll+0x4b297 in RPC_STATUS I_RpcReceive()
rpcrt4.dll+0x82422 in NdrpServerInit()
rpcrt4.dll+0x1c35e in NdrStubCall2()
rpcrt4.dll+0x57832 in DispatchToStubInCNoAvrf()
rpcrt4.dll+0x39e00 in RPC_INTERFACE::DispatchToStubWorker()
rpcrt4.dll+0x39753 in DispatchToStub()
rpcrt4.dll+0x3a2bc in OSF_SCALL::DispatchHelper()

So using this PoC we enter every n packets in the coalesced buffer triggering the second integer overflow.

The routine OSF_SCALL::GetCoalescedBuffer() is reached just because the server has RPC_HAS_PIPE configured. Analyzing the stack trace from the last function before OSF_SCALL::GetCoalescedBuffer() I searched some constraints I satisfied that led me to the vulnerable code.

long __thiscall OSF_SCALL::Receive(OSF_SCALL *this,_RPC_MESSAGE *param_1,uint param_2)
      while (iVar3 = *(int *)(this + 0x21c), iVar3 != 1) { // this+ 0x21c seems not set to 1 in ProcessReceivedPDU, did not found where it is
        if (iVar3 == 2) {
          return *(long *)(this + 0x20);
        if (iVar3 == 3) { // it is equal to 3 on the last frag packet, from ProcessReceivedPDU
          lVar2 = GetCoalescedBuffer(this,param_1,iVar4);
          return lVar2; // last frag received -> no more packets will be waited!
        // if the total queued packets length is less or equal to the allocation hint received than wait for new data
        if (*(uint *)(this + 0x24c) <= (uint)*(ushort *)(*(longlong *)(this + 0x130) + 0xb8) { 
          EVENT::Wait((EVENT *)(this + 0x2c0),-1); // wait for other packets
        else { 
          lVar2 = GetCoalescedBuffer(this,param_1,iVar4); // run coalesced buffer
          if (lVar2 != 0) { // error
            return lVar2;

// (uint)*(ushort *)(*(longlong *)(this + 0x130) + 0xb8 is set during the association request
void OSF_SCONNECTION::AssociationRequested
               (OSF_SCONNECTION *param_1,dce_rpc_t *tcp_received,uint param_3,int param_4)
    uVar18 = *(ushort *)((longlong)&tcp_received->alloc_hint + 2);
    if ((*(byte *)&tcp_received->data_repres & 0xf0) != 0x10) {
      uVar26 = *(uint *)&tcp_received->cxt_id;
      uVar23 = uVar23 >> 8 | uVar23 << 8;
      uVar18 = uVar18 >> 8 | uVar18 << 8;
      *(ushort *)((longlong)&tcp_received->alloc_hint + 2) = uVar18;
      *(uint *)&tcp_received->cxt_id =
           uVar26 >> 0x18 | (uVar26 & 0xff0000) >> 8 | (uVar26 & 0xff00) << 8 | uVar26 << 0x18;
      *(ushort *)&tcp_received->alloc_hint = uVar23;
    puVar24 = (undefined4 *)(ulonglong)uVar23;
    uVar14 = (ushort)*(undefined4 *)(*(longlong *)(param_1 + 0x20) + 0x4c);
    uVar3 = 0xffff;
    if (uVar23 != 0xffff) {
      uVar3 = uVar23;
    if (uVar3 <= uVar18) {
      uVar18 = uVar3;
    if (uVar18 <= uVar14) {
      uVar14 = uVar18;
    *(ushort *)(param_1 + 0xb8) = uVar14 & 0xfff8; // set equal to the allocation hint & 0xfff8 of the first association packet!

Basically from OSF_SCALL::Receive() the code continuosly waits for new packets until a last fragment packet is received or any error is found.

GetCoalescedBuffer() is called if the last fragment is received or if the total len accumulated during the ProcessReceivedPDU() is greater than the alloc_hint, for these condition every n packets it is called.

So, just increasing the fragment length in the PoC it’s possible to trigger GetCoalescedBuffer() without setting MULTIPLEX. Moreover, OSF_SCALL::Receive() is reached because Message->RpcFlags & 0x8000 == 0

RPC_STATUS I_RpcReceive(PRPC_MESSAGE Message,uint Size)
    if ((Message->RpcFlags & 0x8000) == 0) {
                    /* get  OSF_SCALL::Receive address */
      pcVar5 = *(code **)(*plVar7 + 0x38);
    else {
                    /* get NDRServerInitializeMarshall address */
      pcVar5 = *(code **)(*plVar7 + 0x48);
                    /* call coalesced buffer */
    uVar2 = (*pcVar5)();

void NdrpServerInit()
    if (param_5 == 0) {
      if ((*(byte *)(*(ushort **)(puVar2 + 0x10) + 2) & 8) == 0) {

        if ((param_2->RpcFlags & 0x1000) == 0) {  // During my tests this value was 0 each time, didnt searched when this could fail
        param_2->RpcFlags = 0x4000;               // makes to call GetCoalescedBuffer()
        puStackY96 = (undefined *)0x180082427;
        exception = I_RpcReceive(param_2,0);
        if (exception != 0) {
                    /* WARNING: Subroutine does not return */
          puStackY96 = &UNK_180082432;
        param_1->Buffer = (uchar *)param_2->Buffer;
        puVar10 = (uchar *)param_2->Buffer;
        param_1->BufferStart = puVar10;
        param_1->BufferEnd = puVar10 + param_2->BufferLength;
        param_3 = pMVar20;

GetCoalescedBuffer() is used to merge queued buffers to the last received packet that has not been queued, indeed it allocates the total length needed and copy there every queued message then.

long __thiscall OSF_SCALL::GetCoalescedBuffer(OSF_SCALL *this,_RPC_MESSAGE *param_1,int param_2)
  iVar5 = *(int *)(this + 0x24c);
  if (iVar5 != 0) {
    if (uVar3 != 0) {
  /* integer overflow */
      iVar5 = iVar5 + *(int *)(param_1 + 0x18);
    lVar2 = OSF_SCONNECTION::TransGetBuffer((OSF_SCONNECTION *)this_00,&local_res8,iVar5 + 0x18);
    if (lVar2 == 0) {
      dst_00 = (void *)((longlong)local_res8 + 0x18);
      dst = dst_00;
      local_res8 = dst_00;
      if ((uVar3 != 0) && (*(void **)(param_1 + 0x10) != (void *)0x0)) {
        memcpy(dst_00,*(void **)(param_1 + 0x10),*(uint *)(param_1 + 0x18));
        local_res8 = (void *)((ulonglong)*(uint *)(param_1 + 0x18) + (longlong)dst_00);
        OSF_SCONNECTION::TransFreeBuffer() // free last message PDU received
      while (src = (void *)QUEUE::TakeOffQueue((longlong *)(this + 600),(undefined4 *)&local_res8), src != (void *)0x0) { 
        // loop to consume the queue merging the queued packets
        uVar4 = (ulonglong)local_res8 & 0xffffffff;
                    /* OSF_SCONNECTION::TransFreeBuffer  */
        OSF_SCONNECTION::TransFreeBuffer() // free PDU buffer queued
        dst = (void *)((longlong)dst + uVar4);
      *(undefined4 *)(this + 0x24c) = 0;

At this point from the new constraints found merging to the old ones it’s possible to resume the conditions that lead to vulnerable point:

  1. Server configured with RPC_INTERFACE_HAS_PIPES -> integer at offset 0x58 in the rpc interface defined in the server has second bit to 1 - **Not under attacker control **
  2. Client sends a first fragment, that initiliaze the call id object and satisfies the condition to call directly DispatchRpcCall(), sending fragment long just as the alloc hint set in this way: (*(int *)(this + 0x1d8) != *(int *)(this + 0x248) in ProcessReceivedPDU() lead to execute DispatchRpcCall() and so param_1 + 0x214 is set to 1 - Under attacker control
  3. Client continues to send middle fragments until: iVar5 = iVar5 + *(int *)(param_1 + 0x18); overflows and lead memcpy() to overflow the buffer - Under attacker control

Bad points:

  1. fragment length it’s two bytes long, very hard to exploit because it’s time and memory consuming - Under attacker control, Highly constrained
  2. I found no way to cheat on the fragment length value and sent bytes, i.e. they should be coherent between each other.

These lead to send a large number of fragments and consequentely the exploitation could be impossible due to system memory, i.e. the server just before request that overlow the integer should be able, in the coalesce buffer, to allocate 4gb of memory. If the memory is not allocated then an exception is raised. Anyway, assuming that the rpc server could allocate more than 4GB, there is the problem of the time. Indeed the packet’s number to send is very big and the processing time of the packets increase enormously according to the total length of the queued data.

My final is available here PoC, below is shown in the debugger the breakpoint on the vulnerable add operation in the function to coalesce buffers.

Discover an AntiDebug feature: a newbie approach

By: s1ckb017
30 July 2022 at 00:00

This blogpost shows the way I understod where the anti debug check were implemented with a basic, I would say, tending to 0, knowledge on Windows Internals. I started from a problem: the target process I wanted to attach was protected by a kind of watchdog. Did not know if the watchdog was in kernel space or in user space. During this journey, I learnt something about Windows internals, x64dbg/titan engine internals and improved windbg usage.

Problem: Access DENIED to process

I wanted to debug a service running with elevated privilege, i.e. SYSTEM, but this seems to be impossible even using x32dbg run as SYSTEM user.

Running x32dbg with system privileges and attaching to the process results in an ACCESS DENIED error. The PID to debug is 4056.

So, to understand where the debugger received the ACCESS DENIED, I started digging into the procedure used by the debugger to attach and debug a process.

First thing, I did, is to put a bp on ntOpenProcess(). I was pretty sure that ntOpenProcess() was the first involved routine. In the following image is clear that the debugger under debug is opening the target process.

The ntOpenProcess() returns a valid handle, as shown in the image, this suggested that the process is correctly opened.

So, then I cross searched in titan engine and in the x64 debugger source code where this call happens and what there is after. After some research I found that the failing function was DbgUiDebugActiveProcess_(IN HANDLE Process)), i.e. the systemcall NtDebugActiveProcess() returned ACCESS DENIED

This is a function exported by ntdll.dll and ends directly in kernel space without any intermediate. Moreover, checking the target binary statically seems that did not exists any anti debug trick implemented inside. </br>

` OpenProcess() returns no error + NtDebugActiveProcess() return ACCESS DENIED + No anti debug in target binary -> check most likely in kernel space `
Just to see if some of these functions is called and cause an access denied.

Test case

At this point, I focused my research on kernel space in order to check why that routine return ACCESS DENIED even though the process has been correctly opened. If exists some anti debug trick in kernel space that would not allow to debug a process then it would deny any injection too. So, to have a better test case I wrote a small snippet in C that open a process and inject a DLL.

	HANDLE h = OpenProcess(PROCESS_ALL_ACCESS, false, pId);
	if (h) {
		printf("[+] Process %d Opened handle no. %08x\n", pId, (unsigned int)h);
		LPVOID LoadLibAddr = (LPVOID)GetProcAddress(GetModuleHandleA("kernel32.dll"), "LoadLibraryA");
		printf("[+] LoadLibraryA address: %p\n", LoadLibAddr);
		LPVOID dereercomp = VirtualAllocEx(h, NULL, strlen(dllName)+1, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
		if (dereercomp == NULL)
			fprintf(stderr, "[-] Impossible allocate new memory in the target process %d\n", GetLastError());
			return 1;
		printf("[+] Alloced space on: %p\n", dereercomp);
		if (!WriteProcessMemory(h, dereercomp, dllName, strlen(dllName), NULL)) {
			fprintf(stderr, "[-] Impossible writing in process memory %d\n", GetLastError());
			return 1;
		else {
			printf("[+] Process memory correctly written\n");
		HANDLE asdc = CreateRemoteThread(h, NULL, NULL, (LPTHREAD_START_ROUTINE)LoadLibAddr, dereercomp, 0, NULL);
		printf("[+] Remote Thread Output: %d %d\n", asdc, GetLastError());
		WaitForSingleObject(asdc, INFINITE);
		VirtualFreeEx(h, dereercomp, strlen(dllName), MEM_RELEASE);
		return 0;

Running the test case against the process pid from an elevated powershell results in ACCESS DENIED too on VirtualAllocEx() routine.

Windows Kernel Debugging: searching for the anti debug

At this point, I started to debug the windows kernel using windbg, so I decided, firstly to inspect a bit the target process.

kd> !process 0xfd8
Searching for Process with Cid == fd8
PROCESS ffffc20239807080
    SessionId: 0  Cid: 0fd8    Peb: 00256000  ParentCid: 0270
    DirBase: 1ce063000  ObjectTable: ffff89020ea49180  HandleCount: 357.
    VadRoot ffffc2023704ee50 Vads 163 Clone 0 Private 1876. Modified 125. Locked 0.
    DeviceMap ffff890205035ae0
    Token                             ffff89020e225060
    ElapsedTime                       17:06:01.856
    UserTime                          00:00:00.312
    KernelTime                        00:00:00.187
    QuotaPoolUsage[PagedPool]         628552
    QuotaPoolUsage[NonPagedPool]      23512
    Working Set Sizes (now,min,max)  (56976, 50, 345) (227904KB, 200KB, 1380KB)
    PeakWorkingSetSize                84807
    VirtualSize                       322 Mb
    PeakVirtualSize                   419 Mb
    PageFaultCount                    126060
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      11044

According to flags, BeingDebugged and NtGlobalFlag, contained in the process environment block data structure, seems that the process is not currently under debug of another process.

A great article on this topic is provided by CheckPoint Research

kd> .process /p ffffc20239807080; !peb 00256000
Implicit process is now ffffc202`39807080
.cache forcedecodeuser done
PEB at 0000000000256000
    InheritedAddressSpace:    No
    ReadImageFileExecOptions: No
    BeingDebugged:            No
    ImageBaseAddress:         0000000000400000
    NtGlobalFlag:             0
    NtGlobalFlag2:            0
    Ldr                       00007fff9f8ba4c0
    Ldr.Initialized:          Yes
    Ldr.InInitializationOrderModuleList: 00000000005627a0 . 0000000000562ff0
    Ldr.InLoadOrderModuleList:           0000000000562910 . 0000000000562fd0
    Ldr.InMemoryOrderModuleList:         0000000000562920 . 0000000000562fe0

kd> dt _peb 0x0256000
   +0x000 InheritedAddressSpace : 0 ''
   +0x001 ReadImageFileExecOptions : 0 ''
   +0x002 BeingDebugged    : 0 ''
   +0x003 BitField         : 0 ''
   +0x0b8 NumberOfProcessors : 1
   +0x0bc NtGlobalFlag     : 0
   +0x0c0 CriticalSectionTimeout : _LARGE_INTEGER 0xffffe86d`079b8000
   +0x0c8 HeapSegmentReserve : 0x100000
   +0x0d0 HeapSegmentCommit : 0x2000
   +0x0d8 HeapDeCommitTotalFreeThreshold : 0x10000
   +0x0e0 HeapDeCommitFreeBlockThreshold : 0x1000
   +0x0e8 NumberOfHeaps    : 2
   +0x0ec MaximumNumberOfHeaps : 0x10
   +0x0f0 ProcessHeaps     : 0x00007fff`9f8b8d40  -> 0x00000000`00560000 Void
   +0x7c4 NtGlobalFlag2    : 0

At this point, I checked the eprocess structure.

kd> dt ffffc20239807080  _eprocess
   +0x000 Pcb              : _KPROCESS
   +0x438 ProcessLock      : _EX_PUSH_LOCK
   +0x440 UniqueProcessId  : 0x00000000`00000fd8 Void
   +0x448 ActiveProcessLinks : _LIST_ENTRY [ 0xffffc202`39bc84c8 - 0xffffc202`370b54c8 ]
   +0x458 RundownProtect   : _EX_RUNDOWN_REF
   +0x460 Flags2           : 0xd000
   +0x464 Flags            : 0x144d0c01
   +0x464 CreateReported   : 0y1
   +0x464 NoDebugInherit   : 0y0
   +0x464 ProcessExiting   : 0y0
   +0x464 ProcessDelete    : 0y0
   +0x464 ManageExecutableMemoryWrites : 0y0
   +0x464 VmDeleted        : 0y0
   +0x464 OutswapEnabled   : 0y0
   +0x464 Outswapped       : 0y0
   +0x464 FailFastOnCommitFail : 0y0
   +0x464 Wow64VaSpace4Gb  : 0y0
   +0x464 AddressSpaceInitialized : 0y11
   +0x464 SetTimerResolution : 0y0
   +0x464 BreakOnTermination : 0y0
   +0x464 DeprioritizeViews : 0y0
   +0x464 WriteWatch       : 0y0
   +0x464 ProcessInSession : 0y1
   +0x464 OverrideAddressSpace : 0y0
   +0x464 HasAddressSpace  : 0y1
   +0x464 LaunchPrefetched : 0y1
   +0x464 Background       : 0y0
   +0x464 VmTopDown        : 0y0
   +0x464 ImageNotifyDone  : 0y1
   +0x464 PdeUpdateNeeded  : 0y0
   +0x464 VdmAllowed       : 0y0
   +0x464 ProcessRundown   : 0y0
   +0x464 ProcessInserted  : 0y1
   +0x464 DefaultIoPriority : 0y010
   +0x464 ProcessSelfDelete : 0y0
   +0x464 SetTimerResolutionLink : 0y0
   +0x468 CreateTime       : _LARGE_INTEGER 0x01d879aa`94a2629d
   +0x550 Peb              : 0x00000000`00256000 _PEB
   +0x558 Session          : 0xffffd280`69c5b000 _MM_SESSION_SPACE
   +0x560 Spare1           : (null) 
   +0x568 QuotaBlock       : 0xfffff806`12053800 _EPROCESS_QUOTA_BLOCK
   +0x570 ObjectTable      : 0xffff8902`0ea49180 _HANDLE_TABLE
   +0x578 DebugPort        : (null) 
   +0x580 WoW64Process     : 0xffffc202`350fed50 _EWOW64PROCESS

The DebugPort is null so seems there is no debug object linked to the target process.

At this point, I set a breakpoint on virtual alloc system call handler in kernel space when the handle is equal to the handle number used by the test case, i.e. 0xc4. In this way, I could stop the OS when the VirtualAllocEx() called by the test case enter in kernel space.

` bp /w β€œ@ecx == 0xc4” nt!NtAllocateVirtualMemory `

So, I traced every instruction until return of the nt!NtAllocateVirtualMemory() using the command: ta fffff806`11ab6940.

RAX at the end contain the value of the ACCESS DENIED error, i.e. 0x00000000c0000022.

Looking the trace from the end, has been identified where the return value is set.

fffff806`119f61b8 8bc7            mov     eax,edi
fffff806`119f61ba e944fdffff      jmp     nt!ObpReferenceObjectByHandleWithTag+0x233 (fffff806`119f5f03)
fffff806`119f5f03 4c8b742448      mov     r14,qword ptr [rsp+48h]
fffff806`119f5f08 488b6c2450      mov     rbp,qword ptr [rsp+50h]
fffff806`119f5f0d 4883c458        add     rsp,58h
fffff806`119f5f11 415f            pop     r15
fffff806`119f5f13 415d            pop     r13
fffff806`119f5f15 415c            pop     r12
fffff806`119f5f17 5f              pop     rdi
fffff806`119f5f18 5e              pop     rsi
fffff806`119f5f19 5b              pop     rbx
fffff806`119f5f1a c3              ret
fffff806`11ab73d3 448bf0          mov     r14d,eax
fffff806`11ab73d6 85c0            test    eax,eax
fffff806`11ab73d8 0f880e841500    js      nt!MiAllocateVirtualMemoryPrepare+0x1588fc (fffff806`11c0f7ec)
fffff806`11c0f7ec 488b442448      mov     rax,qword ptr [rsp+48h]
fffff806`11c0f7f1 4885c0          test    rax,rax
fffff806`11c0f7f4 740d            je      nt!MiAllocateVirtualMemoryPrepare+0x158913 (fffff806`11c0f803)
fffff806`11c0f803 418bc6          mov     eax,r14d
fffff806`11c0f806 e9327aeaff      jmp     nt!MiAllocateVirtualMemoryPrepare+0x34d (fffff806`11ab723d)
fffff806`11ab723d 488b9c24b8000000 mov     rbx,qword ptr [rsp+0B8h]
fffff806`11ab7245 4883c460        add     rsp,60h
fffff806`11ab7249 415f            pop     r15
fffff806`11ab724b 415e            pop     r14
fffff806`11ab724d 415d            pop     r13
fffff806`11ab724f 415c            pop     r12
fffff806`11ab7251 5f              pop     rdi
fffff806`11ab7252 5e              pop     rsi
fffff806`11ab7253 5d              pop     rbp
fffff806`11ab7254 c3              ret
fffff806`11ab688a 8bd8            mov     ebx,eax
fffff806`11ab688c 89442474        mov     dword ptr [rsp+74h],eax
fffff806`11ab6890 85c0            test    eax,eax
fffff806`11ab6892 7861            js      nt!NtAllocateVirtualMemory+0x1d5 (fffff806`11ab68f5)
fffff806`11ab68f5 85db            test    ebx,ebx
fffff806`11ab68f7 7855            js      nt!NtAllocateVirtualMemory+0x22e (fffff806`11ab694e)
fffff806`11ab694e 4883bc240001000000 cmp   qword ptr [rsp+100h],0
fffff806`11ab6957 0f84578d1500    je      nt!NtAllocateVirtualMemory+0x158f94 (fffff806`11c0f6b4)
fffff806`11c0f6b4 ff05eeee4300    inc     dword ptr [nt!MiState+0x1f28 (fffff806`1204e5a8)]
fffff806`11c0f6ba e93a72eaff      jmp     nt!NtAllocateVirtualMemory+0x1d9 (fffff806`11ab68f9)
fffff806`11ab68f9 4983fe02        cmp     r14,2
fffff806`11ab68fd 0f83bc8d1500    jae     nt!NtAllocateVirtualMemory+0x158f9f (fffff806`11c0f6bf)
fffff806`11ab6903 488b8c2498000000 mov     rcx,qword ptr [rsp+98h]
fffff806`11ab690b 4885c9          test    rcx,rcx
fffff806`11ab690e 7532            jne     nt!NtAllocateVirtualMemory+0x222 (fffff806`11ab6942)
fffff806`11ab6910 85db            test    ebx,ebx
fffff806`11ab6912 780e            js      nt!NtAllocateVirtualMemory+0x202 (fffff806`11ab6922)
fffff806`11ab6922 8bc3            mov     eax,ebx
fffff806`11ab6924 4c8d9c2480010000 lea     r11,[rsp+180h]
fffff806`11ab692c 498b5b38        mov     rbx,qword ptr [r11+38h]
fffff806`11ab6930 498b7348        mov     rsi,qword ptr [r11+48h]
fffff806`11ab6934 498be3          mov     rsp,r11
fffff806`11ab6937 415f            pop     r15
fffff806`11ab6939 415e            pop     r14
fffff806`11ab693b 415d            pop     r13
fffff806`11ab693d 415c            pop     r12
fffff806`11ab693f 5f              pop     rdi
fffff806`11ab6940 c3              ret

Basically, edi is set into eax at nt!ObpReferenceObjectByHandleWithTag+0x4e8 and even though there are many register movements at the end nt!NtAllocateVirtualMemory+0x220 has in RAX the value contained by edi at nt!ObpReferenceObjectByHandleWithTag+0x4e8.

Looking further back, it’s clear that edi is set to the ACCESS DENIED value.

fffff806`119f5e99 0f85ee020000    jne     nt!ObpReferenceObjectByHandleWithTag+0x4bd (fffff806`119f618d)
fffff806`119f618d bf220000c0      mov     edi,0C0000022h
fffff806`119f6192 8b9424b0000000  mov     edx,dword ptr [rsp+0B0h]
fffff806`119f6199 488d4b30        lea     rcx,[rbx+30h]
fffff806`119f619d e8ee20c1ff      call    nt!ObfDereferenceObjectWithTag (fffff806`11608290)
fffff806`119f61a2 80bc24b800000000 cmp     byte ptr [rsp+0B8h],0
fffff806`119f61aa 0f858bb61e00    jne     nt!ObpReferenceObjectByHandleWithTag+0x1ebb6b (fffff806`11be183b)
fffff806`119f61b0 498bcf          mov     rcx,r15
fffff806`119f61b3 e8584ec1ff      call    nt!KeLeaveCriticalRegionThread (fffff806`1160b010)
fffff806`119f61b8 8bc7            mov     eax,edi

Microsoft says that RDI register should be preserved by the callee and indeed it is through the calls to call nt!ObfDereferenceObjectWithTag and call nt!KeLeaveCriticalRegionThread.

RDI, is preserved in nt!ObfDereferenceObjectWithTag, even if it is never touched, pushing it on the stack and popping it at the end.

fffff806`11608290 48895c2408      mov     qword ptr [rsp+8],rbx
fffff806`11608295 4889742410      mov     qword ptr [rsp+10h],rsi
fffff806`1160829a 57              push    rdi
fffff806`1160829b 4883ec30        sub     rsp,30h
fffff806`1160829f 833d6a2daf0000  cmp     dword ptr [nt!ObpTraceFlags (fffff806`120fb010)],0
fffff806`116082a6 488bf1          mov     rsi,rcx
fffff806`116082a9 0f8525f22000    jne     nt!ObfDereferenceObjectWithTag+0x20f244 (fffff806`118174d4)
fffff806`116082af 48c7c3ffffffff  mov     rbx,0FFFFFFFFFFFFFFFFh
fffff806`116082b6 f0480fc15ed0    lock xadd qword ptr [rsi-30h],rbx
fffff806`116082bc 4883eb01        sub     rbx,1
fffff806`116082c0 7e14            jle     nt!ObfDereferenceObjectWithTag+0x46 (fffff806`116082d6)
fffff806`116082c2 488bc3          mov     rax,rbx
fffff806`116082c5 488b5c2440      mov     rbx,qword ptr [rsp+40h]
fffff806`116082ca 488b742448      mov     rsi,qword ptr [rsp+48h]
fffff806`116082cf 4883c430        add     rsp,30h
fffff806`116082d3 5f              pop     rdi
fffff806`116082d4 c3              ret

In the nt!KeLeaveCriticalRegionThread(), RDI is preserved since it is never touched.

fffff806`1160b010 4883ec28        sub     rsp,28h
fffff806`1160b014 668381e401000001 add     word ptr [rcx+1E4h],1
fffff806`1160b01c 750c            jne     nt!KeLeaveCriticalRegionThread+0x1a (fffff806`1160b02a)
fffff806`1160b01e 488d8198000000  lea     rax,[rcx+98h]
fffff806`1160b025 483900          cmp     qword ptr [rax],rax
fffff806`1160b028 7506            jne     nt!KeLeaveCriticalRegionThread+0x20 (fffff806`1160b030)
fffff806`1160b02a 4883c428        add     rsp,28h
fffff806`1160b02e c3              ret

At this point, it’s required to understand why nt!ObpReferenceObjectByHandleWithTag+0x4bd is executed!

fffff806`119f62f0 c3              ret
fffff806`119f5dbf 488bf8          mov     rdi,rax
fffff806`119f5dc2 4885c0          test    rax,rax
fffff806`119f5dc5 0f840b040000    je      nt!ObpReferenceObjectByHandleWithTag+0x506 (fffff806`119f61d6)
fffff806`119f5dcb 0f0d08          prefetchw [rax]
fffff806`119f5dce 488b08          mov     rcx,qword ptr [rax]
fffff806`119f5dd1 488b6808        mov     rbp,qword ptr [rax+8]
fffff806`119f5dd5 48896c2438      mov     qword ptr [rsp+38h],rbp
fffff806`119f5dda 48894c2430      mov     qword ptr [rsp+30h],rcx
fffff806`119f5ddf 4c8b742430      mov     r14,qword ptr [rsp+30h]
fffff806`119f5de4 49f7c6feff0100  test    r14,1FFFEh
fffff806`119f5deb 0f84ff010000    je      nt!ObpReferenceObjectByHandleWithTag+0x320 (fffff806`119f5ff0)
fffff806`119f5df1 41f6c601        test    r14b,1
fffff806`119f5df5 0f8451030000    je      nt!ObpReferenceObjectByHandleWithTag+0x47c (fffff806`119f614c)
fffff806`119f5dfb 498d5efe        lea     rbx,[r14-2]
fffff806`119f5dff 488bcd          mov     rcx,rbp
fffff806`119f5e02 498bc6          mov     rax,r14
fffff806`119f5e05 488bd5          mov     rdx,rbp
fffff806`119f5e08 f0480fc70f      lock cmpxchg16b oword ptr [rdi]
fffff806`119f5e0d 4c8bf0          mov     r14,rax
fffff806`119f5e10 4889442430      mov     qword ptr [rsp+30h],rax
fffff806`119f5e15 488bea          mov     rbp,rdx
fffff806`119f5e18 4889542438      mov     qword ptr [rsp+38h],rdx
fffff806`119f5e1d 0f8558030000    jne     nt!ObpReferenceObjectByHandleWithTag+0x4ab (fffff806`119f617b)
fffff806`119f5e23 488bd8          mov     rbx,rax
fffff806`119f5e26 48d1eb          shr     rbx,1
fffff806`119f5e29 6683fb10        cmp     bx,10h
fffff806`119f5e2d 0f84e4030000    je      nt!ObpReferenceObjectByHandleWithTag+0x547 (fffff806`119f6217)
fffff806`119f5e33 488bd8          mov     rbx,rax
fffff806`119f5e36 48c1fb10        sar     rbx,10h
fffff806`119f5e3a 4883e3f0        and     rbx,0FFFFFFFFFFFFFFF0h
fffff806`119f5e3e 833dcb51700000  cmp     dword ptr [nt!ObpTraceFlags (fffff806`120fb010)],0
fffff806`119f5e45 0f852eb91e00    jne     nt!ObpReferenceObjectByHandleWithTag+0x1ebaa9 (fffff806`11be1779)
fffff806`119f5e4b 488b9424a0000000 mov     rdx,qword ptr [rsp+0A0h]
fffff806`119f5e53 4c8d0da6a1a0ff  lea     r9,[nt!VrpRegistryString <PERF> (nt+0x0) (fffff806`11400000)]
fffff806`119f5e5a 488bc3          mov     rax,rbx
fffff806`119f5e5d 48c1e808        shr     rax,8
fffff806`119f5e61 324318          xor     al,byte ptr [rbx+18h]
fffff806`119f5e64 3205c2687000    xor     al,byte ptr [nt!ObHeaderCookie (fffff806`120fc72c)]
fffff806`119f5e6a 4885d2          test    rdx,rdx
fffff806`119f5e6d 0f8423010000    je      nt!ObpReferenceObjectByHandleWithTag+0x2c6 (fffff806`119f5f96)
fffff806`119f5e73 384228          cmp     byte ptr [rdx+28h],al
fffff806`119f5e76 0f851a010000    jne     nt!ObpReferenceObjectByHandleWithTag+0x2c6 (fffff806`119f5f96)
fffff806`119f5e7c 8b8c2498000000  mov     ecx,dword ptr [rsp+98h]
fffff806`119f5e83 81e5ffffff01    and     ebp,1FFFFFFh
fffff806`119f5e89 80bc24a800000000 cmp     byte ptr [rsp+0A8h],0
fffff806`119f5e91 7414            je      nt!ObpReferenceObjectByHandleWithTag+0x1d7 (fffff806`119f5ea7)
fffff806`119f5e93 8bc5            mov     eax,ebp
fffff806`119f5e95 f7d0            not     eax
fffff806`119f5e97 85c1            test    ecx,eax
fffff806`119f5e99 0f85ee020000    jne     nt!ObpReferenceObjectByHandleWithTag+0x4bd (fffff806`119f618d)
fffff806`119f618d bf220000c0      mov     edi,0C0000022h

So the code arrive to nt!ObpReferenceObjectByHandleWithTag+0x4bd because ` ecx != eax , eax = not(ebp & 0x1FFFFFF) ` and ` rbp = [rax+8] where rax = nt!ExpLookupHandleTableEntry() `. ecx is read from the stack at [rsp+98h].

ExpLookupHandleTableEntry() returns the table entry from the handle number, in our case it would be 0xc4, and the handle table of the process that is calling VirtualAlloc().

At this, point let’s put a breakpoint on ExpLookupHandleTableEntry() return.

RAX = 0xffff89020dd77310 and *RAX+8 = 0x1ff7d6. Analyzing the handle entry table and the handle, i.e. index 0xc4, something strange appear.

kd> dt nt!_HANDLE_TABLE_ENTRY 0xffff89020dd77310
   +0x000 VolatileLowValue : 0n-4466944656595156999
   +0x000 LowValue         : 0n-4466944656595156999
   +0x000 InfoTable        : 0xc2023980`7050fff9 _HANDLE_TABLE_ENTRY_INFO
   +0x008 HighValue        : 0n2095062
   +0x008 NextFreeHandleEntry : 0x00000000`001ff7d6 _HANDLE_TABLE_ENTRY
   +0x008 LeafHandleValue  : _EXHANDLE
   +0x000 RefCountField    : 0n-4466944656595156999
   +0x000 Unlocked         : 0y1
   +0x000 RefCnt           : 0y0111111111111100 (0x7ffc)
   +0x000 Attributes       : 0y000
   +0x000 ObjectPointerBits : 0y11000010000000100011100110000000011100000101 (0xc2023980705)
   +0x008 GrantedAccessBits : 0y0000111111111011111010110 (0x1ff7d6)
   +0x008 NoRightsUpgrade  : 0y0
   +0x008 Spare1           : 0y000000 (0)
   +0x00c Spare2           : 0
kd> !handle 0xc4

PROCESS ffffc2023726c080
    SessionId: 1  Cid: 142c    Peb: 00839000  ParentCid: 113c
    DirBase: 1b1d57000  ObjectTable: ffff89020ea4cdc0  HandleCount:  55.
    Image: ShellCodeInjector.exe

Handle table at ffff89020ea4cdc0 with 55 entries in use

00c4: Object: ffffc20239807080  GrantedAccess: 001ff7d6 (Protected) (Audit) Entry: ffff89020dd77310
Object: ffffc20239807080  Type: (ffffc202314ac380) Process
    ObjectHeader: ffffc20239807050 (new version)
        HandleCount: 11  PointerCount: 354527

HighValue == 2095062 == 0x1ff7d6 at ffff89020dd77310+8 that is the granted access mask to the object, i.e. the target process. The access mask is incoherent with the one used in the test case, i.e. PROCESS_ALL_ACCESS == 0x01fffff. This could lead to access denied on virtual alloc call. The access mask means, that PROCESS_TERMINATE | PROCESS_VM_OPERATION | PROCESS_VM_WRITE | PROCESS_SUSPEND_RESUME are not granted so it’s impossible to allocate memory in the process!

We have at least two ways to bypass this:

  1. Trying to bypass this resetting granted access in the object handle.
  2. Understand where the access mask is modified and disable that feature.

The fastest way doing this is to intercept where the mask is changed in order to disable this once and not every time I have to attach to the process.

So, I put a breakpoint on the open process kernel implementation, i.e. nt!PsOpenProcess. bp nt!PsOpenProcess ".if ( poi(@r9) != 0xfd8) {gc}"

Let’s track the code flow until the end.

fffff806`11a87ea5 e886fdffff      call    nt!ObpCallPreOperationCallbacks (fffff806`11a87c30)

So likely, a callback has been registered for process handle operations via ObRegisterCallbacks().

fffff806`11a87d33 488b4908        mov     rcx,qword ptr [rcx+8]
fffff806`11a87d37 e81480d7ff      call    nt!guard_dispatch_icall (fffff806`117ffd50)
fffff806`117ffd50 4c8b1d111ba000  mov     r11,qword ptr [nt!guard_icall_bitmap (fffff806`12201868)]
fffff806`117ffd57 4885c0          test    rax,rax
fffff806`117ffd5a 0f8d7a000000    jge     nt!guard_dispatch_icall+0x8a (fffff806`117ffdda)
fffff806`117ffd60 4d85db          test    r11,r11
fffff806`117ffd63 741c            je      nt!guard_dispatch_icall+0x31 (fffff806`117ffd81)
fffff806`117ffd81 4c8b1dc0ce8f00  mov     r11,qword ptr [nt!retpoline_image_bitmap (fffff806`120fcc48)]
fffff806`117ffd88 4c8bd0          mov     r10,rax
fffff806`117ffd8b 4d85db          test    r11,r11
fffff806`117ffd8e 742e            je      nt!guard_dispatch_icall+0x6e (fffff806`117ffdbe)
fffff806`117ffdbe 0faee8          lfence
fffff806`117ffdc1 ffe0            jmp     rax

The function called that implements the antidebug is quite simple and basically get the process opened pid against a list of pids to check. Indeed the anti debug module obtains the opened process pid via PsGetProcessID()

fffff806`1166ab30                 mov     rax,qword ptr [rcx+440h]
fffff806`1166ab37                 ret

Check it against a list of pids contained by the anti debug module self.

test    rax,rax
je      fffff806`17481093
je      fffff806`17481093
lea     rbx,[fffff806`174969da0]
cmp     rax,qword ptr [rbx]

If the process pid opened is in the list then the access mask is checked to have the PROCESS_SUSPEND_RESUME bit enabled if so it is zeroed.

Disable PROCESS_SUSPEND_RESUME permission 
bt      dword ptr [rdx+4], 0Bh
jae     FFFFF80617481081
btr     dword ptr [rdx], 0Bh

More over the access mask is computed doing multiple and operation that disable other things like:

and     dword ptr [rdx], 0FFFFFFFEh
dword ptr [rdx], 0FFFFFFF7h
dword ptr [rdx], 0FFFFFFDFh

At this point, I just changed the list of the pids in order to not enter in the code block that alters the access mask, ed fffff806`174969da0 0. Finally, I bypassed the anti debug trick.

Finally, a consideration is required: I did a mess just to found that the handle’s rights were different by the one I asked for.